• Xianting Tian's avatar
    mm/filemap.c: clear page error before actual read · faffdfa0
    Xianting Tian authored
    Mount failure issue happens under the scenario: Application forked dozens
    of threads to mount the same number of cramfs images separately in docker,
    but several mounts failed with high probability.  Mount failed due to the
    checking result of the page(read from the superblock of loop dev) is not
    uptodate after wait_on_page_locked(page) returned in function cramfs_read:
    
       wait_on_page_locked(page);
       if (!PageUptodate(page)) {
          ...
       }
    
    The reason of the checking result of the page not uptodate: systemd-udevd
    read the loopX dev before mount, because the status of loopX is Lo_unbound
    at this time, so loop_make_request directly trigger the calling of io_end
    handler end_buffer_async_read, which called SetPageError(page).  So It
    caused the page can't be set to uptodate in function
    end_buffer_async_read:
    
       if(page_uptodate && !PageError(page)) {
          SetPageUptodate(page);
       }
    
    Then mount operation is performed, it used the same page which is just
    accessed by systemd-udevd above, Because this page is not uptodate, it
    will launch a actual read via submit_bh, then wait on this page by calling
    wait_on_page_locked(page).  When the I/O of the page done, io_end handler
    end_buffer_async_read is called, because no one cleared the page
    error(during the whole read path of mount), which is caused by
    systemd-udevd reading, so this page is still in "PageError" status, which
    can't be set to uptodate in function end_buffer_async_read, then caused
    mount failure.
    
    But sometimes mount succeed even through systemd-udeved read loopX dev
    just before, The reason is systemd-udevd launched other loopX read just
    between step 3.1 and 3.2, the steps as below:
    
    1, loopX dev default status is Lo_unbound;
    2, systemd-udved read loopX dev (page is set to PageError);
    3, mount operation
       1) set loopX status to Lo_bound;
       ==>systemd-udevd read loopX dev<==
       2) read loopX dev(page has no error)
       3) mount succeed
    
    As the loopX dev status is set to Lo_bound after step 3.1, so the other
    loopX dev read by systemd-udevd will go through the whole I/O stack, part
    of the call trace as below:
    
       SYS_read
          vfs_read
              do_sync_read
                  blkdev_aio_read
                     generic_file_aio_read
                         do_generic_file_read:
                            ClearPageError(page);
                            mapping->a_ops->readpage(filp, page);
    
    here, mapping->a_ops->readpage() is blkdev_readpage.  In latest kernel,
    some function name changed, the call trace as below:
    
       blkdev_read_iter
          generic_file_read_iter
             generic_file_buffered_read:
                /*
                 * A previous I/O error may have been due to temporary
                 * failures, eg. mutipath errors.
                 * Pg_error will be set again if readpage fails.
                 */
                ClearPageError(page);
                /* Start the actual read. The read will unlock the page*/
                error=mapping->a_ops->readpage(flip, page);
    
    We can see ClearPageError(page) is called before the actual read,
    then the read in step 3.2 succeed.
    
    This patch is to add the calling of ClearPageError just before the actual
    read of read path of cramfs mount.  Without the patch, the call trace as
    below when performing cramfs mount:
    
       do_mount
          cramfs_read
             cramfs_blkdev_read
                read_cache_page
                   do_read_cache_page:
                      filler(data, page);
                      or
                      mapping->a_ops->readpage(data, page);
    
    With the patch, the call trace as below when performing mount:
    
       do_mount
          cramfs_read
             cramfs_blkdev_read
                read_cache_page:
                   do_read_cache_page:
                      ClearPageError(page); <== new add
                      filler(data, page);
                      or
                      mapping->a_ops->readpage(data, page);
    
    With the patch, mount operation trigger the calling of
    ClearPageError(page) before the actual read, the page has no error if no
    additional page error happen when I/O done.
    Signed-off-by: default avatarXianting Tian <xianting_tian@126.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Reviewed-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Jan Kara <jack@suse.cz>
    Cc: <yubin@h3c.com>
    Link: http://lkml.kernel.org/r/1583318844-22971-1-git-send-email-xianting_tian@126.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    faffdfa0
filemap.c 96.4 KB