• Qu Wenruo's avatar
    btrfs: fix the error handling for submit_extent_page() for btrfs_do_readpage() · 10f7f6f8
    Qu Wenruo authored
    
    
    [BUG]
    Test case generic/475 have a very high chance (almost 100%) to hit a fs
    hang, where a data page will never be unlocked and hang all later
    operations.
    
    [CAUSE]
    In btrfs_do_readpage(), if we hit an error from submit_extent_page() we
    will try to do the cleanup for our current io range, and exit.
    
    This works fine for PAGE_SIZE == sectorsize cases, but not for subpage.
    
    For subpage btrfs_do_readpage() will lock the full page first, which can
    contain several different sectors and extents:
    
     btrfs_do_readpage()
     |- begin_page_read()
     |  |- btrfs_subpage_start_reader();
     |     Now the page will have PAGE_SIZE / sectorsize reader pending,
     |     and the page is locked.
     |
     |- end_page_read() for different branches
     |  This function will reduce subpage readers, and when readers
     |  reach 0, it will unlock the page.
    
    But when submit_extent_page() failed, we only cleanup the current
    io range, while the remaining io range will never be cleaned up, and the
    page remains locked forever.
    
    [FIX]
    Update the error handling of submit_extent_page() to cleanup all the
    remaining subpage range before exiting the loop.
    
    Please note that, now submit_extent_page() can only fail due to
    sanity check in alloc_new_bio().
    
    Thus regular IO errors are impossible to trigger the error path.
    
    CC: stable@vger.kernel.org # 5.15+
    Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    10f7f6f8
extent_io.c 198 KB