• Baokun Li's avatar
    mm/filemap: avoid buffered read/write race to read inconsistent data · e2c27b80
    Baokun Li authored
    The following concurrency may cause the data read to be inconsistent with
    the data on disk:
    
                 cpu1                           cpu2
    ------------------------------|------------------------------
                                   // Buffered write 2048 from 0
                                   ext4_buffered_write_iter
                                    generic_perform_write
                                     copy_page_from_iter_atomic
                                     ext4_da_write_end
                                      ext4_da_do_write_end
                                       block_write_end
                                        __block_commit_write
                                         folio_mark_uptodate
    // Buffered read 4096 from 0          smp_wmb()
    ext4_file_read_iter                   set_bit(PG_uptodate, folio_flags)
     generic_file_read_iter            i_size_write // 2048
      filemap_read                     unlock_page(page)
       filemap_get_pages
        filemap_get_read_batch
        folio_test_uptodate(folio)
         ret = test_bit(PG_uptodate, folio_flags)
         if (ret)
          smp_rmb();
          // Ensure that the data in page 0-2048 is up-to-date.
    
                                   // New buffered write 2048 from 2048
                                   ext4_buffered_write_iter
                                    generic_perform_write
                                     copy_page_from_iter_atomic
                                     ext4_da_write_end
                                      ext4_da_do_write_end
                                       block_write_end
                                        __block_commit_write
                                         folio_mark_uptodate
                                          smp_wmb()
                                          set_bit(PG_uptodate, folio_flags)
                                       i_size_write // 4096
                                       unlock_page(page)
    
       isize = i_size_read(inode) // 4096
       // Read the latest isize 4096, but without smp_rmb(), there may be
       // Load-Load disorder resulting in the data in the 2048-4096 range
       // in the page is not up-to-date.
       copy_page_to_iter
       // copyout 4096
    
    In the concurrency above, we read the updated i_size, but there is no read
    barrier to ensure that the data in the page is the same as the i_size at
    this point, so we may copy the unsynchronized page out.  Hence adding the
    missing read memory barrier to fix this.
    
    This is a Load-Load reordering issue, which only occurs on some weak
    mem-ordering architectures (e.g.  ARM64, ALPHA), but not on strong
    mem-ordering architectures (e.g.  X86).  And theoretically the problem
    doesn't only happen on ext4, filesystems that call filemap_read() but
    don't hold inode lock (e.g.  btrfs, f2fs, ubifs ...) will have this
    problem, while filesystems with inode lock (e.g.  xfs, nfs) won't have
    this problem.
    
    Link: https://lkml.kernel.org/r/20231213062324.739009-1-libaokun1@huawei.comSigned-off-by: default avatarBaokun Li <libaokun1@huawei.com>
    Reviewed-by: default avatarJan Kara <jack@suse.cz>
    Cc: Andreas Dilger <adilger.kernel@dilger.ca>
    Cc: Christoph Hellwig <hch@infradead.org>
    Cc: Dave Chinner <david@fromorbit.com>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
    Cc: Theodore Ts'o <tytso@mit.edu>
    Cc: yangerkun <yangerkun@huawei.com>
    Cc: Yu Kuai <yukuai3@huawei.com>
    Cc: Zhang Yi <yi.zhang@huawei.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    e2c27b80
filemap.c 120 KB