• Dave Chinner's avatar
    iomap: write iomap validity checks · d7b64041
    Dave Chinner authored
    A recent multithreaded write data corruption has been uncovered in
    the iomap write code. The core of the problem is partial folio
    writes can be flushed to disk while a new racing write can map it
    and fill the rest of the page:
    
    writeback			new write
    
    allocate blocks
      blocks are unwritten
    submit IO
    .....
    				map blocks
    				iomap indicates UNWRITTEN range
    				loop {
    				  lock folio
    				  copyin data
    .....
    IO completes
      runs unwritten extent conv
        blocks are marked written
    				  <iomap now stale>
    				  get next folio
    				}
    
    Now add memory pressure such that memory reclaim evicts the
    partially written folio that has already been written to disk.
    
    When the new write finally gets to the last partial page of the new
    write, it does not find it in cache, so it instantiates a new page,
    sees the iomap is unwritten, and zeros the part of the page that
    it does not have data from. This overwrites the data on disk that
    was originally written.
    
    The full description of the corruption mechanism can be found here:
    
    https://lore.kernel.org/linux-xfs/20220817093627.GZ3600936@dread.disaster.area/
    
    To solve this problem, we need to check whether the iomap is still
    valid after we lock each folio during the write. We have to do it
    after we lock the page so that we don't end up with state changes
    occurring while we wait for the folio to be locked.
    
    Hence we need a mechanism to be able to check that the cached iomap
    is still valid (similar to what we already do in buffered
    writeback), and we need a way for ->begin_write to back out and
    tell the high level iomap iterator that we need to remap the
    remaining write range.
    
    The iomap needs to grow some storage for the validity cookie that
    the filesystem provides to travel with the iomap. XFS, in
    particular, also needs to know some more information about what the
    iomap maps (attribute extents rather than file data extents) to for
    the validity cookie to cover all the types of iomaps we might need
    to validate.
    Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
    Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
    Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
    d7b64041
buffered-io.c 52.5 KB