• Eryu Guan's avatar
    iomap: invalidate page caches should be after iomap_dio_complete() in direct write · c771c14b
    Eryu Guan authored
    After XFS switching to iomap based DIO (commit acdda3aa ("xfs:
    use iomap_dio_rw")), I started to notice dio29/dio30 tests failures
    from LTP run on ppc64 hosts, and they can be reproduced on x86_64
    hosts with 512B/1k block size XFS too.
    
    dio29	diotest3 -b 65536 -n 100 -i 1000 -o 1024000
    dio30	diotest6 -b 65536 -n 100 -i 1000 -o 1024000
    
    The failure message is like:
    bufcmp: offset 0: Expected: 0x62, got 0x0
    diotest03    1  TPASS  :  Read with Direct IO, Write without
    diotest03    2  TFAIL  :  diotest3.c:142: comparsion failed; child=98 offset=1425408
    diotest03    3  TFAIL  :  diotest3.c:194: Write Direct-child 98 failed
    
    Direct write wrote 0x62 but buffer read got zero. This is because,
    when doing direct write to a hole or preallocated file, we
    invalidate the page caches before converting the extent from
    unwritten state to normal state, which is done by
    iomap_dio_complete(), thus leave a window for other buffer reader to
    cache the unwritten state extent.
    
    Consider this case, with sub-page blocksize XFS, two processes are
    direct writing to different blocksize-aligned regions (say 512B) of
    the same preallocated file, and reading the region back via buffered
    I/O to compare contents.
    
    process A, region [0,512]		process B, region [512,1024]
    xfs_file_write_iter
     xfs_file_aio_dio_write
      iomap_dio_rw
       iomap_apply
       invalidate_inode_pages2_range
       					xfs_file_write_iter
    				 	xfs_file_aio_dio_write
    					  iomap_dio_rw
    					   iomap_apply
    					   invalidate_inode_pages2_range
    					   iomap_dio_complete
    					xfs_file_read_iter
    					 xfs_file_buffered_aio_read
    					  generic_file_read_iter
    					   do_generic_file_read
    					    <readahead fills pagecache with 0>
       iomap_dio_complete
    xfs_file_read_iter
     <read gets 0 from pagecache>
    
    Process A first invalidates page caches, at this point the
    underlying extent is still in unwritten state (iomap_dio_complete
    not called yet), and process B finishs direct write and populates
    page caches via readahead, which caches zeros in page for region A,
    then process A reads zeros from page cache, instead of the actual
    data.
    
    Fix it by invalidating page caches after converting unwritten extent
    to make sure we read content from disk after extent state changed,
    as what we did before switching to iomap based dio.
    
    Also introduce a new 'start' variable to save the original write
    offset (iomap_dio_complete() updates iocb->ki_pos), and a 'err'
    variable for invalidating caches result, cause we can't reuse 'ret'
    anymore.
    Signed-off-by: default avatarEryu Guan <eguan@redhat.com>
    Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
    Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
    c771c14b
iomap.c 22.8 KB