• Dave Chinner's avatar
    iomap: readpages doesn't zero page tail beyond EOF · 8c110d43
    Dave Chinner authored
    When we read the EOF page of the file via readpages, we need
    to zero the region beyond EOF that we either do not read or
    should not contain data so that mmap does not expose stale data to
    user applications.
    
    However, iomap_adjust_read_range() fails to detect EOF correctly,
    and so fsx on 1k block size filesystems fails very quickly with
    mapreads exposing data beyond EOF. There are two problems here.
    
    Firstly, when calculating the end block of the EOF byte, we have
    to round the size by one to avoid a block aligned EOF from reporting
    a block too large. i.e. a size of 1024 bytes is 1 block, which in
    index terms is block 0. Therefore we have to calculate the end block
    from (isize - 1), not isize.
    
    The second bug is determining if the current page spans EOF, and so
    whether we need split it into two half, one for the IO, and the
    other for zeroing. Unfortunately, the code that checks whether
    we should split the block doesn't actually check if we span EOF, it
    just checks if the read spans the /offset in the page/ that EOF
    sits on. So it splits every read into two if EOF is not page
    aligned, regardless of whether we are reading the EOF block or not.
    
    Hence we need to restrict the "does the read span EOF" check to
    just the page that spans EOF, not every page we read.
    
    This patch results in correct EOF detection through readpages:
    
    xfs_vm_readpages:     dev 259:0 ino 0x43 nr_pages 24
    xfs_iomap_found:      dev 259:0 ino 0x43 size 0x66c00 offset 0x4f000 count 98304 type hole startoff 0x13c startblock 1368 blockcount 0x4
    iomap_readpage_actor: orig pos 323584 pos 323584, length 4096, poff 0 plen 4096, isize 420864
    xfs_iomap_found:      dev 259:0 ino 0x43 size 0x66c00 offset 0x50000 count 94208 type hole startoff 0x140 startblock 1497 blockcount 0x5c
    iomap_readpage_actor: orig pos 327680 pos 327680, length 94208, poff 0 plen 4096, isize 420864
    iomap_readpage_actor: orig pos 331776 pos 331776, length 90112, poff 0 plen 4096, isize 420864
    iomap_readpage_actor: orig pos 335872 pos 335872, length 86016, poff 0 plen 4096, isize 420864
    iomap_readpage_actor: orig pos 339968 pos 339968, length 81920, poff 0 plen 4096, isize 420864
    iomap_readpage_actor: orig pos 344064 pos 344064, length 77824, poff 0 plen 4096, isize 420864
    iomap_readpage_actor: orig pos 348160 pos 348160, length 73728, poff 0 plen 4096, isize 420864
    iomap_readpage_actor: orig pos 352256 pos 352256, length 69632, poff 0 plen 4096, isize 420864
    iomap_readpage_actor: orig pos 356352 pos 356352, length 65536, poff 0 plen 4096, isize 420864
    iomap_readpage_actor: orig pos 360448 pos 360448, length 61440, poff 0 plen 4096, isize 420864
    iomap_readpage_actor: orig pos 364544 pos 364544, length 57344, poff 0 plen 4096, isize 420864
    iomap_readpage_actor: orig pos 368640 pos 368640, length 53248, poff 0 plen 4096, isize 420864
    iomap_readpage_actor: orig pos 372736 pos 372736, length 49152, poff 0 plen 4096, isize 420864
    iomap_readpage_actor: orig pos 376832 pos 376832, length 45056, poff 0 plen 4096, isize 420864
    iomap_readpage_actor: orig pos 380928 pos 380928, length 40960, poff 0 plen 4096, isize 420864
    iomap_readpage_actor: orig pos 385024 pos 385024, length 36864, poff 0 plen 4096, isize 420864
    iomap_readpage_actor: orig pos 389120 pos 389120, length 32768, poff 0 plen 4096, isize 420864
    iomap_readpage_actor: orig pos 393216 pos 393216, length 28672, poff 0 plen 4096, isize 420864
    iomap_readpage_actor: orig pos 397312 pos 397312, length 24576, poff 0 plen 4096, isize 420864
    iomap_readpage_actor: orig pos 401408 pos 401408, length 20480, poff 0 plen 4096, isize 420864
    iomap_readpage_actor: orig pos 405504 pos 405504, length 16384, poff 0 plen 4096, isize 420864
    iomap_readpage_actor: orig pos 409600 pos 409600, length 12288, poff 0 plen 4096, isize 420864
    iomap_readpage_actor: orig pos 413696 pos 413696, length 8192, poff 0 plen 4096, isize 420864
    iomap_readpage_actor: orig pos 417792 pos 417792, length 4096, poff 0 plen 3072, isize 420864
    iomap_readpage_actor: orig pos 420864 pos 420864, length 1024, poff 3072 plen 1024, isize 420864
    
    As you can see, it now does full page reads until the last one which
    is split correctly at the block aligned EOF, reading 3072 bytes and
    zeroing the last 1024 bytes. The original version of the patch got
    this right, but it got another case wrong.
    
    The EOF detection crossing really needs to the the original length
    as plen, while it starts at the end of the block, will be shortened
    as up-to-date blocks are found on the page. This means "orig_pos +
    plen" no longer points to the end of the page, and so will not
    correctly detect EOF crossing. Hence we have to use the length
    passed in to detect this partial page case:
    
    xfs_filemap_fault:    dev 259:1 ino 0x43  write_fault 0
    xfs_vm_readpage:      dev 259:1 ino 0x43 nr_pages 1
    xfs_iomap_found:      dev 259:1 ino 0x43 size 0x2cc00 offset 0x2c000 count 4096 type hole startoff 0xb0 startblock 282 blockcount 0x4
    iomap_readpage_actor: orig pos 180224 pos 181248, length 4096, poff 1024 plen 2048, isize 183296
    xfs_iomap_found:      dev 259:1 ino 0x43 size 0x2cc00 offset 0x2cc00 count 1024 type hole startoff 0xb3 startblock 285 blockcount 0x1
    iomap_readpage_actor: orig pos 183296 pos 183296, length 1024, poff 3072 plen 1024, isize 183296
    
    Heere we see a trace where the first block on the EOF page is up to
    date, hence poff = 1024 bytes. The offset into the page of EOF is
    3072, so the range we want to read is 1024 - 3071, and the range we
    want to zero is 3072 - 4095. You can see this is split correctly
    now.
    
    This fixes the stale data beyond EOF problem that fsx quickly
    uncovers on 1k block size filesystems.
    Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
    Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
    Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
    8c110d43
iomap.c 53.2 KB