• Dave Chinner's avatar
    xfs: Use preallocation for inodes with extsz hints · aff3a9ed
    Dave Chinner authored
    xfstest 229 exposes a problem with buffered IO, delayed allocation
    and extent size hints. That is when we do delayed allocation during
    buffered IO, we reserve space for the extent size hint alignment and
    allocate the physical space to align the extent, but we do not zero
    the regions of the extent that aren't written by the write(2)
    syscall. The result is that we expose stale data in unwritten
    regions of the extent size hints.
    
    There are two ways to fix this. The first is to detect that we are
    doing unaligned writes, check if there is already a mapping or data
    over the extent size hint range, and if not zero the page cache
    first before then doing the real write. This can be very expensive
    for large extent size hints, especially if the subsequent writes
    fill then entire extent size before the data is written to disk.
    
    The second, and simpler way, is simply to turn off delayed
    allocation when the extent size hint is set and use preallocation
    instead. This results in unwritten extents being laid down on disk
    and so only the written portions will be converted. This matches the
    behaviour for direct IO, and will also work for the real time
    device. The disadvantage of this approach is that for small extent
    size hints we can get file fragmentation, but in general extent size
    hints are fairly large (e.g. stripe width sized) so this isn't a big
    deal.
    
    Implement the second approach as it is simple and effective.
    Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
    Reviewed-by: default avatarMark Tinguely <tinguely@sgi.com>
    Signed-off-by: default avatarBen Myers <bpm@sgi.com>
    aff3a9ed
xfs_aops.c 39.4 KB