• Brian Foster's avatar
    xfs: validate writeback mapping using data fork seq counter · d9252d52
    Brian Foster authored
    The writeback code caches the current extent mapping across multiple
    xfs_do_writepage() calls to avoid repeated lookups for sequential
    pages backed by the same extent. This is known to be slightly racy
    with extent fork changes in certain difficult to reproduce
    scenarios. The cached extent is trimmed to within EOF to help avoid
    the most common vector for this problem via speculative
    preallocation management, but this is a band-aid that does not
    address the fundamental problem.
    
    Now that we have an xfs_ifork sequence counter mechanism used to
    facilitate COW writeback, we can use the same mechanism to validate
    consistency between the data fork and cached writeback mappings. On
    its face, this is somewhat of a big hammer approach because any
    change to the data fork invalidates any mapping currently cached by
    a writeback in progress regardless of whether the data fork change
    overlaps with the range under writeback. In practice, however, the
    impact of this approach is minimal in most cases.
    
    First, data fork changes (delayed allocations) caused by sustained
    sequential buffered writes are amortized across speculative
    preallocations. This means that a cached mapping won't be
    invalidated by each buffered write of a common file copy workload,
    but rather only on less frequent allocation events. Second, the
    extent tree is always entirely in-core so an additional lookup of a
    usable extent mostly costs a shared ilock cycle and in-memory tree
    lookup. This means that a cached mapping reval is relatively cheap
    compared to the I/O itself. Third, spurious invalidations don't
    impact ioend construction. This means that even if the same extent
    is revalidated multiple times across multiple writepage instances,
    we still construct and submit the same size ioend (and bio) if the
    blocks are physically contiguous.
    
    Update struct xfs_writepage_ctx with a new field to hold the
    sequence number of the data fork associated with the currently
    cached mapping. Check the wpc seqno against the data fork when the
    mapping is validated and reestablish the mapping whenever the fork
    has changed since the mapping was cached. This ensures that
    writeback always uses a valid extent mapping and thus prevents lost
    writebacks and stale delalloc block problems.
    Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
    Reviewed-by: default avatarAllison Henderson <allison.henderson@oracle.com>
    Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
    Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
    d9252d52
xfs_iomap.c 34.3 KB