• Brian Foster's avatar
    xfs: don't reuse busy extents on extent trim · 06058bc4
    Brian Foster authored
    Freed extents are marked busy from the point the freeing transaction
    commits until the associated CIL context is checkpointed to the log.
    This prevents reuse and overwrite of recently freed blocks before
    the changes are committed to disk, which can lead to corruption
    after a crash. The exception to this rule is that metadata
    allocation is allowed to reuse busy extents because metadata changes
    are also logged.
    
    As of commit 97d3ac75 ("xfs: exact busy extent tracking"), XFS
    has allowed modification or complete invalidation of outstanding
    busy extents for metadata allocations. This implementation assumes
    that use of the associated extent is imminent, which is not always
    the case. For example, the trimmed extent might not satisfy the
    minimum length of the allocation request, or the allocation
    algorithm might be involved in a search for the optimal result based
    on locality.
    
    generic/019 reproduces a corruption caused by this scenario. First,
    a metadata block (usually a bmbt or symlink block) is freed from an
    inode. A subsequent bmbt split on an unrelated inode attempts a near
    mode allocation request that invalidates the busy block during the
    search, but does not ultimately allocate it. Due to the busy state
    invalidation, the block is no longer considered busy to subsequent
    allocation. A direct I/O write request immediately allocates the
    block and writes to it. Finally, the filesystem crashes while in a
    state where the initial metadata block free had not committed to the
    on-disk log. After recovery, the original metadata block is in its
    original location as expected, but has been corrupted by the
    aforementioned dio.
    
    This demonstrates that it is fundamentally unsafe to modify busy
    extent state for extents that are not guaranteed to be allocated.
    This applies to pretty much all of the code paths that currently
    trim busy extents for one reason or another. Therefore to address
    this problem, drop the reuse mechanism from the busy extent trim
    path. This code already knows how to return partial non-busy ranges
    of the targeted free extent and higher level code tracks the busy
    state of the allocation attempt. If a block allocation fails where
    one or more candidate extents is busy, we force the log and retry
    the allocation.
    Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
    Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
    Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
    Reviewed-by: default avatarChandan Babu R <chandanrlinux@gmail.com>
    Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
    06058bc4
xfs_extent_busy.c 15.7 KB