• Brian Foster's avatar
    xfs: don't account extra agfl blocks as available · 1ca89fbc
    Brian Foster authored
    The block allocation AG selection code has parameters that allow a
    caller to perform multiple allocations from a single AG and
    transaction (under certain conditions). The parameters specify the
    total block allocation count required by the transaction and the AG
    selection code selects and locks an AG that will be able to satisfy
    the overall requirement. If the available block accounting
    calculation turns out to be inaccurate and a subsequent allocation
    call fails with -ENOSPC, the resulting transaction cancel leads to
    filesystem shutdown because the transaction is dirty.
    
    This exact problem can be reproduced with a highly parallel space
    consumer and fsstress workload running long enough to a large
    filesystem against -ENOSPC conditions. A bmbt block allocation
    request made for inode extent to bmap format conversion after an
    extent allocation is expected to be satisfied by the same AG and the
    same transaction as the extent allocation. The bmbt block allocation
    fails, however, because the block availability of the AG has changed
    since the AG was selected (outside of the blocks used for the extent
    itself).
    
    The inconsistent block availability calculation is caused by the
    deferred block freeing behavior of the AGFL. This immediately
    removes extra blocks from the AGFL to free up AGFL slots, but rather
    than immediately freeing such blocks as was done in the past, the
    block free is deferred such that said blocks are not available for
    allocation until the current transaction commits. The AG selection
    logic currently considers all AGFL blocks as available and executes
    shortly before any extra AGFL blocks are freed. This means the block
    availability of the current AG can change before the first
    allocation even occurs, but in practice a failure is more likely to
    manifest via a subsequent allocation because extent allocation
    usually has a contiguity requirement larger than a single block that
    can't be satisfied from the AGFL.
    
    In general, XFS prefers operational robustness to absolute
    allocation efficiency. In other words, we prefer to return -ENOSPC
    slightly earlier at the expense of not being able to allocate every
    last block in an AG to avoid this kind of problem. As such, update
    the AG block availability calculation to consider extra AGFL blocks
    as unavailable since they are immediately removed following the
    calculation and will not become available until the current
    transaction commits.
    Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
    Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
    1ca89fbc
xfs_alloc.c 87.2 KB