Commit 1ca89fbc authored by Brian Foster's avatar Brian Foster Committed by Darrick J. Wong

xfs: don't account extra agfl blocks as available

The block allocation AG selection code has parameters that allow a
caller to perform multiple allocations from a single AG and
transaction (under certain conditions). The parameters specify the
total block allocation count required by the transaction and the AG
selection code selects and locks an AG that will be able to satisfy
the overall requirement. If the available block accounting
calculation turns out to be inaccurate and a subsequent allocation
call fails with -ENOSPC, the resulting transaction cancel leads to
filesystem shutdown because the transaction is dirty.

This exact problem can be reproduced with a highly parallel space
consumer and fsstress workload running long enough to a large
filesystem against -ENOSPC conditions. A bmbt block allocation
request made for inode extent to bmap format conversion after an
extent allocation is expected to be satisfied by the same AG and the
same transaction as the extent allocation. The bmbt block allocation
fails, however, because the block availability of the AG has changed
since the AG was selected (outside of the blocks used for the extent
itself).

The inconsistent block availability calculation is caused by the
deferred block freeing behavior of the AGFL. This immediately
removes extra blocks from the AGFL to free up AGFL slots, but rather
than immediately freeing such blocks as was done in the past, the
block free is deferred such that said blocks are not available for
allocation until the current transaction commits. The AG selection
logic currently considers all AGFL blocks as available and executes
shortly before any extra AGFL blocks are freed. This means the block
availability of the current AG can change before the first
allocation even occurs, but in practice a failure is more likely to
manifest via a subsequent allocation because extent allocation
usually has a contiguity requirement larger than a single block that
can't be satisfied from the AGFL.

In general, XFS prefers operational robustness to absolute
allocation efficiency. In other words, we prefer to return -ENOSPC
slightly earlier at the expense of not being able to allocate every
last block in an AG to avoid this kind of problem. As such, update
the AG block availability calculation to consider extra AGFL blocks
as unavailable since they are immediately removed following the
calculation and will not become available until the current
transaction commits.
Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
parent 22fedd80
...@@ -2042,6 +2042,7 @@ xfs_alloc_space_available( ...@@ -2042,6 +2042,7 @@ xfs_alloc_space_available(
xfs_extlen_t alloc_len, longest; xfs_extlen_t alloc_len, longest;
xfs_extlen_t reservation; /* blocks that are still reserved */ xfs_extlen_t reservation; /* blocks that are still reserved */
int available; int available;
xfs_extlen_t agflcount;
if (flags & XFS_ALLOC_FLAG_FREEING) if (flags & XFS_ALLOC_FLAG_FREEING)
return true; return true;
...@@ -2054,8 +2055,13 @@ xfs_alloc_space_available( ...@@ -2054,8 +2055,13 @@ xfs_alloc_space_available(
if (longest < alloc_len) if (longest < alloc_len)
return false; return false;
/* do we have enough free space remaining for the allocation? */ /*
available = (int)(pag->pagf_freeblks + pag->pagf_flcount - * Do we have enough free space remaining for the allocation? Don't
* account extra agfl blocks because we are about to defer free them,
* making them unavailable until the current transaction commits.
*/
agflcount = min_t(xfs_extlen_t, pag->pagf_flcount, min_free);
available = (int)(pag->pagf_freeblks + agflcount -
reservation - min_free - args->minleft); reservation - min_free - args->minleft);
if (available < (int)max(args->total, alloc_len)) if (available < (int)max(args->total, alloc_len))
return false; return false;
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment