• Brian Foster's avatar
    xfs: account only rmapbt-used blocks against rmapbt perag res · 0ab32086
    Brian Foster authored
    The rmapbt perag metadata reservation reserves blocks for the
    reverse mapping btree (rmapbt). Since the rmapbt uses blocks from
    the agfl and perag accounting is updated as blocks are allocated
    from the allocation btrees, the reservation actually accounts blocks
    as they are allocated to (or freed from) the agfl rather than the
    rmapbt itself.
    
    While this works for blocks that are eventually used for the rmapbt,
    not all agfl blocks are destined for the rmapbt. Blocks that are
    allocated to the agfl (and thus "reserved" for the rmapbt) but then
    used by another structure leads to a growing inconsistency over time
    between the runtime tracking of rmapbt usage vs. actual rmapbt
    usage. Since the runtime tracking thinks all agfl blocks are rmapbt
    blocks, it essentially believes that less future reservation is
    required to satisfy the rmapbt than what is actually necessary.
    
    The inconsistency is rectified across mount cycles because the perag
    reservation is initialized based on the actual rmapbt usage at mount
    time. The problem, however, is that the excessive drain of the
    reservation at runtime opens a window to allocate blocks for other
    purposes that might be required for the rmapbt on a subsequent
    mount. This problem can be demonstrated by a simple test that runs
    an allocation workload to consume agfl blocks over time and then
    observe the difference in the agfl reservation requirement across an
    unmount/mount cycle:
    
      mount ...: xfs_ag_resv_init: ... resv 3193 ask 3194 len 3194
      ...
      ...      : xfs_ag_resv_alloc_extent: ... resv 2957 ask 3194 len 1
      umount...: xfs_ag_resv_free: ... resv 2956 ask 3194 len 0
      mount ...: xfs_ag_resv_init: ... resv 3052 ask 3194 len 3194
    
    As the above tracepoints show, the reservation requirement reduces
    from 3194 blocks to 2956 blocks as the workload runs.  Without any
    other changes in the filesystem, the same reservation requirement
    jumps from 2956 to 3052 blocks over a umount/mount cycle.
    
    To address this divergence, update the RMAPBT reservation to account
    blocks used for the rmapbt only rather than all blocks filled into
    the agfl. This patch makes several high-level changes toward that
    end:
    
    1.) Reintroduce an AGFL reservation type to serve as an accounting
        no-op for blocks allocated to (or freed from) the AGFL.
    2.) Invoke RMAPBT usage accounting from the actual rmapbt block
        allocation path rather than the AGFL allocation path.
    
    The first change is required because agfl blocks are considered free
    blocks throughout their lifetime. The perag reservation subsystem is
    invoked unconditionally by the allocation subsystem, so we need a
    way to tell the perag subsystem (via the allocation subsystem) to
    not make any accounting changes for blocks filled into the AGFL.
    
    The second change causes the in-core RMAPBT reservation usage
    accounting to remain consistent with the on-disk state at all times
    and eliminates the risk of leaving the rmapbt reservation
    underfilled.
    Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
    Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
    0ab32086
xfs_rmap_btree.c 15.4 KB