• Darrick J. Wong's avatar
    xfs: fix off-by-one error in xfs_btree_space_to_height · c0f399ff
    Darrick J. Wong authored
    Lately I've been stress-testing extreme-sized rmap btrees by using the
    (new) xfs_db bmap_inflate command to clone bmbt mappings billions of
    times and then using xfs_repair to build new rmap and refcount btrees.
    This of course is /much/ faster than actually FICLONEing a file billions
    of times.
    
    Unfortunately, xfs_repair fails in xfs_btree_bload_compute_geometry with
    EOVERFLOW, which indicates that xfs_mount.m_rmap_maxlevels is not
    sufficiently large for the test scenario.  For a 1TB filesystem (~67
    million AG blocks, 4 AGs) the btheight command reports:
    
    $ xfs_db -c 'btheight -n 4400801200 -w min rmapbt' /dev/sda
    rmapbt: worst case per 4096-byte block: 84 records (leaf) / 45 keyptrs (node)
    level 0: 4400801200 records, 52390491 blocks
    level 1: 52390491 records, 1164234 blocks
    level 2: 1164234 records, 25872 blocks
    level 3: 25872 records, 575 blocks
    level 4: 575 records, 13 blocks
    level 5: 13 records, 1 block
    6 levels, 53581186 blocks total
    
    The AG is sufficiently large to build this rmap btree.  Unfortunately,
    m_rmap_maxlevels is 5.  Augmenting the loop in the space->height
    function to report height, node blocks, and blocks remaining produces
    this:
    
    ht 1 node_blocks 45 blockleft 67108863
    ht 2 node_blocks 2025 blockleft 67108818
    ht 3 node_blocks 91125 blockleft 67106793
    ht 4 node_blocks 4100625 blockleft 67015668
    final height: 5
    
    The goal of this function is to compute the maximum height btree that
    can be stored in the given number of ondisk fsblocks.  Starting with the
    top level of the tree, each iteration through the loop adds the fanout
    factor of the next level down until we run out of blocks.  IOWs, maximum
    height is achieved by using the smallest fanout factor that can apply
    to that level.
    
    However, the loop setup is not correct.  Top level btree blocks are
    allowed to contain fewer than minrecs items, so the computation is
    incorrect because the first time through the loop it should be using a
    fanout factor of 2.  With this corrected, the above becomes:
    
    ht 1 node_blocks 2 blockleft 67108863
    ht 2 node_blocks 90 blockleft 67108861
    ht 3 node_blocks 4050 blockleft 67108771
    ht 4 node_blocks 182250 blockleft 67104721
    ht 5 node_blocks 8201250 blockleft 66922471
    final height: 6
    
    Fixes: 9ec69120 ("xfs: compute the maximum height of the rmap btree when reflink enabled")
    Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
    Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
    c0f399ff
xfs_btree.c 131 KB