• Filipe Manana's avatar
    btrfs: use an xarray to track open inodes in a root · 310b2f5d
    Filipe Manana authored
    Currently we use a red black tree (rb-tree) to track the currently open
    inodes of a root (in struct btrfs_root::inode_tree). This however is not
    very efficient when the number of inodes is large since rb-trees are
    binary trees. For example for 100K open inodes, the tree has a depth of
    17. Besides that, inserting into the tree requires navigating through it
    and pulling useless cache lines in the process since the red black tree
    nodes are embedded within the btrfs inode - on the other hand, by being
    embedded, it requires no extra memory allocations.
    
    We can improve this by using an xarray instead, which is efficient when
    indices are densely clustered (such as inode numbers), is more cache
    friendly and behaves like a resizable array, with a much better search
    and insertion complexity than a red black tree. This only has one small
    disadvantage which is that insertion will sometimes require allocating
    memory for the xarray - which may fail (not that often since it uses a
    kmem_cache) - but on the other hand we can reduce the btrfs inode
    structure size by 24 bytes (from 1080 down to 1056 bytes) after removing
    the embedded red black tree node, which after the next patches will allow
    to reduce the size of the structure to 1024 bytes, meaning we will be able
    to store 4 inodes per 4K page instead of 3 inodes.
    
    This change does a straightforward change to use an xarray, and results
    in a transaction abort if we can't allocate memory for the xarray when
    creating an inode - but the next patch changes things so that we don't
    need to abort.
    
    Running the following fs_mark test showed some improvements:
    
        $ cat test.sh
        #!/bin/bash
    
        DEV=/dev/nullb0
        MNT=/mnt/nullb0
        MOUNT_OPTIONS="-o ssd"
        FILES=100000
        THREADS=$(nproc --all)
    
        echo "performance" | \
            tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
    
        mkfs.btrfs -f $DEV
        mount $MOUNT_OPTIONS $DEV $MNT
    
        OPTS="-S 0 -L 5 -n $FILES -s 0 -t $THREADS -k"
        for ((i = 1; i <= $THREADS; i++)); do
            OPTS="$OPTS -d $MNT/d$i"
        done
    
        fs_mark $OPTS
    
        umount $MNT
    
    Before this patch:
    
        FSUse%        Count         Size    Files/sec     App Overhead
            10      1200000            0      92081.6         12505547
            16      2400000            0     138222.6         13067072
            23      3600000            0     148833.1         13290336
            43      4800000            0      97864.7         13931248
            53      6000000            0      85597.3         14384313
    
    After this patch:
    
        FSUse%        Count         Size    Files/sec     App Overhead
            10      1200000            0      93225.1         12571078
            16      2400000            0     146720.3         12805007
            23      3600000            0     160626.4         13073835
            46      4800000            0     116286.2         13802927
            53      6000000            0      90087.9         14754892
    
    The test was run with a release kernel config (Debian's default config).
    
    Also capturing the insertion times into the rb tree and into the xarray,
    that is measuring the duration of the old function inode_tree_add() and
    the duration of the new btrfs_add_inode_to_root() function, gave the
    following results (in nanoseconds):
    
    Before this patch, inode_tree_add() execution times:
    
       Count: 5000000
       Range:  0.000 - 5536887.000; Mean: 775.674; Median: 729.000; Stddev: 4820.961
       Percentiles:  90th: 1015.000; 95th: 1139.000; 99th: 1397.000
             0.000 -       7.816:      40 |
             7.816 -      37.858:     209 |
            37.858 -     170.278:    6059 |
           170.278 -     753.961: 2754890 #####################################################
           753.961 -    3326.728: 2232312 ###########################################
          3326.728 -   14667.018:    4366 |
         14667.018 -   64652.943:     852 |
         64652.943 -  284981.761:     550 |
        284981.761 - 1256150.914:     221 |
       1256150.914 - 5536887.000:       7 |
    
    After this patch, btrfs_add_inode_to_root() execution times:
    
       Count: 5000000
       Range:  0.000 - 2900652.000; Mean: 272.148; Median: 241.000; Stddev: 2873.369
       Percentiles:  90th: 342.000; 95th: 432.000; 99th: 572.000
            0.000 -       7.264:     104 |
            7.264 -      33.145:     352 |
           33.145 -     140.081:  109606 #
          140.081 -     581.930: 4840090 #####################################################
          581.930 -    2407.590:   43532 |
         2407.590 -    9950.979:    2245 |
         9950.979 -   41119.278:     514 |
        41119.278 -  169902.616:     155 |
       169902.616 -  702018.539:      47 |
       702018.539 - 2900652.000:       9 |
    
    Average, percentiles, standard deviation, etc, are all much better.
    Reviewed-by: default avatarQu Wenruo <wqu@suse.com>
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    310b2f5d
inode.c 313 KB