• Filipe Manana's avatar
    btrfs: speedup checking for extent sharedness during fiemap · 12a824dc
    Filipe Manana authored
    One of the most expensive tasks performed during fiemap is to check if
    an extent is shared. This task has two major steps:
    
    1) Check if the data extent is shared. This implies checking the extent
       item in the extent tree, checking delayed references, etc. If we
       find the data extent is directly shared, we terminate immediately;
    
    2) If the data extent is not directly shared (its extent item has a
       refcount of 1), then it may be shared if we have snapshots that share
       subtrees of the inode's subvolume b+tree. So we check if the leaf
       containing the file extent item is shared, then its parent node, then
       the parent node of the parent node, etc, until we reach the root node
       or we find one of them is shared - in which case we stop immediately.
    
    During fiemap we process the extents of a file from left to right, from
    file offset 0 to EOF. This means that we iterate b+tree leaves from left
    to right, and has the implication that we keep repeating that second step
    above several times for the same b+tree path of the inode's subvolume
    b+tree.
    
    For example, if we have two file extent items in leaf X, and the path to
    leaf X is A -> B -> C -> X, then when we try to determine if the data
    extent referenced by the first extent item is shared, we check if the data
    extent is shared - if it's not, then we check if leaf X is shared, if not,
    then we check if node C is shared, if not, then check if node B is shared,
    if not than check if node A is shared. When we move to the next file
    extent item, after determining the data extent is not shared, we repeat
    the checks for X, C, B and A - doing all the expensive searches in the
    extent tree, delayed refs, etc. If we have thousands of tile extents, then
    we keep repeating the sharedness checks for the same paths over and over.
    
    On a file that has no shared extents or only a small portion, it's easy
    to see that this scales terribly with the number of extents in the file
    and the sizes of the extent and subvolume b+trees.
    
    This change eliminates the repeated sharedness check on extent buffers
    by caching the results of the last path used. The results can be used as
    long as no snapshots were created since they were cached (for not shared
    extent buffers) or no roots were dropped since they were cached (for
    shared extent buffers). This greatly reduces the time spent by fiemap for
    files with thousands of extents and/or large extent and subvolume b+trees.
    
    Example performance test:
    
        $ cat fiemap-perf-test.sh
        #!/bin/bash
    
        DEV=/dev/sdi
        MNT=/mnt/sdi
    
        mkfs.btrfs -f $DEV
        mount -o compress=lzo $DEV $MNT
    
        # 40G gives 327680 128K file extents (due to compression).
        xfs_io -f -c "pwrite -S 0xab -b 1M 0 40G" $MNT/foobar
    
        umount $MNT
        mount -o compress=lzo $DEV $MNT
    
        start=$(date +%s%N)
        filefrag $MNT/foobar
        end=$(date +%s%N)
        dur=$(( (end - start) / 1000000 ))
        echo "fiemap took $dur milliseconds (metadata not cached)"
    
        start=$(date +%s%N)
        filefrag $MNT/foobar
        end=$(date +%s%N)
        dur=$(( (end - start) / 1000000 ))
        echo "fiemap took $dur milliseconds (metadata cached)"
    
        umount $MNT
    
    Before this patch:
    
        $ ./fiemap-perf-test.sh
        (...)
        /mnt/sdi/foobar: 327680 extents found
        fiemap took 3597 milliseconds (metadata not cached)
        /mnt/sdi/foobar: 327680 extents found
        fiemap took 2107 milliseconds (metadata cached)
    
    After this patch:
    
        $ ./fiemap-perf-test.sh
        (...)
        /mnt/sdi/foobar: 327680 extents found
        fiemap took 1646 milliseconds (metadata not cached)
        /mnt/sdi/foobar: 327680 extents found
        fiemap took 698 milliseconds (metadata cached)
    
    That's about 2.2x faster when no metadata is cached, and about 3x faster
    when all metadata is cached. On a real filesystem with many other files,
    data, directories, etc, the b+trees will be 2 or 3 levels higher,
    therefore this optimization will have a higher impact.
    
    Several reports of a slow fiemap show up often, the two Link tags below
    refer to two recent reports of such slowness. This patch, together with
    the next ones in the series, is meant to address that.
    
    Link: https://lore.kernel.org/linux-btrfs/21dd32c6-f1f9-f44a-466a-e18fdc6788a7@virtuozzo.com/
    Link: https://lore.kernel.org/linux-btrfs/Ysace25wh5BbLd5f@atmark-techno.com/Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    12a824dc
backref.h 10.8 KB