• Filipe Manana's avatar
    btrfs: only commit delayed items at fsync if we are logging a directory · 5aa7d1a7
    Filipe Manana authored
    When logging an inode we are committing its delayed items if either the
    inode is a directory or if it is a new inode, created in the current
    transaction.
    
    We need to do it for directories, since new directory indexes are stored
    as delayed items of the inode and when logging a directory we need to be
    able to access all indexes from the fs/subvolume tree in order to figure
    out which index ranges need to be logged.
    
    However for new inodes that are not directories, we do not need to do it
    because the only type of delayed item they can have is the inode item, and
    we are guaranteed to always log an up to date version of the inode item:
    
    *) for a full fsync we do it by committing the delayed inode and then
       copying the item from the fs/subvolume tree with
       copy_inode_items_to_log();
    
    *) for a fast fsync we always log the inode item based on the contents of
       the in-memory struct btrfs_inode. We guarantee this is always done since
       commit e4545de5 ("Btrfs: fix fsync data loss after append write").
    
    So stop running delayed items for a new inodes that are not directories,
    since that forces committing the delayed inode into the fs/subvolume tree,
    wasting time and adding contention to the tree when a full fsync is not
    required. We will only do it in case a fast fsync is needed.
    
    This patch is part of a series that has the following patches:
    
    1/4 btrfs: only commit the delayed inode when doing a full fsync
    2/4 btrfs: only commit delayed items at fsync if we are logging a directory
    3/4 btrfs: stop incremening log_batch for the log root tree when syncing log
    4/4 btrfs: remove no longer needed use of log_writers for the log root tree
    
    After the entire patchset applied I saw about 12% decrease on max latency
    reported by dbench. The test was done on a qemu vm, with 8 cores, 16Gb of
    ram, using kvm and using a raw NVMe device directly (no intermediary fs on
    the host). The test was invoked like the following:
    
      mkfs.btrfs -f /dev/sdk
      mount -o ssd -o nospace_cache /dev/sdk /mnt/sdk
      dbench -D /mnt/sdk -t 300 8
      umount /mnt/dsk
    
    CC: stable@vger.kernel.org # 5.4+
    Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    5aa7d1a7
tree-log.c 172 KB