• Filipe Manana's avatar
    btrfs: only commit the delayed inode when doing a full fsync · 8c8648dd
    Filipe Manana authored
    Commit 2c2c452b ("Btrfs: fix fsync when extend references are added
    to an inode") forced a commit of the delayed inode when logging an inode
    in order to ensure we would end up logging the inode item during a full
    fsync. By committing the delayed inode, we updated the inode item in the
    fs/subvolume tree and then later when copying items from leafs modified in
    the current transaction into the log tree (with copy_inode_items_to_log())
    we ended up copying the inode item from the fs/subvolume tree into the log
    tree. Logging an up to date version of the inode item is required to make
    sure at log replay time we get the link count fixup triggered among other
    things (replay xattr deletes, etc). The test case generic/040 from fstests
    exercises the bug which that commit fixed.
    
    However for a fast fsync we don't need to commit the delayed inode because
    we always log an up to date version of the inode item based on the struct
    btrfs_inode we have in-memory. We started doing this for fast fsyncs since
    commit e4545de5 ("Btrfs: fix fsync data loss after append write").
    
    So just stop committing the delayed inode if we are doing a fast fsync,
    we are only wasting time and adding contention on fs/subvolume tree.
    
    This patch is part of a series that has the following patches:
    
    1/4 btrfs: only commit the delayed inode when doing a full fsync
    2/4 btrfs: only commit delayed items at fsync if we are logging a directory
    3/4 btrfs: stop incremening log_batch for the log root tree when syncing log
    4/4 btrfs: remove no longer needed use of log_writers for the log root tree
    
    After the entire patchset applied I saw about 12% decrease on max latency
    reported by dbench. The test was done on a qemu vm, with 8 cores, 16Gb of
    ram, using kvm and using a raw NVMe device directly (no intermediary fs on
    the host). The test was invoked like the following:
    
      mkfs.btrfs -f /dev/sdk
      mount -o ssd -o nospace_cache /dev/sdk /mnt/sdk
      dbench -D /mnt/sdk -t 300 8
      umount /mnt/dsk
    
    CC: stable@vger.kernel.org # 5.4+
    Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    8c8648dd
tree-log.c 172 KB