Commit 5aa7d1a7 authored by Filipe Manana's avatar Filipe Manana Committed by David Sterba

btrfs: only commit delayed items at fsync if we are logging a directory

When logging an inode we are committing its delayed items if either the
inode is a directory or if it is a new inode, created in the current
transaction.

We need to do it for directories, since new directory indexes are stored
as delayed items of the inode and when logging a directory we need to be
able to access all indexes from the fs/subvolume tree in order to figure
out which index ranges need to be logged.

However for new inodes that are not directories, we do not need to do it
because the only type of delayed item they can have is the inode item, and
we are guaranteed to always log an up to date version of the inode item:

*) for a full fsync we do it by committing the delayed inode and then
   copying the item from the fs/subvolume tree with
   copy_inode_items_to_log();

*) for a fast fsync we always log the inode item based on the contents of
   the in-memory struct btrfs_inode. We guarantee this is always done since
   commit e4545de5 ("Btrfs: fix fsync data loss after append write").

So stop running delayed items for a new inodes that are not directories,
since that forces committing the delayed inode into the fs/subvolume tree,
wasting time and adding contention to the tree when a full fsync is not
required. We will only do it in case a fast fsync is needed.

This patch is part of a series that has the following patches:

1/4 btrfs: only commit the delayed inode when doing a full fsync
2/4 btrfs: only commit delayed items at fsync if we are logging a directory
3/4 btrfs: stop incremening log_batch for the log root tree when syncing log
4/4 btrfs: remove no longer needed use of log_writers for the log root tree

After the entire patchset applied I saw about 12% decrease on max latency
reported by dbench. The test was done on a qemu vm, with 8 cores, 16Gb of
ram, using kvm and using a raw NVMe device directly (no intermediary fs on
the host). The test was invoked like the following:

  mkfs.btrfs -f /dev/sdk
  mount -o ssd -o nospace_cache /dev/sdk /mnt/sdk
  dbench -D /mnt/sdk -t 300 8
  umount /mnt/dsk

CC: stable@vger.kernel.org # 5.4+
Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
parent 8c8648dd
...@@ -5123,7 +5123,6 @@ static int btrfs_log_inode(struct btrfs_trans_handle *trans, ...@@ -5123,7 +5123,6 @@ static int btrfs_log_inode(struct btrfs_trans_handle *trans,
const loff_t end, const loff_t end,
struct btrfs_log_ctx *ctx) struct btrfs_log_ctx *ctx)
{ {
struct btrfs_fs_info *fs_info = root->fs_info;
struct btrfs_path *path; struct btrfs_path *path;
struct btrfs_path *dst_path; struct btrfs_path *dst_path;
struct btrfs_key min_key; struct btrfs_key min_key;
...@@ -5166,15 +5165,17 @@ static int btrfs_log_inode(struct btrfs_trans_handle *trans, ...@@ -5166,15 +5165,17 @@ static int btrfs_log_inode(struct btrfs_trans_handle *trans,
max_key.offset = (u64)-1; max_key.offset = (u64)-1;
/* /*
* Only run delayed items if we are a dir or a new file. * Only run delayed items if we are a directory. We want to make sure
* all directory indexes hit the fs/subvolume tree so we can find them
* and figure out which index ranges have to be logged.
*
* Otherwise commit the delayed inode only if the full sync flag is set, * Otherwise commit the delayed inode only if the full sync flag is set,
* as we want to make sure an up to date version is in the subvolume * as we want to make sure an up to date version is in the subvolume
* tree so copy_inode_items_to_log() / copy_items() can find it and copy * tree so copy_inode_items_to_log() / copy_items() can find it and copy
* it to the log tree. For a non full sync, we always log the inode item * it to the log tree. For a non full sync, we always log the inode item
* based on the in-memory struct btrfs_inode which is always up to date. * based on the in-memory struct btrfs_inode which is always up to date.
*/ */
if (S_ISDIR(inode->vfs_inode.i_mode) || if (S_ISDIR(inode->vfs_inode.i_mode))
inode->generation > fs_info->last_trans_committed)
ret = btrfs_commit_inode_delayed_items(trans, inode); ret = btrfs_commit_inode_delayed_items(trans, inode);
else if (test_bit(BTRFS_INODE_NEEDS_FULL_SYNC, &inode->runtime_flags)) else if (test_bit(BTRFS_INODE_NEEDS_FULL_SYNC, &inode->runtime_flags))
ret = btrfs_commit_inode_delayed_inode(inode); ret = btrfs_commit_inode_delayed_inode(inode);
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment