• Filipe Manana's avatar
    btrfs: do not commit delayed inode when logging a file in full sync mode · f6df27dd
    Filipe Manana authored
    When logging a regular file in full sync mode, we are currently committing
    its delayed inode item. This is to ensure that we never miss copying the
    inode item, with its most up to date data, into the log tree.
    
    However that is not necessary since commit e4545de5 ("Btrfs: fix fsync
    data loss after append write"), because even if we don't find the leaf
    with the inode item when looking for leaves that changed in the current
    transaction, we end up logging the inode item later using the in-memory
    content. In case we find the leaf containing the inode item, we already
    end up using the in-memory inode for filling the inode item in the log
    tree, and not the inode item that is in the fs/subvolume tree, as it
    might be not up to date (copy_items() -> fill_inode_item()).
    
    So don't commit the delayed inode item, which brings a couple of benefits:
    
    1) Avoid writing the inode item to the fs/subvolume btree, saving time and
       reducing lock contention on the btree;
    
    2) In case no other item for the inode was changed, added or deleted in
       the same leaf where the inode item is located, we ended up copying
       all the items in that leaf to the log tree - it's harmless from a
       functional point of view, but it wastes time and log tree space.
    
    This patch is part of a patch set comprised of the following patches:
    
      btrfs: check if a log tree exists at inode_logged()
      btrfs: remove no longer needed checks for NULL log context
      btrfs: do not log new dentries when logging that a new name exists
      btrfs: always update the logged transaction when logging new names
      btrfs: avoid expensive search when dropping inode items from log
      btrfs: add helper to truncate inode items when logging inode
      btrfs: avoid expensive search when truncating inode items from the log
      btrfs: avoid search for logged i_size when logging inode if possible
      btrfs: avoid attempt to drop extents when logging inode for the first time
      btrfs: do not commit delayed inode when logging a file in full sync mode
    
    This is patch 10/10 and the following test results compare a branch with
    the whole patch set applied versus a branch without any of the patches
    applied.
    
    The following script was used to test dbench with 8 and 16 jobs on a
    machine with 12 cores, 64G of RAM, a NVME device and using a non-debug
    kernel config (Debian's default):
    
      $ cat test.sh
      #!/bin/bash
    
      if [ $# -ne 1 ]; then
          echo "Use $0 NUM_JOBS"
          exit 1
      fi
    
      NUM_JOBS=$1
    
      DEV=/dev/nvme0n1
      MNT=/mnt/nvme0n1
      MOUNT_OPTIONS="-o ssd"
      MKFS_OPTIONS="-m single -d single"
    
      echo "performance" | \
          tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
    
      mkfs.btrfs -f $MKFS_OPTIONS $DEV
      mount $MOUNT_OPTIONS $DEV $MNT
    
      dbench -D $MNT -t 120 $NUM_JOBS
    
      umount $MNT
    
    The results were the following:
    
    8 jobs, before patchset:
    
     Operation      Count    AvgLat    MaxLat
     ----------------------------------------
     NTCreateX    4113896     0.009   238.665
     Close        3021699     0.001     0.590
     Rename        174215     0.082   238.733
     Unlink        830977     0.049   238.642
     Deltree           96     2.232     8.022
     Mkdir             48     0.003     0.005
     Qpathinfo    3729013     0.005     2.672
     Qfileinfo     653206     0.001     0.152
     Qfsinfo       683866     0.002     0.526
     Sfileinfo     335055     0.004     1.571
     Find         1441800     0.016     4.288
     WriteX       2049644     0.010     3.982
     ReadX        6449786     0.003     0.969
     LockX          13400     0.002     0.043
     UnlockX        13400     0.001     0.075
     Flush         288349     2.521   245.516
    
    Throughput 1075.73 MB/sec  8 clients  8 procs  max_latency=245.520 ms
    
    8 jobs, after patchset:
    
     Operation      Count    AvgLat    MaxLat
     ----------------------------------------
     NTCreateX    4154282     0.009   156.675
     Close        3051450     0.001     0.843
     Rename        175912     0.072     4.444
     Unlink        839067     0.048    66.050
     Deltree           96     2.131     5.979
     Mkdir             48     0.002     0.004
     Qpathinfo    3765575     0.005     3.079
     Qfileinfo     659582     0.001     0.099
     Qfsinfo       690474     0.002     0.155
     Sfileinfo     338366     0.004     1.419
     Find         1455816     0.016     3.423
     WriteX       2069538     0.010     4.328
     ReadX        6512429     0.003     0.840
     LockX          13530     0.002     0.078
     UnlockX        13530     0.001     0.051
     Flush         291158     2.500   163.468
    
    Throughput 1105.45 MB/sec  8 clients  8 procs  max_latency=163.474 ms
    
    +2.7% throughput, -40.1% max latency
    
    16 jobs, before patchset:
    
     Operation      Count    AvgLat    MaxLat
     ----------------------------------------
     NTCreateX    5457602     0.033   337.098
     Close        4008979     0.002     2.018
     Rename        231051     0.323   254.054
     Unlink       1102209     0.202   337.243
     Deltree          160     6.521    31.720
     Mkdir             80     0.003     0.007
     Qpathinfo    4946147     0.014     6.988
     Qfileinfo     867440     0.001     1.642
     Qfsinfo       907081     0.003     1.821
     Sfileinfo     444433     0.005     2.053
     Find         1912506     0.067     7.854
     WriteX       2724852     0.018     7.428
     ReadX        8553883     0.003     2.059
     LockX          17770     0.003     0.350
     UnlockX        17770     0.002     0.627
     Flush         382533     2.810   353.691
    
    Throughput 1413.09 MB/sec  16 clients  16 procs  max_latency=353.696 ms
    
    16 jobs, after patchset:
    
     Operation      Count    AvgLat    MaxLat
     ----------------------------------------
     NTCreateX    5393156     0.034   303.181
     Close        3961986     0.002     1.502
     Rename        228359     0.320   253.379
     Unlink       1088920     0.206   303.409
     Deltree          160     6.419    30.088
     Mkdir             80     0.003     0.004
     Qpathinfo    48879676     0.015     7.722
     Qfileinfo     857408     0.001     1.651
     Qfsinfo       896343     0.002     2.147
     Sfileinfo     439317     0.005     4.298
     Find         1890018     0.073     8.347
     WriteX       2693356     0.018     6.373
     ReadX        8453485     0.003     3.836
     LockX          17562     0.003     0.486
     UnlockX        17562     0.002     0.635
     Flush         378023     2.802   315.904
    
    Throughput 1454.46 MB/sec  16 clients  16 procs  max_latency=315.910 ms
    
    +2.9% throughput, -11.3% max latency
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    f6df27dd
tree-log.c 179 KB