• Filipe Manana's avatar
    btrfs: do not take the log_mutex of the subvolume when pinning the log · 5522a27e
    Filipe Manana authored
    During a rename we pin the log to make sure no one commits a log that
    reflects an ongoing rename operation, as it might result in a committed
    log where it recorded the unlink of the old name without having recorded
    the new name. However we are taking the subvolume's log_mutex before
    incrementing the log_writers counter, which is not necessary since that
    counter is atomic and we only remove the old name from the log and add
    the new name to the log after we have incremented log_writers, ensuring
    that no one can commit the log after we have removed the old name from
    the log and before we added the new name to the log.
    
    By taking the log_mutex lock we are just adding unnecessary contention on
    the lock, which can become visible for workloads that mix renames with
    fsyncs, writes for files opened with O_SYNC and unlink operations (if the
    inode or its parent were fsynced before in the current transaction).
    
    So just remove the lock and unlock of the subvolume's log_mutex at
    btrfs_pin_log_trans().
    
    Using dbench, which mixes different types of operations that end up taking
    that mutex (fsyncs, renames, unlinks and writes into files opened with
    O_SYNC) revealed some small gains. The following script that calls dbench
    was used:
    
      #!/bin/bash
    
      DEV=/dev/nvme0n1
      MNT=/mnt/btrfs
      MOUNT_OPTIONS="-o ssd -o space_cache=v2"
      MKFS_OPTIONS="-m single -d single"
      THREADS=32
    
      echo "performance" | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
      mkfs.btrfs -f $MKFS_OPTIONS $DEV
      mount $MOUNT_OPTIONS $DEV $MNT
    
      dbench -s -t 600 -D $MNT $THREADS
    
      umount $MNT
    
    The test was run on bare metal, no virtualization, on a box with 12 cores
    (Intel i7-8700), 64Gb of RAM and using a NVMe device, with a kernel
    configuration that is the default of typical distributions (debian in this
    case), without debug options enabled (kasan, kmemleak, slub debug, debug
    of page allocations, lock debugging, etc).
    
    Results before this patch:
    
     Operation      Count    AvgLat    MaxLat
     ----------------------------------------
     NTCreateX    4410848     0.017   738.640
     Close        3240222     0.001     0.834
     Rename        186850     7.478  1272.476
     Unlink        890875     0.128   785.018
     Deltree          128     2.846    12.081
     Mkdir             64     0.002     0.003
     Qpathinfo    3997659     0.009    11.171
     Qfileinfo     701307     0.001     0.478
     Qfsinfo       733494     0.002     1.103
     Sfileinfo     359362     0.004     3.266
     Find         1546226     0.041     4.128
     WriteX       2202803     7.905  1376.989
     ReadX        6917775     0.003     3.887
     LockX          14392     0.002     0.043
     UnlockX        14392     0.001     0.085
     Flush         309225     0.128  1033.936
    
    Throughput 231.555 MB/sec (sync open)  32 clients  32 procs  max_latency=1376.993 ms
    
    Results after this patch:
    
    Operation      Count    AvgLat    MaxLat
     ----------------------------------------
     NTCreateX    4603244     0.017   232.776
     Close        3381299     0.001     1.041
     Rename        194871     7.251  1073.165
     Unlink        929730     0.133   119.233
     Deltree          128     2.871    10.199
     Mkdir             64     0.002     0.004
     Qpathinfo    4171343     0.009    11.317
     Qfileinfo     731227     0.001     1.635
     Qfsinfo       765079     0.002     3.568
     Sfileinfo     374881     0.004     1.220
     Find         1612964     0.041     4.675
     WriteX       2296720     7.569  1178.204
     ReadX        7213633     0.003     3.075
     LockX          14976     0.002     0.076
     UnlockX        14976     0.001     0.061
     Flush         322635     0.102   579.505
    
    Throughput 241.4 MB/sec (sync open)  32 clients  32 procs  max_latency=1178.207 ms
    (+4.3% throughput, -14.4% max latency)
    Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    5522a27e
tree-log.c 172 KB