• Filipe Manana's avatar
    btrfs: improve batch deletion of delayed dir index items · 4bd02d90
    Filipe Manana authored
    Currently we group delayed dir index items for deletion in a single batch
    (single btree operation) as long as they all exist in the same leaf and as
    long as their keys are sequential in the key space. For example if we have
    a leaf that has dir index items with offsets:
    
        2, 3, 4, 6, 7, 10
    
    And we have delayed dir index items for deleting all these indexes, and
    no delayed items for any other index keys in between, then we end up
    deleting in 3 batches:
    
    1) First batch for indexes 2, 3 and 4;
    2) Second batch for indexes 6 and 7;
    3) Third batch for index 10.
    
    This is a waste because we can delete all the index keys in a single
    batch. What matters is that each consecutive delayed index key matches
    each consecutive dir index key in a leaf.
    
    So update the logic at btrfs_batch_delete_items() to check only for a
    key match between delayed dir index items and dir index items in a leaf.
    Also avoid the useless first iteration on comparing the key of the
    first slot to delete with the key of the first delayed item, as it's
    silly since they always match, as the delayed item's key was used for
    the btree search that gave us the path we have.
    
    This is more efficient and reduces runtime of running delayed items, as
    well as lock contention on the subvolume's tree.
    
    For example, the following test script:
    
      $ cat test.sh
      #!/bin/bash
    
      DEV=/dev/sdj
      MNT=/mnt/sdj
    
      mkfs.btrfs -f $DEV
      mount $DEV $MNT
    
      NUM_FILES=1000
    
      mkdir $MNT/testdir
      for ((i = 1; i <= $NUM_FILES; i++)); do
          echo -n > $MNT/testdir/file_$i
      done
    
      # Now delete every other file, to create gaps in the dir index keys.
      for ((i = 1; i <= $NUM_FILES; i += 2)); do
          rm -f $MNT/testdir/file_$i
      done
    
      # Sync to force any delayed items to be flushed to the tree.
      sync
    
      start=$(date +%s%N)
      rm -fr $MNT/testdir
      end=$(date +%s%N)
      dur=$(( (end - start) / 1000000 ))
    
      echo -e "\nrm -fr took $dur milliseconds"
    
      umount $MNT
    
    Running that test script while having the following bpftrace script
    running in another shell:
    
      $ cat bpf-measure.sh
      #!/usr/bin/bpftrace
    
      /* Add 'noinline' to btrfs_delete_delayed_items()'s definition. */
      k:btrfs_delete_delayed_items
      {
          @start_delete_delayed_items[tid] = nsecs;
      }
    
      k:btrfs_del_items
      /@start_delete_delayed_items[tid]/
      {
          @delete_batches = count();
      }
    
      kr:btrfs_delete_delayed_items
      /@start_delete_delayed_items[tid]/
      {
          $dur = (nsecs - @start_delete_delayed_items[tid]) / 1000;
          @btrfs_delete_delayed_items_total_time = sum($dur);
          delete(@start_delete_delayed_items[tid]);
      }
    
    Before this change:
    
    @btrfs_delete_delayed_items_total_time: 9563
    @delete_batches: 1001
    
    After this change:
    
    @btrfs_delete_delayed_items_total_time: 7328
    @delete_batches: 509
    Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    4bd02d90
delayed-inode.c 50.8 KB