• Filipe Manana's avatar
    btrfs: continue readahead of siblings even if target node is in memory · 069a2e37
    Filipe Manana authored
    At reada_for_search(), when attempting to readahead a node or leaf's
    siblings, we skip the readahead of the siblings if the node/leaf is
    already in memory. That is probably fine for the READA_FORWARD and
    READA_BACK readahead types, as they are used on contexts where we
    end up reading some consecutive leaves, but usually not the whole btree.
    
    However for a READA_FORWARD_ALWAYS mode, currently only used for full
    send operations, it does not make sense to skip the readahead if the
    target node or leaf is already loaded in memory, since we know the caller
    is visiting every node and leaf of the btree in ascending order.
    
    So change the behaviour to not skip the readahead when the target node is
    already in memory and the readahead mode is READA_FORWARD_ALWAYS.
    
    The following test script was used to measure the improvement on a box
    using an average, consumer grade, spinning disk, with 32GiB of RAM and
    using a non-debug kernel config (Debian's default config).
    
      $ cat test.sh
      #!/bin/bash
    
      DEV=/dev/sdj
      MNT=/mnt/sdj
      MKFS_OPTIONS="--nodesize 16384"     # default, just to be explicit
      MOUNT_OPTIONS="-o max_inline=2048"  # default, just to be explicit
    
      mkfs.btrfs -f $MKFS_OPTIONS $DEV > /dev/null
      mount $MOUNT_OPTIONS $DEV $MNT
    
      # Create files with inline data to make it easier and faster to create
      # large btrees.
      add_files()
      {
          local total=$1
          local start_offset=$2
          local number_jobs=$3
          local total_per_job=$(($total / $number_jobs))
    
          echo "Creating $total new files using $number_jobs jobs"
          for ((n = 0; n < $number_jobs; n++)); do
              (
                  local start_num=$(($start_offset + $n * $total_per_job))
                  for ((i = 1; i <= $total_per_job; i++)); do
                      local file_num=$((start_num + $i))
                      local file_path="$MNT/file_${file_num}"
                      xfs_io -f -c "pwrite -S 0xab 0 2000" $file_path > /dev/null
                      if [ $? -ne 0 ]; then
                          echo "Failed creating file $file_path"
                          break
                      fi
                  done
              ) &
              worker_pids[$n]=$!
          done
    
          wait ${worker_pids[@]}
    
          sync
          echo
          echo "btree node/leaf count: $(btrfs inspect-internal dump-tree -t 5 $DEV | egrep '^(node|leaf) ' | wc -l)"
      }
    
      file_count=2000000
      add_files $file_count 0 4
    
      echo
      echo "Creating snapshot..."
      btrfs subvolume snapshot -r $MNT $MNT/snap1
    
      umount $MNT
    
      echo 3 > /proc/sys/vm/drop_caches
      blockdev --flushbufs $DEV &> /dev/null
      hdparm -F $DEV &> /dev/null
    
      mount $MOUNT_OPTIONS $DEV $MNT
    
      echo
      echo "Testing full send..."
      start=$(date +%s)
      btrfs send $MNT/snap1 > /dev/null
      end=$(date +%s)
      echo
      echo "Full send took $((end - start)) seconds"
    
      umount $MNT
    
    The duration of the full send operations, in seconds, were the following:
    
    Before this change:  85 seconds
    After this change:   76 seconds (-11.2%)
    Reviewed-by: default avatarQu Wenruo <wqu@suse.com>
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    069a2e37
ctree.c 119 KB