• Dan Williams's avatar
    md: introduce get_priority_stripe() to improve raid456 write performance · 8b3e6cdc
    Dan Williams authored
    Improve write performance by preventing the delayed_list from dumping all its
    stripes onto the handle_list in one shot.  Delayed stripes are now further
    delayed by being held on the 'hold_list'.  The 'hold_list' is bypassed when:
    
      * a STRIPE_IO_STARTED stripe is found at the head of 'handle_list'
      * 'handle_list' is empty and i/o is being done to satisfy full stripe-width
        write requests
      * 'bypass_count' is less than 'bypass_threshold'.  By default the threshold
        is 1, i.e. every other stripe handled is a preread stripe provided the
        top two conditions are false.
    
    Benchmark data:
    System: 2x Xeon 5150, 4x SATA, mem=1GB
    Baseline: 2.6.24-rc7
    Configuration: mdadm --create /dev/md0 /dev/sd[b-e] -n 4 -l 5 --assume-clean
    Test1: dd if=/dev/zero of=/dev/md0 bs=1024k count=2048
      * patched:  +33% (stripe_cache_size = 256), +25% (stripe_cache_size = 512)
    
    Test2: tiobench --size 2048 --numruns 5 --block 4096 --block 131072 (XFS)
      * patched: +13%
      * patched + preread_bypass_threshold = 0: +37%
    
    Changes since v1:
    * reduce bypass_threshold from (chunk_size / sectors_per_chunk) to (1) and
      make it configurable.  This defaults to fairness and modest performance
      gains out of the box.
    Changes since v2:
    * [neilb@suse.de]: kill STRIPE_PRIO_HI and preread_needed as they are not
      necessary, the important change was clearing STRIPE_DELAYED in
      add_stripe_bio and this has been moved out to make_request for the hang
      fix.
    * [neilb@suse.de]: simplify get_priority_stripe
    * [dan.j.williams@intel.com]: reset the bypass_count when ->hold_list is
      sampled empty (+11%)
    * [dan.j.williams@intel.com]: decrement the bypass_count at the detection
      of stripes being naturally promoted off of hold_list +2%.  Note, resetting
      bypass_count instead of decrementing on these events yields +4% but that is
      probably too aggressive.
    Changes since v3:
    * cosmetic fixups
    Tested-by: default avatarJames W. Laferriere <babydr@baby-dragons.com>
    Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
    Signed-off-by: default avatarNeil Brown <neilb@suse.de>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    8b3e6cdc
md.txt 18.3 KB