1. 31 Jul, 2012 5 commits
  2. 19 Jul, 2012 7 commits
    • Shaohua Li's avatar
      raid5: add a per-stripe lock · b17459c0
      Shaohua Li authored
      Add a per-stripe lock to protect stripe specific data. The purpose is to reduce
      lock contention of conf->device_lock.
      
      stripe ->toread, ->towrite are protected by per-stripe lock.  Accessing bio
      list of the stripe is always serialized by this lock, so adding bio to the
      lists (add_stripe_bio()) and removing bio from the lists (like
      ops_run_biofill()) not race.
      
      If bio in ->read, ->written ... list are not shared by multiple stripes, we
      don't need any lock to protect ->read, ->written, because STRIPE_ACTIVE will
      protect them. If the bio are shared,  there are two protections:
      1. bi_phys_segments acts as a reference count
      2. traverse the list uses r5_next_bio, which makes traverse never access bio
      not belonging to the stripe
      
      Let's have an example:
      |  stripe1 |  stripe2    |  stripe3  |
      ...bio1......|bio2|bio3|....bio4.....
      
      stripe2 has 4 bios, when it's finished, it will decrement bi_phys_segments for
      all bios, but only end_bio for bio2 and bio3. bio1->bi_next still points to
      bio2, but this doesn't matter. When stripe1 is finished, it will not touch bio2
      because of r5_next_bio check. Next time stripe1 will end_bio for bio1 and
      stripe3 will end_bio bio4.
      
      before add_stripe_bio() addes a bio to a stripe, we already increament the bio
      bi_phys_segments, so don't worry other stripes release the bio.
      Signed-off-by: default avatarShaohua Li <shli@fusionio.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      b17459c0
    • Shaohua Li's avatar
      raid5: remove unnecessary bitmap write optimization · 7eaf7e8e
      Shaohua Li authored
      Neil pointed out the bitmap write optimization in handle_stripe_clean_event()
      is unnecessary, because the chance one stripe gets written twice in the mean
      time is rare. We can always do a bitmap_startwrite when a write request is
      added to a stripe and bitmap_endwrite after write request is done.  Delete the
      optimization. With it, we can delete some cases of device_lock.
      Signed-off-by: default avatarShaohua Li <shli@fusionio.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      7eaf7e8e
    • Shaohua Li's avatar
      raid5: lockless access raid5 overrided bi_phys_segments · e7836bd6
      Shaohua Li authored
      Raid5 overrides bio->bi_phys_segments, accessing it is with device_lock hold,
      which is unnecessary, We can make it lockless actually.
      Signed-off-by: default avatarShaohua Li <shli@fusionio.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      e7836bd6
    • Shaohua Li's avatar
      raid5: reduce chance release_stripe() taking device_lock · 4eb788df
      Shaohua Li authored
      release_stripe() is a place conf->device_lock is heavily contended. We take the
      lock even stripe count isn't 1, which isn't required.
      Signed-off-by: default avatarShaohua Li <shli@fusionio.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      4eb788df
    • NeilBrown's avatar
      md/raid1: close some possible races on write errors during resync · 58e94ae1
      NeilBrown authored
      commit 4367af55
         md/raid1: clear bad-block record when write succeeds.
      
      Added a 'reschedule_retry' call possibility at the end of
      end_sync_write, but didn't add matching code at the end of
      sync_request_write.  So if the writes complete very quickly, or
      scheduling makes it seem that way, then we can miss rescheduling
      the request and the resync could hang.
      
      Also commit 73d5c38a
          md: avoid races when stopping resync.
      
      Fix a race condition in this same code in end_sync_write but didn't
      make the change in sync_request_write.
      
      This patch updates sync_request_write to fix both of those.
      Patch is suitable for 3.1 and later kernels.
      Reported-by: default avatarAlexander Lyakas <alex.bolshoy@gmail.com>
      Original-version-by: default avatarAlexander Lyakas <alex.bolshoy@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      58e94ae1
    • NeilBrown's avatar
      md: avoid crash when stopping md array races with closing other open fds. · a05b7ea0
      NeilBrown authored
      md will refuse to stop an array if any other fd (or mounted fs) is
      using it.
      When any fs is unmounted of when the last open fd is closed all
      pending IO will be flushed (e.g. sync_blockdev call in __blkdev_put)
      so there will be no pending IO to worry about when the array is
      stopped.
      
      However in order to send the STOP_ARRAY ioctl to stop the array one
      must first get and open fd on the block device.
      If some fd is being used to write to the block device and it is closed
      after mdadm open the block device, but before mdadm issues the
      STOP_ARRAY ioctl, then there will be no last-close on the md device so
      __blkdev_put will not call sync_blockdev.
      
      If this happens, then IO can still be in-flight while md tears down
      the array and bad things can happen (use-after-free and subsequent
      havoc).
      
      So in the case where do_md_stop is being called from an open file
      descriptor, call sync_block after taking the mutex to ensure there
      will be no new openers.
      
      This is needed when setting a read-write device to read-only too.
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarmajianpeng <majianpeng@gmail.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      a05b7ea0
    • NeilBrown's avatar
      md: fix bug in handling of new_data_offset · 25f7fd47
      NeilBrown authored
      commit c6563a8c
          md: add possibility to change data-offset for devices.
      
      introduced a 'new_data_offset' attribute which should normally
      be the same as 'data_offset', but can be explicitly set to a different
      value to allow a reshape operation to move the data.
      
      Unfortunately when the 'data_offset' is explicitly set through
      sysfs, the new_data_offset is not also set, so the two would become
      out-of-sync incorrectly.
      
      One result of this is that trying to set the 'size' after the
      'data_offset' would fail because it is not permitted to set the size
      when the 'data_offset' and 'new_data_offset' are different - as that
      can be confusing.
      Consequently when mdadm tried to do this while assembling an IMSM
      array it would fail.
      
      This bug was introduced in 3.5-rc1.
      Reported-by: default avatarBrian Downing <bdowning@lavos.net>
      Bisected-by: default avatarBrian Downing <bdowning@lavos.net>
      Tested-by: default avatarBrian Downing <bdowning@lavos.net>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      25f7fd47
  3. 14 Jul, 2012 11 commits
  4. 13 Jul, 2012 17 commits