• NeilBrown's avatar
    md/raid5: fix newly-broken locking in get_active_stripe. · 6d183de4
    NeilBrown authored
    commit 566c09c5 raid5: relieve lock contention in get_active_stripe()
    
    modified the locking in get_active_stripe() reducing the range
    protected by the (highly contended) device_lock.
    Unfortunately it reduced the range too much opening up some races.
    
    One race can occur if get_priority_stripe runs between the
    test on sh->count and device_lock being taken.
    This will mean that sh->lru is not empty while get_active_stripe
    thinks ->count is zero resulting in a 'BUG' firing.
    
    Another race happens if __release_stripe is called immediately
    after sh->count is tested and found to be non-zero.  If STRIPE_HANDLE
    is not set, get_active_stripe should increment ->active_stripes
    when it increments ->count from 0, but as it didn't think it was 0,
    it doesn't.
    
    Extending device_lock to cover the test on sh->count close these
    races.
    
    While we are here, fix the two BUG tests:
     -If count is zero, then lru really must not be empty, or we've
      lock the stripe_head somehow - no other tests are relevant.
     -STRIPE_ON_RELEASE_LIST is completely independent of ->lru so
      testing it is pointless.
    Reported-and-tested-by: default avatarBrassow Jonathan <jbrassow@redhat.com>
    Reviewed-by: default avatarShaohua Li <shli@kernel.org>
    Fixes: 566c09c5Signed-off-by: default avatarNeilBrown <neilb@suse.de>
    6d183de4
raid5.c 197 KB