• Guoqing Jiang's avatar
    md: add checkings before flush md_misc_wq · 21e0958e
    Guoqing Jiang authored
    Coly reported possible circular locking dependencyi with LOCKDEP enabled,
    quote the below info from the detailed report [1].
    
    [ 1607.673903] Chain exists of:
    [ 1607.673903]   kn->count#256 --> (wq_completion)md_misc -->
    (work_completion)(&rdev->del_work)
    [ 1607.673903]
    [ 1607.827946]  Possible unsafe locking scenario:
    [ 1607.827946]
    [ 1607.898780]        CPU0                    CPU1
    [ 1607.952980]        ----                    ----
    [ 1608.007173]   lock((work_completion)(&rdev->del_work));
    [ 1608.069690]                                lock((wq_completion)md_misc);
    [ 1608.149887]                                lock((work_completion)(&rdev->del_work));
    [ 1608.242563]   lock(kn->count#256);
    [ 1608.283238]
    [ 1608.283238]  *** DEADLOCK ***
    [ 1608.283238]
    [ 1608.354078] 2 locks held by kworker/5:0/843:
    [ 1608.405152]  #0: ffff8889eecc9948 ((wq_completion)md_misc){+.+.}, at:
    process_one_work+0x42b/0xb30
    [ 1608.512399]  #1: ffff888a1d3b7e10
    ((work_completion)(&rdev->del_work)){+.+.}, at: process_one_work+0x42b/0xb30
    [ 1608.632130]
    
    Since works (rdev->del_work and mddev->del_work) are queued in md_misc_wq,
    then lockdep_map lock is held if either of them are running, then both of
    them try to hold kernfs lock by call kobject_del. Then if new_dev_store
    or array_state_store are triggered by write to the related sysfs node, so
    the write operation gets kernfs lock, but need the lockdep_map because all
    of them would trigger flush_workqueue(md_misc_wq) finally, then the same
    lockdep_map lock is needed.
    
    To suppress the lockdep warnning, we should flush the workqueue in case the
    related work is pending. And several works are attached to md_misc_wq, so
    we need to check which work should be checked:
    
    1. for __md_stop_writes, the purpose of call flush workqueue is ensure sync
    thread is started if it was starting, so check mddev->del_work is pending
    or not since md_start_sync is attached to mddev->del_work.
    
    2. __md_stop flushes md_misc_wq to ensure event_work is done, check the
    event_work is enough. Assume raid_{ctr,dtr} -> md_stop -> __md_stop doesn't
    need the kernfs lock.
    
    3. both new_dev_store (holds kernfs lock) and ADD_NEW_DISK ioctl (holds the
    bdev->bd_mutex) call flush_workqueue to ensure md_delayed_delete has
    completed, this case will be handled in next patch.
    
    4. md_open flushes workqueue to ensure the previous md is disappeared, but
    it holds bdev->bd_mutex then try to flush workqueue, so it is better to
    check mddev->del_work as well to avoid potential lock issue, this will be
    done in another patch.
    
    [1]: https://marc.info/?l=linux-raid&m=158518958031584&w=2
    
    Cc: Coly Li <colyli@suse.de>
    Reported-by: default avatarColy Li <colyli@suse.de>
    Signed-off-by: default avatarGuoqing Jiang <guoqing.jiang@cloud.ionos.com>
    Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
    21e0958e
md.c 256 KB