• Josef Bacik's avatar
    btrfs: delay blkdev_put until after the device remove · 3fa421de
    Josef Bacik authored
    When removing the device we call blkdev_put() on the device once we've
    removed it, and because we have an EXCL open we need to take the
    ->open_mutex on the block device to clean it up.  Unfortunately during
    device remove we are holding the sb writers lock, which results in the
    following lockdep splat:
    
    ======================================================
    WARNING: possible circular locking dependency detected
    5.14.0-rc2+ #407 Not tainted
    ------------------------------------------------------
    losetup/11595 is trying to acquire lock:
    ffff973ac35dd138 ((wq_completion)loop0){+.+.}-{0:0}, at: flush_workqueue+0x67/0x5e0
    
    but task is already holding lock:
    ffff973ac9812c68 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x660 [loop]
    
    which lock already depends on the new lock.
    
    the existing dependency chain (in reverse order) is:
    
    -> #4 (&lo->lo_mutex){+.+.}-{3:3}:
           __mutex_lock+0x7d/0x750
           lo_open+0x28/0x60 [loop]
           blkdev_get_whole+0x25/0xf0
           blkdev_get_by_dev.part.0+0x168/0x3c0
           blkdev_open+0xd2/0xe0
           do_dentry_open+0x161/0x390
           path_openat+0x3cc/0xa20
           do_filp_open+0x96/0x120
           do_sys_openat2+0x7b/0x130
           __x64_sys_openat+0x46/0x70
           do_syscall_64+0x38/0x90
           entry_SYSCALL_64_after_hwframe+0x44/0xae
    
    -> #3 (&disk->open_mutex){+.+.}-{3:3}:
           __mutex_lock+0x7d/0x750
           blkdev_put+0x3a/0x220
           btrfs_rm_device.cold+0x62/0xe5
           btrfs_ioctl+0x2a31/0x2e70
           __x64_sys_ioctl+0x80/0xb0
           do_syscall_64+0x38/0x90
           entry_SYSCALL_64_after_hwframe+0x44/0xae
    
    -> #2 (sb_writers#12){.+.+}-{0:0}:
           lo_write_bvec+0xc2/0x240 [loop]
           loop_process_work+0x238/0xd00 [loop]
           process_one_work+0x26b/0x560
           worker_thread+0x55/0x3c0
           kthread+0x140/0x160
           ret_from_fork+0x1f/0x30
    
    -> #1 ((work_completion)(&lo->rootcg_work)){+.+.}-{0:0}:
           process_one_work+0x245/0x560
           worker_thread+0x55/0x3c0
           kthread+0x140/0x160
           ret_from_fork+0x1f/0x30
    
    -> #0 ((wq_completion)loop0){+.+.}-{0:0}:
           __lock_acquire+0x10ea/0x1d90
           lock_acquire+0xb5/0x2b0
           flush_workqueue+0x91/0x5e0
           drain_workqueue+0xa0/0x110
           destroy_workqueue+0x36/0x250
           __loop_clr_fd+0x9a/0x660 [loop]
           block_ioctl+0x3f/0x50
           __x64_sys_ioctl+0x80/0xb0
           do_syscall_64+0x38/0x90
           entry_SYSCALL_64_after_hwframe+0x44/0xae
    
    other info that might help us debug this:
    
    Chain exists of:
      (wq_completion)loop0 --> &disk->open_mutex --> &lo->lo_mutex
    
     Possible unsafe locking scenario:
    
           CPU0                    CPU1
           ----                    ----
      lock(&lo->lo_mutex);
                                   lock(&disk->open_mutex);
                                   lock(&lo->lo_mutex);
      lock((wq_completion)loop0);
    
     *** DEADLOCK ***
    
    1 lock held by losetup/11595:
     #0: ffff973ac9812c68 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x660 [loop]
    
    stack backtrace:
    CPU: 0 PID: 11595 Comm: losetup Not tainted 5.14.0-rc2+ #407
    Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
    Call Trace:
     dump_stack_lvl+0x57/0x72
     check_noncircular+0xcf/0xf0
     ? stack_trace_save+0x3b/0x50
     __lock_acquire+0x10ea/0x1d90
     lock_acquire+0xb5/0x2b0
     ? flush_workqueue+0x67/0x5e0
     ? lockdep_init_map_type+0x47/0x220
     flush_workqueue+0x91/0x5e0
     ? flush_workqueue+0x67/0x5e0
     ? verify_cpu+0xf0/0x100
     drain_workqueue+0xa0/0x110
     destroy_workqueue+0x36/0x250
     __loop_clr_fd+0x9a/0x660 [loop]
     ? blkdev_ioctl+0x8d/0x2a0
     block_ioctl+0x3f/0x50
     __x64_sys_ioctl+0x80/0xb0
     do_syscall_64+0x38/0x90
     entry_SYSCALL_64_after_hwframe+0x44/0xae
    RIP: 0033:0x7fc21255d4cb
    
    So instead save the bdev and do the put once we've dropped the sb
    writers lock in order to avoid the lockdep recursion.
    Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
    Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
    Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    3fa421de
volumes.h 16.8 KB