• Jens Axboe's avatar
    io_uring: drop ctx->uring_lock before acquiring sqd->lock · 009ad9f0
    Jens Axboe authored
    The SQPOLL thread dictates the lock order, and we hold the ctx->uring_lock
    for all the registration opcodes. We also hold a ref to the ctx, and we
    do drop the lock for other reasons to quiesce, so it's fine to drop the
    ctx lock temporarily to grab the sqd->lock. This fixes the following
    lockdep splat:
    
    ======================================================
    WARNING: possible circular locking dependency detected
    5.14.0-syzkaller #0 Not tainted
    ------------------------------------------------------
    syz-executor.5/25433 is trying to acquire lock:
    ffff888023426870 (&sqd->lock){+.+.}-{3:3}, at: io_register_iowq_max_workers fs/io_uring.c:10551 [inline]
    ffff888023426870 (&sqd->lock){+.+.}-{3:3}, at: __io_uring_register fs/io_uring.c:10757 [inline]
    ffff888023426870 (&sqd->lock){+.+.}-{3:3}, at: __do_sys_io_uring_register+0x10aa/0x2e70 fs/io_uring.c:10792
    
    but task is already holding lock:
    ffff8880885b40a8 (&ctx->uring_lock){+.+.}-{3:3}, at: __do_sys_io_uring_register+0x2e1/0x2e70 fs/io_uring.c:10791
    
    which lock already depends on the new lock.
    
    the existing dependency chain (in reverse order) is:
    
    -> #1 (&ctx->uring_lock){+.+.}-{3:3}:
           __mutex_lock_common kernel/locking/mutex.c:596 [inline]
           __mutex_lock+0x131/0x12f0 kernel/locking/mutex.c:729
           __io_sq_thread fs/io_uring.c:7291 [inline]
           io_sq_thread+0x65a/0x1370 fs/io_uring.c:7368
           ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
    
    -> #0 (&sqd->lock){+.+.}-{3:3}:
           check_prev_add kernel/locking/lockdep.c:3051 [inline]
           check_prevs_add kernel/locking/lockdep.c:3174 [inline]
           validate_chain kernel/locking/lockdep.c:3789 [inline]
           __lock_acquire+0x2a07/0x54a0 kernel/locking/lockdep.c:5015
           lock_acquire kernel/locking/lockdep.c:5625 [inline]
           lock_acquire+0x1ab/0x510 kernel/locking/lockdep.c:5590
           __mutex_lock_common kernel/locking/mutex.c:596 [inline]
           __mutex_lock+0x131/0x12f0 kernel/locking/mutex.c:729
           io_register_iowq_max_workers fs/io_uring.c:10551 [inline]
           __io_uring_register fs/io_uring.c:10757 [inline]
           __do_sys_io_uring_register+0x10aa/0x2e70 fs/io_uring.c:10792
           do_syscall_x64 arch/x86/entry/common.c:50 [inline]
           do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
           entry_SYSCALL_64_after_hwframe+0x44/0xae
    
    other info that might help us debug this:
    
     Possible unsafe locking scenario:
    
           CPU0                    CPU1
           ----                    ----
      lock(&ctx->uring_lock);
                                   lock(&sqd->lock);
                                   lock(&ctx->uring_lock);
      lock(&sqd->lock);
    
     *** DEADLOCK ***
    
    Fixes: 2e480058 ("io-wq: provide a way to limit max number of workers")
    Reported-by: syzbot+97fa56483f69d677969f@syzkaller.appspotmail.com
    Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
    009ad9f0
io_uring.c 267 KB