• Mahesh Bandewar's avatar
    bonding: avoid possible dead-lock · dddf6a3e
    Mahesh Bandewar authored
    BugLink: https://bugs.launchpad.net/bugs/1801900
    
    [ Upstream commit d4859d74 ]
    
    Syzkaller reported this on a slightly older kernel but it's still
    applicable to the current kernel -
    
    ======================================================
    WARNING: possible circular locking dependency detected
    4.18.0-next-20180823+ #46 Not tainted
    ------------------------------------------------------
    syz-executor4/26841 is trying to acquire lock:
    00000000dd41ef48 ((wq_completion)bond_dev->name){+.+.}, at: flush_workqueue+0x2db/0x1e10 kernel/workqueue.c:2652
    
    but task is already holding lock:
    00000000768ab431 (rtnl_mutex){+.+.}, at: rtnl_lock net/core/rtnetlink.c:77 [inline]
    00000000768ab431 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x412/0xc30 net/core/rtnetlink.c:4708
    
    which lock already depends on the new lock.
    
    the existing dependency chain (in reverse order) is:
    
    -> #2 (rtnl_mutex){+.+.}:
           __mutex_lock_common kernel/locking/mutex.c:925 [inline]
           __mutex_lock+0x171/0x1700 kernel/locking/mutex.c:1073
           mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:1088
           rtnl_lock+0x17/0x20 net/core/rtnetlink.c:77
           bond_netdev_notify drivers/net/bonding/bond_main.c:1310 [inline]
           bond_netdev_notify_work+0x44/0xd0 drivers/net/bonding/bond_main.c:1320
           process_one_work+0xc73/0x1aa0 kernel/workqueue.c:2153
           worker_thread+0x189/0x13c0 kernel/workqueue.c:2296
           kthread+0x35a/0x420 kernel/kthread.c:246
           ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:415
    
    -> #1 ((work_completion)(&(&nnw->work)->work)){+.+.}:
           process_one_work+0xc0b/0x1aa0 kernel/workqueue.c:2129
           worker_thread+0x189/0x13c0 kernel/workqueue.c:2296
           kthread+0x35a/0x420 kernel/kthread.c:246
           ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:415
    
    -> #0 ((wq_completion)bond_dev->name){+.+.}:
           lock_acquire+0x1e4/0x4f0 kernel/locking/lockdep.c:3901
           flush_workqueue+0x30a/0x1e10 kernel/workqueue.c:2655
           drain_workqueue+0x2a9/0x640 kernel/workqueue.c:2820
           destroy_workqueue+0xc6/0x9d0 kernel/workqueue.c:4155
           __alloc_workqueue_key+0xef9/0x1190 kernel/workqueue.c:4138
           bond_init+0x269/0x940 drivers/net/bonding/bond_main.c:4734
           register_netdevice+0x337/0x1100 net/core/dev.c:8410
           bond_newlink+0x49/0xa0 drivers/net/bonding/bond_netlink.c:453
           rtnl_newlink+0xef4/0x1d50 net/core/rtnetlink.c:3099
           rtnetlink_rcv_msg+0x46e/0xc30 net/core/rtnetlink.c:4711
           netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2454
           rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4729
           netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
           netlink_unicast+0x5a0/0x760 net/netlink/af_netlink.c:1343
           netlink_sendmsg+0xa18/0xfc0 net/netlink/af_netlink.c:1908
           sock_sendmsg_nosec net/socket.c:622 [inline]
           sock_sendmsg+0xd5/0x120 net/socket.c:632
           ___sys_sendmsg+0x7fd/0x930 net/socket.c:2115
           __sys_sendmsg+0x11d/0x290 net/socket.c:2153
           __do_sys_sendmsg net/socket.c:2162 [inline]
           __se_sys_sendmsg net/socket.c:2160 [inline]
           __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2160
           do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
           entry_SYSCALL_64_after_hwframe+0x49/0xbe
    
    other info that might help us debug this:
    
    Chain exists of:
      (wq_completion)bond_dev->name --> (work_completion)(&(&nnw->work)->work) --> rtnl_mutex
    
     Possible unsafe locking scenario:
    
           CPU0                    CPU1
           ----                    ----
      lock(rtnl_mutex);
                                   lock((work_completion)(&(&nnw->work)->work));
                                   lock(rtnl_mutex);
      lock((wq_completion)bond_dev->name);
    
     *** DEADLOCK ***
    
    1 lock held by syz-executor4/26841:
    
    stack backtrace:
    CPU: 1 PID: 26841 Comm: syz-executor4 Not tainted 4.18.0-next-20180823+ #46
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
     __dump_stack lib/dump_stack.c:77 [inline]
     dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
     print_circular_bug.isra.34.cold.55+0x1bd/0x27d kernel/locking/lockdep.c:1222
     check_prev_add kernel/locking/lockdep.c:1862 [inline]
     check_prevs_add kernel/locking/lockdep.c:1975 [inline]
     validate_chain kernel/locking/lockdep.c:2416 [inline]
     __lock_acquire+0x3449/0x5020 kernel/locking/lockdep.c:3412
     lock_acquire+0x1e4/0x4f0 kernel/locking/lockdep.c:3901
     flush_workqueue+0x30a/0x1e10 kernel/workqueue.c:2655
     drain_workqueue+0x2a9/0x640 kernel/workqueue.c:2820
     destroy_workqueue+0xc6/0x9d0 kernel/workqueue.c:4155
     __alloc_workqueue_key+0xef9/0x1190 kernel/workqueue.c:4138
     bond_init+0x269/0x940 drivers/net/bonding/bond_main.c:4734
     register_netdevice+0x337/0x1100 net/core/dev.c:8410
     bond_newlink+0x49/0xa0 drivers/net/bonding/bond_netlink.c:453
     rtnl_newlink+0xef4/0x1d50 net/core/rtnetlink.c:3099
     rtnetlink_rcv_msg+0x46e/0xc30 net/core/rtnetlink.c:4711
     netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2454
     rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4729
     netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
     netlink_unicast+0x5a0/0x760 net/netlink/af_netlink.c:1343
     netlink_sendmsg+0xa18/0xfc0 net/netlink/af_netlink.c:1908
     sock_sendmsg_nosec net/socket.c:622 [inline]
     sock_sendmsg+0xd5/0x120 net/socket.c:632
     ___sys_sendmsg+0x7fd/0x930 net/socket.c:2115
     __sys_sendmsg+0x11d/0x290 net/socket.c:2153
     __do_sys_sendmsg net/socket.c:2162 [inline]
     __se_sys_sendmsg net/socket.c:2160 [inline]
     __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2160
     do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
     entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x457089
    Code: fd b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00
    RSP: 002b:00007f2df20a5c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
    RAX: ffffffffffffffda RBX: 00007f2df20a66d4 RCX: 0000000000457089
    RDX: 0000000000000000 RSI: 0000000020000180 RDI: 0000000000000003
    RBP: 0000000000930140 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
    R13: 00000000004d40b8 R14: 00000000004c8ad8 R15: 0000000000000001
    Signed-off-by: default avatarMahesh Bandewar <maheshb@google.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    Signed-off-by: default avatarJuerg Haefliger <juergh@canonical.com>
    Signed-off-by: default avatarKhalid Elmously <khalid.elmously@canonical.com>
    dddf6a3e
bonding.h 18.3 KB