• Maor Gottlieb's avatar
    RDMA/mlx5: Use xa_lock_irq when access to SRQ table · c3d6057e
    Maor Gottlieb authored
    SRQ table is accessed both from interrupt and process context,
    therefore we must use xa_lock_irq.
    
       inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
       kworker/u17:9/8573   takes:
       ffff8883e3503d30 (&xa->xa_lock#13){?...}-{2:2}, at: mlx5_cmd_get_srq+0x18/0x70 [mlx5_ib]
       {IN-HARDIRQ-W} state was registered at:
         lock_acquire+0xb9/0x3a0
         _raw_spin_lock+0x25/0x30
         srq_event_notifier+0x2b/0xc0 [mlx5_ib]
         notifier_call_chain+0x45/0x70
         __atomic_notifier_call_chain+0x69/0x100
         forward_event+0x36/0xc0 [mlx5_core]
         notifier_call_chain+0x45/0x70
         __atomic_notifier_call_chain+0x69/0x100
         mlx5_eq_async_int+0xc5/0x160 [mlx5_core]
         notifier_call_chain+0x45/0x70
         __atomic_notifier_call_chain+0x69/0x100
         mlx5_irq_int_handler+0x19/0x30 [mlx5_core]
         __handle_irq_event_percpu+0x43/0x2a0
         handle_irq_event_percpu+0x30/0x70
         handle_irq_event+0x34/0x60
         handle_edge_irq+0x7c/0x1b0
         do_IRQ+0x60/0x110
         ret_from_intr+0x0/0x2a
         default_idle+0x34/0x160
         do_idle+0x1ec/0x220
         cpu_startup_entry+0x19/0x20
         start_secondary+0x153/0x1a0
         secondary_startup_64+0xa4/0xb0
       irq event stamp: 20907
       hardirqs last  enabled at (20907):   _raw_spin_unlock_irq+0x24/0x30
       hardirqs last disabled at (20906):   _raw_spin_lock_irq+0xf/0x40
       softirqs last  enabled at (20746):   __do_softirq+0x2c9/0x436
       softirqs last disabled at (20681):   irq_exit+0xb3/0xc0
    
       other info that might help us debug this:
        Possible unsafe locking scenario:
    
              CPU0
              ----
         lock(&xa->xa_lock#13);
         <Interrupt>
           lock(&xa->xa_lock#13);
    
        *** DEADLOCK ***
    
       2 locks held by kworker/u17:9/8573:
        #0: ffff888295218d38 ((wq_completion)mlx5_ib_page_fault){+.+.}-{0:0}, at: process_one_work+0x1f1/0x5f0
        #1: ffff888401647e78 ((work_completion)(&pfault->work)){+.+.}-{0:0}, at: process_one_work+0x1f1/0x5f0
    
       stack backtrace:
       CPU: 0 PID: 8573 Comm: kworker/u17:9 Tainted: GO      5.7.0_for_upstream_min_debug_2020_06_14_11_31_46_41 #1
       Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
       Workqueue: mlx5_ib_page_fault mlx5_ib_eqe_pf_action [mlx5_ib]
       Call Trace:
        dump_stack+0x71/0x9b
        mark_lock+0x4f2/0x590
        ? print_shortest_lock_dependencies+0x200/0x200
        __lock_acquire+0xa00/0x1eb0
        lock_acquire+0xb9/0x3a0
        ? mlx5_cmd_get_srq+0x18/0x70 [mlx5_ib]
        _raw_spin_lock+0x25/0x30
        ? mlx5_cmd_get_srq+0x18/0x70 [mlx5_ib]
        mlx5_cmd_get_srq+0x18/0x70 [mlx5_ib]
        mlx5_ib_eqe_pf_action+0x257/0xa30 [mlx5_ib]
        ? process_one_work+0x209/0x5f0
        process_one_work+0x27b/0x5f0
        ? __schedule+0x280/0x7e0
        worker_thread+0x2d/0x3c0
        ? process_one_work+0x5f0/0x5f0
        kthread+0x111/0x130
        ? kthread_park+0x90/0x90
        ret_from_fork+0x24/0x30
    
    Fixes: e126ba97 ("mlx5: Add driver for Mellanox Connect-IB adapters")
    Link: https://lore.kernel.org/r/20200712102641.15210-1-leon@kernel.orgSigned-off-by: default avatarMaor Gottlieb <maorg@mellanox.com>
    Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
    Signed-off-by: default avatarJason Gunthorpe <jgg@nvidia.com>
    c3d6057e
srq_cmd.c 18.1 KB