• Maor Gottlieb's avatar
    RDMA/mlx5: Add missing srcu_read_lock in ODP implicit flow · d4d7f596
    Maor Gottlieb authored
    According to the locking scheme, mlx5_ib_update_xlt() should be called
    with srcu_read_lock(dev->odp->srcu). Prefetch missed this. This fixes the
    below WARN from lockdep_assert_held():
    
      WARNING: CPU: 1 PID: 1130 at drivers/infiniband/hw/mlx5/odp.c:132 mlx5_odp_populate_xlt+0x175/0x180 [mlx5_ib]
      Modules linked in: xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter overlay ib_srp scsi_transport_srp rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_umad ib_ipoib ib_cm mlx5_ib ib_uverbs ib_core mlx5_core mlxfw ptp pps_core
      CPU: 1 PID: 1130 Comm: kworker/u16:11 Tainted: G        W 5.8.0-rc5_for_upstream_debug_2020_07_13_11_04 #1
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
      Workqueue: events_unbound mlx5_ib_prefetch_mr_work [mlx5_ib]
      RIP: 0010:mlx5_odp_populate_xlt+0x175/0x180 [mlx5_ib]
      Code: 08 e2 85 c0 0f 84 65 ff ff ff 49 8b 87 60 01 00 00 be ff ff ff ff 48 8d b8 b0 39 00 00 e8 93 e0 50 e1 85 c0 0f 85 45 ff ff ff <0f> 0b e9 3e ff ff ff 0f 0b eb c7 0f 1f 44 00 00 48 8b 87 98 0f 00
      RSP: 0018:ffff88840f44fc68 EFLAGS: 00010246
      RAX: 0000000000000000 RBX: ffff88840cc9d000 RCX: ffff88840efcd940
      RDX: 0000000000000000 RSI: ffff88844871b9b0 RDI: ffff88840efce100
      RBP: ffff88840cc9d040 R08: 0000000000000040 R09: 0000000000000001
      R10: ffff88846ced3068 R11: 0000000000000000 R12: 00000000000156ec
      R13: 0000000000000004 R14: 0000000000000004 R15: ffff888439941000
      FS:  0000000000000000(0000) GS:ffff88846fa80000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f8536d12430 CR3: 0000000437a5e006 CR4: 0000000000360ea0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       mlx5_ib_update_xlt+0x37c/0x7c0 [mlx5_ib]
       pagefault_mr+0x315/0x440 [mlx5_ib]
       mlx5_ib_prefetch_mr_work+0x56/0xa0 [mlx5_ib]
       process_one_work+0x215/0x5c0
       worker_thread+0x3c/0x380
       ? process_one_work+0x5c0/0x5c0
       kthread+0x133/0x150
       ? kthread_park+0x90/0x90
       ret_from_fork+0x1f/0x30
    
    Hold the SRCU during prefetch, even though it strictly isn't needed since
    prefetch is holding the num_deferred_work it does make it easier to reason
    about.
    
    Fixes: 5256edcb ("RDMA/mlx5: Rework implicit ODP destroy")
    Link: https://lore.kernel.org/r/20200719065747.131157-1-leon@kernel.orgSigned-off-by: default avatarMaor Gottlieb <maorg@mellanox.com>
    Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
    Signed-off-by: default avatarJason Gunthorpe <jgg@nvidia.com>
    d4d7f596
verbs.c 77.8 KB