• Leon Romanovsky's avatar
    RDMA/mlx5: Fix access to wrong pointer while performing flush due to error · 950bf4f1
    Leon Romanovsky authored
    The main difference between send and receive SW completions is related to
    separate treatment of WQ queue. For receive completions, the initial index
    to be flushed is stored in "tail", while for send completions, it is in
    deleted "last_poll".
    
      CPU: 54 PID: 53405 Comm: kworker/u161:0 Kdump: loaded Tainted: G           OE    --------- -t - 4.18.0-147.el8.ppc64le #1
      Workqueue: ib-comp-unb-wq ib_cq_poll_work [ib_core]
      NIP:  c000003c7c00a000 LR: c00800000e586af4 CTR: c000003c7c00a000
      REGS: c0000036cc9db940 TRAP: 0400   Tainted: G           OE    --------- -t -  (4.18.0-147.el8.ppc64le)
      MSR:  9000000010009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 24004488  XER: 20040000
      CFAR: c00800000e586af0 IRQMASK: 0
      GPR00: c00800000e586ab4 c0000036cc9dbbc0 c00800000e5f1a00 c0000037d8433800
      GPR04: c000003895a26800 c0000037293f2000 0000000000000201 0000000000000011
      GPR08: c000003895a26c80 c000003c7c00a000 0000000000000000 c00800000ed30438
      GPR12: c000003c7c00a000 c000003fff684b80 c00000000017c388 c00000396ec4be40
      GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
      GPR20: c00000000151e498 0000000000000010 c000003895a26848 0000000000000010
      GPR24: 0000000000000010 0000000000010000 c000003895a26800 0000000000000000
      GPR28: 0000000000000010 c0000037d8433800 c000003895a26c80 c000003895a26800
      NIP [c000003c7c00a000] 0xc000003c7c00a000
      LR [c00800000e586af4] __ib_process_cq+0xec/0x1b0 [ib_core]
      Call Trace:
      [c0000036cc9dbbc0] [c00800000e586ab4] __ib_process_cq+0xac/0x1b0 [ib_core] (unreliable)
      [c0000036cc9dbc40] [c00800000e586c88] ib_cq_poll_work+0x40/0xb0 [ib_core]
      [c0000036cc9dbc70] [c000000000171f44] process_one_work+0x2f4/0x5c0
      [c0000036cc9dbd10] [c000000000172a0c] worker_thread+0xcc/0x760
      [c0000036cc9dbdc0] [c00000000017c52c] kthread+0x1ac/0x1c0
      [c0000036cc9dbe30] [c00000000000b75c] ret_from_kernel_thread+0x5c/0x80
    
    Fixes: 8e3b6883 ("RDMA/mlx5: Delete unreachable handle_atomic code by simplifying SW completion")
    Link: https://lore.kernel.org/r/20200318091640.44069-1-leon@kernel.orgSigned-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
    Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
    950bf4f1
mlx5_ib.h 43.4 KB