• Kashyap Desai's avatar
    RDMA/bnxt_re: Prevent handling any completions after qp destroy · b5bbc655
    Kashyap Desai authored
    HW may generate completions that indicates QP is destroyed.
    Driver should not be scheduling any more completion handlers
    for this QP, after the QP is destroyed. Since CQs are active
    during the QP destroy, driver may still schedule completion
    handlers. This can cause a race where the destroy_cq and poll_cq
    running simultaneously.
    
    Snippet of kernel panic while doing bnxt_re driver load unload in loop.
    This indicates a poll after the CQ is freed. 
    
    [77786.481636] Call Trace:
    [77786.481640]  <TASK>
    [77786.481644]  bnxt_re_poll_cq+0x14a/0x620 [bnxt_re]
    [77786.481658]  ? kvm_clock_read+0x14/0x30
    [77786.481693]  __ib_process_cq+0x57/0x190 [ib_core]
    [77786.481728]  ib_cq_poll_work+0x26/0x80 [ib_core]
    [77786.481761]  process_one_work+0x1e5/0x3f0
    [77786.481768]  worker_thread+0x50/0x3a0
    [77786.481785]  ? __pfx_worker_thread+0x10/0x10
    [77786.481790]  kthread+0xe2/0x110
    [77786.481794]  ? __pfx_kthread+0x10/0x10
    [77786.481797]  ret_from_fork+0x2c/0x50
    
    To avoid this, complete all completion handlers before returning the
    destroy QP. If free_cq is called soon after destroy_qp,  IB stack
    will cancel the CQ work before invoking the destroy_cq verb and
    this will prevent any race mentioned.
    
    Fixes: 1ac5a404 ("RDMA/bnxt_re: Add bnxt_re RoCE driver")
    Signed-off-by: default avatarKashyap Desai <kashyap.desai@broadcom.com>
    Signed-off-by: default avatarSelvin Xavier <selvin.xavier@broadcom.com>
    Link: https://lore.kernel.org/r/1689322969-25402-2-git-send-email-selvin.xavier@broadcom.comSigned-off-by: default avatarLeon Romanovsky <leon@kernel.org>
    b5bbc655
qplib_fp.h 17.4 KB