• Alaa Hleihel's avatar
    IB/mlx5: Fix initializing CQ fragments buffer · 2ba0aa2f
    Alaa Hleihel authored
    The function init_cq_frag_buf() can be called to initialize the current CQ
    fragments buffer cq->buf, or the temporary cq->resize_buf that is filled
    during CQ resize operation.
    
    However, the offending commit started to use function get_cqe() for
    getting the CQEs, the issue with this change is that get_cqe() always
    returns CQEs from cq->buf, which leads us to initialize the wrong buffer,
    and in case of enlarging the CQ we try to access elements beyond the size
    of the current cq->buf and eventually hit a kernel panic.
    
     [exception RIP: init_cq_frag_buf+103]
      [ffff9f799ddcbcd8] mlx5_ib_resize_cq at ffffffffc0835d60 [mlx5_ib]
      [ffff9f799ddcbdb0] ib_resize_cq at ffffffffc05270df [ib_core]
      [ffff9f799ddcbdc0] llt_rdma_setup_qp at ffffffffc0a6a712 [llt]
      [ffff9f799ddcbe10] llt_rdma_cc_event_action at ffffffffc0a6b411 [llt]
      [ffff9f799ddcbe98] llt_rdma_client_conn_thread at ffffffffc0a6bb75 [llt]
      [ffff9f799ddcbec8] kthread at ffffffffa66c5da1
      [ffff9f799ddcbf50] ret_from_fork_nospec_begin at ffffffffa6d95ddd
    
    Fix it by getting the needed CQE by calling mlx5_frag_buf_get_wqe() that
    takes the correct source buffer as a parameter.
    
    Fixes: 388ca8be ("IB/mlx5: Implement fragmented completion queue (CQ)")
    Link: https://lore.kernel.org/r/90a0e8c924093cfa50a482880ad7e7edb73dc19a.1623309971.git.leonro@nvidia.comSigned-off-by: default avatarAlaa Hleihel <alaa@nvidia.com>
    Signed-off-by: default avatarLeon Romanovsky <leonro@nvidia.com>
    Signed-off-by: default avatarJason Gunthorpe <jgg@nvidia.com>
    2ba0aa2f
cq.c 35.3 KB