• Erez Shitrit's avatar
    IB/mlx5: Fetch soft WQE's on fatal error state · 7b74a83c
    Erez Shitrit authored
    On fatal error the driver simulates CQE's for ULPs that rely on
    completion of all their posted work-request.
    
    For the GSI traffic, the mlx5 has its own mechanism that sends the
    completions via software CQE's directly to the relevant CQ.
    
    This should be kept in fatal error too, so the driver should simulate
    such CQE's with the specified error state in order to complete GSI QP
    work requests.
    
    Without the fix the next deadlock might appears:
            schedule_timeout+0x274/0x350
            wait_for_common+0xec/0x240
            mcast_remove_one+0xd0/0x120 [ib_core]
            ib_unregister_device+0x12c/0x230 [ib_core]
            mlx5_ib_remove+0xc4/0x270 [mlx5_ib]
            mlx5_detach_device+0x184/0x1a0 [mlx5_core]
            mlx5_unload_one+0x308/0x340 [mlx5_core]
            mlx5_pci_err_detected+0x74/0xe0 [mlx5_core]
    
    Cc: <stable@vger.kernel.org> # 4.7
    Fixes: 89ea94a7 ("IB/mlx5: Reset flow support for IB kernel ULPs")
    Signed-off-by: default avatarErez Shitrit <erezsh@mellanox.com>
    Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
    Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
    7b74a83c
cq.c 35.9 KB