• Israel Rukshin's avatar
    nvme-rdma: Fix command completion race at error recovery · c947657b
    Israel Rukshin authored
    The race is between completing the request at error recovery work and
    rdma completions.  If we cancel the request before getting the good
    rdma completion we get a NULL deref of the request MR at
    nvme_rdma_process_nvme_rsp().
    
    When Canceling the request we return its mr to the mr pool (set mr to
    NULL) and also unmap its data.  Canceling the requests while the rdma
    queues are active is not safe.  Because rdma queues are active and we
    get good rdma completions that can use the mr pointer which may be NULL.
    Completing the request too soon may lead also to performing DMA to/from
    user buffers which might have been already unmapped.
    
    The commit fixes the race by draining the QP before starting the abort
    commands mechanism.
    Signed-off-by: default avatarIsrael Rukshin <israelr@mellanox.com>
    Reviewed-by: default avatarMax Gurtovoy <maxg@mellanox.com>
    Signed-off-by: default avatarSagi Grimberg <sagi@grimberg.me>
    Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
    c947657b
rdma.c 52.6 KB