• Chuck Lever's avatar
    xprtrdma: Fix occasional transport deadlock · 05eb06d8
    Chuck Lever authored
    Under high I/O workloads, I've noticed that an RPC/RDMA transport
    occasionally deadlocks (IOPS goes to zero, and doesn't recover).
    Diagnosis shows that the sendctx queue is empty, but when sendctxs
    are returned to the queue, the xprt_write_space wake-up never
    occurs. The wake-up logic in rpcrdma_sendctx_put_locked is racy.
    
    I noticed that both EMPTY_SCQ and XPRT_WRITE_SPACE are implemented
    via an atomic bit. Just one of those is sufficient. Removing
    EMPTY_SCQ in favor of the generic bit mechanism makes the deadlock
    un-reproducible.
    
    Without EMPTY_SCQ, rpcrdma_buffer::rb_flags is no longer used and
    is therefore removed.
    
    Unfortunately this patch does not apply cleanly to stable. If
    needed, someone will have to port it and test it.
    
    Fixes: 2fad6592 ("xprtrdma: Wait on empty sendctx queue")
    Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
    Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
    05eb06d8
frwr_ops.c 16.8 KB