Commit 8d75483a authored by Chuck Lever's avatar Chuck Lever Committed by Anna Schumaker

xprtrdma: Fix FRWR invalidation error recovery

When ib_post_send() fails, all LOCAL_INV WRs past @bad_wr have to be
examined, and the MRs reset by hand.

I'm not sure how the existing code can work by comparing R_keys.
Restructure the logic so that instead it walks the chain of WRs,
starting from the first bad one.

Make sure to wait for completion if at least one WR was actually
posted. Otherwise, if the ib_post_send fails, we can end up
DMA-unmapping the MR while LOCAL_INV operations are in flight.

Commit 7a89f9c6 ("xprtrdma: Honor ->send_request API contract")
added the rdma_disconnect() call site. The disconnect actually
causes more problems than it solves, and SQ overruns happen only as
a result of software bugs. So remove it.

Fixes: d7a21c1b ("xprtrdma: Reset MRs in frwr_op_unmap_sync()")
Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
parent 431af645
...@@ -521,12 +521,13 @@ frwr_op_unmap_sync(struct rpcrdma_xprt *r_xprt, struct list_head *mws) ...@@ -521,12 +521,13 @@ frwr_op_unmap_sync(struct rpcrdma_xprt *r_xprt, struct list_head *mws)
* unless ri_id->qp is a valid pointer. * unless ri_id->qp is a valid pointer.
*/ */
r_xprt->rx_stats.local_inv_needed++; r_xprt->rx_stats.local_inv_needed++;
bad_wr = NULL;
rc = ib_post_send(ia->ri_id->qp, first, &bad_wr); rc = ib_post_send(ia->ri_id->qp, first, &bad_wr);
if (bad_wr != first)
wait_for_completion(&f->fr_linv_done);
if (rc) if (rc)
goto reset_mrs; goto reset_mrs;
wait_for_completion(&f->fr_linv_done);
/* ORDER: Now DMA unmap all of the MRs, and return /* ORDER: Now DMA unmap all of the MRs, and return
* them to the free MW list. * them to the free MW list.
*/ */
...@@ -543,17 +544,19 @@ frwr_op_unmap_sync(struct rpcrdma_xprt *r_xprt, struct list_head *mws) ...@@ -543,17 +544,19 @@ frwr_op_unmap_sync(struct rpcrdma_xprt *r_xprt, struct list_head *mws)
reset_mrs: reset_mrs:
pr_err("rpcrdma: FRMR invalidate ib_post_send returned %i\n", rc); pr_err("rpcrdma: FRMR invalidate ib_post_send returned %i\n", rc);
rdma_disconnect(ia->ri_id);
/* Find and reset the MRs in the LOCAL_INV WRs that did not /* Find and reset the MRs in the LOCAL_INV WRs that did not
* get posted. This is synchronous, and slow. * get posted.
*/ */
list_for_each_entry(mw, mws, mw_list) { rpcrdma_init_cqcount(&r_xprt->rx_ep, -count);
f = &mw->frmr; while (bad_wr) {
if (mw->mw_handle == bad_wr->ex.invalidate_rkey) { f = container_of(bad_wr, struct rpcrdma_frmr,
__frwr_reset_mr(ia, mw); fr_invwr);
bad_wr = bad_wr->next; mw = container_of(f, struct rpcrdma_mw, frmr);
}
__frwr_reset_mr(ia, mw);
bad_wr = bad_wr->next;
} }
goto unmap; goto unmap;
} }
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment