Commit fd89099d authored by Leon Romanovsky's avatar Leon Romanovsky Committed by Jason Gunthorpe

RDMA/mlx5: Issue FW command to destroy SRQ on reentry

The HW release can fail and leave the system in limbo state, where SRQ is
removed from the table, but can't be destroyed later.  In every reentry,
the initial xa_erase_irq() check will fail.

Rewrite the erase logic to keep index, but don't store the entry
itself. By doing it, we can safely reinsert entry back in the case of
destroy failure.

Link: https://lore.kernel.org/r/20200907120921.476363-4-leon@kernel.orgSigned-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: default avatarJason Gunthorpe <jgg@nvidia.com>
parent 9a9ebf8c
...@@ -596,13 +596,22 @@ void mlx5_cmd_destroy_srq(struct mlx5_ib_dev *dev, struct mlx5_core_srq *srq) ...@@ -596,13 +596,22 @@ void mlx5_cmd_destroy_srq(struct mlx5_ib_dev *dev, struct mlx5_core_srq *srq)
struct mlx5_core_srq *tmp; struct mlx5_core_srq *tmp;
int err; int err;
tmp = xa_erase_irq(&table->array, srq->srqn); /* Delete entry, but leave index occupied */
if (!tmp || tmp != srq) tmp = xa_cmpxchg_irq(&table->array, srq->srqn, srq, XA_ZERO_ENTRY, 0);
if (WARN_ON(tmp != srq))
return; return;
err = destroy_srq_split(dev, srq); err = destroy_srq_split(dev, srq);
if (err) if (err) {
/*
* We don't need to check returned result for an error,
* because we are storing in pre-allocated space xarray
* entry and it can't fail at this stage.
*/
xa_cmpxchg_irq(&table->array, srq->srqn, XA_ZERO_ENTRY, srq, 0);
return; return;
}
xa_erase_irq(&table->array, srq->srqn);
mlx5_core_res_put(&srq->common); mlx5_core_res_put(&srq->common);
wait_for_completion(&srq->common.free); wait_for_completion(&srq->common.free);
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment