Commits · 2ee9bf346fbfd1dad0933b9eb3a4c2c0979b633e · Kirill Smelkov / linux

30 Sep, 2020 3 commits

RDMA/addr: Fix race with netevent_callback()/rdma_addr_cancel() · 2ee9bf34

Jason Gunthorpe authored Sep 30, 2020

This three thread race can result in the work being run once the callback
becomes NULL:

       CPU1                 CPU2                   CPU3
 netevent_callback()
                     process_one_req()       rdma_addr_cancel()
                      [..]
     spin_lock_bh()
  	set_timeout()
     spin_unlock_bh()

						spin_lock_bh()
						list_del_init(&req->list);
						spin_unlock_bh()

		     req->callback = NULL
		     spin_lock_bh()
		       if (!list_empty(&req->list))
                         // Skipped!
		         // cancel_delayed_work(&req->work);
		     spin_unlock_bh()

		    process_one_req() // again
		     req->callback() // BOOM
						cancel_delayed_work_sync()

The solution is to always cancel the work once it is completed so any
in between set_timeout() does not result in it running again.

Cc: stable@vger.kernel.org
Fixes: 44e75052 ("RDMA/rdma_cm: Make rdma_addr_cancel into a fence")
Link: https://lore.kernel.org/r/20200930072007.1009692-1-leon@kernel.orgReported-by: Dan Aloni <dan@kernelim.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

2ee9bf34

RDMA/core: Remove ucontext->closing · a6f0b08d

Jason Gunthorpe authored Sep 29, 2020

Nothing reads this any more, and the reason for its existence has passed
due to the deferred fput() scheme.

Fixes: 8ea1f989 ("drivers/IB,usnic: reduce scope of mmap_sem")
Link: https://lore.kernel.org/r/0-v1-df64ff042436+42-uctx_closing_jgg@nvidia.comReviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

a6f0b08d

RDMA/rtrs: Remove unused field of rtrs_iu · 220aee30

Gioh Kim authored Sep 30, 2020

list field is not used anywhere

Link: https://lore.kernel.org/r/20200930131407.6438-1-gi-oh.kim@clous.ionos.comSigned-off-by: Gioh Kim <gi-oh.kim@cloud.ionos.com>
Acked-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

220aee30

29 Sep, 2020 11 commits

RDMA/hns: Remove unused variables and definitions · cf4c0fb0

Lang Cheng authored Sep 29, 2020

Some code was removed but the variables were still there, and some
parameters have been changed to be queried from firmware. So the
definitions of them are no longer needed.

Fixes: 2a3d923f ("RDMA/hns: Replace magic numbers with #defines")
Fixes: 82e620d9 ("RDMA/hns: Modify the data structure of hns_roce_av")
Fixes: 82547469 ("IB/hns: Implement the add_gid/del_gid and optimize the GIDs management")
Fixes: 21b97f53 ("RDMA/hns: Fixup qp release bug")
Link: https://lore.kernel.org/r/1601371934-40003-1-git-send-email-liweihang@huawei.comSigned-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

cf4c0fb0

RDMA/i40iw: Remove intermediate pointer that points to the same struct · d4f40a1f

Leon Romanovsky authored Sep 26, 2020

There is no real need to have an intermediate pointer for the same struct,
remove it, and use struct directly.

Link: https://lore.kernel.org/r/20200926102450.2966017-11-leon@kernel.orgAcked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

d4f40a1f

RDMA/mthca: Combine special QP struct with mthca QP · 21c2fe94

Leon Romanovsky authored Sep 26, 2020

As preparation for the removal of QP allocation logic, we need to ensure
that ib_core allocates the right amount of memory before a call to the
driver create_qp(). It requires from driver to have the same structs for
all types of QPs.

Link: https://lore.kernel.org/r/20200926102450.2966017-10-leon@kernel.orgSigned-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

21c2fe94

RDMA/drivers: Remove udata check from special QP · b925c555

Leon Romanovsky authored Sep 26, 2020

GSI QP can't be created from the user space, hence the udata check is
always false (udata == NULL). Remove that check and simplify the flow.

Link: https://lore.kernel.org/r/20200926102450.2966017-9-leon@kernel.orgReviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

b925c555

RDMA/core: Align write and ioctl checks of QP types · 5807bb32

Leon Romanovsky authored Sep 26, 2020

The ioctl flow checks that the user provides only a supported list of QP
types, while write flow didn't do it and relied on the driver to check
it. Align those flows to fail as early as possible.

Link: https://lore.kernel.org/r/20200926102450.2966017-8-leon@kernel.orgReviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

5807bb32

RDMA/mlx4: Prepare QP allocation to remove from the driver · 8fd3cd2a

Leon Romanovsky authored Sep 26, 2020

Since all mlx4 QP have same storage type, move the QP allocation to be in
one place. This change is preparation to removal of such allocation from
the driver.

Link: https://lore.kernel.org/r/20200926102450.2966017-7-leon@kernel.orgReviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

8fd3cd2a

RDMA/mlx4: Embed GSI QP into general mlx4_ib QP · 915ec7ed

Leon Romanovsky authored Sep 26, 2020

Refactor the storage struct of mlx4 GSI QP to be embedded in mlx4_ib
QP. This allows to remove internal memory allocation of QP struct which is
hidden inside the mlx4_ib_create_qp() flow.

Link: https://lore.kernel.org/r/20200926102450.2966017-6-leon@kernel.orgReviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

915ec7ed

RDMA/mlx5: Delete not needed GSI QP signal QP type · eebe580f

Leon Romanovsky authored Sep 26, 2020

GSI QP doesn't need signal QP type because it is initialized statically to
zero, which is IB_SIGNAL_ALL_WR also wr->send_flags isn't set too. This
means that the GSI QP signal QP type can be removed.

Link: https://lore.kernel.org/r/20200926102450.2966017-5-leon@kernel.orgReviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

eebe580f

RDMA/mlx5: Change GSI QP to have same creation flow like other QPs · 2dc4d672

Leon Romanovsky authored Sep 26, 2020

There is no reason to have separate create flow for the GSI QP, while
general create_qp routine has all needed checks and ability to allocate
and free the proper struct mlx5_ib_qp.

Link: https://lore.kernel.org/r/20200926102450.2966017-4-leon@kernel.orgReviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

2dc4d672

RDMA/mlx5: Reuse existing fields in parent QP storage object · f8225e34

Leon Romanovsky authored Sep 26, 2020

Remove duplication of mlx5_ib_qp and mlx5_ib_gsi_qp fields. This change
returns the memory footprint of mlx5_ib QP to be as it was before
embedding GSI QP.

Link: https://lore.kernel.org/r/20200926102450.2966017-3-leon@kernel.orgReviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

f8225e34

RDMA/mlx5: Embed GSI QP into general mlx5_ib QP · 0d9aef86

Leon Romanovsky authored Sep 26, 2020

The GSI QPs have different create flow from the regular QPs, but it is not
really needed. Update the code to use mlx5_ib_qp as a storage class for
all outside of GSI calls.

Link: https://lore.kernel.org/r/20200926102450.2966017-2-leon@kernel.orgReviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

0d9aef86

25 Sep, 2020 1 commit

RDMA/mlx5: Fix type warning of sizeof in __mlx5_ib_alloc_counters() · b942fc03

Liu Shixin authored Sep 17, 2020

sizeof() when applied to a pointer typed expression should give the size
of the pointed data, even if the data is a pointer.

Fixes: e1f24a79 ("IB/mlx5: Support congestion related counters")
Link: https://lore.kernel.org/r/20200917081354.2083293-1-liushixin2@huawei.comSigned-off-by: Liu Shixin <liushixin2@huawei.com>
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

b942fc03

24 Sep, 2020 13 commits

RDMA/hns: Support inline data in extented sge space for RC · 30b70788

Weihang Li authored Sep 10, 2020

HIP08 supports RC inline up to size of 32 Bytes, and all data should be
put into SQWQE. For HIP09, this capability is extended to 1024 Bytes, if
length of data is longer than 32 Bytes, they will be filled into extended
sge space.

Link: https://lore.kernel.org/r/1599744069-9968-1-git-send-email-liweihang@huawei.comSigned-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

30b70788

RDMA/hns: Fix missing sq_sig_type when querying QP · 05df4927

Weihang Li authored Sep 19, 2020

The sq_sig_type field should be filled when querying QP, or the users may
get a wrong value.

Fixes: 926a01dc ("RDMA/hns: Add QP operations support for hip08 SoC")
Link: https://lore.kernel.org/r/1600509802-44382-9-git-send-email-liweihang@huawei.comSigned-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

05df4927

RDMA/hns: Fix configuration of ack_req_freq in QPC · fbed9d2b

Weihang Li authored Sep 19, 2020

The hardware will add AckReq flag in BTH header according to the value of
ack_req_freq to request ACK from responder for the packets with this flag.
It should be greater than or equal to lp_pktn_ini instead of using a fixed
value.

Fixes: 7b9bd73e ("RDMA/hns: Fix wrong assignment of lp_pktn_ini in QPC")
Link: https://lore.kernel.org/r/1600509802-44382-8-git-send-email-liweihang@huawei.comSigned-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

fbed9d2b

RDMA/hns: Fix the wrong value of rnr_retry when querying qp · 99fcf825

Wenpeng Liang authored Sep 19, 2020

The rnr_retry returned to the user is not correct, it should be got from
another fields in QPC.

Fixes: bfe86035 ("RDMA/hns: Fix cast from or to restricted __le32 for driver")
Link: https://lore.kernel.org/r/1600509802-44382-7-git-send-email-liweihang@huawei.comSigned-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

99fcf825

RDMA/hns: Solve the overflow of the calc_pg_sz() · 768202a0

Jiaran Zhang authored Sep 19, 2020

calc_pg_sz() may gets a data calculation overflow if the PAGE_SIZE is 64 KB
and hop_num is 2. It is because that all variables involved in calculation
are defined in type of int. So change the type of bt_chunk_size,
buf_chunk_size and obj_per_chunk_default to u64.

Fixes: ba6bb7e9 ("RDMA/hns: Add interfaces to get pf capabilities from firmware")
Link: https://lore.kernel.org/r/1600509802-44382-6-git-send-email-liweihang@huawei.comSigned-off-by: Jiaran Zhang <zhangjiaran@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

768202a0

RDMA/hns: Add check for the validity of sl configuration · 172505cf

Jiaran Zhang authored Sep 19, 2020

According to the RoCE v1 specification, the sl (service level) 0-7 are
mapped directly to priorities 0-7 respectively, sl 8-15 are reserved. The
driver should verify whether the the value of sl is larger than 7, if so,
an exception should be returned.

Fixes: 926a01dc ("RDMA/hns: Add QP operations support for hip08 SoC")
Link: https://lore.kernel.org/r/1600509802-44382-5-git-send-email-liweihang@huawei.comSigned-off-by: Jiaran Zhang <zhangjiaran@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

172505cf

RDMA/hns: Correct typo of hns_roce_create_cq() · c19893fd

Lang Cheng authored Sep 19, 2020

Change "initialze" to "initialize".

Fixes: 8f3e9f3e ("IB/hns: Add code for refreshing CQ CI using TPTR")
Link: https://lore.kernel.org/r/1600509802-44382-4-git-send-email-liweihang@huawei.comSigned-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

c19893fd

RDMA/hns: Add interception for resizing SRQs · 221109e6

Yangyang Li authored Sep 19, 2020

HIP08 doesn't support modifying the maximum number of outstanding WR in an
SRQ. However, the driver does not return a failure message, and users may
mistakenly think that the resizing is executed successfully. So the driver
needs to intercept this operation.

Fixes: ffb1308b ("RDMA/hns: Move SRQ code to the reasonable place")
Link: https://lore.kernel.org/r/1600509802-44382-3-git-send-email-liweihang@huawei.comSigned-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

221109e6

RDMA/hns: Refactor process about opcode in post_send() · 12542f1d

Weihang Li authored Sep 19, 2020

According to the IB specifications, the verbs should return an immediate
error when the users set an unsupported opcode. Furthermore, refactor codes
about opcode in process of post_send to make the difference between opcodes
clearer.

Link: https://lore.kernel.org/r/1600509802-44382-2-git-send-email-liweihang@huawei.comSigned-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

12542f1d

RDMA/hns: Add support for SCCC in size of 64 Bytes · 3cb2c996

Yangyang Li authored Sep 16, 2020

For HIP09, size of SCCC (Soft Congestion Control Context) is increased to
64 Bytes from 32 Bytes. The hardware will get the configuration of SCCC
from driver instead of using a fixed value.

Link: https://lore.kernel.org/r/1600245806-56321-5-git-send-email-liweihang@huawei.comSigned-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

3cb2c996

RDMA/hns: Add support for QPC in size of 512 Bytes · 98912ee8

Wenpeng Liang authored Sep 16, 2020

The new version of RoCEE supports using QPC in size of 256B or 512B, so
that HIP09 can supports new congestion control algorithms by using QPC in
larger size.

Link: https://lore.kernel.org/r/1600245806-56321-4-git-send-email-liweihang@huawei.comSigned-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

98912ee8

RDMA/hns: Add support for CQE in size of 64 Bytes · 09a5f210

Wenpeng Liang authored Sep 16, 2020

The new version of RoCEE supports using CQE in size of 32B or 64B. The
performance of bus can be improved by using larger size of CQE.

Link: https://lore.kernel.org/r/1600245806-56321-3-git-send-email-liweihang@huawei.comSigned-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

09a5f210

RDMA/hns: Add support for EQE in size of 64 Bytes · 247fc16d

Wenpeng Liang authored Sep 16, 2020

The new version of RoCEE supports using CEQE in size of 4B or 64B, AEQE in
size of 16B or 64B. The performance of bus can be improved by using larger
size of EQE.

Link: https://lore.kernel.org/r/1600245806-56321-2-git-send-email-liweihang@huawei.comSigned-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

247fc16d

22 Sep, 2020 9 commits

RDMA/efa: Drop double zeroing for sg_init_table() · 3de3c478

Julia Lawall authored Sep 20, 2020

sg_init_table() zeroes its first argument, so the allocation of that
argument doesn't have to.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression x,n,flags;
@@

x =
- kcalloc
+ kmalloc_array
  (n,sizeof(*x),flags)
...
sg_init_table(x,n)
// </smpl>

Link: https://lore.kernel.org/r/1600601186-7420-6-git-send-email-Julia.Lawall@inria.frSigned-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Acked-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

3de3c478

i40iw: Add support to make destroy QP synchronous · f2334964

Sindhu, Devale authored Sep 16, 2020

Occasionally ib_write_bw crash is seen due to access of a pd object in
i40iw_sc_qp_destroy after it is freed. Destroy qp is not synchronous in
i40iw and thus the iwqp object could be referencing a pd object that is
freed by ib core as a result of successful return from i40iw_destroy_qp.

Wait in i40iw_destroy_qp till all QP references are released and destroy
the QP and its associated resources before returning.  Switch to use the
refcount API vs atomic API for lifetime management of the qp.

 RIP: 0010:i40iw_sc_qp_destroy+0x4b/0x120 [i40iw]
 [...]
 RSP: 0018:ffffb4a7042e3ba8 EFLAGS: 00010002
 RAX: 0000000000000000 RBX: 0000000000000001 RCX: dead000000000122
 RDX: ffffb4a7042e3bac RSI: ffff8b7ef9b1e940 RDI: ffff8b7efbf09080
 RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
 R10: 8080808080808080 R11: 0000000000000010 R12: ffff8b7efbf08050
 R13: 0000000000000001 R14: ffff8b7f15042928 R15: ffff8b7ef9b1e940
 FS:  0000000000000000(0000) GS:ffff8b7f2fa00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000400 CR3: 000000020d60a006 CR4: 00000000001606e0
 Call Trace:
  i40iw_exec_cqp_cmd+0x4d3/0x5c0 [i40iw]
  ? try_to_wake_up+0x1ea/0x5d0
  ? __switch_to_asm+0x40/0x70
  i40iw_process_cqp_cmd+0x95/0xa0 [i40iw]
  i40iw_handle_cqp_op+0x42/0x1a0 [i40iw]
  ? cm_event_handler+0x13c/0x1f0 [iw_cm]
  i40iw_rem_ref+0xa0/0xf0 [i40iw]
  cm_work_handler+0x99c/0xd10 [iw_cm]
  process_one_work+0x1a1/0x360
  worker_thread+0x30/0x380
  ? process_one_work+0x360/0x360
  kthread+0x10c/0x130
  ? kthread_park+0x80/0x80
  ret_from_fork+0x35/0x40

Fixes: d3749841 ("i40iw: add files for iwarp interface")
Link: https://lore.kernel.org/r/20200916131811.2077-1-shiraz.saleem@intel.comReported-by: Kamal Heib <kheib@redhat.com>
Signed-off-by: Sindhu, Devale <sindhu.devale@intel.com>
Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

f2334964

RDMA/efa: Add messages and RDMA read work requests HW stats · b0cff387

Daniel Kranzdorf authored Sep 15, 2020

Add separate stats types for send messages and RDMA read work requests.

Link: https://lore.kernel.org/r/20200915141449.8428-3-galpress@amazon.comSigned-off-by: Daniel Kranzdorf <dkkranzd@amazon.com>
Reviewed-by: Yossi Leybovich <sleybo@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

b0cff387

RDMA/efa: Group keep alive received counter with other SW stats · 215b88ac

Gal Pressman authored Sep 15, 2020

The keep alive received counter is a software stat, keep it grouped with
all other software stats. Since all stored stats are software stats,
remove the efa_sw_stats struct and use efa_stats instead.

Link: https://lore.kernel.org/r/20200915141449.8428-2-galpress@amazon.comReviewed-by: Daniel Kranzdorf <dkkranzd@amazon.com>
Reviewed-by: Yossi Leybovich <sleybo@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

215b88ac

RDMA/restrack: Improve readability in task name management · b09c4d70

Leon Romanovsky authored Sep 22, 2020

Use rdma_restrack_set_name() and rdma_restrack_parent_name() instead of
tricky uses of rdma_restrack_attach_task()/rdma_restrack_uadd().

This uniformly makes all restracks add'd using rdma_restrack_add().

Link: https://lore.kernel.org/r/20200922091106.2152715-6-leon@kernel.orgSigned-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

b09c4d70

RDMA/restrack: Simplify restrack tracking in kernel flows · c34a23c2

Leon Romanovsky authored Sep 22, 2020

Have a single rdma_restrack_add() that adds an entry, there is no reason
to split the user/kernel here, the rdma_restrack_set_task() is responsible
for this difference.

This patch prepares the code to the future requirement of making restrack
is mandatory for managing ib objects.

Link: https://lore.kernel.org/r/20200922091106.2152715-5-leon@kernel.orgSigned-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

c34a23c2

RDMA/restrack: Count references to the verbs objects · 13ef5539

Leon Romanovsky authored Sep 22, 2020

Refactor the restrack code to make sure the kref inside the restrack entry
properly kref's the object in which it is embedded. This slight change is
needed for future conversions of MR and QP which are refcounted before the
release and kfree.

The ideal flow from ib_core perspective as follows:
* Allocate ib_* structure with rdma_zalloc_*.
* Set everything that is known to ib_core to that newly created object.
* Initialize kref with restrack help
* Call to driver specific allocation functions.
* Insert into restrack DB
....
* Return and release restrack with restrack_put.

Largely this means a rdma_restrack_new() should be called near allocating
the containing structure.

Link: https://lore.kernel.org/r/20200922091106.2152715-4-leon@kernel.orgSigned-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

13ef5539

RDMA/mlx5: Don't call to restrack recursively · d7ecab1e

Leon Romanovsky authored Sep 22, 2020

The restrack is going to manage memory of all IB objects and must be
called before object is created. GSI QP in the mlx5_ib separated between
creating dummy interface and HW object beneath. This was achieved by
double call to ib_create_qp().

In order to skip such reentry call to internal driver create_qp code.

Link: https://lore.kernel.org/r/20200922091106.2152715-3-leon@kernel.orgReviewed-by: Mark Zhang <markz@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

d7ecab1e

RDMA/cma: Delete from restrack DB after successful destroy · 60aaeffa

Leon Romanovsky authored Sep 22, 2020

Update the code to have similar destroy pattern like other IB objects.

This change create asymmetry to the rdma_id_private create flow to make
sure that memory is managed by restrack.

Link: https://lore.kernel.org/r/20200922091106.2152715-2-leon@kernel.orgSigned-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

60aaeffa

18 Sep, 2020 3 commits

RDMA/ucma: Rework ucma_migrate_id() to avoid races with destroy · f5449e74

Jason Gunthorpe authored Sep 14, 2020

ucma_destroy_id() assumes that all things accessing the ctx will do so via
the xarray. This assumption violated only in the case the FD is being
closed, then the ctx is reached via the ctx_list. Normally this is OK
since ucma_destroy_id() cannot run concurrenty with release(), however
with ucma_migrate_id() is involved this can violated as the close of the
2nd FD can run concurrently with destroy on the first:

                CPU0                      CPU1
        ucma_destroy_id(fda)
                                  ucma_migrate_id(fda -> fdb)
                                       ucma_get_ctx()
        xa_lock()
         _ucma_find_context()
         xa_erase()
        xa_unlock()
                                       xa_lock()
                                        ctx->file = new_file
                                        list_move()
                                       xa_unlock()
                                      ucma_put_ctx()

                                   ucma_close(fdb)
                                      _destroy_id()
                                      kfree(ctx)

        _destroy_id()
          wait_for_completion()
          // boom, ctx was freed

The ctx->file must be modified under the handler and xa_lock, and prior to
modification the ID must be rechecked that it is still reachable from
cur_file, ie there is no parallel destroy or migrate.

To make this work remove the double locking and streamline the control
flow. The double locking was obsoleted by the handler lock now directly
preventing new uevents from being created, and the ctx_list cannot be read
while holding fgets on both files. Removing the double locking also
removes the need to check for the same file.

Fixes: 88314e4d ("RDMA/cma: add support for rdma_migrate_id()")
Link: https://lore.kernel.org/r/0-v1-05c5a4090305+3a872-ucma_syz_migrate_jgg@nvidia.com
Reported-and-tested-by: syzbot+cc6fc752b3819e082d0c@syzkaller.appspotmail.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

f5449e74

RDMA/mlx5: Clarify what the UMR is for when creating MRs · 8383da3e

Jason Gunthorpe authored Sep 14, 2020

Once a mkey is created it can be modified using UMR. This is desirable for
performance reasons. However, different hardware has restrictions on what
modifications are possible using UMR. Make sense of these checks:

- mlx5_ib_can_reconfig_with_umr() returns true if the access flags can be
altered. Most cases create MRs using 0 access flags (now made clear by
consistent use of set_mkc_access_pd_addr_fields()), but the old logic
here was tormented. Make it clear that this is checking if the current
access_flags can be modified using UMR to different access_flags. It is
always OK to use UMR to change flags that all HW supports.

- mlx5_ib_can_load_pas_with_umr() returns true if UMR can be used to
enable and update the PAS/XLT. Enabling requires updating the entity
size, so UMR ends up completely disabled on this old hardware. Make it
clear why it is disabled. FRWR, ODP and cache always requires
mlx5_ib_can_load_pas_with_umr().

- mlx5_ib_pas_fits_in_mr() is used to tell if an existing MR can be
resized to hold a new PAS list. This only works for cached MR's because
we don't store the PAS list size in other cases.

To be very clear, arrange things so any pre-created MR's in the cache
check the newly requested access_flags before allowing the MR to leave the
cache. If UMR cannot set the required access_flags the cache fails to
create the MR.

This in turn means relaxed ordering and atomic are now correctly blocked
early for implicit ODP on older HW.

Link: https://lore.kernel.org/r/20200914112653.345244-6-leon@kernel.orgSigned-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

8383da3e

RDMA/mlx5: Disable IB_DEVICE_MEM_MGT_EXTENSIONS if IB_WR_REG_MR can't work · 0ec52f01

Jason Gunthorpe authored Sep 14, 2020

set_reg_wr() always fails if !umr_modify_entity_size_disabled because
mlx5_ib_can_use_umr() always fails. Without set_reg_wr() IB_WR_REG_MR
doesn't work and that means the device should not advertise
IB_DEVICE_MEM_MGT_EXTENSIONS.

Fixes: 841b07f9 ("IB/mlx5: Block MR WR if UMR is not possible")
Link: https://lore.kernel.org/r/20200914112653.345244-5-leon@kernel.orgSigned-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

0ec52f01