- 17 Sep, 2020 1 commit
-
-
Jason Gunthorpe authored
It is currently a bit confusing, but the design is if the handler_mutex is held, and the state is in RDMA_CM_CONNECT, then the state cannot leave RDMA_CM_CONNECT without also serializing with the handler_mutex. Make this clearer by adding a direct assertion, fixing the usage in rdma_connect and generally using READ_ONCE to read the state value. Link: https://lore.kernel.org/r/20200902081122.745412-2-leon@kernel.orgSigned-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
- 16 Sep, 2020 3 commits
-
-
Liu Shixin authored
Use DEFINE_SEQ_ATTRIBUTE macro to simplify the code. Link: https://lore.kernel.org/r/20200916025022.3992627-1-liushixin2@huawei.comSigned-off-by: Liu Shixin <liushixin2@huawei.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
Parav Pandit authored
hw_context stores a pci_device pointer. Use the correct type instead of void * and avoid typecasts. Link: https://lore.kernel.org/r/20200914123528.270382-1-parav@mellanox.comSigned-off-by: Parav Pandit <parav@nvidia.com> Acked-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
Yuval Basson authored
Implement the XRC specific verbs. The additional QP type introduced new logic to the rest of the verbs that now require distinguishing whether a QP has an "RQ" or an "SQ" or both. Link: https://lore.kernel.org/r/20200722102339.30104-1-ybason@marvell.comSigned-off-by: Michal Kalderon <mkalderon@marvell.com> Signed-off-by: Yuval Basson <ybason@marvell.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
- 11 Sep, 2020 21 commits
-
-
Michal Kalderon authored
Alignment of parameters was off by one Link: https://lore.kernel.org/r/20200902165741.8355-9-michal.kalderon@marvell.comSigned-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
Michal Kalderon authored
commit 59e8970b ("RDMA/qedr: Return max inline data in QP query result") changed query_qp max_inline size to return the max roce inline size. When iwarp was introduced, this should have been modified to return the max inline size based on protocol. This size is cached in the device attributes Fixes: 69ad0e7f ("RDMA/qedr: Add support for iWARP in user space") Link: https://lore.kernel.org/r/20200902165741.8355-8-michal.kalderon@marvell.comSigned-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
Michal Kalderon authored
Currently iWARP does not support mtu-change. Notify user when MTU changes that reload of qedr is required for mtu to change. Display the correct active mtu. Fixes: f5b1b177 ("RDMA/qedr: Add iWARP support in existing verbs") Link: https://lore.kernel.org/r/20200902165741.8355-7-michal.kalderon@marvell.comSigned-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
Michal Kalderon authored
MTU change on ethtool is currently not supported for iWARP. Notify qedr driver so that appropriate logging can take place. Link: https://lore.kernel.org/r/20200902165741.8355-6-michal.kalderon@marvell.comSigned-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
Michal Kalderon authored
In iWARP, accept could be called after a QP is already destroyed. In this case an error should be returned and not success. Fixes: 82af6d19 ("RDMA/qedr: Fix synchronization methods and memory leaks in qedr") Link: https://lore.kernel.org/r/20200902165741.8355-5-michal.kalderon@marvell.comSigned-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
Michal Kalderon authored
dev->attr.page_size_caps was used uninitialized when setting device attributes Fixes: ec72fce4 ("qedr: Add support for RoCE HW init") Link: https://lore.kernel.org/r/20200902165741.8355-4-michal.kalderon@marvell.comSigned-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
Michal Kalderon authored
Change the doorbell setting so that the maximum value between the last and current value is set. This is to avoid doorbells being lost. Fixes: a7efd777 ("qedr: Add support for PD,PKEY and CQ verbs") Link: https://lore.kernel.org/r/20200902165741.8355-3-michal.kalderon@marvell.comSigned-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
Michal Kalderon authored
The qedr_qp structure wasn't freed when the protocol was RoCE. kmemleak output when running basic RoCE scenario. unreferenced object 0xffff927ad7e22c00 (size 1024): comm "ib_send_bw", pid 7082, jiffies 4384133693 (age 274.698s) hex dump (first 32 bytes): 00 b0 cd a2 79 92 ff ff 00 3f a1 a2 79 92 ff ff ....y....?..y... 00 ee 5c dd 80 92 ff ff 00 f6 5c dd 80 92 ff ff ..\.......\..... backtrace: [<00000000b2ba0f35>] qedr_create_qp+0xb3/0x6c0 [qedr] [<00000000e85a43dd>] ib_uverbs_handler_UVERBS_METHOD_QP_CREATE+0x555/0xad0 [ib_uverbs] [<00000000fee4d029>] ib_uverbs_cmd_verbs+0xa5a/0xb80 [ib_uverbs] [<000000005d622660>] ib_uverbs_ioctl+0xa4/0x110 [ib_uverbs] [<00000000eb4cdc71>] ksys_ioctl+0x87/0xc0 [<00000000abe6b23a>] __x64_sys_ioctl+0x16/0x20 [<0000000046e7cef4>] do_syscall_64+0x4d/0x90 [<00000000c6948f76>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 1212767e ("qedr: Add wrapping generic structure for qpidr and adjust idr routines.") Link: https://lore.kernel.org/r/20200902165741.8355-2-michal.kalderon@marvell.comSigned-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
Bob Pearson authored
Add work completion opcodes to a new ib_uverbs_wc_opcode enum in ib_user_verbs.h. This plays the same role as ib_uverbs_wr_opcode documenting the opcodes in the user space API. Assigned the IB_WC_XXX opcodes in ib_verbs.h to the IB_UVERBS_WC_XXX where they are defined. This follows the same pattern as the IB_WR_XXX opcodes. This fixes an incorrect value for LSO that had crept in but is not currently being used. Added a missing IB_WR_BIND_MW opcode in ib_verbs.h. Link: https://lore.kernel.org/r/20200903224039.437391-2-rpearson@hpe.comSigned-off-by: Bob Pearson <rpearson@hpe.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
Jason Gunthorpe authored
This is always the same value as IOVA masked by the page size, just use that clearer calculation directly. It is unclear of ocrdma hardware can actually support a true fbo, if so it could use a different algorithm to compute the best page size. Link: https://lore.kernel.org/r/17-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.comSigned-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
Jason Gunthorpe authored
zbva is always false, so fbo is never read. A 'zero-based-virtual-address' is simply IOVA == 0, and the driver already supports this. Link: https://lore.kernel.org/r/16-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.com Acked-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
Jason Gunthorpe authored
For the calls linked to mlx4_ib_umem_calc_optimal_mtt_size() use ib_umem_num_dma_blocks() inside the function, it is just some weird static default. All other places are just using it with PAGE_SIZE, switch to ib_umem_num_dma_blocks(). As this is the last call site, remove ib_umem_num_count(). Link: https://lore.kernel.org/r/15-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.comSigned-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
Jason Gunthorpe authored
This driver always uses PAGE_SIZE. Link: https://lore.kernel.org/r/14-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.comSigned-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
Jason Gunthorpe authored
This driver always uses a DMA array made up of PAGE_SIZE elements, so just use ib_umem_num_dma_blocks(). Since rdma_for_each_dma_block() always iterates exactly ib_umem_num_dma_blocks() there is no need for the early exit check in build_user_pbes(), delete it. Link: https://lore.kernel.org/r/13-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.comSigned-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
Jason Gunthorpe authored
mtr_umem_page_count() does the same thing, replace it with the core code. Also, ib_umem_find_best_pgsz() should always be called to check that the umem meets the page_size requirement. If there is a limited set of page_sizes that work it the pgsz_bitmap should be set to that set. 0 is a failure and the umem cannot be used. Lightly tidy the control flow to implement this flow properly. Link: https://lore.kernel.org/r/12-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.comAcked-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
Jason Gunthorpe authored
ib_umem_page_count() returns the number of 4k entries required for a DMA map, but bnxt_re already computes a variable page size. The correct API to determine the size of the page table array is ib_umem_num_dma_blocks(). Fix the overallocation of the page array in fill_umem_pbl_tbl() when working with larger page sizes by using the right function. Lightly re-organize this function to make it clearer. Replace the other calls to ib_umem_num_pages(). Fixes: d8558251 ("RDMA/bnxt_re: Use core helpers to get aligned DMA address") Link: https://lore.kernel.org/r/11-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.comAcked-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
Jason Gunthorpe authored
The length of the list populated by qedr_populate_pbls() should be calculated using ib_umem_num_dma_blocks() with the same size/shift passed to qedr_populate_pbls(). Link: https://lore.kernel.org/r/10-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.comAcked-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
Jason Gunthorpe authored
This loop is splitting the DMA SGL into pg_shift sized pages, use the core code for this directly. Link: https://lore.kernel.org/r/9-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.comAcked-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
Jason Gunthorpe authored
If ib_umem_find_best_pgsz() returns > PAGE_SIZE then the equation here is not correct. 'start' should be 'virt'. Change it to use the core code for page_num and the canonical calculation of page_shift. Fixes: eb52c033 ("RDMA/i40iw: Use core helpers to get aligned DMA address within a supported page size") Link: https://lore.kernel.org/r/8-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.comSigned-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
Jason Gunthorpe authored
If ib_umem_find_best_pgsz() returns > PAGE_SIZE then the equation here is not correct. 'start' should be 'virt'. Change it to use the core code for page_num and the canonical calculation of page_shift. Fixes: 40ddb3f0 ("RDMA/efa: Use API to get contiguous memory blocks aligned to device supported page size") Link: https://lore.kernel.org/r/7-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.comTested-by: Gal Pressman <galpress@amazon.com> Acked-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
Jason Gunthorpe authored
ib_umem_num_pages() should only be used by things working with the SGL in CPU pages directly. Drivers building DMA lists should use the new ib_num_dma_blocks() which returns the number of blocks rdma_umem_for_each_block() will return. To make this general for DMA drivers requires a different implementation. Computing DMA block count based on umem->address only works if the requested page size is < PAGE_SIZE and/or the IOVA == umem->address. Instead the number of DMA pages should be computed in the IOVA address space, not umem->address. Thus the IOVA has to be stored inside the umem so it can be used for these calculations. For now set it to umem->address by default and fix it up if ib_umem_find_best_pgsz() was called. This allows drivers to be converted to ib_umem_num_dma_blocks() safely. Link: https://lore.kernel.org/r/6-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.comSigned-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
- 09 Sep, 2020 15 commits
-
-
Jason Gunthorpe authored
Generally drivers should be using this core helper to split up the umem into DMA pages. These drivers are all probably wrong in some way to pass PAGE_SIZE in as the HW page size. Either the driver doesn't support other page sizes and it should use 4096, or the driver does support other page sizes and should use ib_umem_find_best_pgsz() to select the best HW pages size of the HW supported set. The only case it could be correct is if the HW has a global setting for PAGE_SIZE set at driver initialization time. Link: https://lore.kernel.org/r/5-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.comSigned-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
Jason Gunthorpe authored
This helper does the same as rdma_for_each_block(), except it works on a umem. This simplifies most of the call sites. Link: https://lore.kernel.org/r/4-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.comAcked-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Acked-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
Jason Gunthorpe authored
The calculation in rdma_find_pg_bit() is fairly complicated, and the function is never called anywhere else. Inline a simpler version into ib_umem_find_best_pgsz() Link: https://lore.kernel.org/r/3-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.comSigned-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
Jason Gunthorpe authored
rdma_for_each_block() makes assumptions about how the SGL is constructed that don't work if the block size is below the page size used to to build the SGL. The rules for umem SGL construction require that the SG's all be PAGE_SIZE aligned and we don't encode the actual byte offset of the VA range inside the SGL using offset and length. So rdma_for_each_block() has no idea where the actual starting/ending point is to compute the first/last block boundary if the starting address should be within a SGL. Fixing the SGL construction turns out to be really hard, and will be the subject of other patches. For now block smaller pages. Fixes: 4a353399 ("RDMA/umem: Add API to find best driver supported page size in an MR") Link: https://lore.kernel.org/r/2-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.comReviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
Jason Gunthorpe authored
It is possible for a single SGL to span an aligned boundary, eg if the SGL is 61440 -> 90112 Then the length is 28672, which currently limits the block size to 32k. With a 32k page size the two covering blocks will be: 32768->65536 and 65536->98304 However, the correct answer is a 128K block size which will span the whole 28672 bytes in a single block. Instead of limiting based on length figure out which high IOVA bits don't change between the start and end addresses. That is the highest useful page size. Fixes: 4a353399 ("RDMA/umem: Add API to find best driver supported page size in an MR") Link: https://lore.kernel.org/r/1-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.comReviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
Leon Romanovsky authored
Change counters to return failure like any other verbs destroy, however this flow shouldn't return error at all. Link: https://lore.kernel.org/r/20200907120921.476363-10-leon@kernel.orgSigned-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
Leon Romanovsky authored
Make this interface symmetrical to other destroy paths. Fixes: a49b1dc7 ("RDMA: Convert destroy_wq to be void") Link: https://lore.kernel.org/r/20200907120921.476363-9-leon@kernel.orgSigned-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
Leon Romanovsky authored
Update XRCD destroy flow to allow command failure. Fixes: 28ad5f65 ("RDMA: Move XRCD to be under ib_core responsibility") Link: https://lore.kernel.org/r/20200907120921.476363-8-leon@kernel.orgSigned-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
Leon Romanovsky authored
Like any other verbs objects, CQ shouldn't fail during destroy, but mlx5_ib didn't follow this contract with mixed IB verbs objects with DEVX. Such mix causes to the situation where FW and kernel are fully interdependent on the reference counting of each side. Kernel verbs and drivers that don't have DEVX flows shouldn't fail. Fixes: e39afe3d ("RDMA: Convert CQ allocations to be under core responsibility") Link: https://lore.kernel.org/r/20200907120921.476363-7-leon@kernel.orgSigned-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
Leon Romanovsky authored
The ib_alloc_cq*() and ib_free_cq*() are solely kernel verbs to manage CQs and doesn't need extra indirection just to call same functions with constant parameter NULL as udata. Link: https://lore.kernel.org/r/20200907120921.476363-6-leon@kernel.orgSigned-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
Leon Romanovsky authored
In similar way to other IB objects, restore the ability to return error on SRQ destroy. Strictly speaking, this change is not necessary, and provided here to ensure a symmetrical interface like other destroy functions. Fixes: 68e326de ("RDMA: Handle SRQ allocations by IB/core") Link: https://lore.kernel.org/r/20200907120921.476363-5-leon@kernel.orgSigned-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
Leon Romanovsky authored
The HW release can fail and leave the system in limbo state, where SRQ is removed from the table, but can't be destroyed later. In every reentry, the initial xa_erase_irq() check will fail. Rewrite the erase logic to keep index, but don't store the entry itself. By doing it, we can safely reinsert entry back in the case of destroy failure. Link: https://lore.kernel.org/r/20200907120921.476363-4-leon@kernel.orgSigned-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
Leon Romanovsky authored
Like any other IB verbs objects, AH are refcounted by ib_core. The release of those objects are controlled by ib_core with promise that AH destroy can't fail. Being SW object for now, this change makes dealloc_ah() to behave like any other destroy IB flows. Fixes: d3456914 ("RDMA: Handle AH allocations by IB/core") Link: https://lore.kernel.org/r/20200907120921.476363-3-leon@kernel.orgSigned-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
Leon Romanovsky authored
The IB verbs objects are counted by the kernel and ib_core ensures that deallocate PD will success so it will be called once all other objects that depends on PD will be released. This is achieved by managing various reference counters on such objects. The mlx5 driver didn't follow this standard flow when allowed DEVX objects that are not managed by ib_core to be interleaved with the ones under ib_core responsibility. In such interleaved scenarios deallocate command can fail and ib_core will leave uobject in internal DB and attempt to clean it later to free resources anyway. This change partially restores returned value from dealloc_pd() for all drivers, but keeping in mind that non-DEVX devices and kernel verbs paths shouldn't fail. Fixes: 21a428a0 ("RDMA: Handle PD allocations by IB/core") Link: https://lore.kernel.org/r/20200907120921.476363-2-leon@kernel.orgSigned-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-
Md Haris Iqbal authored
The rnbd_server module's communication manager (cm) initialization depends on the registration of the "network namespace subsystem" of the RDMA CM agent module. As such, when the kernel is configured to load the rnbd_server and the RDMA cma module during initialization; and if the rnbd_server module is initialized before RDMA cma module, a null ptr dereference occurs during the RDMA bind operation. Call trace: Call Trace: ? xas_load+0xd/0x80 xa_load+0x47/0x80 cma_ps_find+0x44/0x70 rdma_bind_addr+0x782/0x8b0 ? get_random_bytes+0x35/0x40 rtrs_srv_cm_init+0x50/0x80 rtrs_srv_open+0x102/0x180 ? rnbd_client_init+0x6e/0x6e rnbd_srv_init_module+0x34/0x84 ? rnbd_client_init+0x6e/0x6e do_one_initcall+0x4a/0x200 kernel_init_freeable+0x1f1/0x26e ? rest_init+0xb0/0xb0 kernel_init+0xe/0x100 ret_from_fork+0x22/0x30 Modules linked in: CR2: 0000000000000015 All this happens cause the cm init is in the call chain of the module init, which is not a preferred practice. So remove the call to rdma_create_id() from the module init call chain. Instead register rtrs-srv as an ib client, which makes sure that the rdma_create_id() is called only when an ib device is added. Fixes: 9cb83748 ("RDMA/rtrs: server: main functionality") Link: https://lore.kernel.org/r/20200907103106.104530-1-haris.iqbal@cloud.ionos.comReported-by: kernel test robot <rong.a.chen@intel.com> Signed-off-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com> Reviewed-by: Jack Wang <jinpu.wang@cloud.ionos.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-