1. 08 Nov, 2019 7 commits
  2. 06 Nov, 2019 20 commits
  3. 05 Nov, 2019 1 commit
  4. 31 Oct, 2019 2 commits
  5. 29 Oct, 2019 1 commit
    • Arnd Bergmann's avatar
      RDMA/hns: Fix build error again · d5b60e26
      Arnd Bergmann authored
      This is not the first attempt to fix building random configurations,
      unfortunately the attempt in commit a07fc0bb ("RDMA/hns: Fix build
      error") caused a new problem when CONFIG_INFINIBAND_HNS_HIP06=m and
      CONFIG_INFINIBAND_HNS_HIP08=y:
      
      drivers/infiniband/hw/hns/hns_roce_main.o:(.rodata+0xe60): undefined reference to `__this_module'
      
      Revert commits a07fc0bb ("RDMA/hns: Fix build error") and
      a3e2d4c7 ("RDMA/hns: remove obsolete Kconfig comment") to get back to
      the previous state, then fix the issues described there differently, by
      adding more specific dependencies: INFINIBAND_HNS can now only be built-in
      if at least one of HNS or HNS3 are built-in, and the individual back-ends
      are only available if that code is reachable from the main driver.
      
      Fixes: a07fc0bb ("RDMA/hns: Fix build error")
      Fixes: a3e2d4c7 ("RDMA/hns: remove obsolete Kconfig comment")
      Fixes: dd74282d ("RDMA/hns: Initialize the PCI device for hip08 RoCE")
      Fixes: 08805fdb ("RDMA/hns: Split hw v1 driver from hns roce driver")
      Link: https://lore.kernel.org/r/20191007211826.3361202-1-arnd@arndb.deSigned-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      d5b60e26
  6. 28 Oct, 2019 9 commits
    • Jason Gunthorpe's avatar
      Merge branch 'odp_rework' into rdma.git for-next · bb3dba33
      Jason Gunthorpe authored
      Jason Gunthorpe says:
      
      ====================
      In order to hoist the interval tree code out of the drivers and into the
      mmu_notifiers it is necessary for the drivers to not use the interval tree
      for other things.
      
      This series replaces the interval tree with an xarray and along the way
      re-aligns all the locking to use a sensible SRCU model where the 'update'
      step is done by modifying an xarray.
      
      The result is overall much simpler and with less locking in the critical
      path. Many functions were reworked for clarity and small details like
      using 'imr' to refer to the implicit MR make the entire code flow here
      more readable.
      
      This also squashes at least two race bugs on its own, and quite possibily
      more that haven't been identified.
      ====================
      
      Merge conflicts with the odp statistics patch resolved.
      
      * branch 'odp_rework':
        RDMA/odp: Remove broken debugging call to invalidate_range
        RDMA/mlx5: Do not race with mlx5_ib_invalidate_range during create and destroy
        RDMA/mlx5: Do not store implicit children in the odp_mkeys xarray
        RDMA/mlx5: Rework implicit ODP destroy
        RDMA/mlx5: Avoid double lookups on the pagefault path
        RDMA/mlx5: Reduce locking in implicit_mr_get_data()
        RDMA/mlx5: Use an xarray for the children of an implicit ODP
        RDMA/mlx5: Split implicit handling from pagefault_mr
        RDMA/mlx5: Set the HW IOVA of the child MRs to their place in the tree
        RDMA/mlx5: Lift implicit_mr_alloc() into the two routines that call it
        RDMA/mlx5: Rework implicit_mr_get_data
        RDMA/mlx5: Delete struct mlx5_priv->mkey_table
        RDMA/mlx5: Use a dedicated mkey xarray for ODP
        RDMA/mlx5: Split sig_err MR data into its own xarray
        RDMA/mlx5: Use SRCU properly in ODP prefetch
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      bb3dba33
    • Jason Gunthorpe's avatar
      RDMA/odp: Remove broken debugging call to invalidate_range · 46870b23
      Jason Gunthorpe authored
      invalidate_range() also obtains the umem_mutex which is being held at this
      point, so if this path were was ever called it would deadlock. Thus
      conclude the debugging never triggers and rework it into a simple WARN_ON
      and leave things as they are.
      
      While here add a note to explain how we could possibly get inconsistent
      page pointers.
      
      Link: https://lore.kernel.org/r/20191009160934.3143-16-jgg@ziepe.caSigned-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      46870b23
    • Jason Gunthorpe's avatar
      RDMA/mlx5: Do not race with mlx5_ib_invalidate_range during create and destroy · 09689703
      Jason Gunthorpe authored
      For creation, as soon as the umem_odp is created the notifier can be
      called, however the underlying MR may not have been setup yet. This would
      cause problems if mlx5_ib_invalidate_range() runs. There is some
      confusing/ulocked/racy code that might by trying to solve this, but
      without locks it isn't going to work right.
      
      Instead trivially solve the problem by short-circuiting the invalidation
      if there are not yet any DMA mapped pages. By definition there is nothing
      to invalidate in this case.
      
      The create code will have the umem fully setup before anything is DMA
      mapped, and npages is fully locked by the umem_mutex.
      
      For destroy, invalidate the entire MR at the HW to stop DMA then DMA unmap
      the pages before destroying the MR. This drives npages to zero and
      prevents similar racing with invalidate while the MR is undergoing
      destruction.
      
      Arguably it would be better if the umem was created after the MR and
      destroyed before, but that would require a big rework of the MR code.
      
      Fixes: 6aec21f6 ("IB/mlx5: Page faults handling infrastructure")
      Link: https://lore.kernel.org/r/20191009160934.3143-15-jgg@ziepe.caReviewed-by: default avatarArtemy Kovalyov <artemyko@mellanox.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      09689703
    • Jason Gunthorpe's avatar
      RDMA/mlx5: Do not store implicit children in the odp_mkeys xarray · d561987f
      Jason Gunthorpe authored
      These mkeys are entirely internal and are never used by the HW for
      page fault. They should also never be used by userspace for prefetch.
      Simplify & optimize things by not including them in the xarray.
      
      Since the prefetch path can now never see a child mkey there is no need
      for the second synchronize_srcu() during imr destroy.
      
      Link: https://lore.kernel.org/r/20191009160934.3143-14-jgg@ziepe.caReviewed-by: default avatarArtemy Kovalyov <artemyko@mellanox.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      d561987f
    • Jason Gunthorpe's avatar
      RDMA/mlx5: Rework implicit ODP destroy · 5256edcb
      Jason Gunthorpe authored
      Use SRCU in a sensible way by removing all MRs in the implicit tree from
      the two xarrays (the update operation), then a synchronize, followed by a
      normal single threaded teardown.
      
      This is only a little unusual from the normal pattern as there can still
      be some work pending in the unbound wq that may also require a workqueue
      flush. This is tracked with a single atomic, consolidating the redundant
      existing atomics and wait queue.
      
      For understand-ability the entire ODP implicit create/destroy flow now
      largely exists in a single pair of functions within odp.c, with a few
      support functions for tearing down an unused child.
      
      Link: https://lore.kernel.org/r/20191009160934.3143-13-jgg@ziepe.caReviewed-by: default avatarArtemy Kovalyov <artemyko@mellanox.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      5256edcb
    • Jason Gunthorpe's avatar
      RDMA/mlx5: Avoid double lookups on the pagefault path · b70d785d
      Jason Gunthorpe authored
      Now that the locking is simplified combine pagefault_implicit_mr() with
      implicit_mr_get_data() so that we sweep over the idx range only once,
      and do the single xlt update at the end, after the child umems are
      setup.
      
      This avoids double iteration/xa_loads plus the sketchy failure path if the
      xa_load() fails.
      
      Link: https://lore.kernel.org/r/20191009160934.3143-12-jgg@ziepe.caReviewed-by: default avatarArtemy Kovalyov <artemyko@mellanox.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      b70d785d
    • Jason Gunthorpe's avatar
      RDMA/mlx5: Reduce locking in implicit_mr_get_data() · 3389baa8
      Jason Gunthorpe authored
      Now that the child MRs are stored in an xarray we can rely on the SRCU
      lock to protect the xa_load and use xa_cmpxchg on the slow allocation path
      to resolve races with concurrent page fault.
      
      This reduces the scope of the critical section of umem_mutex for implicit
      MRs to only cover mlx5_ib_update_xlt, and avoids taking a lock at all if
      the child MR is already in the xarray. This makes it consistent with the
      normal ODP MR critical section for umem_lock, and the locking approach
      used for destroying an unusued implicit child MR.
      
      The MLX5_IB_UPD_XLT_ATOMIC is no longer needed in implicit_get_child_mr()
      since it is no longer called with any locks.
      
      Link: https://lore.kernel.org/r/20191009160934.3143-11-jgg@ziepe.caReviewed-by: default avatarArtemy Kovalyov <artemyko@mellanox.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      3389baa8
    • Jason Gunthorpe's avatar
      RDMA/mlx5: Use an xarray for the children of an implicit ODP · 423f52d6
      Jason Gunthorpe authored
      Currently the child leaves are stored in the shared interval tree and
      every lookup for a child must be done under the interval tree rwsem.
      
      This is further complicated by dropping the rwsem during iteration (ie the
      odp_lookup(), odp_next() pattern), which requires a very tricky an
      difficult to understand locking scheme with SRCU.
      
      Instead reserve the interval tree for the exclusive use of the mmu
      notifier related code in umem_odp.c and give each implicit MR a xarray
      containing all the child MRs.
      
      Since the size of each child is 1GB of VA, a 1 level xarray will index 64G
      of VA, and a 2 level will index 2TB, making xarray a much better
      data structure choice than an interval tree.
      
      The locking properties of xarray will be used in the next patches to
      rework the implicit ODP locking scheme into something simpler.
      
      At this point, the xarray is locked by the implicit MR's umem_mutex, and
      read can also be locked by the odp_srcu.
      
      Link: https://lore.kernel.org/r/20191009160934.3143-10-jgg@ziepe.caReviewed-by: default avatarArtemy Kovalyov <artemyko@mellanox.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      423f52d6
    • Jason Gunthorpe's avatar
      RDMA/mlx5: Split implicit handling from pagefault_mr · 54375e73
      Jason Gunthorpe authored
      The single routine has a very confusing scheme to advance to the next
      child MR when working on an implicit parent. This scheme can only be used
      when working with an implicit parent and must not be triggered when
      working on a normal MR.
      
      Re-arrange things by directly putting all the single-MR stuff into one
      function and calling it in a loop for the implicit case. Simplify some of
      the error handling in the new pagefault_real_mr() to remove unneeded gotos.
      
      Link: https://lore.kernel.org/r/20191009160934.3143-9-jgg@ziepe.caReviewed-by: default avatarArtemy Kovalyov <artemyko@mellanox.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      54375e73