1. 15 Aug, 2018 6 commits
  2. 14 Aug, 2018 8 commits
  3. 13 Aug, 2018 5 commits
    • Jason Gunthorpe's avatar
      IB/uverbs: Do not check for device disassociation during ioctl · 4ce719f8
      Jason Gunthorpe authored
      Now that the ioctl path and uobjects are converted to use uverbs_api, it
      is now safe to remove the disassociation protection from the common ioctl
      code.
      
      This completes the work to make destroy functions continue to work even
      after device disassociation.
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      4ce719f8
    • Jason Gunthorpe's avatar
      IB/uverbs: Remove struct uverbs_root_spec and all supporting code · 51d0a2b4
      Jason Gunthorpe authored
      Everything now uses the uverbs_uapi data structure.
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      51d0a2b4
    • Jason Gunthorpe's avatar
      IB/uverbs: Use uverbs_api to unmarshal ioctl commands · 3a863577
      Jason Gunthorpe authored
      Convert the ioctl method syscall path to use the uverbs_api data
      structures. The new uapi structure includes all the same information, just
      in a different and more optimal way.
      
       - Use attr_bkey instead of 2 level radix trees for everything related to
         attributes. This includes the attribute storage, presence, and
         detection of missing mandatory attributes.
       - Avoid iterating over all attribute storage at finish, instead use
         find_first_bit with the attr_bkey to locate only those attrs that need
         cleanup.
       - Organize things to always run, and always rely on, cleanup. This
         avoids a bunch of tricky error unwind cases.
       - Locate the method using the radix tree, and locate the attributes
         using a very efficient incremental radix tree lookup
       - Use the precomputed destroy_bkey to handle uobject destruction
       - Use the precomputed allocation sizes and precomputed 'need_stack'
         to avoid maths in the fast path. This is optimal if userspace
         does not pass (many) unsupported attributes.
      
      Overall this results in much better codegen for the attribute accessors,
      everything is now stored in bitmaps or linear arrays indexed by attr_bkey.
      The compiler can compute attr_bkey values at compile time for all method
      attributes, meaning things like uverbs_attr_is_valid() now compile into
      single instruction bit tests.
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      3a863577
    • Jason Gunthorpe's avatar
      IB/uverbs: Use uverbs_alloc for allocations · b61815e2
      Jason Gunthorpe authored
      Several handlers need temporary allocations for the life of the method,
      switch them to use the uverbs_alloc allocator.
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      b61815e2
    • Jason Gunthorpe's avatar
      IB/uverbs: Add a simple allocator to uverbs_attr_bundle · 461bb2ee
      Jason Gunthorpe authored
      This is similar in spirit to devm, it keeps track of any allocations
      linked to this method call and ensures they are all freed when the method
      exits. Further, if there is space in the internal/onstack buffer then the
      allocator will hand out that memory and avoid an expensive call to
      kalloc/kfree in the syscall path.
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      461bb2ee
  4. 10 Aug, 2018 5 commits
    • Jason Gunthorpe's avatar
      IB/uverbs: Remove the ib_uverbs_attr pointer from each attr · 6a1f444f
      Jason Gunthorpe authored
      Memory in the bundle is valuable, do not waste it holding an 8 byte
      pointer for the rare case of writing to a PTR_OUT. We can compute the
      pointer by storing a small 1 byte array offset and the base address of the
      uattr memory in the bundle private memory.
      
      This also means we can access the kernel's copy of the ib_uverbs_attr, so
      drop the copy of flags as well.
      
      Since the uattr base should be private bundle information this also
      de-inlines the already too big uverbs_copy_to inline and moves
      create_udata into uverbs_ioctl.c so they can see the private struct
      definition.
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      6a1f444f
    • Jason Gunthorpe's avatar
      IB/uverbs: Provide implementation private memory for the uverbs_attr_bundle · 4b3dd2bb
      Jason Gunthorpe authored
      This already existed as the anonymous 'ctx' structure, but this was not
      really a useful form. Hoist this struct into bundle_priv and rework the
      internal things to use it instead.
      
      Move a bunch of the processing internal state into the priv and reduce the
      excessive use of function arguments.
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      4b3dd2bb
    • Jason Gunthorpe's avatar
      IB/uverbs: Use uverbs_api to manage the object type inside the uobject · 6b0d08f4
      Jason Gunthorpe authored
      Currently the struct uverbs_obj_type stored in the ib_uobject is part of
      the .rodata segment of the module that defines the object. This is a
      problem if drivers define new uapi objects as we will be left with a
      dangling pointer after device disassociation.
      
      Switch the uverbs_obj_type for struct uverbs_api_object, which is
      allocated memory that is part of the uverbs_api and is guaranteed to
      always exist. Further this moves the 'type_class' into this memory which
      means access to the IDR/FD function pointers is also guaranteed. Drivers
      cannot define new types.
      
      This makes it safe to continue to use all uobjects, including driver
      defined ones, after disassociation.
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      6b0d08f4
    • Jason Gunthorpe's avatar
      IB/uverbs: Build the specs into a radix tree at runtime · 9ed3e5f4
      Jason Gunthorpe authored
      This radix tree datastructure is intended to replace the 'hash' structure
      used today for parsing ioctl methods during system calls. This first
      commit introduces the structure and builds it from the existing .rodata
      descriptions.
      
      The so-called hash arrangement is actually a 5 level open coded radix tree.
      This new version uses a 3 level radix tree built using the radix tree
      library.
      
      Overall this is much less code and much easier to build as the radix tree
      API allows for dynamic modification during the building. There is a small
      memory penalty to pay for this, but since the radix tree is allocated on
      a per device basis, a few kb of RAM seems immaterial considering the
      gained simplicity.
      
      The radix tree is similar to the existing tree, but also has a 'attr_bkey'
      concept, which is a small value'd index for each method attribute. This is
      used to simplify and improve performance of everything in the next
      patches.
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Reviewed-by: default avatarMichael J. Ruhl <michael.j.ruhl@intel.com>
      9ed3e5f4
    • Jason Gunthorpe's avatar
      IB/uverbs: Have the core code create the uverbs_root_spec · 7d96c9b1
      Jason Gunthorpe authored
      There is no reason for drivers to do this, the core code should take of
      everything. The drivers will provide their information from rodata to
      describe their modifications to the core's base uapi specification.
      
      The core uses this to build up the runtime uapi for each device.
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      Reviewed-by: default avatarMichael J. Ruhl <michael.j.ruhl@intel.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      7d96c9b1
  5. 09 Aug, 2018 1 commit
  6. 08 Aug, 2018 5 commits
  7. 07 Aug, 2018 1 commit
  8. 03 Aug, 2018 9 commits
    • Jason Gunthorpe's avatar
      IB/ipoib: Consolidate checking of the proposed child interface · 76010976
      Jason Gunthorpe authored
      Move all the checking for pkey and other validity to the __ipoib_vlan_add
      function. This removes the last difference from the control flow
      of the __ipoib_vlan_add to make the overall design simpler to
      understand.
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: default avatarErez Shitrit <erezsh@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      76010976
    • Jason Gunthorpe's avatar
      IB/ipoib: Maintain the child_intfs list from ndo_init/uninit · 13476d35
      Jason Gunthorpe authored
      This fixes a bug in the netlink path where the vlan_rwsem was not
      held around __ipoib_vlan_add causing the child_intfs to be manipulated
      unsafely.
      
      In the process this greatly simplifies the vlan_rwsem write side locking
      to only cover a single non-sleeping statement.
      
      This also further increases the safety of the removal ordering by holding
      the netdev of the parent while the child is active to ensure most bugs
      become either an oops on a NULL priv or a deadlock on the netdev refcount.
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      13476d35
    • Jason Gunthorpe's avatar
      IB/ipoib: Do not remove child devices from within the ndo_uninit · 25405d98
      Jason Gunthorpe authored
      Switching to priv_destructor and needs_free_netdev created a subtle
      ordering problem in ipoib_remove_one.
      
      Now that unregister_netdev frees the netdev and priv we must ensure that
      the children are unregistered before trying to unregister the parent,
      or child unregister will use after free.
      
      The solution is to unregister the children, then parent, in the same batch
      all while holding the rtnl_lock. This closes all the races where a new
      child could have been added and ensures proper ordering.
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      25405d98
    • Jason Gunthorpe's avatar
      IB/ipoib: Get rid of the sysfs_mutex · ee190ab7
      Jason Gunthorpe authored
      This mutex was introduced to deal with the deadlock formed by calling
      unregister_netdev from within the sysfs callback of a netdev.
      
      Now that we have priv_destructor and needs_free_netdev we can switch
      to the more targeted solution of running the unregister from a
      work queue. This avoids the deadlock and gets rid of the mutex.
      
      The next patch in the series needs this mutex eliminated to create
      atomicity of unregisteration.
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      ee190ab7
    • Jason Gunthorpe's avatar
      RDMA/netdev: Use priv_destructor for netdev cleanup · 9f49a5b5
      Jason Gunthorpe authored
      Now that the unregister_netdev flow for IPoIB no longer relies on external
      code we can now introduce the use of priv_destructor and
      needs_free_netdev.
      
      The rdma_netdev flow is switched to use the netdev common priv_destructor
      instead of the special free_rdma_netdev and the IPOIB ULP adjusted:
       - priv_destructor needs to switch to point to the ULP's destructor
         which will then call the rdma_ndev's in the right order
       - We need to be careful around the error unwind of register_netdev
         as it sometimes calls priv_destructor on failure
       - ULPs need to use ndo_init/uninit to ensure proper ordering
         of failures around register_netdev
      
      Switching to priv_destructor is a necessary pre-requisite to using
      the rtnl new_link mechanism.
      
      The VNIC user for rdma_netdev should also be revised, but that is left for
      another patch.
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: default avatarDenis Drozdov <denisd@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      9f49a5b5
    • Jason Gunthorpe's avatar
      IB/ipoib: Move init code to ndo_init · eaeb3984
      Jason Gunthorpe authored
      Now that we have a proper ndo_uninit, move code that naturally pairs
      with the ndo_uninit into ndo_init. This allows the netdev core to natually
      handle ordering.
      
      This fixes the situation where register_netdev can fail before calling
      ndo_init, in which case it wouldn't call ndo_uninit either.
      
      Also move a bunch of duplicated init code that is shared between child
      and parent for clarity. Now the child and parent register functions look
      very similar.
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      eaeb3984
    • Jason Gunthorpe's avatar
      IB/ipoib: Move all uninit code into ndo_uninit · 7cbee87c
      Jason Gunthorpe authored
      Currently uninit is sometimes done twice in error flows, and is sprinkled
      a bit all over the place.
      
      Improve the clarity of the design by moving all uninit only into
      ndo_uinit.
      
      Some duplication is removed:
       - Sometimes IPOIB_STOP_NEIGH_GC was done before unregister, but
         this duplicates the process in ipoib_neigh_hash_init
       - Flushing priv->wq was sometimes done before unregister,
         but that duplicates what has been done in ndo_uninit
      
      Uniniting the IB event queue must remain before unregister_netdev as it
      requires the RTNL lock to be dropped, this is moved to a helper to make
      that flow really clear and remove some duplication in error flows.
      
      If register_netdev fails (and ndo_init is NULL) then it almost always
      calls ndo_uninit, which lets us remove all the extra code from the error
      unwinds. The next patch in the series will close the 'almost always' hole
      by pairing a proper ndo_init with ndo_uninit.
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      7cbee87c
    • Erez Shitrit's avatar
      IB/ipoib: Use cancel_delayed_work_sync for neigh-clean task · cda8daf1
      Erez Shitrit authored
      The neigh_reap_task is self restarting, but so long as we call
      cancel_delayed_work_sync() it will be guaranteed to not be running and
      never start again. Thus we don't need to have the racy
      IPOIB_STOP_NEIGH_GC bit, or the confusing mismatch of places sometimes
      calling flush_workqueue after the cancel.
      
      This fixes a situation where the GC work could have been left running
      in some rare situations.
      Signed-off-by: default avatarErez Shitrit <erezsh@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      cda8daf1
    • Jason Gunthorpe's avatar
      IB/ipoib: Get rid of IPOIB_FLAG_GOING_DOWN · 577e07ff
      Jason Gunthorpe authored
      This essentially duplicates the netdev's reg_state, so just use that
      directly. The reg_state is updated under the rntl_lock, and all places
      using GOING_DOWN already acquire the rtnl_lock so checking is safe.
      
      Since the only place we use GOING_DOWN is for the parent device this
      does not fix any bugs, but it is a step to tidy up the unregister flow
      so that after later patches the flow is uniform and sane.
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      577e07ff