1. 19 Jul, 2023 4 commits
    • Dave Marchevsky's avatar
      selftests/bpf: Disable newly-added 'owner' field test until refcount re-enabled · f3514a5d
      Dave Marchevsky authored
      The test added in previous patch will fail with bpf_refcount_acquire
      disabled. Until all races are fixed and bpf_refcount_acquire is
      re-enabled on bpf-next, disable the test so CI doesn't complain.
      Signed-off-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      Link: https://lore.kernel.org/r/20230718083813.3416104-6-davemarchevsky@fb.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      f3514a5d
    • Dave Marchevsky's avatar
      selftests/bpf: Add rbtree test exercising race which 'owner' field prevents · fdf48dc2
      Dave Marchevsky authored
      This patch adds a runnable version of one of the races described by
      Kumar in [0]. Specifically, this interleaving:
      
      (rbtree1 and list head protected by lock1, rbtree2 protected by lock2)
      
      Prog A                          Prog B
      ======================================
      n = bpf_obj_new(...)
      m = bpf_refcount_acquire(n)
      kptr_xchg(map, m)
      
                                      m = kptr_xchg(map, NULL)
                                      lock(lock2)
      				bpf_rbtree_add(rbtree2, m->r, less)
      				unlock(lock2)
      
      lock(lock1)
      bpf_list_push_back(head, n->l)
      /* make n non-owning ref */
      bpf_rbtree_remove(rbtree1, n->r)
      unlock(lock1)
      
      The above interleaving, the node's struct bpf_rb_node *r can be used to
      add it to either rbtree1 or rbtree2, which are protected by different
      locks. If the node has been added to rbtree2, we should not be allowed
      to remove it while holding rbtree1's lock.
      
      Before changes in the previous patch in this series, the rbtree_remove
      in the second part of Prog A would succeed as the verifier has no way of
      knowing which tree owns a particular node at verification time. The
      addition of 'owner' field results in bpf_rbtree_remove correctly
      failing.
      
      The test added in this patch splits "Prog A" above into two separate BPF
      programs - A1 and A2 - and uses a second mapval + kptr_xchg to pass n
      from A1 to A2 similarly to the pass from A1 to B. If the test is run
      without the fix applied, the remove will succeed.
      
      Kumar's example had the two programs running on separate CPUs. This
      patch doesn't do this as it's not necessary to exercise the broken
      behavior / validate fixed behavior.
      
        [0]: https://lore.kernel.org/bpf/d7hyspcow5wtjcmw4fugdgyp3fwhljwuscp3xyut5qnwivyeru@ysdq543otzv2Signed-off-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      Suggested-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Link: https://lore.kernel.org/r/20230718083813.3416104-5-davemarchevsky@fb.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      fdf48dc2
    • Dave Marchevsky's avatar
      bpf: Add 'owner' field to bpf_{list,rb}_node · c3c510ce
      Dave Marchevsky authored
      As described by Kumar in [0], in shared ownership scenarios it is
      necessary to do runtime tracking of {rb,list} node ownership - and
      synchronize updates using this ownership information - in order to
      prevent races. This patch adds an 'owner' field to struct bpf_list_node
      and bpf_rb_node to implement such runtime tracking.
      
      The owner field is a void * that describes the ownership state of a
      node. It can have the following values:
      
        NULL           - the node is not owned by any data structure
        BPF_PTR_POISON - the node is in the process of being added to a data
                         structure
        ptr_to_root    - the pointee is a data structure 'root'
                         (bpf_rb_root / bpf_list_head) which owns this node
      
      The field is initially NULL (set by bpf_obj_init_field default behavior)
      and transitions states in the following sequence:
      
        Insertion: NULL -> BPF_PTR_POISON -> ptr_to_root
        Removal:   ptr_to_root -> NULL
      
      Before a node has been successfully inserted, it is not protected by any
      root's lock, and therefore two programs can attempt to add the same node
      to different roots simultaneously. For this reason the intermediate
      BPF_PTR_POISON state is necessary. For removal, the node is protected
      by some root's lock so this intermediate hop isn't necessary.
      
      Note that bpf_list_pop_{front,back} helpers don't need to check owner
      before removing as the node-to-be-removed is not passed in as input and
      is instead taken directly from the list. Do the check anyways and
      WARN_ON_ONCE in this unexpected scenario.
      
      Selftest changes in this patch are entirely mechanical: some BTF
      tests have hardcoded struct sizes for structs that contain
      bpf_{list,rb}_node fields, those were adjusted to account for the new
      sizes. Selftest additions to validate the owner field are added in a
      further patch in the series.
      
        [0]: https://lore.kernel.org/bpf/d7hyspcow5wtjcmw4fugdgyp3fwhljwuscp3xyut5qnwivyeru@ysdq543otzv2Signed-off-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      Suggested-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Link: https://lore.kernel.org/r/20230718083813.3416104-4-davemarchevsky@fb.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      c3c510ce
    • Dave Marchevsky's avatar
      bpf: Introduce internal definitions for UAPI-opaque bpf_{rb,list}_node · 0a1f7bfe
      Dave Marchevsky authored
      Structs bpf_rb_node and bpf_list_node are opaquely defined in
      uapi/linux/bpf.h, as BPF program writers are not expected to touch their
      fields - nor does the verifier allow them to do so.
      
      Currently these structs are simple wrappers around structs rb_node and
      list_head and linked_list / rbtree implementation just casts and passes
      to library functions for those data structures. Later patches in this
      series, though, will add an "owner" field to bpf_{rb,list}_node, such
      that they're not just wrapping an underlying node type. Moreover, the
      bpf linked_list and rbtree implementations will deal with these owner
      pointers directly in a few different places.
      
      To avoid having to do
      
        void *owner = (void*)bpf_list_node + sizeof(struct list_head)
      
      with opaque UAPI node types, add bpf_{list,rb}_node_kern struct
      definitions to internal headers and modify linked_list and rbtree to use
      the internal types where appropriate.
      Signed-off-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      Link: https://lore.kernel.org/r/20230718083813.3416104-3-davemarchevsky@fb.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      0a1f7bfe
  2. 17 Jul, 2023 24 commits
  3. 14 Jul, 2023 12 commits
    • Jesper Dangaard Brouer's avatar
      gve: trivial spell fix Recive to Receive · 68af9000
      Jesper Dangaard Brouer authored
      Spotted this trivial spell mistake while casually reading
      the google GVE driver code.
      Signed-off-by: default avatarJesper Dangaard Brouer <hawk@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      68af9000
    • David S. Miller's avatar
      Merge branch 'mlxsw-rif-pvid' · 382d7dcf
      David S. Miller authored
      Petr Machata says:
      
      ====================
      mlxsw: Manage RIF across PVID changes
      
      The mlxsw driver currently makes the assumption that the user applies
      configuration in a bottom-up manner. Thus netdevices need to be added to
      the bridge before IP addresses are configured on that bridge or SVI added
      on top of it. Enslaving a netdevice to another netdevice that already has
      uppers is in fact forbidden by mlxsw for this reason. Despite this safety,
      it is rather easy to get into situations where the offloaded configuration
      is just plain wrong.
      
      As an example, take a front panel port, configure an IP address: it gets a
      RIF. Now enslave the port to the bridge, and the RIF is gone. Remove the
      port from the bridge again, but the RIF never comes back. There is a number
      of similar situations, where changing the configuration there and back
      utterly breaks the offload.
      
      The situation is going to be made better by implementing a range of replays
      and post-hoc offloads.
      
      In this patch set, address the ordering issues related to creation of
      bridge RIFs. Currently, mlxsw has several shortcomings with regards to RIF
      handling due to PVID changes:
      
      - In order to cause RIF for a bridge device to be created, the user is
        expected first to set PVID, then to add an IP address. The reverse
        ordering is disallowed, which is not very user-friendly.
      
      - When such bridge gets a VLAN upper whose VID was the same as the existing
        PVID, and this VLAN netdevice gets an IP address, a RIF is created for
        this netdevice. The new RIF is then assigned to the 802.1Q FID for the
        given VID. This results in a working configuration. However, then, when
        the VLAN netdevice is removed again, the RIF for the bridge itself is
        never reassociated to the PVID.
      
      - PVID cannot be changed once the bridge has uppers. Presumably this is
        because the driver does not manage RIFs properly in face of PVID changes.
        However, as the previous point shows, it is still possible to get into
        invalid configurations.
      
      This patch set addresses these issues and relaxes some of the ordering
      requirements that mlxsw had. The patch set proceeds as follows:
      
      - In patch #1, pass extack to mlxsw_sp_br_ban_rif_pvid_change()
      
      - To relax ordering between setting PVID and adding an IP address to a
        bridge, mlxsw must be able to request that a RIF is created with a given
        VLAN ID, instead of trying to deduce it from the current netdevice
        settings, which do not reflect the user-requested values yet. This is
        done in patches #2 and #3.
      
      - Similarly, mlxsw_sp_inetaddr_bridge_event() will need to make decisions
        based on the user-requested value of PVID, not the current value. Thus in
        patches #4 and #5, add a new argument which carries the requested PVID
        value.
      
      - Finally in patch #6 relax the ban on PVID changes when a bridge has
        uppers. Instead, add the logic necessary for creation of a RIF as a
        result of PVID change.
      
      - Relevant selftests are presented afterwards. In patch #7 a preparatory
        helper is added to lib.sh. Patches #8, #9, #10 and #11 include selftests
        themselves.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      382d7dcf
    • Petr Machata's avatar
      selftests: router_bridge_pvid_vlan_upper: Add a new selftest · 9cbb3da4
      Petr Machata authored
      This tests whether addition and deletion of a VLAN upper that coincides
      with the current PVID setting throws off forwarding.
      
      This selftests is specifically geared towards offloading drivers. In
      particular, mlxsw used to fail this selftest, and an earlier patch in this
      patchset fixes the issue. However, there's nothing HW-specific in the test
      itself (it absolutely is supposed to pass on SW datapath), and therefore it
      is put into the generic forwarding directory.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarDanielle Ratson <danieller@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9cbb3da4
    • Petr Machata's avatar
      selftests: router_bridge_vlan_upper_pvid: Add a new selftest · b0307b77
      Petr Machata authored
      This tests whether changes to PVID that coincide with an existing VLAN
      upper throw off forwarding. This selftests is specifically geared towards
      offloading drivers, but since there's nothing HW-specific in the test
      itself (it absolutely is supposed to pass on SW datapath), it is put into
      the generic forwarding directory.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarDanielle Ratson <danieller@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b0307b77
    • Petr Machata's avatar
      selftests: router_bridge_vlan: Add PVID change test · d4172a93
      Petr Machata authored
      Add an alternative path involving VLAN 777 instead of the current 555. Then
      add tests that verify that marking 777 as PVID makes the 555 path not work,
      and the 777 path work.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarDanielle Ratson <danieller@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d4172a93
    • Petr Machata's avatar
      selftests: router_bridge: Add tests to remove and add PVID · c7203a29
      Petr Machata authored
      This test relies on PVID being configured on the bridge itself. Thus when
      it is deconfigured, the system should lose the ability to forward traffic.
      Later when it is added again, the ability to forward traffic should be
      regained. Add tests to exercise these configuration changes and verify
      results.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarDanielle Ratson <danieller@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c7203a29
    • Petr Machata's avatar
      selftests: forwarding: lib: Add ping6_, ping_test_fails() · 5f44a714
      Petr Machata authored
      Add two helpers to run a ping test that succeeds when the pings themselves
      fail.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarDanielle Ratson <danieller@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5f44a714
    • Petr Machata's avatar
      mlxsw: spectrum_switchdev: Manage RIFs on PVID change · a5b52692
      Petr Machata authored
      Currently, mlxsw has several shortcomings with regards to RIF handling due
      to PVID changes:
      
      - In order to cause RIF for a bridge device to be created, the user is
        expected first to set PVID, then to add an IP address. The reverse
        ordering is disallowed, which is not very user-friendly.
      
      - When such bridge gets a VLAN upper whose VID was the same as the existing
        PVID, and this VLAN netdevice gets an IP address, a RIF is created for
        this netdevice. The new RIF is then assigned to the 802.1Q FID for the
        given VID. This results in a working configuration. However, then, when
        the VLAN netdevice is removed again, the RIF for the bridge itself is
        never reassociated to the VLAN.
      
      - PVID cannot be changed once the bridge has uppers. Presumably this is
        because the driver does not manage RIFs properly in face of PVID changes.
        However, as the previous point shows, it is still possible to get into
        invalid configurations.
      
      In this patch, add the logic necessary for creation of a RIF as a result of
      PVID change. Moreover, when a VLAN upper is created whose VID matches lower
      PVID, do not create RIF for this netdevice.
      
      These changes obviate the need for ordering of IP address additions and
      PVID configuration, so stop forbidding addition of an IP address to a
      PVID-less bridge. Instead, bail out quietly. Also stop preventing PVID
      changes when the bridge has uppers.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarDanielle Ratson <danieller@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a5b52692
    • Petr Machata's avatar
      mlxsw: spectrum_router: mlxsw_sp_inetaddr_bridge_event: Add an argument · 3430f2cf
      Petr Machata authored
      For purposes of replay, mlxsw_sp_inetaddr_bridge_event() will need to make
      decisions based on the proposed value of PVID. Querying PVID reveals the
      current settings, not the in-flight values that the user requested and that
      the notifiers are acting upon. Add a parameter, lower_pvid, which carries
      the proposed PVID of the lower bridge, or -1 if the lower is not a bridge.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarDanielle Ratson <danieller@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3430f2cf
    • Petr Machata's avatar
      mlxsw: spectrum_router: Adjust mlxsw_sp_inetaddr_vlan_event() coding style · a24a4d29
      Petr Machata authored
      The bridge branch of the dispatch in this function is going to get more
      code and will need curly braces. Per the doctrine, that means the whole
      if-else chain should get them.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarDanielle Ratson <danieller@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a24a4d29
    • Petr Machata's avatar
      mlxsw: spectrum_router: Take VID for VLAN FIDs from RIF params · a0944b24
      Petr Machata authored
      Currently, when an IP address is added to a bridge that has no PVID, the
      operation is rejected. An IP address addition is interpreted as a request
      to create a RIF for the bridge device, but without a PVID there is no VLAN
      for which the RIF should be created. Thus the correct way to create a RIF
      for a bridge as a user is to first add a PVID, and then add the IP address.
      
      Ideally this ordering requirement would not exist. RIF would be created
      either because an IP address is added, or because a PVID is added,
      depending on which comes last.
      
      For that, the switchdev code (which notices the PVID change request) must
      be able to request that a RIF is created with a given VLAN ID, because at
      the time that the PVID notification is distributed, the PVID setting is not
      yet visible for querying.
      
      Therefore when creating a VLAN-based RIF, use mlxsw_sp_rif_params.vid to
      communicate the VID, and do not determine it ad-hoc in the fid_get
      callback.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarDanielle Ratson <danieller@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a0944b24
    • Petr Machata's avatar
      mlxsw: spectrum_router: Pass struct mlxsw_sp_rif_params to fid_get · 5ca9f42c
      Petr Machata authored
      The fid_get callback is called to allocate a FID for the newly-created RIF.
      In a following patch, the fid_get implementation for VLANs will be modified
      to take the VLAN ID from the parameters instead of deducing it from the
      netdevice. To that end, propagate the RIF parameters to the fid_get
      callback.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarDanielle Ratson <danieller@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5ca9f42c