1. 16 Feb, 2022 10 commits
    • Vladimir Oltean's avatar
      net: switchdev: introduce switchdev_handle_port_obj_{add,del} for foreign interfaces · c4076cdd
      Vladimir Oltean authored
      The switchdev_handle_port_obj_add() helper is good for replicating a
      port object on the lower interfaces of @dev, if that object was emitted
      on a bridge, or on a bridge port that is a LAG.
      
      However, drivers that use this helper limit themselves to a box from
      which they can no longer intercept port objects notified on neighbor
      ports ("foreign interfaces").
      
      One such driver is DSA, where software bridging with foreign interfaces
      such as standalone NICs or Wi-Fi APs is an important use case. There, a
      VLAN installed on a neighbor bridge port roughly corresponds to a
      forwarding VLAN installed on the DSA switch's CPU port.
      
      To support this use case while also making use of the benefits of the
      switchdev_handle_* replication helper for port objects, introduce a new
      variant of these functions that crawls through the neighbor ports of
      @dev, in search of potentially compatible switchdev ports that are
      interested in the event.
      
      The strategy is identical to switchdev_handle_fdb_event_to_device():
      if @dev wasn't a switchdev interface, then go one step upper, and
      recursively call this function on the bridge that this port belongs to.
      At the next recursion step, __switchdev_handle_port_obj_add() will
      iterate through the bridge's lower interfaces. Among those, some will be
      switchdev interfaces, and one will be the original @dev that we came
      from. To prevent infinite recursion, we must suppress reentry into the
      original @dev, and just call the @add_cb for the switchdev_interfaces.
      
      It looks like this:
      
                      br0
                     / | \
                    /  |  \
                   /   |   \
                 swp0 swp1 eth0
      
      1. __switchdev_handle_port_obj_add(eth0)
         -> check_cb(eth0) returns false
         -> eth0 has no lower interfaces
         -> eth0's bridge is br0
         -> switchdev_lower_dev_find(br0, check_cb, foreign_dev_check_cb))
            finds br0
      
      2. __switchdev_handle_port_obj_add(br0)
         -> check_cb(br0) returns false
         -> netdev_for_each_lower_dev
            -> check_cb(swp0) returns true, so we don't skip this interface
      
      3. __switchdev_handle_port_obj_add(swp0)
         -> check_cb(swp0) returns true, so we call add_cb(swp0)
      
      (back to netdev_for_each_lower_dev from 2)
            -> check_cb(swp1) returns true, so we don't skip this interface
      
      4. __switchdev_handle_port_obj_add(swp1)
         -> check_cb(swp1) returns true, so we call add_cb(swp1)
      
      (back to netdev_for_each_lower_dev from 2)
            -> check_cb(eth0) returns false, so we skip this interface to
               avoid infinite recursion
      
      Note: eth0 could have been a LAG, and we don't want to suppress the
      recursion through its lowers if those exist, so when check_cb() returns
      false, we still call switchdev_lower_dev_find() to estimate whether
      there's anything worth a recursion beneath that LAG. Using check_cb()
      and foreign_dev_check_cb(), switchdev_lower_dev_find() not only figures
      out whether the lowers of the LAG are switchdev, but also whether they
      actively offload the LAG or not (whether the LAG is "foreign" to the
      switchdev interface or not).
      
      The port_obj_info->orig_dev is preserved across recursive calls, so
      switchdev drivers still know on which device was this notification
      originally emitted.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c4076cdd
    • Vladimir Oltean's avatar
      net: switchdev: rename switchdev_lower_dev_find to switchdev_lower_dev_find_rcu · 7b465f4c
      Vladimir Oltean authored
      switchdev_lower_dev_find() assumes RCU read-side critical section
      calling context, since it uses netdev_walk_all_lower_dev_rcu().
      
      Rename it appropriately, in preparation of adding a similar iterator
      that assumes writer-side rtnl_mutex protection.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7b465f4c
    • Vladimir Oltean's avatar
      net: bridge: switchdev: replay all VLAN groups · b28d580e
      Vladimir Oltean authored
      The major user of replayed switchdev objects is DSA, and so far it
      hasn't needed information about anything other than bridge port VLANs,
      so this is all that br_switchdev_vlan_replay() knows to handle.
      
      DSA has managed to get by through replicating every VLAN addition on a
      user port such that the same VLAN is also added on all DSA and CPU
      ports, but there is a corner case where this does not work.
      
      The mv88e6xxx DSA driver currently prints this error message as soon as
      the first port of a switch joins a bridge:
      
      mv88e6085 0x0000000008b96000:00: port 0 failed to add a6:ef:77:c8:5f:3d vid 1 to fdb: -95
      
      where a6:ef:77:c8:5f:3d vid 1 is a local FDB entry corresponding to the
      bridge MAC address in the default_pvid.
      
      The -EOPNOTSUPP is returned by mv88e6xxx_port_db_load_purge() because it
      tries to map VID 1 to a FID (the ATU is indexed by FID not VID), but
      fails to do so. This is because ->port_fdb_add() is called before
      ->port_vlan_add() for VID 1.
      
      The abridged timeline of the calls is:
      
      br_add_if
      -> netdev_master_upper_dev_link
         -> dsa_port_bridge_join
            -> switchdev_bridge_port_offload
               -> br_switchdev_vlan_replay (*)
               -> br_switchdev_fdb_replay
                  -> mv88e6xxx_port_fdb_add
      -> nbp_vlan_init
         -> nbp_vlan_add
            -> mv88e6xxx_port_vlan_add
      
      and the issue is that at the time of (*), the bridge port isn't in VID 1
      (nbp_vlan_init hasn't been called), therefore br_switchdev_vlan_replay()
      won't have anything to replay, therefore VID 1 won't be in the VTU by
      the time mv88e6xxx_port_fdb_add() is called.
      
      This happens only when the first port of a switch joins. For further
      ports, the initial mv88e6xxx_port_vlan_add() is sufficient for VID 1 to
      be loaded in the VTU (which is switch-wide, not per port).
      
      The problem is somewhat unique to mv88e6xxx by chance, because most
      other drivers offload an FDB entry by VID, so FDBs and VLANs can be
      added asynchronously with respect to each other, but addressing the
      issue at the bridge layer makes sense, since what mv88e6xxx requires
      isn't absurd.
      
      To fix this problem, we need to recognize that it isn't the VLAN group
      of the port that we're interested in, but the VLAN group of the bridge
      itself (so it isn't a timing issue, but rather insufficient information
      being passed from switchdev to drivers).
      
      As mentioned, currently nbp_switchdev_sync_objs() only calls
      br_switchdev_vlan_replay() for VLANs corresponding to the port, but the
      VLANs corresponding to the bridge itself, for local termination, also
      need to be replayed. In this case, VID 1 is not (yet) present in the
      port's VLAN group but is present in the bridge's VLAN group.
      
      So to fix this bug, DSA is now obligated to explicitly handle VLANs
      pointing towards the bridge in order to "close this race" (which isn't
      really a race). As Tobias Waldekranz notices, this also implies that it
      must explicitly handle port VLANs on foreign interfaces, something that
      worked implicitly before:
      https://patchwork.kernel.org/project/netdevbpf/patch/20220209213044.2353153-6-vladimir.oltean@nxp.com/#24735260
      
      So in the end, br_switchdev_vlan_replay() must replay all VLANs from all
      VLAN groups: all the ports, and the bridge itself.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b28d580e
    • Vladimir Oltean's avatar
      net: bridge: make nbp_switchdev_unsync_objs() follow reverse order of sync() · 263029ae
      Vladimir Oltean authored
      There may be switchdev drivers that can add/remove a FDB or MDB entry
      only as long as the VLAN it's in has been notified and offloaded first.
      The nbp_switchdev_sync_objs() method satisfies this requirement on
      addition, but nbp_switchdev_unsync_objs() first deletes VLANs, then
      deletes MDBs and FDBs. Reverse the order of the function calls to cater
      to this requirement.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      263029ae
    • Vladimir Oltean's avatar
      net: bridge: switchdev: differentiate new VLANs from changed ones · 8d23a54f
      Vladimir Oltean authored
      br_switchdev_port_vlan_add() currently emits a SWITCHDEV_PORT_OBJ_ADD
      event with a SWITCHDEV_OBJ_ID_PORT_VLAN for 2 distinct cases:
      
      - a struct net_bridge_vlan got created
      - an existing struct net_bridge_vlan was modified
      
      This makes it impossible for switchdev drivers to properly balance
      PORT_OBJ_ADD with PORT_OBJ_DEL events, so if we want to allow that to
      happen, we must provide a way for drivers to distinguish between a
      VLAN with changed flags and a new one.
      
      Annotate struct switchdev_obj_port_vlan with a "bool changed" that
      distinguishes the 2 cases above.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8d23a54f
    • Vladimir Oltean's avatar
      net: bridge: vlan: notify switchdev only when something changed · 27c5f74c
      Vladimir Oltean authored
      Currently, when a VLAN entry is added multiple times in a row to a
      bridge port, nbp_vlan_add() calls br_switchdev_port_vlan_add() each
      time, even if the VLAN already exists and nothing about it has changed:
      
      bridge vlan add dev lan12 vid 100 master static
      
      Similarly, when a VLAN is added multiple times in a row to a bridge,
      br_vlan_add_existing() doesn't filter at all the calls to
      br_switchdev_port_vlan_add():
      
      bridge vlan add dev br0 vid 100 self
      
      This behavior makes driver-level accounting of VLANs impossible, since
      it is enough for a single deletion event to remove a VLAN, but the
      addition event can be emitted an unlimited number of times.
      
      The cause for this can be identified as follows: we rely on
      __vlan_add_flags() to retroactively tell us whether it has changed
      anything about the VLAN flags or VLAN group pvid. So we'd first have to
      call __vlan_add_flags() before calling br_switchdev_port_vlan_add(), in
      order to have access to the "bool *changed" information. But we don't
      want to change the event ordering, because we'd have to revert the
      struct net_bridge_vlan changes we've made if switchdev returns an error.
      
      So to solve this, we need another function that tells us whether any
      change is going to occur in the VLAN or VLAN group, _prior_ to calling
      __vlan_add_flags().
      
      Split __vlan_add_flags() into a precommit and a commit stage, and rename
      it to __vlan_flags_update(). The precommit stage,
      __vlan_flags_would_change(), will determine whether there is any reason
      to notify switchdev due to a change of flags (note: the BRENTRY flag
      transition from false to true is treated separately: as a new switchdev
      entry, because we skipped notifying the master VLAN when it wasn't a
      brentry yet, and therefore not as a change of flags).
      
      With this lookahead/precommit function in place, we can avoid notifying
      switchdev if nothing changed for the VLAN and VLAN group.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      27c5f74c
    • Vladimir Oltean's avatar
      net: bridge: vlan: make __vlan_add_flags react only to PVID and UNTAGGED · cab2cd77
      Vladimir Oltean authored
      Currently there is a very subtle aspect to the behavior of
      __vlan_add_flags(): it changes the struct net_bridge_vlan flags and
      pvid, yet it returns true ("changed") even if none of those changed,
      just a transition of br_vlan_is_brentry(v) took place from false to
      true.
      
      This can be seen in br_vlan_add_existing(), however we do not actually
      rely on this subtle behavior, since the "if" condition that checks that
      the vlan wasn't a brentry before had a useless (until now) assignment:
      
      	*changed = true;
      
      Make things more obvious by actually making __vlan_add_flags() do what's
      written on the box, and be more specific about what is actually written
      on the box. This is needed because further transformations will be done
      to __vlan_add_flags().
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: default avatarNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cab2cd77
    • Vladimir Oltean's avatar
      net: bridge: vlan: don't notify to switchdev master VLANs without BRENTRY flag · 3116ad06
      Vladimir Oltean authored
      When a VLAN is added to a bridge port and it doesn't exist on the bridge
      device yet, it gets created for the multicast context, but it is
      'hidden', since it doesn't have the BRENTRY flag yet:
      
      ip link add br0 type bridge && ip link set swp0 master br0
      bridge vlan add dev swp0 vid 100 # the master VLAN 100 gets created
      bridge vlan add dev br0 vid 100 self # that VLAN becomes brentry just now
      
      All switchdev drivers ignore switchdev notifiers for VLAN entries which
      have the BRENTRY unset, and for good reason: these are merely private
      data structures used by the bridge driver. So we might just as well not
      notify those at all.
      
      Cleanup in the switchdev drivers that check for the BRENTRY flag is now
      possible, and will be handled separately, since those checks just became
      dead code.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: default avatarNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3116ad06
    • Vladimir Oltean's avatar
      net: bridge: vlan: check early for lack of BRENTRY flag in br_vlan_add_existing · b2bc58d4
      Vladimir Oltean authored
      When a VLAN is added to a bridge port, a master VLAN gets created on the
      bridge for context, but it doesn't have the BRENTRY flag.
      
      Then, when the same VLAN is added to the bridge itself, that enters
      through the br_vlan_add_existing() code path and gains the BRENTRY flag,
      thus it becomes "existing".
      
      It seems natural to check for this condition early, because the current
      code flow is to notify switchdev of the addition of a VLAN that isn't a
      brentry, just to delete it immediately afterwards.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: default avatarNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b2bc58d4
    • Haiyue Wang's avatar
      gve: enhance no queue page list detection · b0471c26
      Haiyue Wang authored
      The commit
      a5886ef4 ("gve: Introduce per netdev `enum gve_queue_format`")
      introduces three queue format type, only GVE_GQI_QPL_FORMAT queue has
      page list. So it should use the queue page list number to detect the
      zero size queue page list. Correct the design logic.
      
      Using the 'queue_format == GVE_GQI_RDA_FORMAT' may lead to request zero
      sized memory allocation, like if the queue format is GVE_DQO_RDA_FORMAT.
      
      The kernel memory subsystem will return ZERO_SIZE_PTR, which is not NULL
      address, so the driver can run successfully. Also the code still checks
      the queue page list number firstly, then accesses the allocated memory,
      so zero number queue page list allocation will not lead to access fault.
      Signed-off-by: default avatarHaiyue Wang <haiyue.wang@intel.com>
      Reviewed-by: default avatarBailey Forrest <bcf@google.com>
      Link: https://lore.kernel.org/r/20220215051751.260866-1-haiyue.wang@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b0471c26
  2. 15 Feb, 2022 22 commits
  3. 14 Feb, 2022 8 commits