1. 16 Feb, 2024 4 commits
    • David S. Miller's avatar
      Merge branch 'bridge-mdb-events' · 82a678e2
      David S. Miller authored
      Tobias Waldekranz says:
      
      ====================
      net: bridge: switchdev: Ensure MDB events are delivered exactly once
      
      When a device is attached to a bridge, drivers will request a replay
      of objects that were created before the device joined the bridge, that
      are still of interest to the joining port. Typical examples include
      FDB entries and MDB memberships on other ports ("foreign interfaces")
      or on the bridge itself.
      
      Conversely when a device is detached, the bridge will synthesize
      deletion events for all those objects that are still live, but no
      longer applicable to the device in question.
      
      This series eliminates two races related to the synching and
      unsynching phases of a bridge's MDB with a joining or leaving device,
      that would cause notifications of such objects to be either delivered
      twice (1/2), or not at all (2/2).
      
      A similar race to the one solved by 1/2 still remains for the
      FDB. This is much harder to solve, due to the lockless operation of
      the FDB's rhashtable, and is therefore knowingly left out of this
      series.
      
      v1 -> v2:
      - Squash the previously separate addition of
        switchdev_port_obj_act_is_deferred into first consumer.
      - Use ether_addr_equal to compare MAC addresses.
      - Document switchdev_port_obj_act_is_deferred (renamed from
        switchdev_port_obj_is_deferred in v1, to indicate that we also match
        on the action).
      - Delay allocations of MDB objects until we know they're needed.
      - Use non-RCU version of the hash list iterator, now that the MDB is
        not scanned while holding the RCU read lock.
      - Add Fixes tag to commit message
      
      v2 -> v3:
      - Fix unlocking in error paths
      - Access RCU protected port list via mlock_dereference, since MDB is
        guaranteed to remain constant for the duration of the scan.
      
      v3 -> v4:
      - Limit the search for exiting deferred events in 1/2 to only apply to
        additions, since the problem does not exist in the deletion case.
      - Add 2/2, to plug a related race when unoffloading an indirectly
        associated device.
      
      v4 -> v5:
      - Fix grammatical errors in kerneldoc of
        switchdev_port_obj_act_is_deferred
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      82a678e2
    • Tobias Waldekranz's avatar
      net: bridge: switchdev: Ensure deferred event delivery on unoffload · f7a70d65
      Tobias Waldekranz authored
      When unoffloading a device, it is important to ensure that all
      relevant deferred events are delivered to it before it disassociates
      itself from the bridge.
      
      Before this change, this was true for the normal case when a device
      maps 1:1 to a net_bridge_port, i.e.
      
         br0
         /
      swp0
      
      When swp0 leaves br0, the call to switchdev_deferred_process() in
      del_nbp() makes sure to process any outstanding events while the
      device is still associated with the bridge.
      
      In the case when the association is indirect though, i.e. when the
      device is attached to the bridge via an intermediate device, like a
      LAG...
      
          br0
          /
        lag0
        /
      swp0
      
      ...then detaching swp0 from lag0 does not cause any net_bridge_port to
      be deleted, so there was no guarantee that all events had been
      processed before the device disassociated itself from the bridge.
      
      Fix this by always synchronously processing all deferred events before
      signaling completion of unoffloading back to the driver.
      
      Fixes: 4e51bf44 ("net: bridge: move the switchdev object replay helpers to "push" mode")
      Signed-off-by: default avatarTobias Waldekranz <tobias@waldekranz.com>
      Reviewed-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f7a70d65
    • Tobias Waldekranz's avatar
      net: bridge: switchdev: Skip MDB replays of deferred events on offload · dc489f86
      Tobias Waldekranz authored
      Before this change, generation of the list of MDB events to replay
      would race against the creation of new group memberships, either from
      the IGMP/MLD snooping logic or from user configuration.
      
      While new memberships are immediately visible to walkers of
      br->mdb_list, the notification of their existence to switchdev event
      subscribers is deferred until a later point in time. So if a replay
      list was generated during a time that overlapped with such a window,
      it would also contain a replay of the not-yet-delivered event.
      
      The driver would thus receive two copies of what the bridge internally
      considered to be one single event. On destruction of the bridge, only
      a single membership deletion event was therefore sent. As a
      consequence of this, drivers which reference count memberships (at
      least DSA), would be left with orphan groups in their hardware
      database when the bridge was destroyed.
      
      This is only an issue when replaying additions. While deletion events
      may still be pending on the deferred queue, they will already have
      been removed from br->mdb_list, so no duplicates can be generated in
      that scenario.
      
      To a user this meant that old group memberships, from a bridge in
      which a port was previously attached, could be reanimated (in
      hardware) when the port joined a new bridge, without the new bridge's
      knowledge.
      
      For example, on an mv88e6xxx system, create a snooping bridge and
      immediately add a port to it:
      
          root@infix-06-0b-00:~$ ip link add dev br0 up type bridge mcast_snooping 1 && \
          > ip link set dev x3 up master br0
      
      And then destroy the bridge:
      
          root@infix-06-0b-00:~$ ip link del dev br0
          root@infix-06-0b-00:~$ mvls atu
          ADDRESS             FID  STATE      Q  F  0  1  2  3  4  5  6  7  8  9  a
          DEV:0 Marvell 88E6393X
          33:33:00:00:00:6a     1  static     -  -  0  .  .  .  .  .  .  .  .  .  .
          33:33:ff:87:e4:3f     1  static     -  -  0  .  .  .  .  .  .  .  .  .  .
          ff:ff:ff:ff:ff:ff     1  static     -  -  0  1  2  3  4  5  6  7  8  9  a
          root@infix-06-0b-00:~$
      
      The two IPv6 groups remain in the hardware database because the
      port (x3) is notified of the host's membership twice: once via the
      original event and once via a replay. Since only a single delete
      notification is sent, the count remains at 1 when the bridge is
      destroyed.
      
      Then add the same port (or another port belonging to the same hardware
      domain) to a new bridge, this time with snooping disabled:
      
          root@infix-06-0b-00:~$ ip link add dev br1 up type bridge mcast_snooping 0 && \
          > ip link set dev x3 up master br1
      
      All multicast, including the two IPv6 groups from br0, should now be
      flooded, according to the policy of br1. But instead the old
      memberships are still active in the hardware database, causing the
      switch to only forward traffic to those groups towards the CPU (port
      0).
      
      Eliminate the race in two steps:
      
      1. Grab the write-side lock of the MDB while generating the replay
         list.
      
      This prevents new memberships from showing up while we are generating
      the replay list. But it leaves the scenario in which a deferred event
      was already generated, but not delivered, before we grabbed the
      lock. Therefore:
      
      2. Make sure that no deferred version of a replay event is already
         enqueued to the switchdev deferred queue, before adding it to the
         replay list, when replaying additions.
      
      Fixes: 4f2673b3 ("net: bridge: add helper to replay port and host-joined mdb entries")
      Signed-off-by: default avatarTobias Waldekranz <tobias@waldekranz.com>
      Reviewed-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dc489f86
    • Alexander Gordeev's avatar
      net/iucv: fix the allocation size of iucv_path_table array · b4ea9b6a
      Alexander Gordeev authored
      iucv_path_table is a dynamically allocated array of pointers to
      struct iucv_path items. Yet, its size is calculated as if it was
      an array of struct iucv_path items.
      Signed-off-by: default avatarAlexander Gordeev <agordeev@linux.ibm.com>
      Reviewed-by: default avatarAlexandra Winter <wintera@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b4ea9b6a
  2. 15 Feb, 2024 28 commits
  3. 14 Feb, 2024 8 commits