1. 29 Jun, 2021 31 commits
    • Jakub Kicinski's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · b6df0078
      Jakub Kicinski authored
      Trivial conflict in net/netfilter/nf_tables_api.c.
      
      Duplicate fix in tools/testing/selftests/net/devlink_port_split.py
      - take the net-next version.
      
      skmsg, and L4 bpf - keep the bpf code but remove the flags
      and err params.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b6df0078
    • Eric Dumazet's avatar
      tcp: change ICSK_CA_PRIV_SIZE definition · 3f8ad50a
      Eric Dumazet authored
      Instead of a magic number (13 currently) and having
      to change it every other year, use sizeof_field() macro.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3f8ad50a
    • Eric Dumazet's avatar
      tcp_yeah: check struct yeah size at compile time · 6706721d
      Eric Dumazet authored
      Compiler can perform the sanity check instead of waiting
      to load the module and crash the host.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6706721d
    • Dan Carpenter's avatar
      gve: DQO: Fix off by one in gve_rx_dqo() · ecd89c02
      Dan Carpenter authored
      The rx->dqo.buf_states[] array is allocated in gve_rx_alloc_ring_dqo()
      and it has rx->dqo.num_buf_states so this > needs to >= to prevent an
      out of bounds access.
      
      Fixes: 9b8dd5e5 ("gve: DQO: Add RX path")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ecd89c02
    • David S. Miller's avatar
      Merge branch 'stmmac-phy-wol' · 66f1546d
      David S. Miller authored
      Ling Pei Lee says:
      
      ====================
      tmmac: Add option to enable PHY WOL with PMT enabled
      
      This patchset main objective is to provide an option to enable PHY WoL
      even the PMT is enabled by default in the HW features.
      
      The current stmmac driver WOL implementation will enable MAC WOL if
      MAC HW PMT feature is on. Else, the driver will check for PHY WOL
      support.  Intel EHL mgbe are designed to wake up through PHY WOL
      although the HW PMT is enabled.Hence, introduced use_phy_wol platform
      data to provide this PHY WOL option. Set use_phy_wol will disable the
      plat->pmt which currently used to determine the system to wake up by
      MAC WOL or PHY WOL.
      
      This WOL patchset includes of setting the device power state to D3hot.
      This is because the EHL PSE will need to PSE mgbe to be in D3 state in
      order for the PSE to goes into suspend mode.
      
      Change Log:
       V2: Drop Patch #3 net: stmmac: Reconfigure the PHY WOL settings in stmmac_resume().
      ====================
      66f1546d
    • Voon Weifeng's avatar
      stmmac: intel: set PCI_D3hot in suspend · 1dd53a61
      Voon Weifeng authored
      During suspend, set the Intel mgbe to D3hot state
      to save power consumption.
      Signed-off-by: default avatarVoon Weifeng <weifeng.voon@intel.com>
      Signed-off-by: default avatarLing Pei Lee <pei.lee.ling@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1dd53a61
    • Ling Pei Lee's avatar
      stmmac: intel: Enable PHY WOL option in EHL · 945beb75
      Ling Pei Lee authored
      Enable PHY Wake On LAN in Intel EHL Intel platform.
      PHY Wake on LAN option is enabled due to
      Intel EHL Intel platform is designed for
      PHY Wake On LAN but not MAC Wake On LAN.
      Signed-off-by: default avatarLing Pei Lee <pei.lee.ling@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      945beb75
    • Ling Pei Lee's avatar
      net: stmmac: option to enable PHY WOL with PMT enabled · 5a9b876e
      Ling Pei Lee authored
      The current stmmac driver WOL implementation will enable MAC WOL
      if MAC HW PMT feature is on. Else, the driver will check for
      PHY WOL support. There is another case where MAC HW PMT is
      enabled but the platform still goes for the PHY WOL option.
      E.g, Intel platform are designed for PHY WOL but not MAC WOL
      although HW MAC PMT features are enabled.
      
      Introduce use_phy_wol platform data to select PHY WOL
      instead of depending on HW PMT features. Set use_phy_wol
      will disable the plat->pmt which currently used to
      determine the system to wake up by MAC WOL or PHY WOL.
      Signed-off-by: default avatarLing Pei Lee <pei.lee.ling@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5a9b876e
    • David S. Miller's avatar
      Merge branch 'ndo_dflt_fdb-print' · b03cfe6f
      David S. Miller authored
      Vladimir Oltean says:
      
      ====================
      Trivial print improvements in ndo_dflt_fdb_{add,del}
      
      These are some changes brought to the informational messages printed in
      the default .ndo_fdb_add and .ndo_fdb_del method implementations.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b03cfe6f
    • Vladimir Oltean's avatar
      net: say "local" instead of "static" addresses in ndo_dflt_fdb_{add,del} · 78ecc890
      Vladimir Oltean authored
      "Static" is a loaded word, and probably not what the author meant when
      the code was written.
      
      In particular, this looks weird:
      $ bridge fdb add dev swp0 00:01:02:03:04:05 local        # totally fine, but
      $ bridge fdb add dev swp0 00:01:02:03:04:05 static
      [ 2020.708298] swp0: FDB only supports static addresses  # hmm what?
      
      By looking at the implementation which uses dev_uc_add/dev_uc_del it is
      absolutely clear that only local addresses are supported, and the proper
      Network Unreachability Detection state is being used for this purpose
      (user space indeed sets NUD_PERMANENT when local addresses are meant).
      So it is just the message that is wrong, fix it.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      78ecc890
    • Vladimir Oltean's avatar
      net: use netdev_info in ndo_dflt_fdb_{add,del} · 23ac0b42
      Vladimir Oltean authored
      Use the more modern printk helper for network interfaces, which also
      contains information about the associated struct device, and results in
      overall shorter line lengths compared to printing an open-coded
      dev->name.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      23ac0b42
    • Jonathan Lemon's avatar
      ptp: Set lookup cookie when creating a PTP PPS source. · 8602e40f
      Jonathan Lemon authored
      When creating a PTP device, the configuration block allows
      creation of an associated PPS device.  However, there isn't
      any way to associate the two devices after creation.
      
      Set the PPS cookie, so pps_lookup_dev(ptp) performs correctly.
      Signed-off-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8602e40f
    • David S. Miller's avatar
      Merge branch 'inet-sk_error-tracers' · c79fa61c
      David S. Miller authored
      Alexander Aring says:
      
      ====================
      net: sock: add tracers for inet socket errors
      
      this patch series introduce tracers for sk_error_report socket callback
      calls. The use-case is that a user space application can monitor them
      and making an own heuristic about bad peer connections even over a
      socket lifetime. To make a specific example it could be use in the Linux
      cluster world to fence a "bad" behaving node. For now it's okay to only
      trace inet sockets. Other socket families can introduce their own tracers
      easily.
      
      Example output with trace-cmd:
      
      <idle>-0     [003]   201.799437: inet_sk_error_report: family=AF_INET protocol=IPPROTO_TCP sport=21064 dport=38941 saddr=192.168.122.57 daddr=192.168.122.251 saddrv6=::ffff:192.168.122.57 daddrv6=::ffff:192.168.122.251 error=104
      
      - Alex
      
      changes since v2:
      
      - change "sk.sk_error_report(&ipc->sk);" to "sk_error_report(&ipc->sk);"
        in net/qrtr/qrtr.c
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c79fa61c
    • Alexander Aring's avatar
      net: sock: add trace for socket errors · e6a3e443
      Alexander Aring authored
      This patch will add tracers to trace inet socket errors only. A user
      space monitor application can track connection errors indepedent from
      socket lifetime and do additional handling. For example a cluster
      manager can fence a node if errors occurs in a specific heuristic.
      Signed-off-by: default avatarAlexander Aring <aahringo@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e6a3e443
    • Alexander Aring's avatar
      net: sock: introduce sk_error_report · e3ae2365
      Alexander Aring authored
      This patch introduces a function wrapper to call the sk_error_report
      callback. That will prepare to add additional handling whenever
      sk_error_report is called, for example to trace socket errors.
      Signed-off-by: default avatarAlexander Aring <aahringo@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e3ae2365
    • David S. Miller's avatar
      Merge branch 'dsa-rx-filtering' · 7f4e5c5b
      David S. Miller authored
      Vladimir Oltean says:
      
      ====================
      RX filtering in DSA
      
      This is my fourth stab (identical to the third one except sent as
      non-RFC) at creating a list of unicast and multicast addresses that the
      DSA CPU ports must trap. I am reusing a lot of Tobias's work which he
      submitted here:
      https://patchwork.kernel.org/project/netdevbpf/cover/20210116012515.3152-1-tobias@waldekranz.com/
      
      My additions to Tobias' work come in the form of taking some care that
      additions and removals of host addresses are properly balanced, so that
      we can do reference counting on them for cross-chip setups and multiple
      bridges spanning the same switch (I am working on an NXP board where
      both are real requirements).
      
      During the last attempted submission of multiple CPU ports for DSA:
      https://patchwork.kernel.org/project/netdevbpf/cover/20210410133454.4768-1-ansuelsmth@gmail.com/
      
      it became clear that the concept of multiple CPU ports would not be
      compatible with the idea of address learning on those CPU ports (when
      those CPU ports are statically assigned to user ports, not in a LAG)
      unless the switch supports complete FDB isolation, which most switches
      do not. So DSA needs to manage in software all addresses that are
      installed on the CPU port(s), which is what this patch set does.
      
      Compared to all earlier attempts, this series does not fiddle with how
      DSA operates the ports in standalone mode at all, just when bridged.
      We need to sort that out properly, then any optimization that comes in
      standalone mode (i.e. IFF_UNICAST_FLT) can come later.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7f4e5c5b
    • Vladimir Oltean's avatar
      net: dsa: replay the local bridge FDB entries pointing to the bridge dev too · 63c51453
      Vladimir Oltean authored
      When we join a bridge that already has some local addresses pointing to
      itself, we do not get those notifications. Similarly, when we leave that
      bridge, we do not get notifications for the deletion of those entries.
      The only switchdev notifications we get are those of entries added while
      the DSA port is enslaved to the bridge.
      
      This makes use cases such as the following work properly (with the
      number of additions and removals properly balanced):
      
      ip link add br0 type bridge
      ip link add br1 type bridge
      ip link set br0 address 00:01:02:03:04:05
      ip link set br1 address 00:01:02:03:04:05
      ip link set swp0 up
      ip link set swp1 up
      ip link set swp0 master br0
      ip link set swp1 master br1
      ip link set br0 up
      ip link set br1 up
      ip link del br1 # 00:01:02:03:04:05 still installed on the CPU port
      ip link del br0 # 00:01:02:03:04:05 finally removed from the CPU port
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      63c51453
    • Vladimir Oltean's avatar
      net: dsa: ensure during dsa_fdb_offload_notify that dev_hold and dev_put are on the same dev · 4bed397c
      Vladimir Oltean authored
      When
      (a) "dev" is a bridge port which the DSA switch tree offloads, but is
          otherwise not a dsa slave (such as a LAG netdev), or
      (b) "dev" is the bridge net device itself
      
      then strange things happen to the dev_hold/dev_put pair:
      dsa_schedule_work() will still be called with a DSA port that offloads
      that netdev, but dev_hold() will be called on the non-DSA netdev.
      Then the "if" condition in dsa_slave_switchdev_event_work() does not
      pass, because "dev" is not a DSA netdev, so dev_put() is not called.
      
      This results in the simple fact that we have a reference counting
      mismatch on the "dev" net device.
      
      This can be seen when we add support for host addresses installed on the
      bridge net device.
      
      ip link add br1 type bridge
      ip link set br1 address 00:01:02:03:04:05
      ip link set swp0 master br1
      ip link del br1
      [  968.512278] unregister_netdevice: waiting for br1 to become free. Usage count = 5
      
      It seems foolish to do penny pinching and not add the net_device pointer
      in the dsa_switchdev_event_work structure, so let's finally do that.
      As an added bonus, when we start offloading local entries pointing
      towards the bridge, these will now properly appear as 'offloaded' in
      'bridge fdb' (this was not possible before, because 'dev' was assumed to
      only be a DSA net device):
      
      00:01:02:03:04:05 dev br0 vlan 1 offload master br0 permanent
      00:01:02:03:04:05 dev br0 offload master br0 permanent
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4bed397c
    • Vladimir Oltean's avatar
      net: dsa: include fdb entries pointing to bridge in the host fdb list · 81a619f7
      Vladimir Oltean authored
      The bridge supports a legacy way of adding local (non-forwarded) FDB
      entries, which works on an individual port basis:
      
      bridge fdb add dev swp0 00:01:02:03:04:05 master local
      
      As well as a new way, added by Roopa Prabhu in commit 3741873b
      ("bridge: allow adding of fdb entries pointing to the bridge device"):
      
      bridge fdb add dev br0 00:01:02:03:04:05 self local
      
      The two commands are functionally equivalent, except that the first one
      produces an entry with fdb->dst == swp0, and the other an entry with
      fdb->dst == NULL. The confusing part, though, is that even if fdb->dst
      is swp0 for the 'local on port' entry, that destination is not used.
      
      Nonetheless, the idea is that the bridge has reference counting for
      local entries, and local entries pointing towards the bridge are still
      'as local' as local entries for a port.
      
      The bridge adds the MAC addresses of the interfaces automatically as
      FDB entries with is_local=1. For the MAC address of the ports, fdb->dst
      will be equal to the port, and for the MAC address of the bridge,
      fdb->dst will point towards the bridge (i.e. be NULL). Therefore, if the
      MAC address of the bridge is not inherited from either of the physical
      ports, then we must explicitly catch local FDB entries emitted towards
      the br0, otherwise we'll miss the MAC address of the bridge (and, of
      course, any entry with 'bridge add dev br0 ... self local').
      Co-developed-by: default avatarTobias Waldekranz <tobias@waldekranz.com>
      Signed-off-by: default avatarTobias Waldekranz <tobias@waldekranz.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      81a619f7
    • Tobias Waldekranz's avatar
      net: dsa: include bridge addresses which are local in the host fdb list · 10fae4ac
      Tobias Waldekranz authored
      The bridge automatically creates local (not forwarded) fdb entries
      pointing towards physical ports with their interface MAC addresses.
      For switchdev, the significance of these fdb entries is the exact
      opposite of that of non-local entries: instead of sending these frame
      outwards, we must send them inwards (towards the host).
      
      NOTE: The bridge's own MAC address is also "local". If that address is
      not shared with any port, the bridge's MAC is not be added by this
      functionality - but the following commit takes care of that case.
      
      NOTE 2: We mark these addresses as host-filtered regardless of the value
      of ds->assisted_learning_on_cpu_port. This is because, as opposed to the
      speculative logic done for dynamic address learning on foreign
      interfaces, the local FDB entries are rather fixed, so there isn't any
      risk of them migrating from one bridge port to another.
      Signed-off-by: default avatarTobias Waldekranz <tobias@waldekranz.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      10fae4ac
    • Vladimir Oltean's avatar
      net: dsa: sync static FDB entries on foreign interfaces to hardware · 3068d466
      Vladimir Oltean authored
      DSA is able to install FDB entries towards the CPU port for addresses
      which were dynamically learnt by the software bridge on foreign
      interfaces that are in the same bridge with a DSA switch interface.
      Since this behavior is opportunistic, it is guarded by the
      "assisted_learning_on_cpu_port" property which can be enabled by drivers
      and is not done automatically (since certain switches may support
      address learning of packets coming from the CPU port).
      
      But if those FDB entries added on the foreign interfaces are static
      (added by the user) instead of dynamically learnt, currently DSA does
      not do anything (and arguably it should).
      
      Because static FDB entries are not supposed to move on their own, there
      is no downside in reusing the "assisted_learning_on_cpu_port" logic to
      sync static FDB entries to the DSA CPU port unconditionally, even if
      assisted_learning_on_cpu_port is not requested by the driver.
      
      For example, this situation:
      
         br0
         / \
      swp0 dummy0
      
      $ bridge fdb add 02:00:de:ad:00:01 dev dummy0 vlan 1 master static
      
      Results in DSA adding an entry in the hardware FDB, pointing this
      address towards the CPU port.
      
      The same is true for entries added to the bridge itself, e.g:
      
      $ bridge fdb add 02:00:de:ad:00:01 dev br0 vlan 1 self local
      
      (except that right now, DSA still ignores 'local' FDB entries, this will
      be changed in a later patch)
      Signed-off-by: default avatarTobias Waldekranz <tobias@waldekranz.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3068d466
    • Vladimir Oltean's avatar
      net: dsa: install the host MDB and FDB entries in the master's RX filter · 26ee7b06
      Vladimir Oltean authored
      If the DSA master implements strict address filtering, then the unicast
      and multicast addresses kept by the DSA CPU ports should be synchronized
      with the address lists of the DSA master.
      
      Note that we want the synchronization of the master's address lists even
      if the DSA switch doesn't support unicast/multicast database operations,
      on the premises that the packets will be flooded to the CPU in that
      case, and we should still instruct the master to receive them. This is
      why we do the dev_uc_add() etc first, even if dsa_port_notify() returns
      -EOPNOTSUPP. In turn, dev_uc_add() and friends return error only if
      memory allocation fails, so it is probably ok to check and propagate
      that error code and not just ignore it.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      26ee7b06
    • Vladimir Oltean's avatar
      net: dsa: reference count the FDB addresses at the cross-chip notifier level · 3f6e32f9
      Vladimir Oltean authored
      The same concerns expressed for host MDB entries are valid for host FDBs
      just as well:
      
      - in the case of multiple bridges spanning the same switch chip, deleting
        a host FDB entry that belongs to one bridge will result in breakage to
        the other bridge
      - not deleting FDB entries across DSA links means that the switch's
        hardware tables will eventually run out, given enough wear&tear
      
      So do the same thing and introduce reference counting for CPU ports and
      DSA links using the same data structures as we have for MDB entries.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3f6e32f9
    • Vladimir Oltean's avatar
      net: dsa: introduce a separate cross-chip notifier type for host FDBs · 3dc80afc
      Vladimir Oltean authored
      DSA treats some bridge FDB entries by trapping them to the CPU port.
      Currently, the only class of such entries are FDB addresses learnt by
      the software bridge on a foreign interface. However there are many more
      to be added:
      
      - FDB entries with the is_local flag (for termination) added by the
        bridge on the user ports (typically containing the MAC address of the
        bridge port)
      - FDB entries pointing towards the bridge net device (for termination).
        Typically these contain the MAC address of the bridge net device.
      - Static FDB entries installed on a foreign interface that is in the
        same bridge with a DSA user port.
      
      The reason why a separate cross-chip notifier for host FDBs is justified
      compared to normal FDBs is the same as in the case of host MDBs: the
      cross-chip notifier matching function in switch.c should avoid
      installing these entries on routing ports that route towards the
      targeted switch, but not towards the CPU. This is required in order to
      have proper support for H-like multi-chip topologies.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3dc80afc
    • Vladimir Oltean's avatar
      net: dsa: reference count the MDB entries at the cross-chip notifier level · 161ca59d
      Vladimir Oltean authored
      Ever since the cross-chip notifiers were introduced, the design was
      meant to be simplistic and just get the job done without worrying too
      much about dangling resources left behind.
      
      For example, somebody installs an MDB entry on sw0p0 in this daisy chain
      topology. It gets installed using ds->ops->port_mdb_add() on sw0p0,
      sw1p4 and sw2p4.
      
                                                          |
                 sw0p0     sw0p1     sw0p2     sw0p3     sw0p4
              [  user ] [  user ] [  user ] [  dsa  ] [  cpu  ]
              [   x   ] [       ] [       ] [       ] [       ]
                                                |
                                                +---------+
                                                          |
                 sw1p0     sw1p1     sw1p2     sw1p3     sw1p4
              [  user ] [  user ] [  user ] [  dsa  ] [  dsa  ]
              [       ] [       ] [       ] [       ] [   x   ]
                                                |
                                                +---------+
                                                          |
                 sw2p0     sw2p1     sw2p2     sw2p3     sw2p4
              [  user ] [  user ] [  user ] [  user ] [  dsa  ]
              [       ] [       ] [       ] [       ] [   x   ]
      
      Then the same person deletes that MDB entry. The cross-chip notifier for
      deletion only matches sw0p0:
      
                                                          |
                 sw0p0     sw0p1     sw0p2     sw0p3     sw0p4
              [  user ] [  user ] [  user ] [  dsa  ] [  cpu  ]
              [   x   ] [       ] [       ] [       ] [       ]
                                                |
                                                +---------+
                                                          |
                 sw1p0     sw1p1     sw1p2     sw1p3     sw1p4
              [  user ] [  user ] [  user ] [  dsa  ] [  dsa  ]
              [       ] [       ] [       ] [       ] [       ]
                                                |
                                                +---------+
                                                          |
                 sw2p0     sw2p1     sw2p2     sw2p3     sw2p4
              [  user ] [  user ] [  user ] [  user ] [  dsa  ]
              [       ] [       ] [       ] [       ] [       ]
      
      Why?
      
      Because the DSA links are 'trunk' ports, if we just go ahead and delete
      the MDB from sw1p4 and sw2p4 directly, we might delete those multicast
      entries when they are still needed. Just consider the fact that somebody
      does:
      
      - add a multicast MAC address towards sw0p0 [ via the cross-chip
        notifiers it gets installed on the DSA links too ]
      - add the same multicast MAC address towards sw0p1 (another port of that
        same switch)
      - delete the same multicast MAC address from sw0p0.
      
      At this point, if we deleted the MAC address from the DSA links, it
      would be flooded, even though there is still an entry on switch 0 which
      needs it not to.
      
      So that is why deletions only match the targeted source port and nothing
      on DSA links. Of course, dangling resources means that the hardware
      tables will eventually run out given enough additions/removals, but hey,
      at least it's simple.
      
      But there is a bigger concern which needs to be addressed, and that is
      our support for SWITCHDEV_OBJ_ID_HOST_MDB. DSA simply translates such an
      object into a dsa_port_host_mdb_add() which ends up as ds->ops->port_mdb_add()
      on the upstream port, and a similar thing happens on deletion:
      dsa_port_host_mdb_del() will trigger ds->ops->port_mdb_del() on the
      upstream port.
      
      When there are 2 VLAN-unaware bridges spanning the same switch (which is
      a use case DSA proudly supports), each bridge will install its own
      SWITCHDEV_OBJ_ID_HOST_MDB entries. But upon deletion, DSA goes ahead and
      emits a DSA_NOTIFIER_MDB_DEL for dp->cpu_dp, which is shared between the
      user ports enslaved to br0 and the user ports enslaved to br1. Not good.
      The host-trapped multicast addresses installed by br1 will be deleted
      when any state changes in br0 (IGMP timers expire, or ports leave, etc).
      
      To avoid this, we could of course go the route of the zero-sum game and
      delete the DSA_NOTIFIER_MDB_DEL call for dp->cpu_dp. But the better
      design is to just admit that on shared ports like DSA links and CPU
      ports, we should be reference counting calls, even if this consumes some
      dynamic memory which DSA has traditionally avoided. On the flip side,
      the hardware tables of switches are limited in size, so it would be good
      if the OS managed them properly instead of having them eventually
      overflow.
      
      To address the memory usage concern, we only apply the refcounting of
      MDB entries on ports that are really shared (CPU ports and DSA links)
      and not on user ports. In a typical single-switch setup, this means only
      the CPU port (and the host MDB entries are not that many, really).
      
      The name of the newly introduced data structures (dsa_mac_addr) is
      chosen in such a way that will be reusable for host FDB entries (next
      patch).
      
      With this change, we can finally have the same matching logic for the
      MDB additions and deletions, as well as for their host-trapped variants.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      161ca59d
    • Vladimir Oltean's avatar
      net: dsa: introduce a separate cross-chip notifier type for host MDBs · b8e997c4
      Vladimir Oltean authored
      Commit abd49535 ("net: dsa: execute dsa_switch_mdb_add only for
      routing port in cross-chip topologies") does a surprisingly good job
      even for the SWITCHDEV_OBJ_ID_HOST_MDB use case, where DSA simply
      translates a switchdev object received on dp into a cross-chip notifier
      for dp->cpu_dp.
      
      To visualize how that works, imagine the daisy chain topology below and
      consider a SWITCHDEV_OBJ_ID_HOST_MDB object emitted on sw2p0. How does
      the cross-chip notifier know to match on all the right ports (sw0p4, the
      dedicated CPU port, sw1p4, an upstream DSA link, and sw2p4, another
      upstream DSA link)?
      
                                                      |
             sw0p0     sw0p1     sw0p2     sw0p3     sw0p4
          [  user ] [  user ] [  user ] [  dsa  ] [  cpu  ]
          [       ] [       ] [       ] [       ] [   x   ]
                                            |
                                            +---------+
                                                      |
             sw1p0     sw1p1     sw1p2     sw1p3     sw1p4
          [  user ] [  user ] [  user ] [  dsa  ] [  dsa  ]
          [       ] [       ] [       ] [       ] [   x   ]
                                            |
                                            +---------+
                                                      |
             sw2p0     sw2p1     sw2p2     sw2p3     sw2p4
          [  user ] [  user ] [  user ] [  user ] [  dsa  ]
          [       ] [       ] [       ] [       ] [   x   ]
      
      The answer is simple: the dedicated CPU port of sw2p0 is sw0p4, and
      dsa_routing_port returns the upstream port for all switches.
      
      That is fine, but there are other topologies where this does not work as
      well. There are trees with "H" topologies in the wild, where there are 2
      or more switches with DSA links between them, but every switch has its
      dedicated CPU port. For these topologies, it seems stupid for the neighbor
      switches to install an MDB entry on the routing port, since these
      multicast addresses are fundamentally different than the usual ones we
      support (and that is the justification for this patch, to introduce the
      concept of a termination plane multicast MAC address, as opposed to a
      forwarding plane multicast MAC address).
      
      For example, when a SWITCHDEV_OBJ_ID_HOST_MDB would get added to sw0p0,
      without this patch, it would get treated as a regular port MDB on sw0p2
      and it would match on the ports below (including the sw1p3 routing port).
      
                               |                                  |
          sw0p0     sw0p1     sw0p2     sw0p3          sw1p3     sw1p2     sw1p1     sw1p0
       [  user ] [  user ] [  cpu  ] [  dsa  ]      [  dsa  ] [  cpu  ] [  user ] [  user ]
       [       ] [       ] [   x   ] [       ] ---- [   x   ] [       ] [       ] [       ]
      
      With the patch, the host MDB notifier on sw0p0 matches only on the local
      switch, which is what we want for a termination plane address.
      
                               |                                  |
          sw0p0     sw0p1     sw0p2     sw0p3          sw1p3     sw1p2     sw1p1     sw1p0
       [  user ] [  user ] [  cpu  ] [  dsa  ]      [  dsa  ] [  cpu  ] [  user ] [  user ]
       [       ] [       ] [   x   ] [       ] ---- [       ] [       ] [       ] [       ]
      
      Name this new matching function "dsa_switch_host_address_match" since we
      will be reusing it soon for host FDB entries as well.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b8e997c4
    • Vladimir Oltean's avatar
      net: dsa: introduce dsa_is_upstream_port and dsa_switch_is_upstream_of · 63609c8f
      Vladimir Oltean authored
      In preparation for the new cross-chip notifiers for host addresses,
      let's introduce some more topology helpers which we are going to use to
      discern switches that are in our path towards the dedicated CPU port
      from switches that aren't.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      63609c8f
    • Vladimir Oltean's avatar
      net: dsa: delete dsa_legacy_fdb_add and dsa_legacy_fdb_del · b117e1e8
      Vladimir Oltean authored
      We want to add reference counting for FDB entries in cross-chip
      topologies, and in order for that to have any chance of working and not
      be unbalanced (leading to entries which are never deleted), we need to
      ensure that higher layers are sane, because if they aren't, it's garbage
      in, garbage out.
      
      For example, if we add a bridge FDB entry twice, the bridge properly
      errors out:
      
      $ bridge fdb add dev swp0 00:01:02:03:04:07 master static
      $ bridge fdb add dev swp0 00:01:02:03:04:07 master static
      RTNETLINK answers: File exists
      
      However, the same thing cannot be said about the bridge bypass
      operations:
      
      $ bridge fdb add dev swp0 00:01:02:03:04:07
      $ bridge fdb add dev swp0 00:01:02:03:04:07
      $ bridge fdb add dev swp0 00:01:02:03:04:07
      $ bridge fdb add dev swp0 00:01:02:03:04:07
      $ echo $?
      0
      
      But one 'bridge fdb del' is enough to remove the entry, no matter how
      many times it was added.
      
      The bridge bypass operations are impossible to maintain in these
      circumstances and lack of support for reference counting the cross-chip
      notifiers is holding us back from making further progress, so just drop
      support for them. The only way left for users to install static bridge
      FDB entries is the proper one, using the "master static" flags.
      
      With this change, rtnl_fdb_add() falls back to calling
      ndo_dflt_fdb_add() which uses the duplicate-exclusive variant of
      dev_uc_add(): dev_uc_add_excl(). Because DSA does not (yet) declare
      IFF_UNICAST_FLT, this results in us going to promiscuous mode:
      
      $ bridge fdb add dev swp0 00:01:02:03:04:05
      [   28.206743] device swp0 entered promiscuous mode
      $ bridge fdb add dev swp0 00:01:02:03:04:05
      RTNETLINK answers: File exists
      
      So even if it does not completely fail, there is at least some indication
      that it is behaving differently from before, and closer to user space
      expectations, I would argue (the lack of a "local|static" specifier
      defaults to "local", or "host-only", so dev_uc_add() is a reasonable
      default implementation). If the generic implementation of .ndo_fdb_add
      provided by Vlad Yasevich is a proof of anything, it only proves that
      the implementation provided by DSA was always wrong, by not looking at
      "ndm->ndm_state & NUD_NOARP" (the "static" flag which means that the FDB
      entry points outwards) and "ndm->ndm_state & NUD_PERMANENT" (the "local"
      flag which means that the FDB entry points towards the host). It all
      used to mean the same thing to DSA.
      
      Update the documentation so that the users are not confused about what's
      going on.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b117e1e8
    • Vladimir Oltean's avatar
      net: bridge: allow br_fdb_replay to be called for the bridge device · f851a721
      Vladimir Oltean authored
      When a port joins a bridge which already has local FDB entries pointing
      to the bridge device itself, we would like to offload those, so allow
      the "dev" argument to be equal to the bridge too. The code already does
      what we need in that case.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f851a721
    • Tobias Waldekranz's avatar
      net: bridge: switchdev: send FDB notifications for host addresses · 6eb38bf8
      Tobias Waldekranz authored
      Treat addresses added to the bridge itself in the same way as regular
      ports and send out a notification so that drivers may sync it down to
      the hardware FDB.
      Signed-off-by: default avatarTobias Waldekranz <tobias@waldekranz.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6eb38bf8
    • Vladimir Oltean's avatar
      net: bridge: use READ_ONCE() and WRITE_ONCE() compiler barriers for fdb->dst · 3e19ae7c
      Vladimir Oltean authored
      Annotate the writer side of fdb->dst:
      
      - fdb_create()
      - br_fdb_update()
      - fdb_add_entry()
      - br_fdb_external_learn_add()
      
      with WRITE_ONCE() and the reader side:
      
      - br_fdb_test_addr()
      - br_fdb_update()
      - fdb_fill_info()
      - fdb_add_entry()
      - fdb_delete_by_addr_and_port()
      - br_fdb_external_learn_add()
      - br_switchdev_fdb_notify()
      
      with compiler barriers such that the readers do not attempt to reload
      fdb->dst multiple times, leading to potentially different destination
      ports when the fdb entry is updated concurrently.
      
      This is especially important in read-side sections where fdb->dst is
      used more than once, but let's convert all accesses for the sake of
      uniformity.
      Suggested-by: default avatarNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3e19ae7c
  2. 28 Jun, 2021 9 commits