1. 14 Dec, 2022 7 commits
  2. 13 Dec, 2022 1 commit
  3. 12 Dec, 2022 32 commits
    • Jakub Kicinski's avatar
      Merge branch 'net-ipa-enable-ipa-v4-7-support' · c4b7a297
      Jakub Kicinski authored
      Alex Elder says:
      
      ====================
      net: ipa: enable IPA v4.7 support
      
      The first patch in this series adds "qcom,sm6350-ipa" as a possible
      IPA compatible string, for the Qualcomm SM6350 SoC.  That SoC uses
      IPA v4.7
      
      The second patch in this series adds code that enables support for
      IPA v4.7.  DTS updates that make use of these will be merged later.
      ====================
      
      Link: https://lore.kernel.org/r/20221208211529.757669-1-elder@linaro.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c4b7a297
    • Alex Elder's avatar
      net: ipa: add IPA v4.7 support · b310de78
      Alex Elder authored
      Add the necessary register and data definitions needed for IPA v4.7,
      which is found on the SM6350 SoC.
      Co-developed-by: default avatarLuca Weiss <luca.weiss@fairphone.com>
      Signed-off-by: default avatarLuca Weiss <luca.weiss@fairphone.com>
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b310de78
    • Luca Weiss's avatar
      dt-bindings: net: qcom,ipa: Add SM6350 compatible · 5071429f
      Luca Weiss authored
      Add support for SM6350, which uses IPA v4.7.
      Signed-off-by: default avatarLuca Weiss <luca.weiss@fairphone.com>
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Acked-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5071429f
    • Coco Li's avatar
      bnxt: Use generic HBH removal helper in tx path · b6488b16
      Coco Li authored
      Eric Dumazet implemented Big TCP that allowed bigger TSO/GRO packet sizes
      for IPv6 traffic. See patch series:
      'commit 89527be8 ("net: add IFLA_TSO_{MAX_SIZE|SEGS} attributes")'
      
      This reduces the number of packets traversing the networking stack and
      should usually improves performance. However, it also inserts a
      temporary Hop-by-hop IPv6 extension header.
      
      Using the HBH header removal method in the previous patch, the extra header
      be removed in bnxt drivers to allow it to send big TCP packets (bigger
      TSO packets) as well.
      
      Tested:
      Compiled locally
      
      To further test functional correctness, update the GSO/GRO limit on the
      physical NIC:
      
      ip link set eth0 gso_max_size 181000
      ip link set eth0 gro_max_size 181000
      
      Note that if there are bonding or ipvan devices on top of the physical
      NIC, their GSO sizes need to be updated as well.
      
      Then, IPv6/TCP packets with sizes larger than 64k can be observed.
      Signed-off-by: default avatarCoco Li <lixiaoyan@google.com>
      Reviewed-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Tested-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Link: https://lore.kernel.org/r/20221210041646.3587757-2-lixiaoyan@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b6488b16
    • Coco Li's avatar
      IPv6/GRO: generic helper to remove temporary HBH/jumbo header in driver · 89300468
      Coco Li authored
      IPv6/TCP and GRO stacks can build big TCP packets with an added
      temporary Hop By Hop header.
      
      Is GSO is not involved, then the temporary header needs to be removed in
      the driver. This patch provides a generic helper for drivers that need
      to modify their headers in place.
      
      Tested:
      Compiled and ran with ethtool -K eth1 tso off
      Could send Big TCP packets
      Signed-off-by: default avatarCoco Li <lixiaoyan@google.com>
      Link: https://lore.kernel.org/r/20221210041646.3587757-1-lixiaoyan@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      89300468
    • Jakub Kicinski's avatar
      Merge branch 'bridge-mcast-extensions-for-evpn' · 8150f0cf
      Jakub Kicinski authored
      Ido Schimmel says:
      
      ====================
      bridge: mcast: Extensions for EVPN
      
      tl;dr
      =====
      
      This patchset creates feature parity between user space and the kernel
      and allows the former to install and replace MDB port group entries with
      a source list and associated filter mode. This is required for EVPN use
      cases where multicast state is not derived from snooped IGMP/MLD
      packets, but instead derived from EVPN routes exchanged by the control
      plane in user space.
      
      Background
      ==========
      
      IGMPv3 [1] and MLDv2 [2] differ from earlier versions of the protocols
      in that they add support for source-specific multicast. That is, hosts
      can advertise interest in listening to a particular multicast address
      only from specific source addresses or from all sources except for
      specific source addresses.
      
      In kernel 5.10 [3][4], the bridge driver gained the ability to snoop
      IGMPv3/MLDv2 packets and install corresponding MDB port group entries.
      For example, a snooped IGMPv3 Membership Report that contains a single
      MODE_IS_EXCLUDE record for group 239.10.10.10 with sources 192.0.2.1,
      192.0.2.2, 192.0.2.20 and 192.0.2.21 would trigger the creation of these
      entries:
      
       # bridge -d mdb show
       dev br0 port veth1 grp 239.10.10.10 src 192.0.2.21 temp filter_mode include proto kernel  blocked
       dev br0 port veth1 grp 239.10.10.10 src 192.0.2.20 temp filter_mode include proto kernel  blocked
       dev br0 port veth1 grp 239.10.10.10 src 192.0.2.2 temp filter_mode include proto kernel  blocked
       dev br0 port veth1 grp 239.10.10.10 src 192.0.2.1 temp filter_mode include proto kernel  blocked
       dev br0 port veth1 grp 239.10.10.10 temp filter_mode exclude source_list 192.0.2.21/0.00,192.0.2.20/0.00,192.0.2.2/0.00,192.0.2.1/0.00 proto kernel
      
      While the kernel can install and replace entries with a filter mode and
      source list, user space cannot. It can only add EXCLUDE entries with an
      empty source list, which is sufficient for IGMPv2/MLDv1, but not for
      IGMPv3/MLDv2.
      
      Use cases where the multicast state is not derived from snooped packets,
      but instead derived from routes exchanged by the user space control
      plane require feature parity between user space and the kernel in terms
      of MDB configuration. Such a use case is detailed in the next section.
      
      Motivation
      ==========
      
      RFC 7432 [5] defines a "MAC/IP Advertisement route" (type 2) [6] that
      allows NVE switches in the EVPN network to advertise and learn
      reachability information for unicast MAC addresses. Traffic destined to
      a unicast MAC address can therefore be selectively forwarded to a single
      NVE switch behind which the MAC is located.
      
      The same is not true for IP multicast traffic. Such traffic is simply
      flooded as BUM to all NVE switches in the broadcast domain (BD),
      regardless if a switch has interested receivers for the multicast stream
      or not. This is especially problematic for overlay networks that make
      heavy use of multicast.
      
      The issue is addressed by RFC 9251 [7] that defines a "Selective
      Multicast Ethernet Tag Route" (type 6) [8] which allows NVE switches in
      the EVPN network to advertise multicast streams that they are interested
      in. This is done by having each switch suppress IGMP/MLD packets from
      being transmitted to the NVE network and instead communicate the
      information over BGP to other switches.
      
      As far as the bridge driver is concerned, the above means that the
      multicast state (i.e., {multicast address, group timer, filter-mode,
      (source records)}) for the VXLAN bridge port is not populated by the
      kernel from snooped IGMP/MLD packets (they are suppressed), but instead
      by user space. Specifically, by the routing daemon that is exchanging
      EVPN routes with other NVE switches.
      
      Changes are obviously also required in the VXLAN driver, but they are
      the subject of future patchsets. See the "Future work" section.
      
      Implementation
      ==============
      
      The user interface is extended to allow user space to specify the filter
      mode of the MDB port group entry and its source list. Replace support is
      also added so that user space would not need to remove an entry and
      re-add it only to edit its source list or filter mode, as that would
      result in packet loss. Example usage:
      
       # bridge mdb replace dev br0 port dummy10 grp 239.1.1.1 permanent \
      	source_list 192.0.2.1,192.0.2.3 filter_mode exclude proto zebra
       # bridge -d -s mdb show
       dev br0 port dummy10 grp 239.1.1.1 src 192.0.2.3 permanent filter_mode include proto zebra  blocked    0.00
       dev br0 port dummy10 grp 239.1.1.1 src 192.0.2.1 permanent filter_mode include proto zebra  blocked    0.00
       dev br0 port dummy10 grp 239.1.1.1 permanent filter_mode exclude source_list 192.0.2.3/0.00,192.0.2.1/0.00 proto zebra     0.00
      
      The netlink interface is extended with a few new attributes in the
      RTM_NEWMDB request message:
      
      [ struct nlmsghdr ]
      [ struct br_port_msg ]
      [ MDBA_SET_ENTRY ]
      	struct br_mdb_entry
      [ MDBA_SET_ENTRY_ATTRS ]
      	[ MDBE_ATTR_SOURCE ]
      		struct in_addr / struct in6_addr
      	[ MDBE_ATTR_SRC_LIST ]		// new
      		[ MDBE_SRC_LIST_ENTRY ]
      			[ MDBE_SRCATTR_ADDRESS ]
      				struct in_addr / struct in6_addr
      		[ ...]
      	[ MDBE_ATTR_GROUP_MODE ]	// new
      		u8
      	[ MDBE_ATTR_RTPORT ]		// new
      		u8
      
      No changes are required in RTM_NEWMDB responses and notifications, as
      all the information can already be dumped by the kernel today.
      
      Testing
      =======
      
      Tested with existing bridge multicast selftests: bridge_igmp.sh,
      bridge_mdb_port_down.sh, bridge_mdb.sh, bridge_mld.sh,
      bridge_vlan_mcast.sh.
      
      In addition, added many new test cases for existing as well as for new
      MDB functionality.
      
      Patchset overview
      =================
      
      Patches #1-#8 are non-functional preparations for the core changes in
      later patches.
      
      Patches #9-#10 allow user space to install (*, G) entries with a source
      list and associated filter mode. Specifically, patch #9 adds the
      necessary kernel plumbing and patch #10 exposes the new functionality to
      user space via a few new attributes.
      
      Patch #11 allows user space to specify the routing protocol of new MDB
      port group entries so that a routing daemon could differentiate between
      entries installed by it and those installed by an administrator.
      
      Patch #12 allows user space to replace MDB port group entries. This is
      useful, for example, when user space wants to add a new source to a
      source list. Instead of deleting a (*, G) entry and re-adding it with an
      extended source list (which would result in packet loss), user space can
      simply replace the current entry.
      
      Patches #13-#14 add tests for existing MDB functionality as well as for
      all new functionality added in this patchset.
      
      Future work
      ===========
      
      The VXLAN driver will need to be extended with an MDB so that it could
      selectively forward IP multicast traffic to NVE switches with interested
      receivers instead of simply flooding it to all switches as BUM.
      
      The idea is to reuse the existing MDB interface for the VXLAN driver in
      a similar way to how the FDB interface is shared between the bridge and
      VXLAN drivers.
      
      From command line perspective, configuration will look as follows:
      
       # bridge mdb add dev br0 port vxlan0 grp 239.1.1.1 permanent \
      	filter_mode exclude source_list 198.50.100.1,198.50.100.2
      
       # bridge mdb add dev vxlan0 port vxlan0 grp 239.1.1.1 permanent \
      	filter_mode include source_list 198.50.100.3,198.50.100.4 \
      	dst 192.0.2.1 dst_port 4789 src_vni 2
      
       # bridge mdb add dev vxlan0 port vxlan0 grp 239.1.1.1 permanent \
      	filter_mode exclude source_list 198.50.100.1,198.50.100.2 \
      	dst 192.0.2.2 dst_port 4789 src_vni 2
      
      Where the first command is enabled by this set, but the next two will be
      the subject of future work.
      
      From netlink perspective, the existing PF_BRIDGE/RTM_*MDB messages will
      be extended to the VXLAN driver. This means that a few new attributes
      will be added (e.g., 'MDBE_ATTR_SRC_VNI') and that the handlers for
      these messages will need to move to net/core/rtnetlink.c. The rtnetlink
      code will call into the appropriate driver based on the ifindex
      specified in the ancillary header.
      
      iproute2 patches can be found here [9].
      
      Changelog
      =========
      
      Since v1 [10]:
      
      * Patch #12: Remove extack from br_mdb_replace_group_sg().
      * Patch #12: Change 'nlflags' to u16 and move it after 'filter_mode' to
        pack the structure.
      
      Since RFC [11]:
      
      * Patch #6: New patch.
      * Patch #9: Use an array instead of a list to store source entries.
      * Patch #10: Use an array instead of list to store source entries.
      * Patch #10: Drop br_mdb_config_attrs_fini().
      * Patch #11: Reject protocol for host entries.
      * Patch #13: New patch.
      * Patch #14: New patch.
      
      [1] https://datatracker.ietf.org/doc/html/rfc3376
      [2] https://www.rfc-editor.org/rfc/rfc3810
      [3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6af52ae2ed14a6bc756d5606b29097dfd76740b8
      [4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=68d4fd30c83b1b208e08c954cd45e6474b148c87
      [5] https://datatracker.ietf.org/doc/html/rfc7432
      [6] https://datatracker.ietf.org/doc/html/rfc7432#section-7.2
      [7] https://datatracker.ietf.org/doc/html/rfc9251
      [8] https://datatracker.ietf.org/doc/html/rfc9251#section-9.1
      [9] https://github.com/idosch/iproute2/commits/submit/mdb_v1
      [10] https://lore.kernel.org/netdev/20221208152839.1016350-1-idosch@nvidia.com/
      [11] https://lore.kernel.org/netdev/20221018120420.561846-1-idosch@nvidia.com/
      ====================
      
      Link: https://lore.kernel.org/r/20221210145633.1328511-1-idosch@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8150f0cf
    • Ido Schimmel's avatar
      selftests: forwarding: Add bridge MDB test · b6d00da0
      Ido Schimmel authored
      Add a selftests that includes the following test cases:
      
      1. Configuration tests. Both valid and invalid configurations are
         tested across all entry types (e.g., L2, IPv4).
      
      2. Forwarding tests. Both host and port group entries are tested across
         all entry types.
      
      3. Interaction between user installed MDB entries and IGMP / MLD control
         packets.
      
      Example output:
      
      INFO: # Host entries configuration tests
      TEST: Common host entries configuration tests (IPv4)                [ OK ]
      TEST: Common host entries configuration tests (IPv6)                [ OK ]
      TEST: Common host entries configuration tests (L2)                  [ OK ]
      
      INFO: # Port group entries configuration tests - (*, G)
      TEST: Common port group entries configuration tests (IPv4 (*, G))   [ OK ]
      TEST: Common port group entries configuration tests (IPv6 (*, G))   [ OK ]
      TEST: IPv4 (*, G) port group entries configuration tests            [ OK ]
      TEST: IPv6 (*, G) port group entries configuration tests            [ OK ]
      
      INFO: # Port group entries configuration tests - (S, G)
      TEST: Common port group entries configuration tests (IPv4 (S, G))   [ OK ]
      TEST: Common port group entries configuration tests (IPv6 (S, G))   [ OK ]
      TEST: IPv4 (S, G) port group entries configuration tests            [ OK ]
      TEST: IPv6 (S, G) port group entries configuration tests            [ OK ]
      
      INFO: # Port group entries configuration tests - L2
      TEST: Common port group entries configuration tests (L2 (*, G))     [ OK ]
      TEST: L2 (*, G) port group entries configuration tests              [ OK ]
      
      INFO: # Forwarding tests
      TEST: IPv4 host entries forwarding tests                            [ OK ]
      TEST: IPv6 host entries forwarding tests                            [ OK ]
      TEST: L2 host entries forwarding tests                              [ OK ]
      TEST: IPv4 port group "exclude" entries forwarding tests            [ OK ]
      TEST: IPv6 port group "exclude" entries forwarding tests            [ OK ]
      TEST: IPv4 port group "include" entries forwarding tests            [ OK ]
      TEST: IPv6 port group "include" entries forwarding tests            [ OK ]
      TEST: L2 port entries forwarding tests                              [ OK ]
      
      INFO: # Control packets tests
      TEST: IGMPv3 MODE_IS_INCLUE tests                                   [ OK ]
      TEST: MLDv2 MODE_IS_INCLUDE tests                                   [ OK ]
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b6d00da0
    • Ido Schimmel's avatar
      selftests: forwarding: Rename bridge_mdb test · f9923a67
      Ido Schimmel authored
      The test is only concerned with host MDB entries and not with MDB
      entries as a whole. Rename the test to reflect that.
      
      Subsequent patches will add a more general test that will contain the
      test cases for host MDB entries and remove the current test.
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f9923a67
    • Ido Schimmel's avatar
      bridge: mcast: Support replacement of MDB port group entries · 61f21835
      Ido Schimmel authored
      Now that user space can specify additional attributes of port group
      entries such as filter mode and source list, it makes sense to allow
      user space to atomically modify these attributes by replacing entries
      instead of forcing user space to delete the entries and add them back.
      
      Replace MDB port group entries when the 'NLM_F_REPLACE' flag is
      specified in the netlink message header.
      
      When a (*, G) entry is replaced, update the following attributes: Source
      list, state, filter mode, protocol and flags. If the entry is temporary
      and in EXCLUDE mode, reset the group timer to the group membership
      interval. If the entry is temporary and in INCLUDE mode, reset the
      source timers of associated sources to the group membership interval.
      
      Examples:
      
       # bridge mdb replace dev br0 port dummy10 grp 239.1.1.1 permanent source_list 192.0.2.1,192.0.2.2 filter_mode include
       # bridge -d -s mdb show
       dev br0 port dummy10 grp 239.1.1.1 src 192.0.2.2 permanent filter_mode include proto static     0.00
       dev br0 port dummy10 grp 239.1.1.1 src 192.0.2.1 permanent filter_mode include proto static     0.00
       dev br0 port dummy10 grp 239.1.1.1 permanent filter_mode include source_list 192.0.2.2/0.00,192.0.2.1/0.00 proto static     0.00
      
       # bridge mdb replace dev br0 port dummy10 grp 239.1.1.1 permanent source_list 192.0.2.1,192.0.2.3 filter_mode exclude proto zebra
       # bridge -d -s mdb show
       dev br0 port dummy10 grp 239.1.1.1 src 192.0.2.3 permanent filter_mode include proto zebra  blocked    0.00
       dev br0 port dummy10 grp 239.1.1.1 src 192.0.2.1 permanent filter_mode include proto zebra  blocked    0.00
       dev br0 port dummy10 grp 239.1.1.1 permanent filter_mode exclude source_list 192.0.2.3/0.00,192.0.2.1/0.00 proto zebra     0.00
      
       # bridge mdb replace dev br0 port dummy10 grp 239.1.1.1 temp source_list 192.0.2.4,192.0.2.3 filter_mode include proto bgp
       # bridge -d -s mdb show
       dev br0 port dummy10 grp 239.1.1.1 src 192.0.2.4 temp filter_mode include proto bgp     0.00
       dev br0 port dummy10 grp 239.1.1.1 src 192.0.2.3 temp filter_mode include proto bgp     0.00
       dev br0 port dummy10 grp 239.1.1.1 temp filter_mode include source_list 192.0.2.4/259.44,192.0.2.3/259.44 proto bgp     0.00
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      61f21835
    • Ido Schimmel's avatar
      bridge: mcast: Allow user space to specify MDB entry routing protocol · 1d7b66a7
      Ido Schimmel authored
      Add the 'MDBE_ATTR_RTPORT' attribute to allow user space to specify the
      routing protocol of the MDB port group entry. Enforce a minimum value of
      'RTPROT_STATIC' to prevent user space from using protocol values that
      should only be set by the kernel (e.g., 'RTPROT_KERNEL'). Maintain
      backward compatibility by defaulting to 'RTPROT_STATIC'.
      
      The protocol is already visible to user space in RTM_NEWMDB responses
      and notifications via the 'MDBA_MDB_EATTR_RTPROT' attribute.
      
      The routing protocol allows a routing daemon to distinguish between
      entries configured by it and those configured by the administrator. Once
      MDB flush is supported, the protocol can be used as a criterion
      according to which the flush is performed.
      
      Examples:
      
       # bridge mdb add dev br0 port dummy10 grp 239.1.1.1 permanent proto kernel
       Error: integer out of range.
      
       # bridge mdb add dev br0 port dummy10 grp 239.1.1.1 permanent proto static
      
       # bridge mdb add dev br0 port dummy10 grp 239.1.1.1 src 192.0.2.1 permanent proto zebra
      
       # bridge mdb add dev br0 port dummy10 grp 239.1.1.2 permanent source_list 198.51.100.1,198.51.100.2 filter_mode include proto 250
      
       # bridge -d mdb show
       dev br0 port dummy10 grp 239.1.1.2 src 198.51.100.2 permanent filter_mode include proto 250
       dev br0 port dummy10 grp 239.1.1.2 src 198.51.100.1 permanent filter_mode include proto 250
       dev br0 port dummy10 grp 239.1.1.2 permanent filter_mode include source_list 198.51.100.2/0.00,198.51.100.1/0.00 proto 250
       dev br0 port dummy10 grp 239.1.1.1 src 192.0.2.1 permanent filter_mode include proto zebra
       dev br0 port dummy10 grp 239.1.1.1 permanent filter_mode exclude proto static
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1d7b66a7
    • Ido Schimmel's avatar
      bridge: mcast: Allow user space to add (*, G) with a source list and filter mode · 6afaae6d
      Ido Schimmel authored
      Add new netlink attributes to the RTM_NEWMDB request that allow user
      space to add (*, G) with a source list and filter mode.
      
      The RTM_NEWMDB message can already dump such entries (created by the
      kernel) so there is no need to add dump support. However, the message
      contains a different set of attributes depending if it is a request or a
      response. The naming and structure of the new attributes try to follow
      the existing ones used in the response.
      
      Request:
      
      [ struct nlmsghdr ]
      [ struct br_port_msg ]
      [ MDBA_SET_ENTRY ]
      	struct br_mdb_entry
      [ MDBA_SET_ENTRY_ATTRS ]
      	[ MDBE_ATTR_SOURCE ]
      		struct in_addr / struct in6_addr
      	[ MDBE_ATTR_SRC_LIST ]		// new
      		[ MDBE_SRC_LIST_ENTRY ]
      			[ MDBE_SRCATTR_ADDRESS ]
      				struct in_addr / struct in6_addr
      		[ ...]
      	[ MDBE_ATTR_GROUP_MODE ]	// new
      		u8
      
      Response:
      
      [ struct nlmsghdr ]
      [ struct br_port_msg ]
      [ MDBA_MDB ]
      	[ MDBA_MDB_ENTRY ]
      		[ MDBA_MDB_ENTRY_INFO ]
      			struct br_mdb_entry
      		[ MDBA_MDB_EATTR_TIMER ]
      			u32
      		[ MDBA_MDB_EATTR_SOURCE ]
      			struct in_addr / struct in6_addr
      		[ MDBA_MDB_EATTR_RTPROT ]
      			u8
      		[ MDBA_MDB_EATTR_SRC_LIST ]
      			[ MDBA_MDB_SRCLIST_ENTRY ]
      				[ MDBA_MDB_SRCATTR_ADDRESS ]
      					struct in_addr / struct in6_addr
      				[ MDBA_MDB_SRCATTR_TIMER ]
      					u8
      			[...]
      		[ MDBA_MDB_EATTR_GROUP_MODE ]
      			u8
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6afaae6d
    • Ido Schimmel's avatar
      bridge: mcast: Add support for (*, G) with a source list and filter mode · b1c8fec8
      Ido Schimmel authored
      In preparation for allowing user space to add (*, G) entries with a
      source list and associated filter mode, add the necessary plumbing to
      handle such requests.
      
      Extend the MDB configuration structure with a currently empty source
      array and filter mode that is currently hard coded to EXCLUDE.
      
      Add the source entries and the corresponding (S, G) entries before
      making the new (*, G) port group entry visible to the data path.
      
      Handle the creation of each source entry in a similar fashion to how it
      is created from the data path in response to received Membership
      Reports: Create the source entry, arm the source timer (if needed), add
      a corresponding (S, G) forwarding entry and finally mark the source
      entry as installed (by user space).
      
      Add the (S, G) entry by populating an MDB configuration structure and
      calling br_mdb_add_group_sg() as if a new entry is created by user
      space, with the sole difference that the 'src_entry' field is set to
      make sure that the group timer of such entries is never armed.
      
      Note that it is not currently possible to add more than 32 source
      entries to a port group entry. If this proves to be a problem we can
      either increase 'PG_SRC_ENT_LIMIT' or avoid forcing a limit on entries
      created by user space.
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b1c8fec8
    • Ido Schimmel's avatar
      bridge: mcast: Avoid arming group timer when (S, G) corresponds to a source · 079afd66
      Ido Schimmel authored
      User space will soon be able to install a (*, G) with a source list,
      prompting the creation of a (S, G) entry for each source.
      
      In this case, the group timer of the (S, G) entry should never be set.
      
      Solve this by adding a new field to the MDB configuration structure that
      denotes whether the (S, G) corresponds to a source or not.
      
      The field will be set in a subsequent patch where br_mdb_add_group_sg()
      is called in order to create a (S, G) entry for each user provided
      source.
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      079afd66
    • Ido Schimmel's avatar
      bridge: mcast: Add a flag for user installed source entries · a01ecb17
      Ido Schimmel authored
      There are a few places where the bridge driver differentiates between
      (S, G) entries installed by the kernel (in response to Membership
      Reports) and those installed by user space. One of them is when deleting
      an (S, G) entry corresponding to a source entry that is being deleted.
      
      While user space cannot currently add a source entry to a (*, G), it can
      add an (S, G) entry that later corresponds to a source entry created by
      the reception of a Membership Report. If this source entry is later
      deleted because its source timer expired or because the (*, G) entry is
      being deleted, the bridge driver will not delete the corresponding (S,
      G) entry if it was added by user space as permanent.
      
      This is going to be a problem when the ability to install a (*, G) with
      a source list is exposed to user space. In this case, when user space
      installs the (*, G) as permanent, then all the (S, G) entries
      corresponding to its source list will also be installed as permanent.
      When user space deletes the (*, G), all the source entries will be
      deleted and the expectation is that the corresponding (S, G) entries
      will be deleted as well.
      
      Solve this by introducing a new source entry flag denoting that the
      entry was installed by user space. When the entry is deleted, delete the
      corresponding (S, G) entry even if it was installed by user space as
      permanent, as the flag tells us that it was installed in response to the
      source entry being created.
      
      The flag will be set in a subsequent patch where source entries are
      created in response to user requests.
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a01ecb17
    • Ido Schimmel's avatar
      bridge: mcast: Expose __br_multicast_del_group_src() · 083e3534
      Ido Schimmel authored
      Expose __br_multicast_del_group_src() which is symmetric to
      br_multicast_new_group_src() and does not remove the installed {S, G}
      forwarding entry, unlike br_multicast_del_group_src().
      
      The function will be used in the error path when user space was able to
      add a new source entry, but failed to install a corresponding forwarding
      entry.
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      083e3534
    • Ido Schimmel's avatar
      bridge: mcast: Expose br_multicast_new_group_src() · fd0c6961
      Ido Schimmel authored
      Currently, new group source entries are only created in response to
      received Membership Reports. Subsequent patches are going to allow user
      space to install (*, G) entries with a source list.
      
      As a preparatory step, expose br_multicast_new_group_src() so that it
      could later be invoked from the MDB code (i.e., br_mdb.c) that handles
      RTM_NEWMDB messages.
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fd0c6961
    • Ido Schimmel's avatar
      bridge: mcast: Add a centralized error path · 160dd931
      Ido Schimmel authored
      Subsequent patches will add memory allocations in br_mdb_config_init()
      as the MDB configuration structure will include a linked list of source
      entries. This memory will need to be freed regardless if br_mdb_add()
      succeeded or failed.
      
      As a preparation for this change, add a centralized error path where the
      memory will be freed.
      
      Note that br_mdb_del() already has one error path and therefore does not
      require any changes.
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      160dd931
    • Ido Schimmel's avatar
      bridge: mcast: Place netlink policy before validation functions · 1870a2d3
      Ido Schimmel authored
      Subsequent patches are going to add additional validation functions and
      netlink policies. Some of these functions will need to perform parsing
      using nla_parse_nested() and the new policies.
      
      In order to keep all the policies next to each other, move the current
      policy to before the validation functions.
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1870a2d3
    • Ido Schimmel's avatar
      bridge: mcast: Split (*, G) and (S, G) addition into different functions · 6ff1e68e
      Ido Schimmel authored
      When the bridge is using IGMP version 3 or MLD version 2, it handles the
      addition of (*, G) and (S, G) entries differently.
      
      When a new (S, G) port group entry is added, all the (*, G) EXCLUDE
      ports need to be added to the port group of the new entry. Similarly,
      when a new (*, G) EXCLUDE port group entry is added, the port needs to
      be added to the port group of all the matching (S, G) entries.
      
      Subsequent patches will create more differences between both entry
      types. Namely, filter mode and source list can only be specified for (*,
      G) entries.
      
      Given the current and future differences between both entry types,
      handle the addition of each entry type in a different function, thereby
      avoiding the creation of one complex function.
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6ff1e68e
    • Ido Schimmel's avatar
      bridge: mcast: Do not derive entry type from its filter mode · b63e3065
      Ido Schimmel authored
      Currently, the filter mode (i.e., INCLUDE / EXCLUDE) of MDB entries
      cannot be set from user space. Instead, it is set by the kernel
      according to the entry type: (*, G) entries are treated as EXCLUDE and
      (S, G) entries are treated as INCLUDE. This allows the kernel to derive
      the entry type from its filter mode.
      
      Subsequent patches will allow user space to set the filter mode of (*,
      G) entries, making the current assumption incorrect.
      
      As a preparation, remove the current assumption and instead determine
      the entry type from its key, which is a more direct way.
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b63e3065
    • Jiapeng Chong's avatar
      qlcnic: Clean up some inconsistent indenting · 02abf84a
      Jiapeng Chong authored
      No functional modification involved.
      
      drivers/net/ethernet/qlogic/qlcnic/qlcnic_ethtool.c:714 qlcnic_validate_ring_count() warn: inconsistent indenting.
      
      Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=3419Reported-by: default avatarAbaci Robot <abaci@linux.alibaba.com>
      Signed-off-by: default avatarJiapeng Chong <jiapeng.chong@linux.alibaba.com>
      Link: https://lore.kernel.org/r/20221212055813.91154-1-jiapeng.chong@linux.alibaba.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      02abf84a
    • Tirthendu Sarkar's avatar
      i40e: allow toggling loopback mode via ndo_set_features callback · b1746fba
      Tirthendu Sarkar authored
      Add support for NETIF_F_LOOPBACK. This feature can be set via:
      $ ethtool -K eth0 loopback <on|off>
      
      This sets the MAC Tx->Rx loopback.
      
      This feature is used for the xsk selftests, and might have other uses
      too.
      Signed-off-by: default avatarTirthendu Sarkar <tirthendu.sarkar@intel.com>
      Reviewed-by: default avatarAlexander Lobakin <alexandr.lobakin@intel.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Tested-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Link: https://lore.kernel.org/r/20221209185553.2520088-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b1746fba
    • Jakub Kicinski's avatar
      Merge branch 'net-add-iff_no_addrconf-to-prevent-ipv6-addrconf' · 2a78dd22
      Jakub Kicinski authored
      Xin Long says:
      
      ====================
      net: add IFF_NO_ADDRCONF to prevent ipv6 addrconf
      
      This patchset adds IFF_NO_ADDRCONF flag for dev->priv_flags
      to prevent ipv6 addrconf, as Jiri Pirko's suggestion.
      
      For Bonding it changes to use this flag instead of IFF_SLAVE
      flag in Patch 1, and for Teaming and Net Failover it sets
      this flag before calling dev_open() in Patch 2 and 3.
      ====================
      
      Link: https://lore.kernel.org/r/cover.1670599241.git.lucien.xin@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2a78dd22
    • Xin Long's avatar
      net: failover: use IFF_NO_ADDRCONF flag to prevent ipv6 addrconf · cb54d392
      Xin Long authored
      Similar to Bonding and Team, to prevent ipv6 addrconf with
      IFF_NO_ADDRCONF in slave_dev->priv_flags for slave ports
      is also needed in net failover.
      
      Note that dev_open(slave_dev) is called in .slave_register,
      which is called after the IFF_NO_ADDRCONF flag is set in
      failover_slave_register().
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      cb54d392
    • Xin Long's avatar
      net: team: use IFF_NO_ADDRCONF flag to prevent ipv6 addrconf · 0aa64df3
      Xin Long authored
      This patch is to use IFF_NO_ADDRCONF flag to prevent ipv6 addrconf
      for Team port. This flag will be set in team_port_enter(), which
      is called before dev_open(), and cleared in team_port_leave(),
      called after dev_close() and the err path in team_port_add().
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0aa64df3
    • Xin Long's avatar
      net: add IFF_NO_ADDRCONF and use it in bonding to prevent ipv6 addrconf · 8a321cf7
      Xin Long authored
      Currently, in bonding it reused the IFF_SLAVE flag and checked it
      in ipv6 addrconf to prevent ipv6 addrconf.
      
      However, it is not a proper flag to use for no ipv6 addrconf, for
      bonding it has to move IFF_SLAVE flag setting ahead of dev_open()
      in bond_enslave(). Also, IFF_MASTER/SLAVE are historical flags
      used in bonding and eql, as Jiri mentioned, the new devices like
      Team, Failover do not use this flag.
      
      So as Jiri suggested, this patch adds IFF_NO_ADDRCONF in priv_flags
      of the device to indicate no ipv6 addconf, and uses it in bonding
      and moves IFF_SLAVE flag setting back to its original place.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8a321cf7
    • Uladzislau Koshchanka's avatar
      lib: packing: replace bit_reverse() with bitrev8() · 1280d4b7
      Uladzislau Koshchanka authored
      Remove bit_reverse() function.  Instead use bitrev8() from linux/bitrev.h +
      bitshift.  Reduces code-repetition.
      Signed-off-by: default avatarUladzislau Koshchanka <koshchanka@gmail.com>
      Link: https://lore.kernel.org/r/20221210004423.32332-1-koshchanka@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1280d4b7
    • Kurt Kanzenbach's avatar
      dt-bindings: net: dsa: hellcreek: Sync DSA maintainers · 93e637a3
      Kurt Kanzenbach authored
      The current DSA maintainers are Florian Fainelli, Andrew Lunn and Vladimir
      Oltean. Update the hellcreek binding accordingly.
      Signed-off-by: Kurt Kanzenbach's avatarKurt Kanzenbach <kurt@linutronix.de>
      Reviewed-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Acked-by: default avatarRob Herring <robh@kernel.org>
      Acked-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Link: https://lore.kernel.org/r/20221212081546.6916-1-kurt@linutronix.deSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      93e637a3
    • Yunsheng Lin's avatar
      net: tso: inline tso_count_descs() · d7b061b8
      Yunsheng Lin authored
      tso_count_descs() is a small function doing simple calculation,
      and tso_count_descs() is used in fast path, so inline it to
      reduce the overhead of calls.
      Signed-off-by: default avatarYunsheng Lin <linyunsheng@huawei.com>
      Link: https://lore.kernel.org/r/20221212032426.16050-1-linyunsheng@huawei.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d7b061b8
    • Vladimir Oltean's avatar
      net: dsa: don't call ptp_classify_raw() if switch doesn't provide RX timestamping · 8f18655c
      Vladimir Oltean authored
      ptp_classify_raw() is not exactly cheap, since it invokes a BPF program
      for every skb in the receive path. For switches which do not provide
      ds->ops->port_rxtstamp(), running ptp_classify_raw() provides precisely
      nothing, so check for the presence of the function pointer first, since
      that is much cheaper.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: Kurt Kanzenbach's avatarKurt Kanzenbach <kurt@linutronix.de>
      Link: https://lore.kernel.org/r/20221209175840.390707-1-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8f18655c
    • Jakub Kicinski's avatar
      Merge branch 'trace-points-for-mv88e6xxx' · cd2aafa2
      Jakub Kicinski authored
      Vladimir Oltean says:
      
      ====================
      Trace points for mv88e6xxx
      
      While testing Hans Schultz' attempt at offloading MAB on mv88e6xxx:
      https://patchwork.kernel.org/project/netdevbpf/cover/20221205185908.217520-1-netdev@kapio-technology.com/
      I noticed that he still didn't get rid of the huge log spam caused by
      ATU and VTU violations, even if we discussed about this:
      https://patchwork.kernel.org/project/netdevbpf/cover/20221112203748.68995-1-netdev@kapio-technology.com/#25091076
      
      It seems unlikely he's going to ever do this, so here is my own stab at
      converting those messages to trace points. This is IMO an improvement
      regardless of whether Hans' work with MAB lands or not, especially the
      VTU violations which were quite annoying to me as well.
      
      A small sample of before:
      
      $ ./bridge_locked_port.sh lan1 lan2 lan3 lan4
      [  114.465272] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9
      [  119.550508] mv88e6xxx_g1_vtu_prob_irq_thread_fn: 34 callbacks suppressed
      [  120.369586] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9
      [  120.473658] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9
      [  125.535209] mv88e6xxx_g1_vtu_prob_irq_thread_fn: 21 callbacks suppressed
      [  125.535243] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9
      [  126.174558] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9
      [  130.234055] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for 00:01:02:03:04:01 fid 3 portvec 4 spid 2
      [  130.338193] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for 00:01:02:03:04:01 fid 3 portvec 4 spid 2
      [  134.626099] mv88e6xxx_g1_atu_prob_irq_thread_fn: 38 callbacks suppressed
      [  134.626132] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for 00:01:02:03:04:01 fid 3 portvec 4 spid 2
      
      and after:
      
      $ trace-cmd record -e mv88e6xxx ./bridge_locked_port.sh lan1 lan2 lan3 lan4
      $ trace-cmd report
         irq/35-moxtet-60    [001]    93.929734: mv88e6xxx_vtu_miss_violation: dev d0032004.mdio-mii:10 spid 9 vid 100
         irq/35-moxtet-60    [001]    94.183209: mv88e6xxx_vtu_miss_violation: dev d0032004.mdio-mii:10 spid 9 vid 100
         irq/35-moxtet-60    [001]   101.865545: mv88e6xxx_vtu_miss_violation: dev d0032004.mdio-mii:10 spid 9 vid 100
         irq/35-moxtet-60    [001]   121.831261: mv88e6xxx_vtu_member_violation: dev d0032004.mdio-mii:10 spid 9 vid 100
         irq/35-moxtet-60    [001]   122.371238: mv88e6xxx_vtu_member_violation: dev d0032004.mdio-mii:10 spid 9 vid 100
         irq/35-moxtet-60    [001]   148.452932: mv88e6xxx_atu_miss_violation: dev d0032004.mdio-mii:10 spid 2 portvec 0x4 addr 00:01:02:03:04:01 fid 0
      
      v1 at:
      https://patchwork.kernel.org/project/netdevbpf/cover/20221207233954.3619276-1-vladimir.oltean@nxp.com/
      ====================
      
      Link: https://lore.kernel.org/r/20221209172817.371434-1-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      cd2aafa2
    • Vladimir Oltean's avatar
      net: dsa: mv88e6xxx: replace VTU violation prints with trace points · 9e3d9ae5
      Vladimir Oltean authored
      It is possible to trigger these VTU violation messages very easily,
      it's only necessary to send packets with an unknown VLAN ID to a port
      that belongs to a VLAN-aware bridge.
      
      Do a similar thing as for ATU violation messages, and hide them in the
      kernel's trace buffer.
      
      New usage model:
      
      $ trace-cmd list | grep mv88e6xxx
      mv88e6xxx
      mv88e6xxx:mv88e6xxx_vtu_miss_violation
      mv88e6xxx:mv88e6xxx_vtu_member_violation
      $ trace-cmd report
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarSaeed Mahameed <saeed@kernel.org>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9e3d9ae5