1. 09 Feb, 2023 6 commits
    • Vladimir Oltean's avatar
      net: mscc: ocelot: fix all IPv6 getting trapped to CPU when PTP timestamping is used · 2fcde9fe
      Vladimir Oltean authored
      While running this selftest which usually passes:
      
      ~/selftests/drivers/net/dsa# ./local_termination.sh eno0 swp0
      TEST: swp0: Unicast IPv4 to primary MAC address                     [ OK ]
      TEST: swp0: Unicast IPv4 to macvlan MAC address                     [ OK ]
      TEST: swp0: Unicast IPv4 to unknown MAC address                     [ OK ]
      TEST: swp0: Unicast IPv4 to unknown MAC address, promisc            [ OK ]
      TEST: swp0: Unicast IPv4 to unknown MAC address, allmulti           [ OK ]
      TEST: swp0: Multicast IPv4 to joined group                          [ OK ]
      TEST: swp0: Multicast IPv4 to unknown group                         [ OK ]
      TEST: swp0: Multicast IPv4 to unknown group, promisc                [ OK ]
      TEST: swp0: Multicast IPv4 to unknown group, allmulti               [ OK ]
      TEST: swp0: Multicast IPv6 to joined group                          [ OK ]
      TEST: swp0: Multicast IPv6 to unknown group                         [ OK ]
      TEST: swp0: Multicast IPv6 to unknown group, promisc                [ OK ]
      TEST: swp0: Multicast IPv6 to unknown group, allmulti               [ OK ]
      
      if I start PTP timestamping then run it again (debug prints added by me),
      the unknown IPv6 MC traffic is seen by the CPU port even when it should
      have been dropped:
      
      ~/selftests/drivers/net/dsa# ptp4l -i swp0 -2 -P -m
      ptp4l[225.410]: selected /dev/ptp1 as PTP clock
      [  225.445746] mscc_felix 0000:00:00.5: ocelot_l2_ptp_trap_add: port 0 adding L2 PTP trap
      [  225.453815] mscc_felix 0000:00:00.5: ocelot_ipv4_ptp_trap_add: port 0 adding IPv4 PTP event trap
      [  225.462703] mscc_felix 0000:00:00.5: ocelot_ipv4_ptp_trap_add: port 0 adding IPv4 PTP general trap
      [  225.471768] mscc_felix 0000:00:00.5: ocelot_ipv6_ptp_trap_add: port 0 adding IPv6 PTP event trap
      [  225.480651] mscc_felix 0000:00:00.5: ocelot_ipv6_ptp_trap_add: port 0 adding IPv6 PTP general trap
      ptp4l[225.488]: port 1: INITIALIZING to LISTENING on INIT_COMPLETE
      ptp4l[225.488]: port 0: INITIALIZING to LISTENING on INIT_COMPLETE
      ^C
      ~/selftests/drivers/net/dsa# ./local_termination.sh eno0 swp0
      TEST: swp0: Unicast IPv4 to primary MAC address                     [ OK ]
      TEST: swp0: Unicast IPv4 to macvlan MAC address                     [ OK ]
      TEST: swp0: Unicast IPv4 to unknown MAC address                     [ OK ]
      TEST: swp0: Unicast IPv4 to unknown MAC address, promisc            [ OK ]
      TEST: swp0: Unicast IPv4 to unknown MAC address, allmulti           [ OK ]
      TEST: swp0: Multicast IPv4 to joined group                          [ OK ]
      TEST: swp0: Multicast IPv4 to unknown group                         [ OK ]
      TEST: swp0: Multicast IPv4 to unknown group, promisc                [ OK ]
      TEST: swp0: Multicast IPv4 to unknown group, allmulti               [ OK ]
      TEST: swp0: Multicast IPv6 to joined group                          [ OK ]
      TEST: swp0: Multicast IPv6 to unknown group                         [FAIL]
              reception succeeded, but should have failed
      TEST: swp0: Multicast IPv6 to unknown group, promisc                [ OK ]
      TEST: swp0: Multicast IPv6 to unknown group, allmulti               [ OK ]
      
      The PGID_MCIPV6 is configured correctly to not flood to the CPU,
      I checked that.
      
      Furthermore, when I disable back PTP RX timestamping (ptp4l doesn't do
      that when it exists), packets are RX filtered again as they should be:
      
      ~/selftests/drivers/net/dsa# hwstamp_ctl -i swp0 -r 0
      [  218.202854] mscc_felix 0000:00:00.5: ocelot_l2_ptp_trap_del: port 0 removing L2 PTP trap
      [  218.212656] mscc_felix 0000:00:00.5: ocelot_ipv4_ptp_trap_del: port 0 removing IPv4 PTP event trap
      [  218.222975] mscc_felix 0000:00:00.5: ocelot_ipv4_ptp_trap_del: port 0 removing IPv4 PTP general trap
      [  218.233133] mscc_felix 0000:00:00.5: ocelot_ipv6_ptp_trap_del: port 0 removing IPv6 PTP event trap
      [  218.242251] mscc_felix 0000:00:00.5: ocelot_ipv6_ptp_trap_del: port 0 removing IPv6 PTP general trap
      current settings:
      tx_type 1
      rx_filter 12
      new settings:
      tx_type 1
      rx_filter 0
      ~/selftests/drivers/net/dsa# ./local_termination.sh eno0 swp0
      TEST: swp0: Unicast IPv4 to primary MAC address                     [ OK ]
      TEST: swp0: Unicast IPv4 to macvlan MAC address                     [ OK ]
      TEST: swp0: Unicast IPv4 to unknown MAC address                     [ OK ]
      TEST: swp0: Unicast IPv4 to unknown MAC address, promisc            [ OK ]
      TEST: swp0: Unicast IPv4 to unknown MAC address, allmulti           [ OK ]
      TEST: swp0: Multicast IPv4 to joined group                          [ OK ]
      TEST: swp0: Multicast IPv4 to unknown group                         [ OK ]
      TEST: swp0: Multicast IPv4 to unknown group, promisc                [ OK ]
      TEST: swp0: Multicast IPv4 to unknown group, allmulti               [ OK ]
      TEST: swp0: Multicast IPv6 to joined group                          [ OK ]
      TEST: swp0: Multicast IPv6 to unknown group                         [ OK ]
      TEST: swp0: Multicast IPv6 to unknown group, promisc                [ OK ]
      TEST: swp0: Multicast IPv6 to unknown group, allmulti               [ OK ]
      
      So it's clear that something in the PTP RX trapping logic went wrong.
      
      Looking a bit at the code, I can see that there are 4 typos, which
      populate "ipv4" VCAP IS2 key filter fields for IPv6 keys.
      
      VCAP IS2 keys of type OCELOT_VCAP_KEY_IPV4 and OCELOT_VCAP_KEY_IPV6 are
      handled by is2_entry_set(). OCELOT_VCAP_KEY_IPV4 looks at
      &filter->key.ipv4, and OCELOT_VCAP_KEY_IPV6 at &filter->key.ipv6.
      Simply put, when we populate the wrong key field, &filter->key.ipv6
      fields "proto.mask" and "proto.value" remain all zeroes (or "don't care").
      So is2_entry_set() will enter the "else" of this "if" condition:
      
      	if (msk == 0xff && (val == IPPROTO_TCP || val == IPPROTO_UDP))
      
      and proceed to ignore the "proto" field. The resulting rule will match
      on all IPv6 traffic, trapping it to the CPU.
      
      This is the reason why the local_termination.sh selftest sees it,
      because control traps are stronger than the PGID_MCIPV6 used for
      flooding (from the forwarding data path).
      
      But the problem is in fact much deeper. We trap all IPv6 traffic to the
      CPU, but if we're bridged, we set skb->offload_fwd_mark = 1, so software
      forwarding will not take place and IPv6 traffic will never reach its
      destination.
      
      The fix is simple - correct the typos.
      
      I was intentionally inaccurate in the commit message about the breakage
      occurring when any PTP timestamping is enabled. In fact it only happens
      when L4 timestamping is requested (HWTSTAMP_FILTER_PTP_V2_EVENT or
      HWTSTAMP_FILTER_PTP_V2_L4_EVENT). But ptp4l requests a larger RX
      timestamping filter than it needs for "-2": HWTSTAMP_FILTER_PTP_V2_EVENT.
      I wanted people skimming through git logs to not think that the bug
      doesn't affect them because they only use ptp4l in L2 mode.
      
      Fixes: 96ca08c0 ("net: mscc: ocelot: set up traps for PTP packets")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Link: https://lore.kernel.org/r/20230207183117.1745754-1-vladimir.oltean@nxp.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      2fcde9fe
    • Pietro Borrello's avatar
      rds: rds_rm_zerocopy_callback() use list_first_entry() · f753a689
      Pietro Borrello authored
      rds_rm_zerocopy_callback() uses list_entry() on the head of a list
      causing a type confusion.
      Use list_first_entry() to actually access the first element of the
      rs_zcookie_queue list.
      
      Fixes: 9426bbc6 ("rds: use list structure to track information for zerocopy completion notification")
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarPietro Borrello <borrello@diag.uniroma1.it>
      Link: https://lore.kernel.org/r/20230202-rds-zerocopy-v3-1-83b0df974f9a@diag.uniroma1.itSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      f753a689
    • Jakub Kicinski's avatar
      Merge tag 'ipsec-2023-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec · 646be03e
      Jakub Kicinski authored
      Steffen Klassert says:
      
      ====================
      ipsec 2023-02-08
      
      1) Fix policy checks for nested IPsec tunnels when using
         xfrm interfaces. From Benedict Wong.
      
      2) Fix netlink message expression on 32=>64-bit
         messages translators. From Anastasia Belova.
      
      3) Prevent potential spectre v1 gadget in xfrm_xlate32_attr.
         From Eric Dumazet.
      
      4) Always consistently use time64_t in xfrm_timer_handler.
         From Eric Dumazet.
      
      5) Fix KCSAN reported bug: Multiple cpus can update use_time
         at the same time. From Eric Dumazet.
      
      6) Fix SCP copy from IPv4 to IPv6 on interfamily tunnel.
         From Christian Hopps.
      
      * tag 'ipsec-2023-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
        xfrm: fix bug with DSCP copy to v6 from v4 tunnel
        xfrm: annotate data-race around use_time
        xfrm: consistently use time64_t in xfrm_timer_handler()
        xfrm/compat: prevent potential spectre v1 gadget in xfrm_xlate32_attr()
        xfrm: compat: change expression for switch in xfrm_xlate64
        Fix XFRM-I support for nested ESP tunnels
      ====================
      
      Link: https://lore.kernel.org/r/20230208114322.266510-1-steffen.klassert@secunet.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      646be03e
    • Jiawen Wu's avatar
      net: txgbe: Update support email address · 363d7c22
      Jiawen Wu authored
      Update new email address for Wangxun 10Gb NIC support team.
      Signed-off-by: default avatarJiawen Wu <jiawenwu@trustnetic.com>
      Link: https://lore.kernel.org/r/20230208023035.3371250-1-jiawenwu@trustnetic.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      363d7c22
    • Jakub Kicinski's avatar
      Merge tag 'mlx5-fixes-2023-02-07' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · ff8ced4e
      Jakub Kicinski authored
      Saeed Mahameed says:
      
      ====================
      mlx5 fixes 2023-02-07
      
      This series provides bug fixes to mlx5 driver.
      
      * tag 'mlx5-fixes-2023-02-07' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
        net/mlx5: Serialize module cleanup with reload and remove
        net/mlx5: fw_tracer, Zero consumer index when reloading the tracer
        net/mlx5: fw_tracer, Clear load bit when freeing string DBs buffers
        net/mlx5: Expose SF firmware pages counter
        net/mlx5: Store page counters in a single array
        net/mlx5e: IPoIB, Show unknown speed instead of error
        net/mlx5e: Fix crash unsetting rx-vlan-filter in switchdev mode
        net/mlx5: Bridge, fix ageing of peer FDB entries
        net/mlx5: DR, Fix potential race in dr_rule_create_rule_nic
        net/mlx5e: Update rx ring hw mtu upon each rx-fcs flag change
      ====================
      
      Link: https://lore.kernel.org/r/20230208030302.95378-1-saeed@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ff8ced4e
    • Ido Schimmel's avatar
      selftests: Fix failing VXLAN VNI filtering test · b963d9d5
      Ido Schimmel authored
      iproute2 does not recognize the "group6" and "remote6" keywords. Fix by
      using "group" and "remote" instead.
      
      Before:
      
       # ./test_vxlan_vnifiltering.sh
       [...]
       Tests passed:  25
       Tests failed:   2
      
      After:
      
       # ./test_vxlan_vnifiltering.sh
       [...]
       Tests passed:  27
       Tests failed:   0
      
      Fixes: 3edf5f66 ("selftests: add new tests for vxlan vnifiltering")
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarAlexander Duyck <alexanderduyck@fb.com>
      Link: https://lore.kernel.org/r/20230207141819.256689-1-idosch@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b963d9d5
  2. 08 Feb, 2023 28 commits
  3. 07 Feb, 2023 5 commits
    • Devid Antonio Filoni's avatar
      can: j1939: do not wait 250 ms if the same addr was already claimed · 4ae5e1e9
      Devid Antonio Filoni authored
      The ISO 11783-5 standard, in "4.5.2 - Address claim requirements", states:
        d) No CF shall begin, or resume, transmission on the network until 250
           ms after it has successfully claimed an address except when
           responding to a request for address-claimed.
      
      But "Figure 6" and "Figure 7" in "4.5.4.2 - Address-claim
      prioritization" show that the CF begins the transmission after 250 ms
      from the first AC (address-claimed) message even if it sends another AC
      message during that time window to resolve the address contention with
      another CF.
      
      As stated in "4.4.2.3 - Address-claimed message":
        In order to successfully claim an address, the CF sending an address
        claimed message shall not receive a contending claim from another CF
        for at least 250 ms.
      
      As stated in "4.4.3.2 - NAME management (NM) message":
        1) A commanding CF can
           d) request that a CF with a specified NAME transmit the address-
              claimed message with its current NAME.
        2) A target CF shall
           d) send an address-claimed message in response to a request for a
              matching NAME
      
      Taking the above arguments into account, the 250 ms wait is requested
      only during network initialization.
      
      Do not restart the timer on AC message if both the NAME and the address
      match and so if the address has already been claimed (timer has expired)
      or the AC message has been sent to resolve the contention with another
      CF (timer is still running).
      Signed-off-by: default avatarDevid Antonio Filoni <devid.filoni@egluetechnologies.com>
      Acked-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Link: https://lore.kernel.org/all/20221125170418.34575-1-devid.filoni@egluetechnologies.com
      Fixes: 9d71dd0c ("can: add support of SAE J1939 protocol")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      4ae5e1e9
    • Jiri Pirko's avatar
      devlink: change port event netdev notifier from per-net to global · 565b4824
      Jiri Pirko authored
      Currently only the network namespace of devlink instance is monitored
      for port events. If netdev is moved to a different namespace and then
      unregistered, NETDEV_PRE_UNINIT is missed which leads to trigger
      following WARN_ON in devl_port_unregister().
      WARN_ON(devlink_port->type != DEVLINK_PORT_TYPE_NOTSET);
      
      Fix this by changing the netdev notifier from per-net to global so no
      event is missed.
      
      Fixes: 02a68a47 ("net: devlink: track netdev with devlink_port assigned")
      Signed-off-by: default avatarJiri Pirko <jiri@nvidia.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Link: https://lore.kernel.org/r/20230206094151.2557264-1-jiri@resnulli.usSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      565b4824
    • Vladimir Oltean's avatar
      selftests: ocelot: tc_flower_chains: make test_vlan_ingress_modify() more comprehensive · bbb253b2
      Vladimir Oltean authored
      We have two IS1 filters of the OCELOT_VCAP_KEY_ANY key type (the one with
      "action vlan pop" and the one with "action vlan modify") and one of the
      OCELOT_VCAP_KEY_IPV4 key type (the one with "action skbedit priority").
      But we have no IS1 filter with the OCELOT_VCAP_KEY_ETYPE key type, and
      there was an uncaught breakage there.
      
      To increase test coverage, convert one of the OCELOT_VCAP_KEY_ANY
      filters to OCELOT_VCAP_KEY_ETYPE, by making the filter also match on the
      MAC SA of the traffic sent by mausezahn, $h1_mac.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://lore.kernel.org/r/20230205192409.1796428-2-vladimir.oltean@nxp.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      bbb253b2
    • Vladimir Oltean's avatar
      net: mscc: ocelot: fix VCAP filters not matching on MAC with "protocol 802.1Q" · f964f839
      Vladimir Oltean authored
      Alternative short title: don't instruct the hardware to match on
      EtherType with "protocol 802.1Q" flower filters. It doesn't work for the
      reasons detailed below.
      
      With a command such as the following:
      
      tc filter add dev $swp1 ingress chain $(IS1 2) pref 3 \
      	protocol 802.1Q flower skip_sw vlan_id 200 src_mac $h1_mac \
      	action vlan modify id 300 \
      	action goto chain $(IS2 0 0)
      
      the created filter is set by ocelot_flower_parse_key() to be of type
      OCELOT_VCAP_KEY_ETYPE, and etype is set to {value=0x8100, mask=0xffff}.
      This gets propagated all the way to is1_entry_set() which commits it to
      hardware (the VCAP_IS1_HK_ETYPE field of the key). Compare this to the
      case where src_mac isn't specified - the key type is OCELOT_VCAP_KEY_ANY,
      and is1_entry_set() doesn't populate VCAP_IS1_HK_ETYPE.
      
      The problem is that for VLAN-tagged frames, the hardware interprets the
      ETYPE field as holding the encapsulated VLAN protocol. So the above
      filter will only match those packets which have an encapsulated protocol
      of 0x8100, rather than all packets with VLAN ID 200 and the given src_mac.
      
      The reason why this is allowed to occur is because, although we have a
      block of code in ocelot_flower_parse_key() which sets "match_protocol"
      to false when VLAN keys are present, that code executes too late.
      There is another block of code, which executes for Ethernet addresses,
      and has a "goto finished_key_parsing" and skips the VLAN header parsing.
      By skipping it, "match_protocol" remains with the value it was
      initialized with, i.e. "true", and "proto" is set to f->common.protocol,
      or 0x8100.
      
      The concept of ignoring some keys rather than erroring out when they are
      present but can't be offloaded is dubious in itself, but is present
      since the initial commit fe3490e6 ("net: mscc: ocelot: Hardware
      ofload for tc flower filter"), and it's outside of the scope of this
      patch to change that.
      
      The problem was introduced when the driver started to interpret the
      flower filter's protocol, and populate the VCAP filter's ETYPE field
      based on it.
      
      To fix this, it is sufficient to move the code that parses the VLAN keys
      earlier than the "goto finished_key_parsing" instruction. This will
      ensure that if we have a flower filter with both VLAN and Ethernet
      address keys, it won't match on ETYPE 0x8100, because the VLAN key
      parsing sets "match_protocol = false".
      
      Fixes: 86b956de ("net: mscc: ocelot: support matching on EtherType")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Link: https://lore.kernel.org/r/20230205192409.1796428-1-vladimir.oltean@nxp.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      f964f839
    • Vladimir Oltean's avatar
      net: dsa: mt7530: don't change PVC_EG_TAG when CPU port becomes VLAN-aware · 0b6d6425
      Vladimir Oltean authored
      Frank reports that in a mt7530 setup where some ports are standalone and
      some are in a VLAN-aware bridge, 8021q uppers of the standalone ports
      lose their VLAN tag on xmit, as seen by the link partner.
      
      This seems to occur because once the other ports join the VLAN-aware
      bridge, mt7530_port_vlan_filtering() also calls
      mt7530_port_set_vlan_aware(ds, cpu_dp->index), and this affects the way
      that the switch processes the traffic of the standalone port.
      
      Relevant is the PVC_EG_TAG bit. The MT7530 documentation says about it:
      
      EG_TAG: Incoming Port Egress Tag VLAN Attribution
      0: disabled (system default)
      1: consistent (keep the original ingress tag attribute)
      
      My interpretation is that this setting applies on the ingress port, and
      "disabled" is basically the normal behavior, where the egress tag format
      of the packet (tagged or untagged) is decided by the VLAN table
      (MT7530_VLAN_EGRESS_UNTAG or MT7530_VLAN_EGRESS_TAG).
      
      But there is also an option of overriding the system default behavior,
      and for the egress tagging format of packets to be decided not by the
      VLAN table, but simply by copying the ingress tag format (if ingress was
      tagged, egress is tagged; if ingress was untagged, egress is untagged;
      aka "consistent). This is useful in 2 scenarios:
      
      - VLAN-unaware bridge ports will always encounter a miss in the VLAN
        table. They should forward a packet as-is, though. So we use
        "consistent" there. See commit e045124e ("net: dsa: mt7530: fix
        tagged frames pass-through in VLAN-unaware mode").
      
      - Traffic injected from the CPU port. The operating system is in god
        mode; if it wants a packet to exit as VLAN-tagged, it sends it as
        VLAN-tagged. Otherwise it sends it as VLAN-untagged*.
      
      *This is true only if we don't consider the bridge TX forwarding offload
      feature, which mt7530 doesn't support.
      
      So for now, make the CPU port always stay in "consistent" mode to allow
      software VLANs to be forwarded to their egress ports with the VLAN tag
      intact, and not stripped.
      
      Link: https://lore.kernel.org/netdev/trinity-e6294d28-636c-4c40-bb8b-b523521b00be-1674233135062@3c-app-gmx-bs36/
      Fixes: e045124e ("net: dsa: mt7530: fix tagged frames pass-through in VLAN-unaware mode")
      Reported-by: default avatarFrank Wunderlich <frank-w@public-files.de>
      Tested-by: default avatarFrank Wunderlich <frank-w@public-files.de>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Tested-by: default avatarArınç ÜNAL <arinc.unal@arinc9.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Link: https://lore.kernel.org/r/20230205140713.1609281-1-vladimir.oltean@nxp.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      0b6d6425
  4. 06 Feb, 2023 1 commit