Commits · 5d3ac705c281977db7a5ba3854d710881dbbceea · Kirill Smelkov / linux

06 Feb, 2023 34 commits

net: txgbe: Add interrupt support · 5d3ac705

Jiawen Wu authored Feb 03, 2023

Determine proper interrupt scheme to enable and handle interrupt.
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5d3ac705

net: ngbe: Add irqs request flow · e7956139

Mengyuan Lou authored Feb 03, 2023

Add request_irq for tx/rx rings and misc other events.
If the application is successful, config vertors for interrupts.
Enable some base interrupts mask in ngbe_irq_enable.
Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e7956139

net: libwx: Add irq flow functions · 3f703186

Mengyuan Lou authored Feb 03, 2023

Add irq flow functions for ngbe and txgbe.
Alloc pcie msix irqs for drivers, otherwise fall back to msi/legacy.
Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3f703186

net: page_pool: use in_softirq() instead · 542bcea4

Qingfang DENG authored Feb 03, 2023

We use BH context only for synchronization, so we don't care if it's
actually serving softirq or not.

As a side node, in case of threaded NAPI, in_serving_softirq() will
return false because it's in process context with BH off, making
page_pool_recycle_in_cache() unreachable.
Signed-off-by: Qingfang DENG <qingfang.deng@siflower.com.cn>
Tested-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: David S. Miller <davem@davemloft.net>

542bcea4

Merge tag 'mlx5-updates-2023-02-04' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 637bc8f0

David S. Miller authored Feb 06, 2023

Saeed Mahameed says:

====================
mlx5-updates-2023-02-04

This series provides misc updates to mlx5 driver:

1) Trivial LAG code cleanup patches from Roi

2) Rahul improves mlx5's documentation structure
Separates the documentation into multiple pages related to different
components in the device driver. Adds Kconfig parameters, devlink
parameters, and tracepoints that were previously introduced but not added
to the documentation. Introduces a new page on ethtool statistics counters
with information about counters previously implemented in the mlx5_core
driver but not documented in the kernel tree.

3) From Raed, policy/state selector support for IPSec.

4) From Fragos, add support for XDR speed in IPoIB mlx5 netdev

5) Few more misc cleanups and trivial changes
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

637bc8f0

virtio-net: Maintain reverse cleanup order · 27369c9c

Parav Pandit authored Feb 03, 2023

To easily audit the code, better to keep the device stop()
sequence to be mirror of the device open() sequence.
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

27369c9c

Merge branch 'bridge-mdb-limit' · cb3086ce

David S. Miller authored Feb 06, 2023

Petr Machata says:

====================
bridge: Limit number of MDB entries per port, port-vlan

The MDB maintained by the bridge is limited. When the bridge is configured
for IGMP / MLD snooping, a buggy or malicious client can easily exhaust its
capacity. In SW datapath, the capacity is configurable through the
IFLA_BR_MCAST_HASH_MAX parameter, but ultimately is finite. Obviously a
similar limit exists in the HW datapath for purposes of offloading.

In order to prevent the issue of unilateral exhaustion of MDB resources,
introduce two parameters in each of two contexts:

- Per-port and (when BROPT_MCAST_VLAN_SNOOPING_ENABLED is enabled)
  per-port-VLAN number of MDB entries that the port is member in.

- Per-port and (when BROPT_MCAST_VLAN_SNOOPING_ENABLED is enabled)
  per-port-VLAN maximum permitted number of MDB entries, or 0 for
  no limit.

Per-port number of entries keeps track of the total number of MDB entries
configured on a given port. The per-port-VLAN value then keeps track of the
subset of MDB entries configured specifically for the given VLAN, on that
port. The number is adjusted as port_groups are created and deleted, and
therefore under multicast lock.

A maximum value, if non-zero, then places a limit on the number of entries
that can be configured in a given context. Attempts to add entries above
the maximum are rejected.

Rejection reason of netlink-based requests to add MDB entries is
communicated through extack. This channel is unavailable for rejections
triggered from the control path. To address this lack of visibility, the
patchset adds a tracepoint, bridge:br_mdb_full:

	# perf record -e bridge:br_mdb_full &
	# [...]
	# perf script | cut -d: -f4-
	 dev v2 af 2 src ::ffff:0.0.0.0 grp ::ffff:239.1.1.112/00:00:00:00:00:00 vid 0
	 dev v2 af 10 src :: grp ff0e::112/00:00:00:00:00:00 vid 0
	 dev v2 af 2 src ::ffff:0.0.0.0 grp ::ffff:239.1.1.112/00:00:00:00:00:00 vid 10
	 dev v2 af 10 src 2001:db8:1::1 grp ff0e::1/00:00:00:00:00:00 vid 10
	 dev v2 af 2 src ::ffff:192.0.2.1 grp ::ffff:239.1.1.1/00:00:00:00:00:00 vid 10

Another option to consume the tracepoint is e.g. through the bpftrace tool:

	# bpftrace -e ' tracepoint:bridge:br_mdb_full /args->af != 0/ {
			    printf("dev %s src %s grp %s vid %u\n",
				   str(args->dev), ntop(args->src),
				   ntop(args->grp), args->vid);
			}
			tracepoint:bridge:br_mdb_full /args->af == 0/ {
			    printf("dev %s grp %s vid %u\n",
				   str(args->dev),
				   macaddr(args->grpmac), args->vid);
			}'

This tracepoint is triggered for mcast_hash_max exhaustions as well.

The following is an example of how the feature is used. A more extensive
example is available in patch #8:

	# bridge vlan set dev v1 vid 1 mcast_max_groups 1
	# bridge mdb add dev br port v1 grp 230.1.2.3 temp vid 1
	# bridge mdb add dev br port v1 grp 230.1.2.4 temp vid 1
	Error: bridge: Port-VLAN is already in 1 groups, and mcast_max_groups=1.

The patchset progresses as follows:

- In patch #1, set strict_start_type at two bridge-related policies. The
  reason is we are adding a new attribute to one of these, and want the new
  attribute to be parsed strictly. The other was adjusted for completeness'
  sake.

- In patches #2 to #5, br_mdb and br_multicast code is adjusted to make the
  following additions smoother.

- In patch #6, add the tracepoint.

- In patch #7, the code to maintain number of MDB entries is added as
  struct net_bridge_mcast_port::mdb_n_entries. The maximum is added, too,
  as struct net_bridge_mcast_port::mdb_max_entries, however at this point
  there is no way to set the value yet, and since 0 is treated as "no
  limit", the functionality doesn't change at this point. Note however,
  that mcast_hash_max violations already do trigger at this point.

- In patch #8, netlink plumbing is added: reading of number of entries, and
  reading and writing of maximum.

  The per-port values are passed through RTM_NEWLINK / RTM_GETLINK messages
  in IFLA_BRPORT_MCAST_N_GROUPS and _MAX_GROUPS, inside IFLA_PROTINFO nest.

  The per-port-vlan values are passed through RTM_GETVLAN / RTM_NEWVLAN
  messages in BRIDGE_VLANDB_ENTRY_MCAST_N_GROUPS, _MAX_GROUPS, inside
  BRIDGE_VLANDB_ENTRY.

The following patches deal with the selftest:

- Patches #9 and #10 clean up and move around some selftest code.

- Patches #11 to #14 add helpers and generalize the existing IGMP / MLD
  support to allow generating packets with configurable group addresses and
  varying source lists for (S,G) memberships.

- Patch #15 adds code to generate IGMP leave and MLD done packets.

- Patch #16 finally adds the selftest itself.

v3:
- Patch #7:
    - Access mdb_max_/_n_entries through READ_/WRITE_ONCE
    - Move extack setting to br_multicast_port_ngroups_inc_one().
      Since we use NL_SET_ERR_MSG_FMT_MOD, the correct context
      (port / port-vlan) can be passed through an argument.
      This also removes the need for more READ/WRITE_ONCE's
      at the extack-setting site.
- Patch #8:
    - Move the br_multicast_port_ctx_vlan_disabled() check
      out to the _vlan_ helpers callers. Thus these helpers
      cannot fail, which makes them very similar to the
      _port_ helpers. Have them take the MC context directly
      and unify them.

v2:
- Cover letter:
    - Add an example of a bpftrace-based probe script
- Patch #6:
    - Report IPv4 as an IPv6-mapped address through the IPv6 buffer
      as well, to save ring buffer space.
- Patch #7:
    - In br_multicast_port_ngroups_inc_one(), bounce
      if n>=max, not if n==max
    - Adjust extack messages to mention ngroups, now
      that the bounces appear when n>=max, not n==max
    - In __br_multicast_enable_port_ctx(), do not reset
      max to 0. Also do not count number of entries by
      going through _inc, as that would end up incorrectly
      bouncing the entries.
- Patch #8:
    - Drop locks around accesses in
      br_multicast_{port,vlan}_ngroups_{get,set_max}(),
    - Drop bounces due to max<n in
      br_multicast_{port,vlan}_ngroups_set_max().
- Patch #12:
    - In the comment at payload_template_calc_checksum(),
      s/%#02x/%02x/, that's the mausezahn payload format.
- Patch #16:
    - Adjust the tests that check setting max below n and
      reset of max on VLAN snooping enablement
    - Make test naming uniform
    - Enable testing of control path (IGMP/MLD) in
      mcast_vlan_snooping bridge
    - Reorganize the code so that test instances (per bridge
      type and configuration type) always come right after
      the test, in order of {d,q,qvs}{4,6}{cfg,ctl}.
      Then groups of selftests are at the end of the file.
      Similarly adjust invocation order of the tests.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

cb3086ce

selftests: forwarding: bridge_mdb_max: Add a new selftest · 3446dcd7

Petr Machata authored Feb 02, 2023

Add a suite covering mcast_n_groups and mcast_max_groups bridge features.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

3446dcd7

selftests: forwarding: lib: Add helpers to build IGMP/MLD leave packets · 9ae85469

Petr Machata authored Feb 02, 2023

The testsuite that checks for mcast_max_groups functionality will need to
wipe the added groups as well. Add helpers to build an IGMP or MLD packets
announcing that host is leaving a given group.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

9ae85469

selftests: forwarding: lib: Allow list of IPs for IGMPv3/MLDv2 · 705d4bc7

Petr Machata authored Feb 02, 2023

The testsuite that checks for mcast_max_groups functionality will need
to generate IGMP and MLD packets with configurable number of (S,G)
addresses. To that end, further extend igmpv3_is_in_get() and
mldv2_is_in_get() to allow a list of IP addresses instead of one
address.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

705d4bc7

selftests: forwarding: lib: Parameterize IGMPv3/MLDv2 generation · 506a1ac9

Petr Machata authored Feb 02, 2023

In order to generate IGMPv3 and MLDv2 packets on the fly, the
functions that generate these packets need to be able to generate
packets for different groups and different sources. Generating MLDv2
packets further needs the source address of the packet for purposes of
checksum calculation. Add the necessary parameters, and generate the
payload accordingly by dispatching to helpers added in the previous
patches.

Adjust the sole client, bridge_mdb.sh, as well.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

506a1ac9

selftests: forwarding: lib: Add helpers for checksum handling · 952e0ee3

Petr Machata authored Feb 02, 2023

In order to generate IGMPv3 and MLDv2 packets on the fly, we will need
helpers to calculate the packet checksum.

The approach presented in this patch revolves around payload templates
for mausezahn. These are mausezahn-like payload strings (01:23:45:...)
with possibly one 2-byte sequence replaced with the word PAYLOAD. The
main function is payload_template_calc_checksum(), which calculates
RFC 1071 checksum of the message. There are further helpers to then
convert the checksum to the payload format, and to expand it.

For IPv6, MLDv2 message checksum is computed using a pseudoheader that
differs from the header used in the payload itself. The fact that the
two messages are different means that the checksum needs to be
returned as a separate quantity, instead of being expanded in-place in
the payload itself. Furthermore, the pseudoheader includes a length of
the message. Much like the checksum, this needs to be expanded in
mausezahn format. And likewise for number of addresses for (S,G)
entries. Thus we have several places where a computed quantity needs
to be presented in the payload format. Add a helper u16_to_bytes(),
which will be used in all these cases.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

952e0ee3

selftests: forwarding: lib: Add helpers for IP address handling · fcf49276

Petr Machata authored Feb 02, 2023

In order to generate IGMPv3 and MLDv2 packets on the fly, we will need
helpers to expand IPv4 and IPv6 addresses given as parameters in
mausezahn payload notation. Add helpers that do it.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

fcf49276

selftests: forwarding: bridge_mdb: Fix a typo · f7ccf60c

Petr Machata authored Feb 02, 2023

Add the letter missing from the word "INCLUDE".
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

f7ccf60c

selftests: forwarding: Move IGMP- and MLD-related functions to lib · 344dd2c9

Petr Machata authored Feb 02, 2023

These functions will be helpful for other testsuites as well. Extract them
to a common place.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

344dd2c9

net: bridge: Add netlink knobs for number / maximum MDB entries · a1aee20d

Petr Machata authored Feb 02, 2023

The previous patch added accounting for number of MDB entries per port and
per port-VLAN, and the logic to verify that these values stay within
configured bounds. However it didn't provide means to actually configure
those bounds or read the occupancy. This patch does that.

Two new netlink attributes are added for the MDB occupancy:
IFLA_BRPORT_MCAST_N_GROUPS for the per-port occupancy and
BRIDGE_VLANDB_ENTRY_MCAST_N_GROUPS for the per-port-VLAN occupancy.
And another two for the maximum number of MDB entries:
IFLA_BRPORT_MCAST_MAX_GROUPS for the per-port maximum, and
BRIDGE_VLANDB_ENTRY_MCAST_MAX_GROUPS for the per-port-VLAN one.

Note that the two new IFLA_BRPORT_ attributes prompt bumping of
RTNL_SLAVE_MAX_TYPE to size the slave attribute tables large enough.

The new attributes are used like this:

 # ip link add name br up type bridge vlan_filtering 1 mcast_snooping 1 \
                                      mcast_vlan_snooping 1 mcast_querier 1
 # ip link set dev v1 master br
 # bridge vlan add dev v1 vid 2

 # bridge vlan set dev v1 vid 1 mcast_max_groups 1
 # bridge mdb add dev br port v1 grp 230.1.2.3 temp vid 1
 # bridge mdb add dev br port v1 grp 230.1.2.4 temp vid 1
 Error: bridge: Port-VLAN is already in 1 groups, and mcast_max_groups=1.

 # bridge link set dev v1 mcast_max_groups 1
 # bridge mdb add dev br port v1 grp 230.1.2.3 temp vid 2
 Error: bridge: Port is already in 1 groups, and mcast_max_groups=1.

 # bridge -d link show
 5: v1@v2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br [...]
     [...] mcast_n_groups 1 mcast_max_groups 1

 # bridge -d vlan show
 port              vlan-id
 br                1 PVID Egress Untagged
                     state forwarding mcast_router 1
 v1                1 PVID Egress Untagged
                     [...] mcast_n_groups 1 mcast_max_groups 1
                   2
                     [...] mcast_n_groups 0 mcast_max_groups 0
Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a1aee20d

net: bridge: Maintain number of MDB entries in net_bridge_mcast_port · b57e8d87

Petr Machata authored Feb 02, 2023

The MDB maintained by the bridge is limited. When the bridge is configured
for IGMP / MLD snooping, a buggy or malicious client can easily exhaust its
capacity. In SW datapath, the capacity is configurable through the
IFLA_BR_MCAST_HASH_MAX parameter, but ultimately is finite. Obviously a
similar limit exists in the HW datapath for purposes of offloading.

In order to prevent the issue of unilateral exhaustion of MDB resources,
introduce two parameters in each of two contexts:

- Per-port and per-port-VLAN number of MDB entries that the port
  is member in.

- Per-port and (when BROPT_MCAST_VLAN_SNOOPING_ENABLED is enabled)
  per-port-VLAN maximum permitted number of MDB entries, or 0 for
  no limit.

The per-port multicast context is used for tracking of MDB entries for the
port as a whole. This is available for all bridges.

The per-port-VLAN multicast context is then only available on
VLAN-filtering bridges on VLANs that have multicast snooping on.

With these changes in place, it will be possible to configure MDB limit for
bridge as a whole, or any one port as a whole, or any single port-VLAN.

Note that unlike the global limit, exhaustion of the per-port and
per-port-VLAN maximums does not cause disablement of multicast snooping.
It is also permitted to configure the local limit larger than hash_max,
even though that is not useful.

In this patch, introduce only the accounting for number of entries, and the
max field itself, but not the means to toggle the max. The next patch
introduces the netlink APIs to toggle and read the values.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b57e8d87

net: bridge: Add a tracepoint for MDB overflows · d47230a3

Petr Machata authored Feb 02, 2023

The following patch will add two more maximum MDB allowances to the global
one, mcast_hash_max, that exists today. In all these cases, attempts to add
MDB entries above the configured maximums through netlink, fail noisily and
obviously. Such visibility is missing when adding entries through the
control plane traffic, by IGMP or MLD packets.

To improve visibility in those cases, add a trace point that reports the
violation, including the relevant netdevice (be it a slave or the bridge
itself), and the MDB entry parameters:

	# perf record -e bridge:br_mdb_full &
	# [...]
	# perf script | cut -d: -f4-
	 dev v2 af 2 src ::ffff:0.0.0.0 grp ::ffff:239.1.1.112/00:00:00:00:00:00 vid 0
	 dev v2 af 10 src :: grp ff0e::112/00:00:00:00:00:00 vid 0
	 dev v2 af 2 src ::ffff:0.0.0.0 grp ::ffff:239.1.1.112/00:00:00:00:00:00 vid 10
	 dev v2 af 10 src 2001:db8:1::1 grp ff0e::1/00:00:00:00:00:00 vid 10
	 dev v2 af 2 src ::ffff:192.0.2.1 grp ::ffff:239.1.1.1/00:00:00:00:00:00 vid 10

CC: Steven Rostedt <rostedt@goodmis.org>
CC: linux-trace-kernel@vger.kernel.org
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

d47230a3

net: bridge: Change a cleanup in br_multicast_new_port_group() to goto · eceb3085

Petr Machata authored Feb 02, 2023

This function is getting more to clean up in the following patches.
Structuring the cleanups in one labeled block will allow reusing the same
cleanup from several places.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

eceb3085

net: bridge: Add br_multicast_del_port_group() · 976b3858

Petr Machata authored Feb 02, 2023

Since cleaning up the effects of br_multicast_new_port_group() just
consists of delisting and freeing the memory, the function
br_mdb_add_group_star_g() inlines the corresponding code. In the following
patches, number of per-port and per-port-VLAN MDB entries is going to be
maintained, and that counter will have to be updated. Because that logic
is going to be hidden in the br_multicast module, introduce a new hook
intended to again remove a newly-created group.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

976b3858

net: bridge: Move extack-setting to br_multicast_new_port_group() · 1c85b80b

Petr Machata authored Feb 02, 2023

Now that br_multicast_new_port_group() takes an extack argument, move
setting the extack there. The downside is that the error messages end
up being less specific (the function cannot distinguish between (S,G)
and (*,G) groups). However, the alternative is to check in the caller
whether the callee set the extack, and if it didn't, set it. But that
is only done when the callee is not exactly known. (E.g. in case of a
notifier invocation.)
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

1c85b80b

net: bridge: Add extack to br_multicast_new_port_group() · 60977a0c

Petr Machata authored Feb 02, 2023

Make it possible to set an extack in br_multicast_new_port_group().
Eventually, this function will check for per-port and per-port-vlan
MDB maximums, and will use the extack to communicate the reason for
the bounce.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

60977a0c

net: bridge: Set strict_start_type at two policies · c00041cf

Petr Machata authored Feb 02, 2023

Make any attributes newly-added to br_port_policy or vlan_tunnel_policy
parsed strictly, to prevent userspace from passing garbage. Note that this
patchset only touches the former policy. The latter was adjusted for
completeness' sake. There do not appear to be other _deprecated calls
with non-NULL policies.
Suggested-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

c00041cf

Merge branch 'sparx5-PSFP-support' · 8b7018fa

David S. Miller authored Feb 06, 2023

Daniel Machon says:

====================
net: Add support for PSFP in Sparx5

================================================================================
Add support for Per-Stream Filtering and Policing (802.1Q-2018, 8.6.5.1).
================================================================================

The VCAP CLM (VCAP IS0 ingress classifier) classifies streams,
identified by ISDX (Ingress Service Index, frame metadata), and maps
ISDX to streams.

Flow meters are also classified by ISDX, and implemented using service
policers (Service Dual Leacky Buckets, SDLB). Leacky buckets are linked
together in a leak chain of a leak group. Leak groups a preconfigured to serve
buckets within a certain rate interval.

Stream gates are time-based policers used by PSFP. Frames are dropped
based on the gate state (OPEN/ CLOSE), whose state will be altered based
on the Gate Control List (GCL) and current PTP time. Apart from
time-based policing, stream gates can alter egress queue selection for
the frames that pass through the Gate. This is done through Internal
Priority Selector (IPS). Stream gates are mapped from stream filters.

Support for tc actions gate and police, have been added to the VCAP IS0 set of
supported actions.

Examples:

// tc filter with gate action
$ tc filter add dev eth1 ingress chain 1100000 prio 1 handle 1001 protocol \
802.1q flower skip_sw vlan_id 100 action gate base-time 0 sched-entry open \
700000 7 8m sched-entry close 300000 action goto chain 1200000

// tc filter with police action
$ tc filter add dev eth1 ingress chain 1100000 prio 1 handle 1002 protocol \
802.1q flower skip_sw vlan_id 100 action police rate 1gbit burst 8096      \
conform-exceed drop action goto chain 1200000

================================================================================
Patches
================================================================================
Patch #1:  Adds new register needed for PSFP.
Patch #2:  Adds resource pools to control PSFP needed chip resources.
Patch #3:  Adds support for SDLB's needed for flow-meters.
Patch #4:  Adds support for service policers.
Patch #5:  Adds support for PSFP flow-meters, using service policers.
Patch #6:  Adds a new function to calculate basetime, required by flow-meters.
Patch #7:  Adds support for PSFP stream gates.
Patch #8:  Adds support for PSFP stream filters.
Patch #9:  Adds a function to initialize flow-meters, stream gates and stream
           filters.
Patch #10: Adds the required flower code to configure PSFP using the tc command.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

8b7018fa

sparx5: add support for configuring PSFP via tc · 6ebf182b

Daniel Machon authored Feb 02, 2023

Add support for tc actions gate and police, in order to implement
support for configuring PSFP through tc.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6ebf182b

net: microchip: sparx5: initialize PSFP · e116b19d

Daniel Machon authored Feb 02, 2023

Initialize the SDLB's, stream gates and stream filters.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e116b19d

net: microchip: sparx5: add support for PSFP stream filters · ae3e691f

Daniel Machon authored Feb 02, 2023

Add support for configuring PSFP stream filters (IEEE 802.1Q-2018,
8.6.5.1.1).

The VCAP CLM (VCAP IS0 ingress classifier) classifies streams,
identified by ISDX (Ingress Service Index, frame metadata), and maps
ISDX to streams.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ae3e691f

net: microchip: sparx5: add support for PSFP stream gates · c70a5e2c

Daniel Machon authored Feb 02, 2023

Add support for configuring PSFP stream gates (IEEE 802.1Q-2018,
8.6.5.1.2).

Stream gates are time-based policers used by PSFP. Frames are dropped
based on the gate state (OPEN/ CLOSE), whose state will be altered based
on the Gate Control List (GCL) and current PTP time. Apart from
time-based policing, stream gates can alter egress queue selection for
the frames that pass through the Gate. This is done through Internal
Priority Selector (IPS). Stream gates are mapped from stream filters.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c70a5e2c

net: microchip: sparx5: add function for calculating PTP basetime · 9e02131e

Daniel Machon authored Feb 02, 2023

Add a new function for calculating PTP basetime, required by the stream
gate scheduler to calculate gate state (open / close).
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9e02131e

net: microchip: sparx5: add support for PSFP flow-meters · d2185e79

Daniel Machon authored Feb 02, 2023

Add support for configuring PSFP flow-meters (IEEE 802.1Q-2018,
8.6.5.1.3).

The VCAP CLM (VCAP IS0 ingress classifier) classifies streams,
identified by ISDX (Ingress Service Index, frame metadata), and maps
ISDX to flow-meters. SDLB's provide the flow-meter parameters.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d2185e79

net: microchip: sparx5: add support for service policers · 1db82abf

Daniel Machon authored Feb 02, 2023

Add initial API for configuring policers. This patch add support for
service policers.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1db82abf

net: microchip: sparx5: add support for Service Dual Leacky Buckets · 9bf50889

Daniel Machon authored Feb 02, 2023

Add support for Service Dual Leacky Buckets (SDLB), used to implement
PSFP flow-meters. Buckets are linked together in a leak chain of a leak
group. Leak groups a preconfigured to serve buckets within a certain
rate interval.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9bf50889

net: microchip: sparx5: add resource pools · bb535c0d

Daniel Machon authored Feb 02, 2023

Add resource pools and accessor functions. These pools can be queried by
the driver, whenever a finite resource is required. Some resources can
be reused, in which case an index and a reference count is used to keep
track of users.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bb535c0d

net: microchip: add registers needed for PSFP · edad83e2

Daniel Machon authored Feb 02, 2023

Add registers needed for PSFP. This patch also renames a single
register, shortening its name (SYS_CLK_PER_100PS). Uses have been update
accordingly.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

edad83e2

04 Feb, 2023 6 commits

net/mlx5e: Trigger NAPI after activating an SQ · 79efecb4

Maxim Mikityanskiy authored Aug 30, 2022

If an SQ is deactivated and reactivated again, some packets could be
sent after MLX5E_SQ_STATE_ENABLED is cleared, but before
netif_tx_stop_queue, meaning that NAPI might miss some completions. In
order to handle them, make sure to trigger NAPI after SQ activation in
all cases where it can be relevant. Regular SQs, XDP SQs and XSK SQs are
good. Missing cases added: after recovery, after activating HTB SQs and
after activating PTP SQs.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

79efecb4

net/mlx5e: IPsec, support upper protocol selector field offload · a7385187

Raed Salem authored Jan 11, 2023

Add support to policy/state upper protocol selector field offload,
this will enable to select traffic for IPsec operation based on l4
protocol (TCP/UDP) with specific source/destination port.
Signed-off-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

a7385187

net/mlx5e: IPoIB, Add support for XDR speed · ce231772

Dragos Tatulea authored Jan 19, 2023

Add XDR IB PTYS coding and XDR speed 200Gbps.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

ce231772

net/mlx5: Enhance debug print in page allocation failure · 7eef9300

Jack Morgenstein authored Jan 18, 2023

Provide more details to aid debugging.

Fixes: bf0bf77f ("mlx5: Support communicating arbitrary host page size to firmware")
Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com>
Signed-off-by: Majd Dibbiny <majd@nvidia.com>
Signed-off-by: Jack Morgenstein <jackm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

7eef9300

net/mlx5: Add firmware support for MTUTC scaled_ppm frequency adjustments · b63636b6

Rahul Rameshbabu authored Nov 21, 2022

When device is capable of handling scaled ppm values for adjusting
frequency, conversion to ppb will not be done by the driver. Instead, the
scaled ppm value will be passed directly to the device for the frequency
adjustment operation.
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

b63636b6

net/mlx5: Document support for RoCE HCA disablement capability · 04937a0f

Rahul Rameshbabu authored Jan 19, 2023

Some mlx5 devices are capable of disabling RoCE. In this situation,
disablement does not need to be handled at the driver level.
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

04937a0f