Commits · 1d294b4f903fc3beb18f8f20da3cec46d333daab · Kirill Smelkov / linux

01 Jul, 2024 11 commits

bnxt_en: Add TX timestamp completion logic · 1d294b4f

Michael Chan authored Jun 28, 2024

The new BCM5760X chips will return the timestamp of TX packets in a
new completion.  Add logic in __bnxt_poll_work() to handle this
completion type to retrieve the timestamp.  This feature eliminates
the limit on the number of in-flight PTP TX packets.
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1d294b4f

bnxt_en: Allow some TX packets to be unprocessed in NAPI · ba0155f1

Michael Chan authored Jun 28, 2024

The driver's current logic will always free all the TX SKBs up to
txr->tx_hw_cons within NAPI.  In the next patches, we'll be adding
logic to handle TX timestamp completion and we may need to hold
some remaining TX SKBs if we don't have the timestamp completions
yet.

Modify __bnxt_poll_work_done() to clear each event bit separately to
allow bnapi->tx_int() to decide whether to clear BNXT_TX_CMP_EVENT or
not.  bnapi->tx_int() will not clear BNXT_TX_CMP_EVENT if some TX
SKBs are held waiting for TX timestamps.  Note that legacy chips will
never hold any SKBs this way.  The SKB is always deferred to the PTP
worker slow path to retrieve the timestamp from firmware.  On the new
P7 chips, the timestamp is returned by the hardware directly and we
can retrieve it directly from NAPI.
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ba0155f1

bnxt_en: Add is_ts_pkt field to struct bnxt_sw_tx_bd · 449da975

Michael Chan authored Jun 28, 2024

Remove the unused is_gso field and add the is_ts_pkt field to struct
bnxt_sw_tx_bd. This field will mark the TX BD that has requested
HW TX timestamp. The field needs to be cleared if the timestamp packet
is later aborted. This field will be useful when processing the
new TX timestamp completion from the hardware in the next patches.
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

449da975

bnxt_en: Add new TX timestamp completion definitions · be6b7ca3

Michael Chan authored Jun 28, 2024

The new BCM5760X chips will generate this new TX timestamp completion
when a TX packet's timestamp has been taken right before transmission.
The driver logic to retrieve the timestamp will be added in the next
few patches.
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

be6b7ca3

octeontx2-af: Sync NIX and NPA contexts from NDC to LLC/DRAM · 42c45ac1

Nithin Dabilpuram authored Jun 28, 2024

Octeontx2 hardware uses Near Data Cache(NDC) block to cache
contexts in it so that access to LLC/DRAM can be avoided.
It is recommended in HRM to sync the NDC contents before
releasing/resetting LF resources. Hence implement NDC_SYNC
mailbox and sync contexts during driver teardown.
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

42c45ac1

net: tn40xx: add initial ethtool_ops support · 7433d034

FUJITA Tomonori authored Jun 28, 2024

Call phylink_ethtool_ksettings_get() for get_link_ksettings method and
ethtool_op_get_link() for get_link method.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

7433d034

Merge tag 'nf-next-24-06-28' of... · 1c5fc27b

David S. Miller authored Jul 01, 2024

Merge tag 'nf-next-24-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next into main

Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains Netfilter/IPVS updates for net-next:

Patch #1 to #11 to shrink memory consumption for transaction objects:

  struct nft_trans_chain { /* size: 120 (-32), cachelines: 2, members: 10 */
  struct nft_trans_elem { /* size: 72 (-40), cachelines: 2, members: 4 */
  struct nft_trans_flowtable { /* size: 80 (-48), cachelines: 2, members: 5 */
  struct nft_trans_obj { /* size: 72 (-40), cachelines: 2, members: 4 */
  struct nft_trans_rule { /* size: 80 (-32), cachelines: 2, members: 6 */
  struct nft_trans_set { /* size: 96 (-24), cachelines: 2, members: 8 */
  struct nft_trans_table { /* size: 56 (-40), cachelines: 1, members: 2 */

  struct nft_trans_elem can now be allocated from kmalloc-96 instead of
  kmalloc-128 slab.

  Series from Florian Westphal. For the record, I have mangled patch #1
  to add nft_trans_container_*() and use if for every transaction object.
   I have also added BUILD_BUG_ON to ensure struct nft_trans always comes
  at the beginning of the container transaction object. And few minor
  cleanups, any new bugs are of my own.

Patch #12 simplify check for SCTP GSO in IPVS, from Ismael Luceno.

Patch #13 nf_conncount key length remains in the u32 bound, from Yunjian Wang.

Patch #14 removes unnecessary check for CTA_TIMEOUT_L3PROTO when setting
          default conntrack timeouts via nfnetlink_cttimeout API, from
          Lin Ma.

Patch #15 updates NFT_SECMARK_CTX_MAXLEN to 4096, SELinux could use
          larger secctx names than the existing 256 bytes length.

Patch #16 adds a selftest to exercise nfnetlink_queue listeners leaving
          nfnetlink_queue, from Florian Westphal.

Patch #17 increases hitcount from 255 to 65535 in xt_recent, from Phil Sutter.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

1c5fc27b

Merge branch 'tcp_metrics-netlink-specs' into main · a051091c

David S. Miller authored Jul 01, 2024

Jakub Kicinski says:

====================
tcp_metrics: add netlink protocol spec in YAML

Add a netlink protocol spec for the tcp_metrics generic netlink family.
First patch adjusts the uAPI header guards to make it easier to build
tools/ with non-system headers.

v1: https://lore.kernel.org/all/20240626201133.2572487-1-kuba@kernel.org
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

a051091c

tcp_metrics: add netlink protocol spec in YAML · 85674625

Jakub Kicinski authored Jun 27, 2024

Add a protocol spec for tcp_metrics, so that it's accessible via YNL.
Useful at the very least for testing fixes.

In this episode of "10,000 ways to complicate netlink" the metric
nest has defines which are off by 1. iproute2 does:

        struct rtattr *m[TCP_METRIC_MAX + 1 + 1];

        parse_rtattr_nested(m, TCP_METRIC_MAX + 1, a);

        for (i = 0; i < TCP_METRIC_MAX + 1; i++) {
                // ...
                attr = m[i + 1];

This is too weird to support in YNL, add a new set of defines
with _correct_ values to the official kernel header.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

85674625

tcp_metrics: add UAPI to the header guard · 7c811005

Jakub Kicinski authored Jun 27, 2024

tcp_metrics' header lacks the customary _UAPI in the header guard.
This makes YNL build rules work less seamlessly.
We can easily fix that on YNL side, but this could also be
problematic if we ever needed to create a kernel-only tcp_metrics.h.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7c811005

net: phy: realtek: Add support for PHY LEDs on RTL8211F · 17784801

Marek Vasut authored Jun 25, 2024

Realtek RTL8211F Ethernet PHY supports 3 LED pins which are used to
indicate link status and activity. Add minimal LED controller driver
supporting the most common uses with the 'netdev' trigger.
Signed-off-by: Marek Vasut <marex@denx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

17784801

29 Jun, 2024 18 commits

Merge branch 'ethtool-track-custom-rss-contexts-in-the-core' · 30972a4e

Jakub Kicinski authored Jun 28, 2024

Edward Cree says:

====================
ethtool: track custom RSS contexts in the core

Make the core responsible for tracking the set of custom RSS contexts,
 their IDs, indirection tables, hash keys, and hash functions; this
 lets us get rid of duplicative code in drivers, and will allow us to
 support netlink dumps later.

This series only moves the sfc EF10 & EF100 driver over to the new API;
 other drivers (mvpp2, octeontx2, mlx5, sfc/siena, bnxt_en) can be converted
 afterwards and the legacy API removed.
====================

Link: https://patch.msgid.link/cover.1719502239.git.ecree.xilinx@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

30972a4e

sfc: remove get_rxfh_context dead code · b859316e

Edward Cree authored Jun 27, 2024

The core now always satisfies 'ethtool -x context nonzero' from its own
tracking, so our lookup code for that case is never called. Remove it.
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/b426fcc416dedc8f203e52eebef6891eccebe4c1.1719502240.git.ecree.xilinx@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

b859316e

net: ethtool: use the tracking array for get_rxfh on custom RSS contexts · 7964e788

Edward Cree authored Jun 27, 2024

On 'ethtool -x' with rss_context != 0, instead of calling the driver to
read the RSS settings for the context, just get the settings from the
rss_ctx xarray, and return them to the user with no driver involvement.
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/2d0190fa29638f307ea720f882ebd41f6f867694.1719502240.git.ecree.xilinx@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

7964e788

sfc: use new rxfh_context API · a9ee8d4a

Edward Cree authored Jun 27, 2024

The core is now responsible for allocating IDs and a memory region for
 us to store our state (struct efx_rss_context_priv), so we no longer
 need efx_alloc_rss_context_entry() and friends.
Since the contexts are now maintained by the core, use the core's lock
 (net_dev->ethtool->rss_lock), rather than our own mutex (efx->rss_lock),
 to serialise access against changes; and remove the now-unused
 efx->rss_lock from struct efx_nic.
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/150274740ea8cc137fef5502541ce573d32fb319.1719502240.git.ecree.xilinx@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

a9ee8d4a

net: ethtool: add a mutex protecting RSS contexts · 87925151

Edward Cree authored Jun 27, 2024

While this is not needed to serialise the ethtool entry points (which
are all under RTNL), drivers may have cause to asynchronously access
dev->ethtool->rss_ctx; taking dev->ethtool->rss_lock allows them to
do this safely without needing to take the RTNL.
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/7f9c15eb7525bf87af62c275dde3a8570ee8bf0a.1719502240.git.ecree.xilinx@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

87925151

net: ethtool: add an extack parameter to new rxfh_context APIs · 30a32cdf

Edward Cree authored Jun 27, 2024

Currently passed as NULL, but will allow drivers to report back errors
when ethnl support for these ops is added.
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/6e0012347d175fdd1280363d7bfa76a2f2777e17.1719502240.git.ecree.xilinx@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

30a32cdf

net: ethtool: let the core choose RSS context IDs · 847a8ab1

Edward Cree authored Jun 27, 2024

Add a new API to create/modify/remove RSS contexts, that passes in the
newly-chosen context ID (not as a pointer) rather than leaving the
driver to choose it on create. Also pass in the ctx, allowing drivers
to easily use its private data area to store their hardware-specific
state.
Keep the existing .set_rxfh API for now as a fallback, but deprecate it
for custom contexts (rss_context != 0).
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/45f1fe61df2163c091ec394c9f52000c8b16cc3b.1719502240.git.ecree.xilinx@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

847a8ab1

net: ethtool: record custom RSS contexts in the XArray · eac9122f

Edward Cree authored Jun 27, 2024

Since drivers are still choosing the context IDs, we have to force the
XArray to use the ID they've chosen rather than picking one ourselves,
and handle the case where they give us an ID that's already in use.
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/801f5faa4cec87c65b2c6e27fb220c944bce593a.1719502240.git.ecree.xilinx@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

eac9122f

net: ethtool: attach an XArray of custom RSS contexts to a netdevice · 6ad2962f

Edward Cree authored Jun 27, 2024

Each context stores the RXFH settings (indir, key, and hfunc) as well
as optionally some driver private data.
Delete any still-existing contexts at netdev unregister time.
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/cbd1c402cec38f2e03124f2ab65b4ae4e08bd90d.1719502240.git.ecree.xilinx@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

6ad2962f

net: move ethtool-related netdev state into its own struct · 3ebbd9f6

Edward Cree authored Jun 27, 2024

net_dev->ethtool is a pointer to new struct ethtool_netdev_state, which
currently contains only the wol_enabled field.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/293a562278371de7534ed1eb17531838ca090633.1719502239.git.ecree.xilinx@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

3ebbd9f6

Merge branch 'selftests-drv-net-add-ability-to-schedule-cleanup-with-defer' · c2dd2139

Jakub Kicinski authored Jun 28, 2024

Jakub Kicinski says:

====================
selftests: drv-net: add ability to schedule cleanup with defer()

Introduce a defer / cleanup mechanism for driver selftests.
More detailed info in the second patch.
====================

Link: https://patch.msgid.link/20240627185502.3069139-1-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

c2dd2139

selftests: drv-net: rss_ctx: convert to defer() · 0759356b

Jakub Kicinski authored Jun 27, 2024

Use just added defer().
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20240627185502.3069139-4-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

0759356b

selftests: drv-net: add ability to schedule cleanup with defer() · 8510801a

Jakub Kicinski authored Jun 27, 2024

This implements what I was describing in [1]. When writing a test
author can schedule cleanup / undo actions right after the creation
completes, eg:

  cmd("touch /tmp/file")
  defer(cmd, "rm /tmp/file")

defer() takes the function name as first argument, and the rest are
arguments for that function. defer()red functions are called in
inverse order after test exits. It's also possible to capture them
and execute earlier (in which case they get automatically de-queued).

  undo = defer(cmd, "rm /tmp/file")
  # ... some unsafe code ...
  undo.exec()

As a nice safety all exceptions from defer()ed calls are captured,
printed, and ignored (they do make the test fail, however).
This addresses the common problem of exceptions in cleanup paths
often being unhandled, leading to potential leaks.

There is a global action queue, flushed by ksft_run(). We could support
function level defers too, I guess, but there's no immediate need..

Link: https://lore.kernel.org/all/877cedb2ki.fsf@nvidia.com/ # [1]
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20240627185502.3069139-3-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

8510801a

selftests: net: ksft: avoid continue when handling results · 147997af

Jakub Kicinski authored Jun 27, 2024

Exception handlers print the result and use continue
to skip the non-exception result printing. This makes
inserting common post-test code hard. Refactor to
avoid the continues and have only one ktap_result() call.
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20240627185502.3069139-2-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

147997af

enic: add ethtool get_channel support · bf7bb7b4

Jon Kohler authored Jun 27, 2024

Add .get_channel to enic_ethtool_ops to enable basic ethtool -l
support to get the current channel configuration.

Note that the driver does not support dynamically changing queue
configuration, so .set_channel is intentionally unused. Instead, users
should use Cisco's hardware management tools (UCSM/IMC) to modify
virtual interface card configuration out of band.
Signed-off-by: Jon Kohler <jon@nutanix.com>
Link: https://patch.msgid.link/20240627202013.2398217-1-jon@nutanix.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

bf7bb7b4

Merge branch 'lift-udp_segment-restriction-for-egress-via-device-w-o-csum-offload' · db2dede2

Jakub Kicinski authored Jun 28, 2024

Jakub Sitnicki says:

====================
Lift UDP_SEGMENT restriction for egress via device w/o csum offload

This is a follow-up to an earlier question [1] if we can make UDP GSO work
with any egress device, even those with no checksum offload capability.
That's the default setup for TUN/TAP.

Because there is a change in behavior - sendmsg() does no longer return
EIO error - I'm submitting through net-next tree, rather than net,
as per Willem's advice.

[1] https://lore.kernel.org/netdev/87jzqsld6q.fsf@cloudflare.com/

v1: https://lore.kernel.org/r/20240622-linux-udpgso-v1-0-d2344157ab2a@cloudflare.com
====================

Link: https://patch.msgid.link/20240626-linux-udpgso-v2-0-422dfcbd6b48@cloudflare.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

db2dede2

selftests/net: Add test coverage for UDP GSO software fallback · 3e400219

Jakub Sitnicki authored Jun 26, 2024

Extend the existing test to exercise UDP GSO egress through devices with
various offload capabilities, including lack of checksum offload, which is
the default case for TUN/TAP devices.

Test against a dummy device because it is simpler to set up then TUN/TAP.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240626-linux-udpgso-v2-2-422dfcbd6b48@cloudflare.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

3e400219

udp: Allow GSO transmit from devices with no checksum offload · 10154dbd

Jakub Sitnicki authored Jun 26, 2024

Today sending a UDP GSO packet from a TUN device results in an EIO error:

  import fcntl, os, struct
  from socket import *

  TUNSETIFF = 0x400454CA
  IFF_TUN = 0x0001
  IFF_NO_PI = 0x1000
  UDP_SEGMENT = 103

  tun_fd = os.open("/dev/net/tun", os.O_RDWR)
  ifr = struct.pack("16sH", b"tun0", IFF_TUN | IFF_NO_PI)
  fcntl.ioctl(tun_fd, TUNSETIFF, ifr)

  os.system("ip addr add 192.0.2.1/24 dev tun0")
  os.system("ip link set dev tun0 up")

  s = socket(AF_INET, SOCK_DGRAM)
  s.setsockopt(SOL_UDP, UDP_SEGMENT, 1200)
  s.sendto(b"x" * 3000, ("192.0.2.2", 9)) # EIO

This is due to a check in the udp stack if the egress device offers
checksum offload. While TUN/TAP devices, by default, don't advertise this
capability because it requires support from the TUN/TAP reader.

However, the GSO stack has a software fallback for checksum calculation,
which we can use. This way we don't force UDP_SEGMENT users to handle the
EIO error and implement a segmentation fallback.

Lift the restriction so that UDP_SEGMENT can be used with any egress
device. We also need to adjust the UDP GSO code to match the GSO stack
expectation about ip_summed field, as set in commit 8d63bee6 ("net:
avoid skb_warn_bad_offload false positives on UFO"). Otherwise we will hit
the bad offload check.

Users should, however, expect a potential performance impact when
batch-sending packets with UDP_SEGMENT without checksum offload on the
egress device. In such case the packet payload is read twice: first during
the sendmsg syscall when copying data from user memory, and then in the GSO
stack for checksum computation. This double memory read can be less
efficient than a regular sendmsg where the checksum is calculated during
the initial data copy from user memory.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240626-linux-udpgso-v2-1-422dfcbd6b48@cloudflare.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

10154dbd

28 Jun, 2024 11 commits

netfilter: xt_recent: Lift restrictions on max hitcount value · f4ebd034

Phil Sutter authored Jun 27, 2024

Support tracking of up to 65535 packets per table entry instead of just
255 to better facilitate longer term tracking or higher throughput
scenarios.

Note how this aligns sizes of struct recent_entry's 'nstamps' and
'index' fields when 'nstamps' was larger before. This is unnecessary as
the value of 'nstamps' grows along with that of 'index' after being
initialized to 1 (see recent_entry_update()). Its value will thus never
exceed that of 'index' and therefore does not need to provide space for
larger values.
Requested-by: Fabio <pedretti.fabio@gmail.com>
Link: https://bugzilla.netfilter.org/show_bug.cgi?id=1745Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

f4ebd034

selftests: netfilter: nft_queue.sh: add test for disappearing listener · 742ad979

Florian Westphal authored Jun 25, 2024

If userspace program exits while the queue its subscribed to has packets
those need to be discarded.

commit dc21c6cc ("netfilter: nfnetlink_queue: acquire rcu_read_lock()
in instance_destroy_rcu()") fixed a (harmless) rcu splat that could be
triggered in this case.

Add a test case to cover this.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

742ad979

Merge branch 'net-selftests-mirroring-cleanup' into main · 748e3bbf

David S. Miller authored Jun 28, 2024

Petr Machata says:

====================
selftest: Clean-up and stabilize mirroring tests

The mirroring selftests work by sending ICMP traffic between two hosts.
Along the way, this traffic is mirrored to a gretap netdevice, and counter
taps are then installed strategically along the path of the mirrored
traffic to verify the mirroring took place.

The problem with this is that besides mirroring the primary traffic, any
other service traffic is mirrored as well. At the same time, because the
tests need to work in HW-offloaded scenarios, the ability of the device to
do arbitrary packet inspection should not be taken for granted. Most tests
therefore simply use matchall, one uses flower to match on IP address.
As a result, the selftests are noisy.

mirror_test() accommodated this noisiness by giving the counters an
allowance of several packets. But that only works up to a point, and on
busy systems won't be always enough.

In this patch set, clean up and stabilize the mirroring selftests. The
original intention was to port the tests over to UDP, but the logic of
ICMP ends up being so entangled in the mirroring selftests that the
changes feel overly invasive. Instead, ICMP is kept, but where possible,
we match on ICMP message type, thus filtering out hits by other ICMP
messages.

Where this is not practical (where the counter tap is put on a device
that carries encapsulated packets), switch the counter condition to _at
least_ X observed packets. This is less robust, but barely so --
probably the only scenario that this would not catch is something like
erroneous packet duplication, which would hopefully get caught by the
numerous other tests in this extensive suite.

- Patches #1 to #3 clean up parameters at various helpers.

- Patches #4 to #6 stabilize the mirroring selftests as described above.

- Mirroring tests currently allow testing SW datapath even on HW
  netdevices by trapping traffic to the SW datapath. This complicates
  the tests a bit without a good reason: to test SW datapath, just run
  the selftests on the veth topology. Thus in patch #7, drop support for
  this dual SW/HW testing.

- At this point, some cleanups were either made possible by the previous
  patches, or were always possible. In patches #8 to #11, realize these
  cleanups.

- In patch #12, fix mlxsw mirror_gre selftest to respect setting TESTS.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

748e3bbf

selftests: mlxsw: mirror_gre: Obey TESTS · 098ba97d

Petr Machata authored Jun 27, 2024

This test is unusual in that overriding TESTS does not change the tests to
be run. Split the individual tests into several functions and invoke them
through tests_run() as appropriate.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

098ba97d

selftests: libs: Drop unused functions · 06704a0d

Petr Machata authored Jun 27, 2024

Nothing calls these.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

06704a0d

selftests: libs: Drop slow_path_trap_install()/_uninstall() · 4e9cd3d0

Petr Machata authored Jun 27, 2024

These functions are not used anymore.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4e9cd3d0

selftests: mirror_gre_lag_lacp: Drop unnecessary code · 95d33989

Petr Machata authored Jun 27, 2024

The selftest does not use functions from mirror_gre_lib, ditch the import.

It does not use arping either, so drop the require_command as well.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

95d33989

selftests: mlxsw: mirror_gre: Simplify · 388b2d98

Petr Machata authored Jun 27, 2024

After the previous patch, the function test_span_failable() is always
called with should_fail=1. Drop the argument and streamline the code.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

388b2d98

selftests: mirror: Drop dual SW/HW testing · d361d78f

Petr Machata authored Jun 27, 2024

The mirroring tests are currently run in a skip_hw and optionally a skip_sw
mode. The former tests the SW datapath, the latter the HW datapath, if
available. In order to be able to test SW datapath on HW loopbacks, traps
are installed on ingress to get traffic from the HW datapath to the SW one.
This adds an unnecessary complexity when it would be much simpler to just
use a veth-based topology to test the SW datapath. Thus drop all the code
that supports this dual testing.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d361d78f

selftests: mirror: mirror_test(): Allow exact count of packets · a86e0df9

Petr Machata authored Jun 27, 2024

The mirroring selftests work by sending ICMP traffic between two hosts.
Along the way, this traffic is mirrored to a gretap netdevice, and counter
taps are then installed strategically along the path of the mirrored
traffic to verify the mirroring took place.

The problem with this is that besides mirroring the primary traffic, any
other service traffic is mirrored as well. At the same time, because the
tests need to work in HW-offloaded scenarios, the ability of the device to
do arbitrary packet inspection should not be taken for granted. Most tests
therefore simply use matchall, one uses flower to match on IP address.

As a result, the selftests are noisy, because besides the primary ICMP
traffic, any amount of other service traffic is mirrored as well.

mirror_test() accommodated this noisiness by giving the counters an
allowance of several packets. But in the previous patch, where possible,
counter taps were changed to match only on an exact ICMP message. At least
in those cases, we can demand an exact number of packets to match.

Where the tap is installed on a connective netdevice, the exact matching is
not practical (though with u32, anything is possible). In those places,
there should still be some leeway -- and probably bigger than before,
because experience shows that these tests are very noisy.

To that end, change mirror_test() so that it can be either called with an
exact number to expect, or with an expression. Where leeway is needed,
adjust callers to pass a ">= 10" instead of mere 10.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a86e0df9

selftests: mirror: do_test_span_dir_ips(): Install accurate taps · 83341535

Petr Machata authored Jun 27, 2024

As a result, the selftests are noisy, because besides the primary ICMP
traffic, any amount of other service traffic is mirrored as well.

However, often the counter tap is installed at the remote end of the gretap
tunnel. Since this is a SW-datapath scenario anyway, we can make the filter
arbitrarily accurate.

Thus in this patch, add parameters forward_type and backward_type to
several mirroring test helpers, as some other helpers already have. Then
change do_test_span_dir_ips() to instead of installing one generic tap and
using it for test in both directions, install the tap for each direction
separately, matching on the ICMP type given by these parameters.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

83341535