Commits · f5a699e00f045856b55a60ebf57e367fca389910 · Kirill Smelkov / linux

06 Jun, 2024 10 commits

net/mlx5e: SHAMPO, Skipping on duplicate flush of the same SHAMPO SKB · f5a699e0

Yoray Zack authored Jun 04, 2024

SHAMPO SKB can be flushed in mlx5e_shampo_complete_rx_cqe().
If the SKB was flushed, rq->hw_gro_data->skb was also set to NULL.

We can skip on flushing the SKB in mlx5e_shampo_flush_skb
if rq->hw_gro_data->skb == NULL.
Signed-off-by: Yoray Zack <yorayz@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240603212219.1037656-9-tariqt@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

f5a699e0

net/mlx5e: SHAMPO, Specialize mlx5e_fill_skb_data() · d34d7d19

Dragos Tatulea authored Jun 04, 2024

mlx5e_fill_skb_data() used to have multiple callers. But after the XDP
multibuf refactoring from commit 2cb0e27d ("net/mlx5e: RX, Prepare
non-linear striding RQ for XDP multi-buffer support") the SHAMPO code
path is the only caller.

Take advantage of this and specialize the function:
- Drop the redundant check.
- Assume that data_bcnt is > 0. This is needed in a downstream patch.

Rename the function as well to make things clear.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Suggested-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240603212219.1037656-8-tariqt@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

d34d7d19

net/mlx5e: SHAMPO, Simplify header page release in teardown · e839ac9a

Dragos Tatulea authored Jun 04, 2024

The function that releases SHAMPO header pages (mlx5e_shampo_dealloc_hd)
has some complicated logic that comes from the fact that it is called
twice during teardown:
1) To release the posted header pages that didn't get any completions.
2) To release all remaining header pages.

This flow is not necessary: all header pages can be released from the
driver side in one go. Furthermore, the above flow is buggy. Taking the
8 headers per page example:
1) Release fragments 5-7. Page will be released.
2) Release remaining fragments 0-4. The bits in the header will indicate
   that the page needs releasing. But this is incorrect: page was
   released in step 1.

This patch releases all header pages in one go. This simplifies the
header page cleanup function. For consistency, the datapath header
page release API (mlx5e_free_rx_shampo_hd_entry()) is used.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240603212219.1037656-7-tariqt@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

e839ac9a

net/mlx5e: SHAMPO, Disable gso_size for non GRO packets · 083dbb54

Dragos Tatulea authored Jun 04, 2024

When HW GRO is enabled, forwarding of packets is broken due to gso_size
being set incorrectly on non GRO packets.

Non GRO packets have a skb GRO count of 1. mlx5 always sets gso_size on
the skb, even for non GRO packets. It leans on the fact that gso_size is
normally reset in napi_gro_complete(). But this happens only for packets
from GRO'able protocols (TCP/UDP) that have a gro_receive() handler.

The problematic scenarios are:

1) Non GRO protocol packets are received, validate_xmit_skb() will drop
   them (see EPROTONOSUPPORT in skb_mac_gso_segment()). The fix for
   this case would be to not set gso_size at all for SHAMPO packets with
   header size 0.

2) Packets from a GRO'ed protocol (TCP) are received but immediately
   flushed because they are not GRO'able (TCP SYN for example).
   mlx5e_shampo_update_hdr(), which updates the remaining GRO state on
   the skb, is not called because skb GRO count is 1. The fix here would
   be to always call mlx5e_shampo_update_hdr(), regardless of skb GRO
   count. But this call is expensive

The unified fix for both cases is to reset gso_size before calling
napi_gro_receive(). It is a change that is more effective (no call to
mlx5e_shampo_update_hdr() necessary) and simple (smallest code
footprint).
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240603212219.1037656-6-tariqt@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

083dbb54

net/mlx5e: SHAMPO, Fix FCS config when HW GRO on · a64bbd8c

Dragos Tatulea authored Jun 04, 2024

For the following scenario:

ethtool --features eth3 rx-gro-hw on
ethtool --features eth3 rx-fcs on
ethtool --features eth3 rx-fcs off

... there is a firmware error because the driver enables HW GRO first
while FCS is still enabled.

This patch fixes this by swapping the order of HW GRO and FCS for this
specific case. Take LRO into consideration as well for consistency.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240603212219.1037656-5-tariqt@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

a64bbd8c

net/mlx5e: SHAMPO, Fix invalid WQ linked list unlink · fba83347

Dragos Tatulea authored Jun 04, 2024

When all the strides in a WQE have been consumed, the WQE is unlinked
from the WQ linked list (mlx5_wq_ll_pop()). For SHAMPO, it is possible
to receive CQEs with 0 consumed strides for the same WQE even after the
WQE is fully consumed and unlinked. This triggers an additional unlink
for the same wqe which corrupts the linked list.

Fix this scenario by accepting 0 sized consumed strides without
unlinking the WQE again.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240603212219.1037656-4-tariqt@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

fba83347

net/mlx5e: SHAMPO, Fix incorrect page release · 70bd03b8

Dragos Tatulea authored Jun 04, 2024

Under the following conditions:
1) No skb created yet
2) header_size == 0 (no SHAMPO header)
3) header_index + 1 % MLX5E_SHAMPO_WQ_HEADER_PER_PAGE == 0 (this is the
   last page fragment of a SHAMPO header page)

a new skb is formed with a page that is NOT a SHAMPO header page (it
is a regular data page). Further down in the same function
(mlx5e_handle_rx_cqe_mpwrq_shampo()), a SHAMPO header page from
header_index is released. This is wrong and it leads to SHAMPO header
pages being released more than once.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240603212219.1037656-3-tariqt@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

70bd03b8

net/mlx5e: SHAMPO, Use net_prefetch API · 4e92d247

Tariq Toukan authored Jun 04, 2024

Let the SHAMPO functions use the net-specific prefetch API,
similar to all other usages.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240603212219.1037656-2-tariqt@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

4e92d247

selftests: hsr: Extend the hsr_ping.sh test to use fixed MAC addresses · ed20142e

Lukasz Majewski authored Jun 03, 2024

Fixed MAC addresses help with debugging as last four bytes identify the
network namespace.
Signed-off-by: Lukasz Majewski <lukma@denx.de>
Link: https://lore.kernel.org/r/20240603093322.3150030-1-lukma@denx.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>

ed20142e

selftests: hsr: Extend the hsr_redbox.sh test to use fixed MAC addresses · 955edd87

Lukasz Majewski authored Jun 03, 2024

Fixed MAC addresses help with debugging as last four bytes identify the
network namespace.

Moreover, it allows to mimic the real life setup with for example bridge
having the same MAC address on each port.
Signed-off-by: Lukasz Majewski <lukma@denx.de>
Link: https://lore.kernel.org/r/20240603093322.3150030-2-lukma@denx.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>

955edd87

05 Jun, 2024 22 commits

Merge branch 'vmxnet3-upgrade-to-version-9' · d223d194

Jakub Kicinski authored Jun 05, 2024

Ronak Doshi says:

====================
vmxnet3: upgrade to version 9

vmxnet3 emulation has recently added timestamping feature which allows the
hypervisor (ESXi) to calculate latency from guest virtual NIC driver to all
the way up to the physical NIC. This patch series extends vmxnet3 driver
to leverage these new feature.

Compatibility is maintained using existing vmxnet3 versioning mechanism as
follows:
 - new features added to vmxnet3 emulation are associated with new vmxnet3
   version viz. vmxnet3 version 9.
 - emulation advertises all the versions it supports to the driver.
 - during initialization, vmxnet3 driver picks the highest version number
   supported by both the emulation and the driver and configures emulation
   to run at that version.

In particular, following changes are introduced:

Patch 1:
  This patch introduces utility macros for vmxnet3 version 9 comparison
  and updates Copyright information.

Patch 2:
  This patch adds support to timestamp the packets so as to allow latency
  measurement in the ESXi.

Patch 3:
  This patch adds support to disable certain offloads on the device based
  on the request specified by the user in the VM configuration.

Patch 4:
  With all vmxnet3 version 9 changes incorporated in the vmxnet3 driver,
  with this patch, the driver can configure emulation to run at vmxnet3
  version 9.
====================

Link: https://lore.kernel.org/r/20240531193050.4132-1-ronak.doshi@broadcom.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

d223d194

vmxnet3: update to version 9 · 63587234

Ronak Doshi authored May 31, 2024

With all vmxnet3 version 9 changes incorporated in the vmxnet3 driver,
the driver can configure emulation to run at vmxnet3 version 9, provided
the emulation advertises support for version 9.
Signed-off-by: Ronak Doshi <ronak.doshi@broadcom.com>
Acked-by: Guolin Yang <guolin.yang@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240531193050.4132-5-ronak.doshi@broadcom.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

63587234

vmxnet3: add command to allow disabling of offloads · 2e5010fd

Ronak Doshi authored May 31, 2024

This patch adds a new command to disable certain offloads. This
allows user to specify, using VM configuration, if certain offloads
need to be disabled.
Signed-off-by: Ronak Doshi <ronak.doshi@broadcom.com>
Acked-by: Guolin Yang <guolin.yang@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240531193050.4132-4-ronak.doshi@broadcom.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

2e5010fd

vmxnet3: add latency measurement support in vmxnet3 · 4c22fad7

Ronak Doshi authored May 31, 2024

This patch enhances vmxnet3 to support latency measurement.
This support will help to track the latency in packet processing
between guest virtual nic driver and host. For this purpose, we
introduce a new timestamp ring in vmxnet3 which will be per Tx/Rx
queue. This ring will be used to carry timestamp of the packets
which will be used to calculate the latency.

User can enable latency measurement using realtime knob in vnic
setting in VCenter.
Signed-off-by: Ronak Doshi <ronak.doshi@broadcom.com>
Acked-by: Guolin Yang <guolin.yang@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240531193050.4132-3-ronak.doshi@broadcom.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

4c22fad7

vmxnet3: prepare for version 9 changes · 4978478a

Ronak Doshi authored May 31, 2024

vmxnet3 is currently at version 7 and this patch initiates the
preparation to accommodate changes for up to version 9. Introduced
utility macros for vmxnet3 version 9 comparison and update Copyright
information.
Signed-off-by: Ronak Doshi <ronak.doshi@broadcom.com>
Acked-by: Guolin Yang <guolin.yang@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240531193050.4132-2-ronak.doshi@broadcom.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

4978478a

ionic: advertise 52-bit addressing limitation for MSI-X · 1467713e

David Christensen authored Jun 03, 2024

Current ionic devices only support 52 internal physical address
lines. This is sufficient for x86_64 systems which have similar
limitations but does not apply to all other architectures,
notably IBM POWER (ppc64). To ensure that MSI/MSI-X vectors are
not set outside the physical address limits of the NIC, set the
no_64bit_msi value of the pci_dev structure during device probe.
Signed-off-by: David Christensen <drc@linux.ibm.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://lore.kernel.org/r/20240603212747.1079134-1-drc@linux.ibm.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

1467713e

bnxt_en: fix atomic counter for ptp packets · c790275b

Vadim Fedorenko authored Jun 04, 2024

atomic_dec_if_positive returns new value regardless if it is updated or
not. The commit in fixes changed the behavior of the condition to one
that differs from original code. Restore original condition to properly
maintain atomic counter.

Fixes: 165f8769 ("bnxt_en: add timestamping statistics support")
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240604091939.785535-1-vadfed@meta.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

c790275b

Merge branch 'tcp-rto-min-us' · 54751f4d

David S. Miller authored Jun 05, 2024

Kevin Yang says:

====================
tcp: add sysctl_tcp_rto_min_us

Adding a sysctl knob to allow user to specify a default
rto_min at socket init time.

After this patch series, the rto_min will has multiple sources:
route option has the highest precedence, followed by the
TCP_BPF_RTO_MIN socket option, followed by this new
tcp_rto_min_us sysctl.

v3:
    fix typo, simplify min/max_t to min/max

v2:
    fit line width to 80 column.

v2: https://lore.kernel.org/netdev/20240530153436.2202800-1-yyd@google.com/
v1: https://lore.kernel.org/netdev/20240528171320.1332292-1-yyd@google.com/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

54751f4d

tcp: add sysctl_tcp_rto_min_us · f086edef

Kevin Yang authored Jun 03, 2024

Adding a sysctl knob to allow user to specify a default
rto_min at socket init time, other than using the hard
coded 200ms default rto_min.

Note that the rto_min route option has the highest precedence
for configuring this setting, followed by the TCP_BPF_RTO_MIN
socket option, followed by the tcp_rto_min_us sysctl.
Signed-off-by: Kevin Yang <yyd@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

f086edef

tcp: derive delack_max with tcp_rto_min helper · 512bd0f9

Kevin Yang authored Jun 03, 2024

Rto_min now has multiple sources, ordered by preprecedence high to
low: ip route option rto_min, icsk->icsk_rto_min.

When derive delack_max from rto_min, we should not only use ip
route option, but should use tcp_rto_min helper to get the correct
rto_min.
Signed-off-by: Kevin Yang <yyd@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

512bd0f9

tcp: annotate data-races around tw->tw_ts_recent and tw->tw_ts_recent_stamp · 69e0b33a

Eric Dumazet authored Jun 03, 2024

These fields can be read and written locklessly, add annotations
around these minor races.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

69e0b33a

octeontx2-af: Add debugfs support to dump NIX TM topology · b907194a

Anshumali Gaur authored Jun 03, 2024

This patch adds support to dump NIX transmit queue topology.
There are multiple levels of scheduling/shaping supported by
NIX and a packet traverses through multiple levels before sending
the packet out. At each level, there are set of scheduling/shaping
rules applied to a packet flow.

Each packet traverses through multiple levels
SQ->SMQ->TL4->TL3->TL2->TL1 and these levels are mapped in a parent-child
relationship.

This patch dumps the debug information related to all TM Levels in
the following way.

Example:
$ echo <nixlf> > /sys/kernel/debug/octeontx2/nix/tm_tree
$ cat /sys/kernel/debug/octeontx2/nix/tm_tree

A more desriptive set of registers at each level can be dumped
in the following way.

Example:
$ echo <nixlf> > /sys/kernel/debug/octeontx2/nix/tm_topo
$ cat /sys/kernel/debug/octeontx2/nix/tm_topo
Signed-off-by: Anshumali Gaur <agaur@marvell.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

b907194a

Merge branch 'devlink-const' · fd70f044

David S. Miller authored Jun 05, 2024

Christophe JAILLET says:

====================
devlink: Constify struct devlink_dpipe_table_ops

Patch 1 updates devl_dpipe_table_register() and struct
devlink_dpipe_table to accept "const struct devlink_dpipe_table_ops".

Then patch 2 updates the only user of this function.

This is compile tested only.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

fd70f044

mlxsw: spectrum_router: Constify struct devlink_dpipe_table_ops · b072aa78

Christophe JAILLET authored Jun 02, 2024

'struct devlink_dpipe_table_ops' are not modified in this driver.

Constifying these structures moves some data to a read-only section, so
increase overall security.

On a x86_64, with allmodconfig:
Before:
======
   text	   data	    bss	    dec	    hex	filename
  15557	    712	      0	  16269	   3f8d	drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.o

After:
=====
   text	   data	    bss	    dec	    hex	filename
  15789	    488	      0	  16277	   3f95	drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.o
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b072aa78

devlink: Constify the 'table_ops' parameter of devl_dpipe_table_register() · 82dc29b9

Christophe JAILLET authored Jun 02, 2024

"struct devlink_dpipe_table_ops" only contains some function pointers.

Update "struct devlink_dpipe_table" and the 'table_ops' parameter of
devl_dpipe_table_register() so that structures in drivers can be
constified.

Constifying these structures will move some data to a read-only section, so
increase overall security.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

82dc29b9

net: phy: aquantia: add support for PHY LEDs · 61578f67

Daniel Golle authored Jun 01, 2024

Aquantia Ethernet PHYs got 3 LED output pins which are typically used
to indicate link status and activity.
Add a minimal LED controller driver supporting the most common uses
with the 'netdev' trigger as well as software-driven forced control of
the LEDs.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
[ rework indentation, fix checkpatch error and improve some functions ]
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

61578f67

net: phy: aquantia: move priv and hw stat to header · c11d5dbb

Christian Marangi authored Jun 01, 2024

In preparation for LEDs support, move priv and hw stat to header to
reference priv struct also in other .c outside aquantia.main
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

c11d5dbb

net: ethtool: remove unused struct 'cable_test_tdr_req_info' · a23b0034

Dr. David Alan Gilbert authored Jun 01, 2024

'cable_test_tdr_req_info' is unused since the original
commit f2bc8ad3 ("net: ethtool: Allow PHY cable test TDR data to
configured").

Remove it.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

a23b0034

net: caif: remove unused structs · 6f49c3fb

Dr. David Alan Gilbert authored Jun 01, 2024

'cfpktq' has been unused since
commit 73d6ac63 ("caif: code cleanup").

'caif_packet_funcs' is declared but never defined.

Remove both of them.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6f49c3fb

net: remove NULL-pointer net parameter in ip_metrics_convert · 61e2bbaf

Jason Xing authored May 31, 2024

When I was doing some experiments, I found that when using the first
parameter, namely, struct net, in ip_metrics_convert() always triggers NULL
pointer crash. Then I digged into this part, realizing that we can remove
this one due to its uselessness.
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

61e2bbaf

net: bridge: fix an inconsistent indentation · cdbdb3c6

Chen Hanxiao authored May 31, 2024

Smatch complains:
net/bridge/br_netlink_tunnel.c:
   318 br_process_vlan_tunnel_info() warn: inconsistent indenting

Fix it with a proper indenting
Signed-off-by: Chen Hanxiao <chenhx.fnst@fujitsu.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

cdbdb3c6

dt-bindings: dsa: Rewrite Vitesse VSC73xx in schema · 3374136f

Linus Walleij authored May 30, 2024

This rewrites the Vitesse VSC73xx DSA switches DT binding in
schema.

It was a bit tricky since I needed to come up with some way
of applying the SPI properties only on SPI devices and not
platform devices, but I figured something out that works.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

3374136f

04 Jun, 2024 8 commits

Revert "ethernet: octeontx2: avoid linking objects into multiple modules" · a6ba5125

Jakub Kicinski authored Jun 04, 2024

This reverts commit 727c94c9.

Stephen reports that this commit causes a circular module dependency
for him. Revert, and we'll try to address the problem, again.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Link: https://lore.kernel.org/all/20240531152223.25591c8e@canb.auug.org.auSigned-off-by: Jakub Kicinski <kuba@kernel.org>

a6ba5125

openvswitch: Remove generic .ndo_get_stats64 · 2b438c57

Breno Leitao authored May 31, 2024

Commit 3e2f544d ("net: get stats64 if device if driver is
configured") moved the callback to dev_get_tstats64() to net core, so,
unless the driver is doing some custom stats collection, it does not
need to set .ndo_get_stats64.

Since this driver is now relying in NETDEV_PCPU_STAT_TSTATS, then, it
doesn't need to set the dev_get_tstats64() generic .ndo_get_stats64
function pointer.
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Subbaraya Sundeep <sbhatta@marvell.com>
Link: https://lore.kernel.org/r/20240531111552.3209198-2-leitao@debian.orgSigned-off-by: Paolo Abeni <pabeni@redhat.com>

2b438c57

openvswitch: Move stats allocation to core · 8c3fdff2

Breno Leitao authored May 31, 2024

With commit 34d21de9 ("net: Move {l,t,d}stats allocation to core and
convert veth & vrf"), stats allocation could be done on net core instead
of this driver.

With this new approach, the driver doesn't have to bother with error
handling (allocation failure checking, making sure free happens in the
right spot, etc). This is core responsibility now.

Move openvswitch driver to leverage the core allocation.
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://lore.kernel.org/r/20240531111552.3209198-1-leitao@debian.orgSigned-off-by: Paolo Abeni <pabeni@redhat.com>

8c3fdff2

Merge branch 'tcp-refactor-skb_cmp_decrypted-checks' · cd0057ad

Paolo Abeni authored Jun 04, 2024

Jakub Kicinski says:

====================
tcp: refactor skb_cmp_decrypted() checks

Refactor the input patch coalescing checks and wrap "EOR forcing"
logic into a helper. This will hopefully make the code easier to
follow. While at it throw some DEBUG_NET checks into skb_shift().
====================

Link: https://lore.kernel.org/r/20240530233616.85897-1-kuba@kernel.orgSigned-off-by: Paolo Abeni <pabeni@redhat.com>

cd0057ad

net: skb: add compatibility warnings to skb_shift() · 99b8add0

Jakub Kicinski authored May 30, 2024

According to current semantics we should never try to shift data
between skbs which differ on decrypted or pp_recycle status.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

99b8add0

tcp: add a helper for setting EOR on tail skb · 1be68a87

Jakub Kicinski authored May 30, 2024

TLS (and hopefully soon PSP will) use EOR to prevent skbs
with different decrypted state from getting merged, without
adding new tests to the skb handling. In both cases once
the connection switches to an "encrypted" state, all subsequent
skbs will be encrypted, so a single "EOR fence" is sufficient
to prevent mixing.

Add a helper for setting the EOR bit, to make this arrangement
more explicit.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

1be68a87

tcp: wrap mptcp and decrypted checks into tcp_skb_can_collapse_rx() · 07111530

Jakub Kicinski authored May 30, 2024

tcp_skb_can_collapse() checks for conditions which don't make
sense on input. Because of this we ended up sprinkling a few
pairs of mptcp_skb_can_collapse() and skb_cmp_decrypted() calls
on the input path. Group them in a new helper. This should make
it less likely that someone will check mptcp and not decrypted
or vice versa when adding new code.

This implicitly adds a decrypted check early in tcp_collapse().
AFAIU this will very slightly increase our ability to collapse
packets under memory pressure, not a real bug.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

07111530

Merge branch 'net-allow-dissecting-matching-tunnel-control-flags' · 2589d668

Paolo Abeni authored Jun 04, 2024

Davide Caratti says:

====================
net: allow dissecting/matching tunnel control flags

Ilya says: "for correct matching on decapsulated packets, we should match
on not only tunnel id and headers, but also on tunnel configuration flags
like TUNNEL_NO_CSUM and TUNNEL_DONT_FRAGMENT. This is done to distinguish
similar tunnels with slightly different configs. And it is important since
tunnel configuration is flow based, i.e. can be different for every packet,
even though the main tunnel port is the same."

 - patch 1 extends the kernel's flow dissector to extract these flags
   from the packet's tunnel metadata.
 - patch 2 extends TC flower to match on any combination of TUNNEL_NO_CSUM,
   TUNNEL_DONT_FRAGMENT, TUNNEL_OAM, TUNNEL_CRIT_OPT

v4:
 - fix kernel-doc warning in flow_dissector.h (thanks Jakub)

v3:
 - rebase on top of new uAPI bits and internals after commit 5832c4a7
   ("ip_tunnel: convert __be16 tunnel flags to bitmaps"). Use of network
   byte order is no more needed, since these bits match on metadata: convert
   netlink attributes to be u32.
 - also include TUNNEL_CRIT_OPT

v2:
 - use NL_REQ_ATTR_CHECK() where possible (thanks Jamal)
 - don't overwrite 'ret' in the error path of fl_set_key_flags()
====================

Link: https://lore.kernel.org/r/cover.1717088241.git.dcaratti@redhat.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>

2589d668