Commits · 026763ece881b4c636173c25d012cde085689027 · Kirill Smelkov / linux

08 Mar, 2024 40 commits

ipv6: raw: check sk->sk_rcvbuf earlier · 026763ec

Eric Dumazet authored Mar 07, 2024

There is no point cloning an skb and having to free the clone
if the receive queue of the raw socket is full.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20240307162943.2523817-1-edumazet@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

026763ec

nexthop: Simplify dump error handling · 5d9b7cb3

Ido Schimmel authored Mar 07, 2024

The only error that can happen during a nexthop dump is insufficient
space in the skb caring the netlink messages (EMSGSIZE). If this happens
and some messages were already filled in, the nexthop code returns the
skb length to signal the netlink core that more objects need to be
dumped.

After commit b5a89915 ("netlink: handle EMSGSIZE errors in the
core") there is no need to handle this error in the nexthop code as it
is now handled in the core.

Simplify the code and simply return the error to the core.

No regressions in nexthop tests:

 # ./fib_nexthops.sh
 Tests passed: 234
 Tests failed:   0
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240307154727.3555462-1-idosch@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

5d9b7cb3

net: add skb_data_unref() helper · 1cface55

Eric Dumazet authored Mar 07, 2024

Similar to skb_unref(), add skb_data_unref() to save an expensive
atomic operation (and cache line dirtying) when last reference
on shinfo->dataref is released.

I saw this opportunity on hosts with RAW sockets accidentally
bound to UDP protocol, forcing an skb_clone() on all received packets.

These RAW sockets had their receive queue full, so all clone
packets were immediately dropped.

When UDP recvmsg() consumes later the original skb, skb_release_data()
is hitting atomic_sub_return() quite badly, because skb->clone
has been set permanently.

Note that this patch helps TCP TX performance, because
TCP stack also use (fast) clones.

This means that at least one of the two packets (the main skb or
its clone) will no longer have to perform this atomic operation
in skb_release_data().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240307123446.2302230-1-edumazet@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

1cface55

Merge tag 'wireless-next-2024-03-08' of... · 75c2946d

Jakub Kicinski authored Mar 08, 2024

Merge tag 'wireless-next-2024-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next

Kalle Valo says:

====================
wireless-next patches for v6.9

The fourth "new features" pull request for v6.9 with changes both in
stack and in drivers. The theme in this pull request is to fix sparse
warnings but we still have some left in wireless subsystem. Otherwise
quite normal.

Major changes:

rtw89
 * NL80211_EXT_FEATURE_SCAN_RANDOM_SN support
 * NL80211_EXT_FEATURE_SET_SCAN_DWELL support

rtw88
 * support for more rtw8811cu and rtw8821cu devices

mt76
 * mt76x2u: add Netgear WNDA3100v3 USB
 * mt7915: newer ADIE version support
 * mt7925: radio temperature sensor support
 * mt7996: remove GCMP IGTK offload

* tag 'wireless-next-2024-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (125 commits)
  wifi: rtw89: wow: move release offload packet earlier for WoWLAN mode
  wifi: rtw89: wow: set security engine options for 802.11ax chips only
  wifi: rtw89: update suspend/resume for different generation
  wifi: rtw89: wow: update config mac function with different generation
  wifi: rtw89: update DMA function with different generation
  wifi: rtw89: wow: update WoWLAN status register for different generation
  wifi: rtw89: wow: update WoWLAN reason register for different chips
  wifi: brcm80211: handle pmk_op allocation failure
  wifi: rtw89: coex: Add coexistence policy to decrease WiFi packet CRC-ERR
  wifi: rtw89: coex: When Bluetooth not available don't set power/gain
  wifi: rtw89: coex: add return value to ensure H2C command is success or not
  wifi: rtw89: coex: Reorder H2C command index to align with firmware
  wifi: rtw89: coex: add BTC ctrl_info version 7 and related logic
  wifi: rtw89: coex: add init_info H2C command format version 7
  wifi: rtw89: 8922a: add coexistence helpers of SW grant
  wifi: rtw89: mac: add coexistence helpers {cfg/get}_plt
  wifi: cw1200: restore endian swapping
  wifi: wlcore: sdio: Rate limit wl12xx_sdio_raw_{read,write}() failures warns
  wifi: rtlwifi: Remove rtl_intf_ops.read_efuse_byte
  wifi: rtw88: 8821c: Fix false alarm count
  ...
====================

Link: https://lore.kernel.org/r/20240308100429.B8EA2C433F1@smtp.kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

75c2946d

Merge branch 'hns3-fixes' · 19cfdc0d

David S. Miller authored Mar 08, 2024

Jijie Shao says:

====================
There are some bugfix for the HNS3 ethernet driver

There are some bugfix for the HNS3 ethernet driver
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

19cfdc0d

net: hns3: add checking for vf id of mailbox · 4e2969a0

Jian Shen authored Mar 07, 2024

Add checking for vf id of mailbox, in order to avoid array
out-of-bounds risk.
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4e2969a0

net: hns3: fix port duplex configure error in IMP reset · 11d80f79

Jie Wang authored Mar 07, 2024

Currently, the mac port is fixed to configured as full dplex mode in
hclge_mac_init() when driver initialization or reset restore. Users may
change the mode to half duplex with ethtool,  so it may cause the user
configuration dropped after reset.

To fix it, don't change the duplex mode when resetting.

Fixes: 2d03eacc ("net: hns3: Only update mac configuation when necessary")
Signed-off-by: Jie Wang <wangjie125@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

11d80f79

net: hns3: fix reset timeout under full functions and queues · 216bc415

Peiyang Wang authored Mar 07, 2024

The cmdq reset command times out when all VFs are enabled and the queue is
full. The hardware processing time exceeds the timeout set by the driver.
In order to avoid the above extreme situations, the driver extends the
reset timeout to 1 second.
Signed-off-by: Peiyang Wang <wangpeiyang1@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

216bc415

net: hns3: fix delete tc fail issue · 03f92287

Jijie Shao authored Mar 07, 2024

When the tc is removed during reset, hns3 driver will return a errcode.
But kernel ignores this errcode, As a result,
the driver status is inconsistent with the kernel status.

This patch retains the deletion status when the deletion fails
and continues to delete after the reset to ensure that
the status of the driver is consistent with that of kernel.
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

03f92287

net: hns3: fix kernel crash when 1588 is received on HIP08 devices · 0fbcf236

Yonglong Liu authored Mar 07, 2024

The HIP08 devices does not register the ptp devices, so the
hdev->ptp is NULL, but the hardware can receive 1588 messages,
and set the HNS3_RXD_TS_VLD_B bit, so, if match this case, the
access of hdev->ptp->flags will cause a kernel crash:

[ 5888.946472] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000018
[ 5888.946475] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000018
...
[ 5889.266118] pc : hclge_ptp_get_rx_hwts+0x40/0x170 [hclge]
[ 5889.272612] lr : hclge_ptp_get_rx_hwts+0x34/0x170 [hclge]
[ 5889.279101] sp : ffff800012c3bc50
[ 5889.283516] x29: ffff800012c3bc50 x28: ffff2040002be040
[ 5889.289927] x27: ffff800009116484 x26: 0000000080007500
[ 5889.296333] x25: 0000000000000000 x24: ffff204001c6f000
[ 5889.302738] x23: ffff204144f53c00 x22: 0000000000000000
[ 5889.309134] x21: 0000000000000000 x20: ffff204004220080
[ 5889.315520] x19: ffff204144f53c00 x18: 0000000000000000
[ 5889.321897] x17: 0000000000000000 x16: 0000000000000000
[ 5889.328263] x15: 0000004000140ec8 x14: 0000000000000000
[ 5889.334617] x13: 0000000000000000 x12: 00000000010011df
[ 5889.340965] x11: bbfeff4d22000000 x10: 0000000000000000
[ 5889.347303] x9 : ffff800009402124 x8 : 0200f78811dfbb4d
[ 5889.353637] x7 : 2200000000191b01 x6 : ffff208002a7d480
[ 5889.359959] x5 : 0000000000000000 x4 : 0000000000000000
[ 5889.366271] x3 : 0000000000000000 x2 : 0000000000000000
[ 5889.372567] x1 : 0000000000000000 x0 : ffff20400095c080
[ 5889.378857] Call trace:
[ 5889.382285] hclge_ptp_get_rx_hwts+0x40/0x170 [hclge]
[ 5889.388304] hns3_handle_bdinfo+0x324/0x410 [hns3]
[ 5889.394055] hns3_handle_rx_bd+0x60/0x150 [hns3]
[ 5889.399624] hns3_clean_rx_ring+0x84/0x170 [hns3]
[ 5889.405270] hns3_nic_common_poll+0xa8/0x220 [hns3]
[ 5889.411084] napi_poll+0xcc/0x264
[ 5889.415329] net_rx_action+0xd4/0x21c
[ 5889.419911] __do_softirq+0x130/0x358
[ 5889.424484] irq_exit+0x134/0x154
[ 5889.428700] __handle_domain_irq+0x88/0xf0
[ 5889.433684] gic_handle_irq+0x78/0x2c0
[ 5889.438319] el1_irq+0xb8/0x140
[ 5889.442354] arch_cpu_idle+0x18/0x40
[ 5889.446816] default_idle_call+0x5c/0x1c0
[ 5889.451714] cpuidle_idle_call+0x174/0x1b0
[ 5889.456692] do_idle+0xc8/0x160
[ 5889.460717] cpu_startup_entry+0x30/0xfc
[ 5889.465523] secondary_start_kernel+0x158/0x1ec
[ 5889.470936] Code: 97ffab78 f9411c14 91408294 f9457284 (f9400c80)
[ 5889.477950] SMP: stopping secondary CPUs
[ 5890.514626] SMP: failed to stop secondary CPUs 0-69,71-95
[ 5890.522951] Starting crashdump kernel...

Fixes: 0bf5eb78 ("net: hns3: add support for PTP")
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0fbcf236

net: hns3: Disable SerDes serial loopback for HiLink H60 · 0448825b

Hao Lan authored Mar 07, 2024

When the hilink version is H60, the serdes serial loopback test is not
supported. This patch add hilink version detection. When the version
is H60, the serdes serial loopback test will be disable.
Signed-off-by: Hao Lan <lanhao@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0448825b

net: hns3: add new 200G link modes for hisilicon device · dd1f65f0

Hao Lan authored Mar 07, 2024

The hisilicon device now supports a new 200G link interface,
which query from firmware in a new bit. Therefore,
the HCLGE_SUPPORT_200G_R4_BIT capability bit has been added.
The HCLGE_SUPPORT_200G_BIT has been renamed as
HCLGE_SUPPORT_200G_R4_EXT_BIT, and the firmware has
extended support for this mode.

Fixes: ae6f010c ("net: hns3: add support for 200G device")
Signed-off-by: Hao Lan <lanhao@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dd1f65f0

net: hns3: fix wrong judgment condition issue · 07a1d6dc

Jijie Shao authored Mar 07, 2024

In hns3_dcbnl_ieee_delapp, should check ieee_delapp not ieee_setapp.
This path fix the wrong judgment.

Fixes: 0ba22bcb ("net: hns3: add support config dscp map to tc")
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

07a1d6dc

Merge branch 'ionic-diet' · 147a1c06

David S. Miller authored Mar 08, 2024

Shannon Nelson says:

====================
ionic: putting ionic on a diet

Building on the performance work done in the previous patchset
    [Link] https://lore.kernel.org/netdev/20240229193935.14197-1-shannon.nelson@amd.com/
this patchset puts the ionic driver on a diet, decreasing the memory
requirements per queue, and simplifies a few more bits of logic.

We trimmed the queue management structs and gained some ground, but
the most savings came from trimming the individual buffer descriptors.
The original design used a single generic buffer descriptor for Tx, Rx and
Adminq needs, but the Rx and Adminq descriptors really don't need all the
info that the Tx descriptors track.  By splitting up the descriptor types
we can significantly reduce the descriptor sizes for Rx and Adminq use.

There is a small reduction in the queue management structs, saving about
3 cachelines per queuepair:

    ionic_qcq:
	Before:	/* size: 2176, cachelines: 34, members: 23 */
	After:	/* size: 2048, cachelines: 32, members: 23 */

We also remove an array of completion descriptor pointers, or about
8 Kbytes per queue.

But the biggest savings came from splitting the desc_info struct into
queue specific structs and trimming out what was unnecessary.

    Before:
	ionic_desc_info:
		/* size: 496, cachelines: 8, members: 10 */
    After:
	ionic_tx_desc_info:
		/* size: 496, cachelines: 8, members: 6 */
	ionic_rx_desc_info:
		/* size: 224, cachelines: 4, members: 2 */
	ionic_admin_desc_info:
		/* size: 8, cachelines: 1, members: 1 */

In a 64 core host the ionic driver will default to 64 queuepairs of
1024 descriptors for Rx, 1024 for Tx, and 80 for Adminq and Notifyq.

The total memory usage for 64 queues:
    Before:
	  65 * sizeof(ionic_qcq)			   141,440
	+ 64 * 1024 * sizeof(ionic_desc_info)		32,505,856
	+ 64 * 1024 * sizeof(ionic_desc_info)		32,505,856
	+ 64 * 1024 * 2 * sizeof(ionic_qc_info)		    16,384
	+  1 *   80 * sizeof(ionic_desc_info)		    39,690
							----------
							65,201,038

    After:
	  65 * sizeof(ionic_qcq)			   133,120
	+ 64 * 1024 * sizeof(ionic_tx_desc_info)	32,505,856
	+ 64 * 1024 * sizeof(ionic_rx_desc_info)	14,680,064
	+                           (removed)		         0
	+  1 *   80 * sizeof(ionic_admin desc_info)	       640
							----------
							47,319,680

This saves us approximately 18 Mbytes per port in a 64 core machine,
a 28% savings in our memory needs.

In addition, this improves our simple single thread / single queue
iperf case on a 9100 MTU connection from 86.7 to 95 Gbits/sec.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

147a1c06

ionic: keep stats struct local to error handling · 2854242d

Shannon Nelson authored Mar 06, 2024

When possible, keep the stats struct references strictly
in the error handling blocks and out of the fastpath.
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2854242d

ionic: better dma-map error handling · 56e41ee1

Shannon Nelson authored Mar 06, 2024

Fix up a couple of small dma_addr handling issues
  - don't double-count dma-map-err stat in ionic_tx_map_skb()
    or ionic_xdp_post_frame()
  - return 0 on error from both ionic_tx_map_single() and
    ionic_tx_map_frag() and check for !dma_addr in ionic_tx_map_skb()
    and ionic_xdp_post_frame()
  - be sure to unmap buf_info[0] in ionic_tx_map_skb() error path
  - don't assign rx buf->dma_addr until error checked in ionic_rx_page_alloc()
  - remove unnecessary dma_addr_t casts
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

56e41ee1

ionic: remove unnecessary NULL test · a12c1e7a

Shannon Nelson authored Mar 06, 2024

We call ionic_rx_page_alloc() only on existing buf_info structs from
ionic_rx_fill().  There's no need for the additional NULL test.
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a12c1e7a

ionic: rearrange ionic_queue for better layout · 4554341d

Shannon Nelson authored Mar 06, 2024

A simple change to the struct ionic_queue layout removes some
unnecessary padding and saves us a cacheline in the struct
ionic_qcq layout.

    struct ionic_queue {
	Before: /* size: 256, cachelines: 4, members: 29 */
	After:  /* size: 192, cachelines: 3, members: 29 */

    struct ionic_qcq {
	Before: /* size: 2112, cachelines: 33, members: 23 */
	After:  /* size: 2048, cachelines: 32, members: 23 */
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4554341d

ionic: rearrange ionic_qcq · 453538c5

Shannon Nelson authored Mar 06, 2024

Rearange a few fields for better cache use and to put the
flags field up into the first cacheline rather than the last.

    struct ionic_qcq
	Before: /* size: 2176, cachelines: 34, members: 23 */
	After:  /* size: 2112, cachelines: 33, members: 23 */
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

453538c5

ionic: carry idev in ionic_cq struct · 01658924

Shannon Nelson authored Mar 06, 2024

Remove the idev field from ionic_queue, which saves us a
bit of space, and add it into ionic_cq where there's room
within some cacheline padding.  Use this pointer rather
than doing a multi level reference from lif->ionic.
Suggested-by: Neel Patel <npatel2@amd.com>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

01658924

ionic: refactor skb building · 36a47c90

Shannon Nelson authored Mar 06, 2024

The existing ionic_rx_frags() code is a bit of a mess and can
be cleaned up by unrolling the first frag/header setup from
the loop, then reworking the do-while-loop into a for-loop.  We
rename the function to a more descriptive ionic_rx_build_skb().
We also change a couple of related variable names for readability.
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

36a47c90

ionic: fold adminq clean into service routine · 8599bd4c

Shannon Nelson authored Mar 06, 2024

Since the AdminQ clean is a simple action called from only
one place, fold it back into the service routine.
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8599bd4c

ionic: use specialized desc info structs · 4dcd4575

Shannon Nelson authored Mar 06, 2024

Make desc_info structure specific to the queue type, which
allows us to cut down the Rx and AdminQ descriptor sizes by
not including all the fields needed for the Tx desriptors.

Before:
    struct ionic_desc_info {
	/* size: 464, cachelines: 8, members: 6 */

After:
    struct ionic_tx_desc_info {
	/* size: 464, cachelines: 8, members: 6 */
    struct ionic_rx_desc_info {
	/* size: 224, cachelines: 4, members: 2 */
    struct ionic_admin_desc_info {
	/* size: 8, cachelines: 1, members: 1 */
Suggested-by: Neel Patel <npatel2@amd.com>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4dcd4575

ionic: remove the cq_info to save more memory · 65e548f6

Shannon Nelson authored Mar 06, 2024

With a little simple math we don't need another struct array to
find the completion structs, so we can remove the ionic_cq_info
altogether.  This doesn't really save anything in the ionic_cq
since it gets padded out to the cacheline, but it does remove
the parallel array allocation of 8 * num_descriptors, or about
8 Kbytes per queue in a default configuration.
Suggested-by: Neel Patel <npatel2@amd.com>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

65e548f6

ionic: remove callback pointer from desc_info · ae24a8f8

Shannon Nelson authored Mar 06, 2024

By reworking the queue service routines to have their own
servicing loops we can remove the cb pointer from desc_info
to save another 8 bytes per descriptor,

This simplifies some of the queue handling indirection and makes
the code a little easier to follow, and keeps service code in
one place rather than jumping between code files.

   struct ionic_desc_info
	Before:  /* size: 472, cachelines: 8, members: 7 */
	After:   /* size: 464, cachelines: 8, members: 6 */
Suggested-by: Neel Patel <npatel2@amd.com>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ae24a8f8

ionic: move adminq-notifyq handling to main file · 05c94473

Shannon Nelson authored Mar 06, 2024

Move the AdminQ and NotifyQ queue handling to ionic_main.c with
the rest of the adminq code.
Suggested-by: Neel Patel <npatel2@amd.com>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

05c94473

ionic: drop q mapping · 90c01ede

Shannon Nelson authored Mar 06, 2024

Now that we're not using desc_info pointers mapped in every q
we can simplify and drop the unnecessary utility functions.
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

90c01ede

ionic: remove desc, sg_desc and cmb_desc from desc_info · d60984d3

Shannon Nelson authored Mar 06, 2024

Remove the struct pointers from desc_info to use less space.
Instead of pointers in every desc_info to its descriptor,
we can use the queue descriptor index to find the individual
desc, desc_info, and sgl structs in their parallel arrays.

   struct ionic_desc_info
	Before:  /* size: 496, cachelines: 8, members: 10 */
	After:   /* size: 472, cachelines: 8, members: 7 */
Suggested-by: Neel Patel <npatel2@amd.com>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d60984d3

Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue · e3eec349

David S. Miller authored Mar 08, 2024

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2024-03-06 (iavf, i40e, ixgbe)

This series contains updates to iavf, i40e, and ixgbe drivers.

Alexey Kodanev removes duplicate calls related to cloud filters on iavf
and unnecessary null checks on i40e.

Maciej adds helper functions for common code relating to updating
statistics for ixgbe.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

e3eec349

Add Jeff Kirsher to .get_maintainer.ignore · 7221fbe8

Jakub Kicinski authored Mar 06, 2024

Jeff was retired as the Intel driver maintainer in
commit 6667df91 ("MAINTAINERS: Update MAINTAINERS for
Intel ethernet drivers"), and his address bounces.
But he has signed-off a lot of patches over the years
so get_maintainer insists on CCing him.

We haven't heard from him since he left Intel, so remapping
the address via mailmap is also pointless. Add to ignored
addresses.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7221fbe8

Merge branch 'ipv6-lockless-dump-addrs' · 570c86ed

David S. Miller authored Mar 08, 2024

Eric Dumazet says:

====================
ipv6: lockless inet6_dump_addr()

This series removes RTNL locking to dump ipv6 addresses.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

570c86ed

ipv6: remove RTNL protection from inet6_dump_addr() · 155549a6

Eric Dumazet authored Mar 06, 2024

We can now remove RTNL acquisition while running
inet6_dump_addr(), inet6_dump_ifmcaddr()
and inet6_dump_ifacaddr().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

155549a6

ipv6: use xa_array iterator to implement inet6_dump_addr() · 9cc4cc32

Eric Dumazet authored Mar 06, 2024

inet6_dump_addr() can use the new xa_array iterator
for better scalability.

Make it ready for RCU-only protection.
RTNL use is removed in the following patch.

Also properly return 0 at the end of a dump to avoid
and extra recvmsg() to get NLMSG_DONE.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9cc4cc32

ipv6: make in6_dump_addrs() lockless · 46f5182d

Eric Dumazet authored Mar 06, 2024

in6_dump_addrs() is called with RCU protection.

There is no need holding idev->lock to iterate through unicast addresses.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

46f5182d

ipv6: make inet6_fill_ifaddr() lockless · f0a7da70

Eric Dumazet authored Mar 06, 2024

Make inet6_fill_ifaddr() lockless, and add approriate annotations
on ifa->tstamp, ifa->valid_lft, ifa->preferred_lft, ifa->ifa_proto
and ifa->rt_priority.

Also constify 2nd argument of inet6_fill_ifaddr(), inet6_fill_ifmcaddr()
and inet6_fill_ifacaddr().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f0a7da70

Merge tag 'ipsec-next-2024-03-06' of... · 3dbf6d67

David S. Miller authored Mar 08, 2024

Merge tag 'ipsec-next-2024-03-06' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next

Steffen Klassert says:

====================
1) Introduce forwarding of ICMP Error messages. That is specified
   in RFC 4301 but was never implemented. From Antony Antony.

2) Use KMEM_CACHE instead of kmem_cache_create in xfrm6_tunnel_init()
   and xfrm_policy_init(). From Kunwu Chan.

3) Do not allocate stats in the xfrm interface driver, this can be done
   on net core now. From Breno Leitao.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

3dbf6d67

Merge branch 'nexthop-group-stats' · 7cf497e5

David S. Miller authored Mar 08, 2024

Petr Machata says:

====================
Support for nexthop group statistics

ECMP is a fundamental component in L3 designs. However, it's fragile. Many
factors influence whether an ECMP group will operate as intended: hash
policy (i.e. the set of fields that contribute to ECMP hash calculation),
neighbor validity, hash seed (which might lead to polarization) or the type
of ECMP group used (hash-threshold or resilient).

At the same time, collection of statistics that would help an operator
determine that the group performs as desired, is difficult.

A solution that we present in this patchset is to add counters to next hop
group entries. For SW-datapath deployments, this will on its own allow
collection and evaluation of relevant statistics. For HW-datapath
deployments, we further add a way to request that HW counters be installed
for a given group, in-kernel interfaces to collect the HW statistics, and
netlink interfaces to query them.

For example:

    # ip nexthop replace id 4000 group 4001/4002 hw_stats on

    # ip -s -d nexthop show id 4000
    id 4000 group 4001/4002 scope global proto unspec offload hw_stats on used on
      stats:
        id 4001 packets 5002 packets_hw 5000
        id 4002 packets 4999 packets_hw 4999

The point of the patchset is visibility of ECMP balance, and that is
influenced by packet headers, not their payload. Correspondingly, we only
include packet counters in the statistics, not byte counters.

We also decided to model HW statistics as a nexthop group attribute, not an
arbitrary nexthop one. The latter would count any traffic going through a
given nexthop, regardless of which ECMP group it is in, or any at all. The
reason is again hat the point of the patchset is ECMP balance visibility,
not arbitrary inspection of how busy a particular nexthop is.
Implementation of individual-nexthop statistics is certainly possible, and
could well follow the general approach we are taking in this patchset.
For resilient groups, per-bucket statistics could be done in a similar
manner as well.

This patchset contains the core code. mlxsw support will be sent in a
follow-up patch set.

This patchset progresses as follows:

- Patches #1 and #2 add support for a new next-hop object attribute,
  NHA_OP_FLAGS. That is meant to carry various op-specific signaling, in
  particular whether SW- and HW-collected nexthop stats should be part of
  the get or dump response. The idea is to avoid wasting message space, and
  time for collection of HW statistics, when the values are not needed.

- Patches #3 and #4 add SW-datapath stats and corresponding UAPI.

- Patches #5, #6 and #7 add support fro HW-datapath stats and UAPI.
  Individual drivers still need to contribute the appropriate HW-specific
  support code.

v4:
- Patch #2:
    - s/nla_get_bitfield32/nla_get_u32/ in __nh_valid_dump_req().

v3:
- Patch #3:
    - Convert to u64_stats_t
- Patch #4:
    - Give a symbolic name to the set of all valid dump flags
      for the NHA_OP_FLAGS attribute.
    - Convert to u64_stats_t
- Patch #6:
    - Use a named constant for the NHA_HW_STATS_ENABLE policy.

v2:
- Patch #2:
    - Change OP_FLAGS to u32, enforce through NLA_POLICY_MASK
- Patch #3:
    - Set err on nexthop_create_group() error path
- Patch #4:
    - Use uint to encode NHA_GROUP_STATS_ENTRY_PACKETS
    - Rename jump target in nla_put_nh_group_stats() to avoid
      having to rename further in the patchset.
- Patch #7:
    - Use uint to encode NHA_GROUP_STATS_ENTRY_PACKETS_HW
    - Do not cancel outside of nesting in nla_put_nh_group_stats()
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

7cf497e5

net: nexthop: Expose nexthop group HW stats to user space · 5072ae00

Ido Schimmel authored Mar 06, 2024

Add netlink support for reading NH group hardware stats.

Stats collection is done through a new notifier,
NEXTHOP_EVENT_HW_STATS_REPORT_DELTA. Drivers that implement HW counters for
a given NH group are thereby asked to collect the stats and report back to
core by calling nh_grp_hw_stats_report_delta(). This is similar to what
netdevice L3 stats do.

Besides exposing number of packets that passed in the HW datapath, also
include information on whether any driver actually realizes the counters.
The core can tell based on whether it got any _report_delta() reports from
the drivers. This allows enabling the statistics at the group at any time,
with drivers opting into supporting them. This is also in line with what
netdevice L3 stats are doing.

So as not to waste time and space, tie the collection and reporting of HW
stats with a new op flag, NHA_OP_FLAG_DUMP_HW_STATS.
Co-developed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Kees Cook <keescook@chromium.org> # For the __counted_by bits
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

5072ae00

net: nexthop: Add ability to enable / disable hardware statistics · 746c19a5

Ido Schimmel authored Mar 06, 2024

Add netlink support for enabling collection of HW statistics on nexthop
groups.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

746c19a5

net: nexthop: Add hardware statistics notifications · 5877786f

Ido Schimmel authored Mar 06, 2024

Add hw_stats field to several notifier structures to communicate to the
drivers that HW statistics should be configured for nexthops within a given
group.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5877786f