Commits · b22800f9d3b142bf2550dd47ff738b9feedc1093 · Kirill Smelkov / linux

24 Jun, 2024 11 commits

dev: Use nested-BH locking for softnet_data.process_queue. · b22800f9

Sebastian Andrzej Siewior authored Jun 20, 2024

softnet_data::process_queue is a per-CPU variable and relies on disabled
BH for its locking. Without per-CPU locking in local_bh_disable() on
PREEMPT_RT this data structure requires explicit locking.

softnet_data::input_queue_head can be updated lockless. This is fine
because this value is only update CPU local by the local backlog_napi
thread.

Add a local_lock_t to softnet_data and use local_lock_nested_bh() for locking
of process_queue. This change adds only lockdep coverage and does not
alter the functional behaviour for !PREEMPT_RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://patch.msgid.link/20240620132727.660738-11-bigeasy@linutronix.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>

b22800f9

dev: Remove PREEMPT_RT ifdefs from backlog_lock.*(). · a8760d0d

Sebastian Andrzej Siewior authored Jun 20, 2024

The backlog_napi locking (previously RPS) relies on explicit locking if
either RPS or backlog NAPI is enabled. If both are disabled then locking
was achieved by disabling interrupts except on PREEMPT_RT. PREEMPT_RT
was excluded because the needed synchronisation was already provided
local_bh_disable().

Since the introduction of backlog NAPI and making it mandatory for
PREEMPT_RT the ifdef within backlog_lock.*() is obsolete and can be
removed.

Remove the ifdefs in backlog_lock.*().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://patch.msgid.link/20240620132727.660738-10-bigeasy@linutronix.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>

a8760d0d

net: softnet_data: Make xmit per task. · ecefbc09

Sebastian Andrzej Siewior authored Jun 20, 2024

Softirq is preemptible on PREEMPT_RT. Without a per-CPU lock in
local_bh_disable() there is no guarantee that only one device is
transmitting at a time.
With preemption and multiple senders it is possible that the per-CPU
`recursion' counter gets incremented by different threads and exceeds
XMIT_RECURSION_LIMIT leading to a false positive recursion alert.
The `more' member is subject to similar problems if set by one thread
for one driver and wrongly used by another driver within another thread.

Instead of adding a lock to protect the per-CPU variable it is simpler
to make xmit per-task. Sending and receiving skbs happens always
in thread context anyway.

Having a lock to protected the per-CPU counter would block/ serialize two
sending threads needlessly. It would also require a recursive lock to
ensure that the owner can increment the counter further.

Make the softnet_data.xmit a task_struct member on PREEMPT_RT. Add
needed wrapper.

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://patch.msgid.link/20240620132727.660738-9-bigeasy@linutronix.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>

ecefbc09

netfilter: br_netfilter: Use nested-BH locking for brnf_frag_data_storage. · c67ef53a

Sebastian Andrzej Siewior authored Jun 20, 2024

brnf_frag_data_storage is a per-CPU variable and relies on disabled BH
for its locking. Without per-CPU locking in local_bh_disable() on
PREEMPT_RT this data structure requires explicit locking.

Add a local_lock_t to the data structure and use local_lock_nested_bh()
for locking. This change adds only lockdep coverage and does not alter
the functional behaviour for !PREEMPT_RT.

Cc: Florian Westphal <fw@strlen.de>
Cc: Jozsef Kadlecsik <kadlec@netfilter.org>
Cc: Nikolay Aleksandrov <razor@blackwall.org>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Roopa Prabhu <roopa@nvidia.com>
Cc: bridge@lists.linux.dev
Cc: coreteam@netfilter.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://patch.msgid.link/20240620132727.660738-8-bigeasy@linutronix.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>

c67ef53a

net/ipv4: Use nested-BH locking for ipv4_tcp_sk. · ebad6d03

Sebastian Andrzej Siewior authored Jun 20, 2024

ipv4_tcp_sk is a per-CPU variable and relies on disabled BH for its
locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT
this data structure requires explicit locking.

Make a struct with a sock member (original ipv4_tcp_sk) and a
local_lock_t and use local_lock_nested_bh() for locking. This change
adds only lockdep coverage and does not alter the functional behaviour
for !PREEMPT_RT.

Cc: David Ahern <dsahern@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://patch.msgid.link/20240620132727.660738-7-bigeasy@linutronix.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>

ebad6d03

net/tcp_sigpool: Use nested-BH locking for sigpool_scratch. · 585aa621

Sebastian Andrzej Siewior authored Jun 20, 2024

sigpool_scratch is a per-CPU variable and relies on disabled BH for its
locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT
this data structure requires explicit locking.

Make a struct with a pad member (original sigpool_scratch) and a
local_lock_t and use local_lock_nested_bh() for locking. This change
adds only lockdep coverage and does not alter the functional behaviour
for !PREEMPT_RT.

Cc: David Ahern <dsahern@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://patch.msgid.link/20240620132727.660738-6-bigeasy@linutronix.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>

585aa621

net: Use nested-BH locking for napi_alloc_cache. · bdacf3e3

Sebastian Andrzej Siewior authored Jun 20, 2024

napi_alloc_cache is a per-CPU variable and relies on disabled BH for its
locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT
this data structure requires explicit locking.

Add a local_lock_t to the data structure and use local_lock_nested_bh()
for locking. This change adds only lockdep coverage and does not alter
the functional behaviour for !PREEMPT_RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://patch.msgid.link/20240620132727.660738-5-bigeasy@linutronix.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>

bdacf3e3

net: Use __napi_alloc_frag_align() instead of open coding it. · 43d7ca29

Sebastian Andrzej Siewior authored Jun 20, 2024

The else condition within __netdev_alloc_frag_align() is an open coded
__napi_alloc_frag_align().

Use __napi_alloc_frag_align() instead of open coding it.
Move fragsz assignment before page_frag_alloc_align() invocation because
__napi_alloc_frag_align() also contains this statement.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://patch.msgid.link/20240620132727.660738-4-bigeasy@linutronix.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>

43d7ca29

locking/local_lock: Add local nested BH locking infrastructure. · c5bcab75

Sebastian Andrzej Siewior authored Jun 20, 2024

Add local_lock_nested_bh() locking. It is based on local_lock_t and the
naming follows the preempt_disable_nested() example.

For !PREEMPT_RT + !LOCKDEP it is a per-CPU annotation for locking
assumptions based on local_bh_disable(). The macro is optimized away
during compilation.
For !PREEMPT_RT + LOCKDEP the local_lock_nested_bh() is reduced to
the usual lock-acquire plus lockdep_assert_in_softirq() - ensuring that
BH is disabled.

For PREEMPT_RT local_lock_nested_bh() acquires the specified per-CPU
lock. It does not disable CPU migration because it relies on
local_bh_disable() disabling CPU migration.
With LOCKDEP it performans the usual lockdep checks as with !PREEMPT_RT.
Due to include hell the softirq check has been moved spinlock.c.

The intention is to use this locking in places where locking of a per-CPU
variable relies on BH being disabled. Instead of treating disabled
bottom halves as a big per-CPU lock, PREEMPT_RT can use this to reduce
the locking scope to what actually needs protecting.
A side effect is that it also documents the protection scope of the
per-CPU variables.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://patch.msgid.link/20240620132727.660738-3-bigeasy@linutronix.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>

c5bcab75

locking/local_lock: Introduce guard definition for local_lock. · 07e4fd4c

Sebastian Andrzej Siewior authored Jun 20, 2024

Introduce lock guard definition for local_lock_t. There are no users
yet.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://patch.msgid.link/20240620132727.660738-2-bigeasy@linutronix.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>

07e4fd4c

MAINTAINERS: adjust file entry in FREESCALE QORIQ DPAA FMAN DRIVER · 568ebdab

Lukas Bulwahn authored Jun 24, 2024

Commit 243996d1 ("dt-bindings: net: Convert fsl-fman to yaml") splits
the previous dt text file into four yaml files. It adjusts a corresponding
file entry in MAINTAINERS from txt to yaml, but this adjustment misses
that the file was split and renamed.

Hence, ./scripts/get_maintainer.pl --self-test=patterns complains about a
broken reference.

Adjust the file entry to match the four yaml files resulting from this
commit above.
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

568ebdab

23 Jun, 2024 1 commit

Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue · 84562f99

David S. Miller authored Jun 23, 2024

Tony Nguyen says:

====================
ice: prepare representor for SF support

Michal Swiatkowski says:

This is a series to prepare port representor for supporting also
subfunctions. We need correct devlink locking and the possibility to
update parent VSI after port representor is created.

Refactor how devlink lock is taken to suite the subfunction use case.

VSI configuration needs to be done after port representor is created.
Port representor needs only allocated VSI. It doesn't need to be
configured before.

VSI needs to be reconfigured when update function is called.

The code for this patchset was split from (too big) patchset [1].

[1] https://lore.kernel.org/netdev/20240213072724.77275-1-michal.swiatkowski@linux.intel.com/
---
Originally from https://lore.kernel.org/netdev/20240605-next-2024-06-03-intel-next-batch-v2-0-39c23963fa78@intel.com/
Changes:
- delete ice_repr_get_by_vsi() from header
- rephrase commit message in moving devlink locking
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

84562f99

22 Jun, 2024 2 commits

net: xilinx: axienet: Enable multicast by default · 185d7211

Sean Anderson authored Jun 20, 2024

We support multicast addresses, so enable it by default.
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Signed-off-by: David S. Miller <davem@davemloft.net>

185d7211

Merge tag 'linux-can-next-for-6.11-20240621' of... · e9212f9d

Jakub Kicinski authored Jun 21, 2024

Merge tag 'linux-can-next-for-6.11-20240621' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
pull-request: can-next 2024-06-21

The first 2 patches are by Andy Shevchenko, one cleans up the includes
in the mcp251x driver, the other one updates the sja100 plx_pci driver
to make use of predefines PCI subvendor ID.

Mans Rullgard's patch cleans up the Kconfig help text of for the slcan
driver.

Oliver Hartkopp provides a patch to update the documentation, which
removes the ISO 15675-2 specification version where possible.

The next 2 patches are by Harini T and update the documentation of the
xilinx_can driver.

Francesco Valla provides documentation for the ISO 15765-2 protocol.

A patch by Dr. David Alan Gilbert removes an unused struct from the
mscan driver.

12 patches are by Martin Jocic. The first three add support for 3 new
devices to the kvaser_usb driver. The remaining 9 first clean up the
kvaser_pciefd driver, and then add support for MSI.

Krzysztof Kozlowski contributes 3 patches simplifies the CAN SPI
drivers by making use of spi_get_device_match_data().

The last patch is by Martin Hundebøll, which reworks the m_can driver
to not enable the CAN transceiver during probe.

* tag 'linux-can-next-for-6.11-20240621' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next: (24 commits)
  can: m_can: don't enable transceiver when probing
  can: mcp251xfd: simplify with spi_get_device_match_data()
  can: mcp251x: simplify with spi_get_device_match_data()
  can: hi311x: simplify with spi_get_device_match_data()
  can: kvaser_pciefd: Add MSI interrupts
  can: kvaser_pciefd: Move reset of DMA RX buffers to the end of the ISR
  can: kvaser_pciefd: Change name of return code variable
  can: kvaser_pciefd: Rename board_irq to pci_irq
  can: kvaser_pciefd: Add unlikely
  can: kvaser_pciefd: Add inline
  can: kvaser_pciefd: Remove unnecessary comment
  can: kvaser_pciefd: Skip redundant NULL pointer check in ISR
  can: kvaser_pciefd: Group #defines together
  can: kvaser_usb: Add support for Kvaser Mini PCIe 1xCAN
  can: kvaser_usb: Add support for Kvaser USBcan Pro 5xCAN
  can: kvaser_usb: Add support for Vining 800
  can: mscan: remove unused struct 'mscan_state'
  Documentation: networking: document ISO 15765-2
  can: xilinx_can: Document driver description to list all supported IPs
  can: isotp: remove ISO 15675-2 specification version where possible
  ...
====================

Link: https://patch.msgid.link/20240621080201.305471-1-mkl@pengutronix.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>

e9212f9d

21 Jun, 2024 26 commits

ice: update representor when VSI is ready · fff5cca3

Michal Swiatkowski authored Jun 10, 2024

In case of reset of VF VSI can be reallocated. To handle this case it
should be properly updated.

Reload representor as vsi->vsi_num can be different than the one stored
when representor was created.

Instead of only changing antispoof do whole VSI configuration for
eswitch.
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

fff5cca3

ice: move VSI configuration outside repr setup · 4d364df2

Michal Swiatkowski authored Jun 10, 2024

It is needed because subfunction port representor shouldn't configure
the source VSI during representor creation.

Move the code to separate function and call it only in case the VF port
representor is being created.
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

4d364df2

ice: move devlink locking outside the port creation · 8d2f518c

Michal Swiatkowski authored Jun 10, 2024

In case of subfunction lock will be taken for whole port creation and
removing. Do the same in VF case.
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

8d2f518c

ice: store representor ID in bridge port · 639ac8ce

Michal Swiatkowski authored Jun 10, 2024

It is used to get representor structure during cleaning.
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

639ac8ce

selftests: net: change shebang to bash in amt.sh · 32266073

Taehee Yoo authored Jun 21, 2024

amt.sh is written in bash, not sh.
So, shebang should be bash.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Acked-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

32266073

net: ethernet: rtsn: Add support for Renesas Ethernet-TSN · b0d3969d

Niklas Söderlund authored Jun 20, 2024

Add initial support for Renesas Ethernet-TSN End-station device of R-Car
V4H. The Ethernet End-station can connect to an Ethernet network using a
10 Mbps, 100 Mbps, or 1 Gbps full-duplex link via MII/GMII/RMII/RGMII.
Depending on the connected PHY.

The driver supports Rx checksum and offload and hardware timestamps.

While full power management and suspend/resume is not yet supported the
driver enables runtime PM in order to enable the module clock. While
explicit clock management using clk_enable() would suffice for the
supported SoC, the module could be reused on SoCs where the module is
part of a power domain.
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

b0d3969d

Merge branch 'qca8k-cleanup-and-port-isolation' · d7527fe9

David S. Miller authored Jun 21, 2024

Matthias Schiffer says:

====================
net: dsa: qca8k: cleanup and port isolation

A small cleanup patch, and basically the same changes that were just
accepted for mt7530 to implement port isolation.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

d7527fe9

net: dsa: qca8k: add support for bridge port isolation · 422b6402

Matthias Schiffer authored Jun 20, 2024

Remove a pair of ports from the port matrix when both ports have the
isolated flag set.
Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

422b6402

net: dsa: qca8k: factor out bridge join/leave logic · 412e1775

Matthias Schiffer authored Jun 20, 2024

Most of the logic in qca8k_port_bridge_join() and qca8k_port_bridge_leave()
is the same. Refactor to reduce duplication and prepare for reusing the
code for implementing bridge port isolation.

dsa_port_offloads_bridge_dev() is used instead of
dsa_port_offloads_bridge(), passing the bridge in as a struct netdevice *,
as we won't have a struct dsa_bridge in qca8k_port_bridge_flags().

The error handling is changed slightly in the bridge leave case,
returning early and emitting an error message when a regmap access fails.
This shouldn't matter in practice, as there isn't much we can do if
communication with the switch breaks down in the middle of reconfiguration.
Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

412e1775

net: dsa: qca8k: do not write port mask twice in bridge join/leave · e85d3e6f

Matthias Schiffer authored Jun 20, 2024

qca8k_port_bridge_join() set QCA8K_PORT_LOOKUP_CTRL() for i == port twice,
once in the loop handling all other port's masks, and finally at the end
with the accumulated port_mask.

The first time it would incorrectly set the port's own bit in the mask,
only to correct the mistake a moment later. qca8k_port_bridge_leave() had
the same issue, but here the regmap_clear_bits() was a no-op rather than
setting an unintended value.

Remove the duplicate assignment by skipping the whole loop iteration for
i == port. The unintended bit setting doesn't seem to have any negative
effects (even when not reverted right away), so the change is submitted
as a simple cleanup rather than a fix.
Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e85d3e6f

Merge branch 'net-mscc-miim-switch-reset' · 28ba5c11

David S. Miller authored Jun 21, 2024

Herve Codina says:

====================
Handle switch reset in mscc-miim

These two patches were previously sent as part of a bigger series:
  https://lore.kernel.org/lkml/20240527161450.326615-1-herve.codina@bootlin.com/

v1 and v2 iterations were handled during the v1 and v2 reviews of this
bigger series. As theses two patches are now ready to be applied, they
were extracted from the bigger series and sent alone in this current
series.

This current v3 series takes into account feedback received during the
bigger series v2 review.

Changes v2 -> v3
  - patch 1
    Drop one useless sentence.
    Add 'Reviewed-by: Andrew Lunn <andrew@lunn.ch>'
    Add 'Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>'

  - patch 2
   Add 'Reviewed-by: Andrew Lunn <andrew@lunn.ch>'

Changes v1 -> v2 (as part of the bigger series iterations)
  - Patch 1
    Improve the reset property description

  - Patch 2
    Fix a wrong reverse x-mass tree declaration
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

28ba5c11

net: mdio: mscc-miim: Handle the switch reset · 9e6d3393

Herve Codina authored Jun 20, 2024

The mscc-miim device can be impacted by the switch reset, at least when
this device is part of the LAN966x PCI device.

Handle this newly added (optional) resets property.
Signed-off-by: Herve Codina <herve.codina@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

9e6d3393

dt-bindings: net: mscc-miim: Add resets property · e5efa3ff

Herve Codina authored Jun 20, 2024

Add the (optional) resets property.
The mscc-miim device is impacted by the switch reset especially when the
mscc-miim device is used as part of the LAN966x PCI device.
Signed-off-by: Herve Codina <herve.codina@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

e5efa3ff

Merge branch 'l2tp-sk_user_data' · 4fce809e

David S. Miller authored Jun 21, 2024

James Chapman says:

====================
l2tp: don't use the tunnel socket's sk_user_data in datapath

This series refactors l2tp to not use the tunnel socket's sk_user_data
in the datapath. The main reasons for doing this are

 * to allow for simplifying internal socket cleanup code (to be done
   in a later series)
 * to support multiple L2TPv3 UDP tunnels using the same 5-tuple
   address

When handling received UDP frames, l2tp's current approach is to look
up a session in a per-tunnel list. l2tp uses the tunnel socket's
sk_user_data to derive the tunnel context from the receiving socket.

But this results in the socket and tunnel lifetimes being very tightly
coupled and the tunnel/socket cleanup paths being complicated. The
latter has historically been a source of l2tp bugs and makes the code
more difficult to maintain. Also, if sockets are aliased, we can't
trust that the socket's sk_user_data references the right tunnel
anyway. Hence the desire to not use sk_user_data in the datapath.

The new approach is to lookup sessions in a per-net session list
without first deriving the tunnel:

 * For L2TPv2, the l2tp header has separate tunnel ID and session ID
   fields which can be trivially combined to make a unique 32-bit key
   for per-net session lookup.

 * For L2TPv3, there is no tunnel ID in the packet header, only a
   session ID, which should be unique over all tunnels so can be used
   as a key for per-net session lookup. However, this only works when
   the L2TPv3 session ID really is unique over all tunnels. At least
   one L2TPv3 application is known to use the same session ID in
   different L2TPv3 UDP tunnels, relying on UDP address/ports to
   distinguish them. This worked previously because sessions in UDP
   tunnels were kept in a per-tunnel list. To retain support for this,
   L2TPv3 session ID collisions are managed using a separate per-net
   session hlist, keyed by ID and sk. When looking up a session by ID,
   if there's more than one match, sk is used to find the right one.

L2TPv3 sessions in IP-encap tunnels are already looked up by session
ID in a per-net list. This work has UDP sessions also use the per-net
session list, while allowing for session ID collisions. The existing
per-tunnel hlist becomes a plain list since it is used only in
management and cleanup paths to walk a list of sessions in a given
tunnel.

For better performance, the per-net session lists use IDR. Separate
IDRs are used for L2TPv2 and L2TPv3 sessions to avoid potential key
collisions.

These changes pass l2tp regression tests and improve data forwarding
performance by about 10% in some of my test setups.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

4fce809e

l2tp: replace hlist with simple list for per-tunnel session list · d18d3f0a

James Chapman authored Jun 20, 2024

The per-tunnel session list is no longer used by the
datapath. However, we still need a list of sessions in the tunnel for
l2tp_session_get_nth, which is used by management code. (An
alternative might be to walk each session IDR list, matching only
sessions of a given tunnel.)

Replace the per-tunnel hlist with a per-tunnel list. In functions
which walk a list of sessions of a tunnel, walk this list instead.
Signed-off-by: James Chapman <jchapman@katalix.com>
Reviewed-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d18d3f0a

l2tp: drop the now unused l2tp_tunnel_get_session · 8c6245af

James Chapman authored Jun 20, 2024

All users of l2tp_tunnel_get_session are now gone so it can be
removed.
Signed-off-by: James Chapman <jchapman@katalix.com>
Reviewed-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8c6245af

l2tp: use IDR for all session lookups · 5f77c18e

James Chapman authored Jun 20, 2024

Add generic session getter which uses IDR. Replace all users of
l2tp_tunnel_get_session which uses the per-tunnel session list to use
the generic getter.
Signed-off-by: James Chapman <jchapman@katalix.com>
Reviewed-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5f77c18e

l2tp: don't use sk_user_data in l2tp_udp_encap_err_recv · c37e0138

James Chapman authored Jun 20, 2024

If UDP sockets are aliased, sk might be the wrong socket. There's no
benefit to using sk_user_data to do some checks on the associated
tunnel context. Just report the error anyway, like udp core does.
Signed-off-by: James Chapman <jchapman@katalix.com>
Reviewed-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c37e0138

l2tp: refactor udp recv to lookup to not use sk_user_data · ff6a2ac2

James Chapman authored Jun 20, 2024

Modify UDP decap to not use the tunnel pointer which comes from the
sock's sk_user_data when parsing the L2TP header. By looking up the
destination session using only the packet contents we avoid potential
UDP 5-tuple aliasing issues which arise from depending on the socket
that received the packet.

Drop the useless error messages on short packet or on failing to find
a session since the tunnel pointer might point to a different tunnel
if multiple sockets use the same 5-tuple.

Short packets (those not big enough to contain an L2TP header) are no
longer counted in the tunnel's invalid counter because we can't derive
the tunnel until we parse the l2tp header to lookup the session.

l2tp_udp_encap_recv was a small wrapper around l2tp_udp_recv_core which
used sk_user_data to derive a tunnel pointer in an RCU-safe way. But
we no longer need the tunnel pointer, so remove that code and combine
the two functions.
Signed-off-by: James Chapman <jchapman@katalix.com>
Reviewed-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ff6a2ac2

l2tp: store l2tpv2 sessions in per-net IDR · 2a3339f6

James Chapman authored Jun 20, 2024

L2TPv2 sessions are currently kept in a per-tunnel hashlist, keyed by
16-bit session_id. When handling received L2TPv2 packets, we need to
first derive the tunnel using the 16-bit tunnel_id or sock, then
lookup the session in a per-tunnel hlist using the 16-bit session_id.

We want to avoid using sk_user_data in the datapath and double lookups
on every packet. So instead, use a per-net IDR to hold L2TPv2
sessions, keyed by a 32-bit value derived from the 16-bit tunnel_id
and session_id. This will allow the L2TPv2 UDP receive datapath to
lookup a session with a single lookup without deriving the tunnel
first.

L2TPv2 sessions are held in their own IDR to avoid potential
key collisions with L2TPv3 sessions.
Signed-off-by: James Chapman <jchapman@katalix.com>
Reviewed-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2a3339f6

l2tp: store l2tpv3 sessions in per-net IDR · aa5e17e1

James Chapman authored Jun 20, 2024

L2TPv3 sessions are currently held in one of two fixed-size hash
lists: either a per-net hashlist (IP-encap), or a per-tunnel hashlist
(UDP-encap), keyed by the L2TPv3 32-bit session_id.

In order to lookup L2TPv3 sessions in UDP-encap tunnels efficiently
without finding the tunnel first via sk_user_data, UDP sessions are
now kept in a per-net session list, keyed by session ID. Convert the
existing per-net hashlist to use an IDR for better performance when
there are many sessions and have L2TPv3 UDP sessions use the same IDR.

Although the L2TPv3 RFC states that the session ID alone identifies
the session, our implementation has allowed the same session ID to be
used in different L2TP UDP tunnels. To retain support for this, a new
per-net session hashtable is used, keyed by the sock and session
ID. If on creating a new session, a session already exists with that
ID in the IDR, the colliding sessions are added to the new hashtable
and the existing IDR entry is flagged. When looking up sessions, the
approach is to first check the IDR and if no unflagged match is found,
check the new hashtable. The sock is made available to session getters
where session ID collisions are to be considered. In this way, the new
hashtable is used only for session ID collisions so can be kept small.

For managing session removal, we need a list of colliding sessions
matching a given ID in order to update or remove the IDR entry of the
ID. This is necessary to detect session ID collisions when future
sessions are created. The list head is allocated on first collision
of a given ID and refcounted.
Signed-off-by: James Chapman <jchapman@katalix.com>
Reviewed-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

aa5e17e1

l2tp: remove unused list_head member in l2tp_tunnel · a744e2d0

James Chapman authored Jun 20, 2024

Remove an unused variable in struct l2tp_tunnel which was left behind
by commit c4d48a58 ("l2tp: convert l2tp_tunnel_list to idr").
Signed-off-by: James Chapman <jchapman@katalix.com>
Reviewed-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a744e2d0

octeontx2-pf: Add ucast filter count configurability via devlink. · 39c46918

Sai Krishna authored Jun 20, 2024

The existing method of reserving unicast filter count leads to wasted
MCAM entries if the functionality is not used or fewer entries are used.
Furthermore, the amount of MCAM entries differs amongst Octeon SoCs.
We implemented a means to adjust the UC filter count via devlink,
allowing for better use of MCAM entries across Netdev apps.

commands:

To get the current unicast filter count
 # devlink dev param show pci/0002:02:00.0 name unicast_filter_count

To change/set the unicast filter count
 # devlink dev param  set  pci/0002:02:00.0  name unicast_filter_count
 value 5 cmode runtime
Signed-off-by: Sai Krishna <saikrishnag@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

39c46918

docs: net: document guidance of implementing the SR-IOV NDOs · 4558645d

Jakub Kicinski authored Jun 19, 2024

New drivers were prevented from adding ndo_set_vf_* callbacks
over the last few years. This was expected to result in broader
switchdev adoption, but seems to have had little effect.

Based on recent netdev meeting there is broad support for allowing
adding those ops.

There is a problem with the current API supporting a limited number
of VFs (100+, which is less than some modern HW supports).
We can try to solve it by adding similar functionality on devlink
ports, but that'd be another API variation to maintain.
So a netlink attribute reshuffling is a more likely outcome.

Document the guidance, make it clear that the API is frozen.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

4558645d

net: dsa: ksz_common: Allow only up to two HSR HW offloaded ports for KSZ9477 · dcec8d29

Lukasz Majewski authored Jun 19, 2024

The KSZ9477 allows HSR in-HW offloading for any of two selected ports.
This patch adds check if one tries to use more than two ports with
HSR offloading enabled.

The problem is with RedBox configuration (HSR-SAN) - when configuring:
ip link add name hsr0 type hsr slave1 lan1 slave2 lan2 interlink lan3 \
	supervision 45 version 1

The lan1 (port0) and lan2 (port1) are correctly configured as ports, which
can use HSR offloading on ksz9477.

However, when we do already have two bits set in hsr_ports, we need to
return (-ENOTSUPP), so the interlink port (lan3) would be used with
SW based HSR RedBox support.

Otherwise, I do see some strange network behavior, as some HSR frames are
visible on non-HSR network and vice versa.

This causes the switch connected to interlink port (lan3) to drop frames
and no communication is possible.

Moreover, conceptually - the interlink (i.e. HSR-SAN port - lan3/port2)
shall be only supported in software as it is also possible to use ksz9477
with only SW based HSR (i.e. port0/1 -> hsr0 with offloading, port2 ->
HSR-SAN/interlink, port4/5 -> hsr1 with SW based HSR).

Fixes: 5055cccf ("net: hsr: Provide RedBox support (HSR-SAN)")
Signed-off-by: Lukasz Majewski <lukma@denx.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dcec8d29

net: fec: Fix FEC_ECR_EN1588 being cleared on link-down · c32fe198

Csókás, Bence authored Jun 19, 2024

FEC_ECR_EN1588 bit gets cleared after MAC reset in `fec_stop()`, which
makes all 1588 functionality shut down, and all the extended registers
disappear, on link-down, making the adapter fall back to compatibility
"dumb mode". However, some functionality needs to be retained (e.g. PPS)
even without link.

Fixes: 6605b730 ("FEC: Add time stamping code and a PTP hardware clock")
Cc: Richard Cochran <richardcochran@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/netdev/5fa9fadc-a89d-467a-aae9-c65469ff5fe1@lunn.ch/Signed-off-by: Csókás, Bence <csokas.bence@prolan.hu>
Reviewed-by: Wei Fang <wei.fang@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c32fe198