Commits · 2c955856da4faec3a36df1e85b3ba3dfe230d6fd · Kirill Smelkov / linux

15 Feb, 2022 22 commits

net: dm9051: Fix spelling mistake "eror" -> "error" · 2c955856

Colin Ian King authored Feb 15, 2022

There are spelling mistakes in debug messages. Fix them.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2c955856

dpaa2-eth: Simplify bool conversion · 99cd6a64

Yang Li authored Feb 15, 2022

Fix the following coccicheck warnings:
./drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c:1199:42-47: WARNING:
conversion to bool not needed here
./drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c:1218:54-59: WARNING:
conversion to bool not needed here
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

99cd6a64

net: bridge: vlan: check for errors from __vlan_del in __vlan_flush · 5454f5c2

Vladimir Oltean authored Feb 15, 2022

If the following call path returns an error from switchdev:

nbp_vlan_flush
-> __vlan_del
   -> __vlan_vid_del
      -> br_switchdev_port_vlan_del
-> __vlan_group_free
   -> WARN_ON(!list_empty(&vg->vlan_list));

then the deletion of the net_bridge_vlan is silently halted, which will
trigger the WARN_ON from __vlan_group_free().

The WARN_ON is rather unhelpful, because nothing about the source of the
error is printed. Add a print to catch errors from __vlan_del.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5454f5c2

net: hso: Use GFP_KERNEL instead of GFP_ATOMIC when possible · 25ce79db

Christophe JAILLET authored Feb 14, 2022

hso_create_device() is only called from function that already use
GFP_KERNEL. And all the callers are called from the probe function.

So there is no need here to explicitly require a GFP_ATOMIC when
allocating memory.

Use GFP_KERNEL instead.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

25ce79db

virtio_net: Fix code indent error · 4f50ef15

Michael Catanzaro authored Feb 13, 2022

This patch fixes the checkpatch.pl warning:

ERROR: code indent should use tabs where possible #3453: FILE: drivers/net/virtio_net.c:3453: ret = register_virtio_driver(&virtio_net_driver);$

Uneccessary newline was also removed making line 3453 now 3452.
Signed-off-by: Michael Catanzaro <mcatanzaro.kernel@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4f50ef15

Merge tag 'mlx5-updates-2022-02-14' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 9b3e446c

David S. Miller authored Feb 15, 2022

Saeed Mahameed says:

====================
mlx5-updates-2022-02-14

mlx5 TX routines improvements

1) From Aya and Tariq, first 3 patches, Use the Max size of the TX descriptor
as advertised by the device and not the fixed value of 16 that the driver
always assumed, this is not a bug fix as all existing devices have Max value
larger than 16, but the series is necessary for future proofing the driver.

2) TX Synchronization improvements from Maxim, last 12 patches

Maxim Mikityanskiy Says:
=======================
mlx5e: Synchronize ndo_select_queue with configuration changes

The kernel can call ndo_select_queue at any time, and there is no direct
way to block it. The implementation of ndo_select_queue in mlx5e expects
the parameters to be consistent and may crash (invalid pointer, division
by zero) if they aren't.

There were attempts to partially fix some of the most frequent crashes,
see commit 846d6da1 ("net/mlx5e: Fix division by 0 in
mlx5e_select_queue") and commit 84c8a874 ("net/mlx5e: Fix division
by 0 in mlx5e_select_queue for representors"). However, they don't
address the issue completely.

This series introduces the proper synchronization mechanism between
mlx5e configuration and TX data path:

1. txq2sq updates are synchronized properly with ndo_start_xmit
   (mlx5e_xmit). The TX queue is stopped when it configuration is being
   updated, and memory barriers ensure the changes are visible before
   restarting.

2. The set of parameters needed for mlx5e_select_queue is reduced, and
   synchronization using RCU is implemented. This way, changes are
   atomic, and the state in mlx5e_select_queue is always consistent.

3. A few optimizations are applied to the new implementation of
   mlx5e_select_queue.

=======================

====================
Signed-off-by: David S. Miller <davem@davemloft.net>

9b3e446c

net/mlx5e: Optimize the common case condition in mlx5e_select_queue · 71753b8e

Maxim Mikityanskiy authored Jan 25, 2022

Check all booleans for special queues at once, when deciding whether to
go to the fast path in mlx5e_select_queue. Pack them into bitfields to
have some room for extensibility.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

71753b8e

net/mlx5e: Optimize modulo in mlx5e_select_queue · 3a9e5fff

Maxim Mikityanskiy authored Jan 25, 2022

To improve the performance of the modulo operation (%), it's replaced by
a subtracting the divisor in a loop. The modulo is used to fix up an
out-of-bounds value that might be returned by netdev_pick_tx or to
convert the queue number to the channel number when num_tcs > 1. Both
situations are unlikely, because XPS is configured not to pick higher
queues (qid >= num_channels) by default, so under normal circumstances
the flow won't go inside the loop, and it will be faster than %.

num_tcs == 8 adds at most 7 iterations to the loop. PTP adds at most 1
iteration to the loop. HTB would add at most 256 iterations (when
num_channels == 1), so there is an additional boundary check in the HTB
flow, which falls back to % if more than 7 iterations are expected.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

3a9e5fff

net/mlx5e: Optimize mlx5e_select_queue · 3c87aedd

Maxim Mikityanskiy authored Jan 25, 2022

This commit optimizes mlx5e_select_queue for HTB and PTP cases by
short-cutting some checks, without sacrificing performance of the common
non-HTB non-PTP flow.

1. The HTB flow uses the fact that num_tcs == 1 to drop these checks
(it's not possible to attach both mqprio and htb as the root qdisc).
It's also enough to calculate `txq_ix % num_channels` only once, instead
of twice.

2. The PTP flow drops the check for HTB and the second calculation of
`txq_ix % num_channels`.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

3c87aedd

net/mlx5e: Use READ_ONCE/WRITE_ONCE for DCBX trust state · ed5f9cf0

Maxim Mikityanskiy authored Jan 25, 2022

trust_state can be written while mlx5e_select_queue() is reading it. To
avoid inconsistencies, use READ_ONCE and WRITE_ONCE for access and
updates, and touch the variable only once per operation.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

ed5f9cf0

net/mlx5e: Move repeating code that gets TC prio into a function · 62f7991f

Maxim Mikityanskiy authored Jan 25, 2022

Both mlx5e_select_queue and mlx5e_select_ptpsq contain the same logic to
get user priority of a packet, according to the current trust state
settings. This commit moves this repeating code to its own function.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

62f7991f

net/mlx5e: Use select queue parameters to sync with control flow · 3ab45777

Maxim Mikityanskiy authored Jan 25, 2022

Start using the select queue parameters introduced in the previous
commit to have proper synchronization with changing the configuration
(such as number of channels and queues). It ensures that the state that
mlx5e_select_queue() sees is always consistent and stays the same while
the function is running. Also it allows mlx5e_select_queue to stop using
data structures that weren't synchronized properly: txq2sq,
channel_tc2realtxq, port_ptp_tc2realtxq. The last two are removed
completely, as they were used only in mlx5e_select_queue.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

3ab45777

net/mlx5e: Move mlx5e_select_queue to en/selq.c · 6b23f6ab

Maxim Mikityanskiy authored Jan 25, 2022

This commit moves mlx5e_select_queue and all stuff related to
ndo_select_queue to en/selq.c to put all stuff working with selq into a
separate file.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

6b23f6ab

net/mlx5e: Introduce select queue parameters · 8bf30be7

Maxim Mikityanskiy authored Jan 25, 2022

ndo_select_queue can be called at any time, and there is no way to stop
the kernel from calling it to synchronize with configuration changes
(real_num_tx_queues, num_tc). This commit introduces an internal way in
mlx5e to sync mlx5e_select_queue() with these changes. The configuration
needed by this function is stored in a struct mlx5e_selq_params, which
is modified and accessed in an atomic way using RCU methods. The whole
ndo_select_queue is called under an RCU lock, providing the necessary
guarantees.

The parameters stored in the new struct mlx5e_selq_params should only be
used from inside mlx5e_select_queue. It's the minimal set of parameters
needed for mlx5e_select_queue to do its job efficiently, derived from
parameters stored elsewhere. That means that when the configuration
change, mlx5e_selq_params may need to be updated. In such cases, the
mlx5e_selq_prepare/mlx5e_selq_apply API should be used.

struct mlx5e_selq contains two slots for the params: active and standby.
mlx5e_selq_prepare updates the standby slot, and mlx5e_selq_apply swaps
the slots in a safe atomic way using the RCU API. It integrates well
with the open/activate stages of the configuration change flow.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

8bf30be7

net/mlx5e: Sync txq2sq updates with mlx5e_xmit for HTB queues · 17c84cb4

Maxim Mikityanskiy authored Jan 25, 2022

This commit makes necessary changes to guarantee that txq2sq remains
stable while mlx5e_xmit is running. Proper synchronization is added for
HTB TX queues.

All updates to txq2sq are performed while the corresponding queue is
disabled (i.e. mlx5e_xmit doesn't run on that queue). smp_wmb after each
change guarantees that mlx5e_xmit can see the updated value after the
queue is enabled. Comments explaining this mechanism are added to
mlx5e_xmit.

When an HTB SQ can be deleted (after deleting an HTB node), synchronize
with RCU to wait for mlx5e_select_queue to finish and stop selecting
that queue, before we re-enable it to avoid TX timeout watchdog alarms.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

17c84cb4

net/mlx5e: Use a barrier after updating txq2sq · 6ce204ea

Maxim Mikityanskiy authored Jan 25, 2022

mlx5e_build_txq_maps updates txq2sq while TX queues are stopped. Add a
barrier to ensure that these changes are visible before the queues are
started and mlx5e_xmit reads from txq2sq.

This commit handles regular TX queues. Synchronization between HTB TX
queues and mlx5e_xmit is handled in the following commit.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

6ce204ea

net/mlx5e: Disable TX queues before registering the netdev · d08c6e2a

Maxim Mikityanskiy authored Jan 25, 2022

Normally, the queues are disabled when the channels are deactivated, and
enabled when the channels are activated. However, on register, the
channels are not active, but the queues are enabled by default. This
change fixes it, preventing mlx5e_xmit from running when the channels
are deactivated in the beginning.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

d08c6e2a

net/mlx5e: Cleanup of start/stop all queues · befa4177

Maxim Mikityanskiy authored Jan 25, 2022

mlx5e_activate_priv_channels() and mlx5e_deactivate_priv_channels()
start and stop all netdev TX queues. This commit removes the unneeded
call to netif_tx_stop_all_queues and adds explanatory comments why these
operations are needed.

netif_tx_disable() does the same thing that netif_tx_stop_all_queues(),
but taking the TX lock, thus guaranteeing that ndo_start_xmit is not
running after return. That means that the netif_tx_stop_all_queues()
call is not really necessary.

The comments are improved: the TX watchdog timeout explanation is moved
to the start stage where it really belongs (it used to be in both
places, but was lost during some old refactoring) and rephrased in more
details; the explanation for stopping all TX queues is added.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

befa4177

net/mlx5e: Use FW limitation for max MPW WQEBBs · 76c31e5f

Aya Levin authored May 10, 2021

Calculate maximal count of MPW WQEBBs on SQ's creation and store it
there. Remove MLX5E_TX_MPW_MAX_NUM_DS and MLX5E_TX_MPW_MAX_WQEBBS.
Update mlx5e_tx_mpwqe_is_full() and mlx5e_xdp_mpqwe_is_full() .
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

76c31e5f

net/mlx5e: Read max WQEBBs on the SQ from firmware · c27bd171

Aya Levin authored Jan 17, 2022

Prior to this patch the maximal value for max WQEBBs (WQE Basic Blocks,
where WQE is a Work Queue Element) on the TX side was assumed to be 16
(fixed value). All firmware versions till today comply to this. In order
to be more flexible and resilient, read from FW the corresponding:
max_wqe_sz_sq. This value describes the maximum WQE size given in bytes,
thus max WQEBBs is given by the division in WQEBB's byte size. The
driver uses the top between 16 and the division result. This ensures
synchronization between driver and firmware and avoids unexpected
behavior. Store this value on the different SQs (Send Queues) for easy
access.
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

c27bd171

net/mlx5e: Remove unused tstamp SQ field · 9536923d

Tariq Toukan authored May 20, 2021

Remove tstamp pointer in mlx5e_txqsq as it's no longer used after
commit 7c39afb3 ("net/mlx5: PTP code migration to driver core section").
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

9536923d

net: dsa: mv88e6xxx: Fix validation of built-in PHYs on 6095/6097 · d0b78ab1

Tobias Waldekranz authored Feb 13, 2022

These chips have 8 built-in FE PHYs and 3 SERDES interfaces that can
run at 1G. With the blamed commit, the built-in PHYs could no longer
be connected to, using an MII PHY interface mode.

Create a separate .phylink_get_caps callback for these chips, which
takes the FE/GE split into consideration.

Fixes: 2ee84cfe ("net: dsa: mv88e6xxx: convert to phylink_generic_validate()")
Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/20220213185154.3262207-1-tobias@waldekranz.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

d0b78ab1

14 Feb, 2022 18 commits

selftests: net: cmsg_sender: Fix spelling mistake "MONOTINIC" -> "MONOTONIC" · 12d8c111

Colin Ian King authored Feb 14, 2022

There is a spelling mistake in an error message. Fix it.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

12d8c111

net: prestera: acl: add multi-chain support offload · fa5d824c

Volodymyr Mytnyk authored Feb 14, 2022

Add support of rule offloading added to the non-zero index chain,
which was previously forbidden. Also, goto action is offloaded
allowing to jump for processing of desired chain.

Note that only implicit chain 0 is bound to the device port(s) for
processing. The rest of chains have to be jumped by actions.
Signed-off-by: Volodymyr Mytnyk <vmytnyk@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fa5d824c

Merge branch 'wwan-debugfs' · e81f1e0d

David S. Miller authored Feb 14, 2022

M Chetan Kumar says:

====================
net: wwan: debugfs dev reference not dropped

This patch series contains WWAN subsystem & IOSM Driver changes to
drop dev reference obtained as part of wwan debugfs dir entry retrieval.

PATCH1: A new debugfs interface is introduced in wwan subsystem so
that wwan driver can drop the obtained dev reference post debugfs use.

PATCH2: IOSM Driver uses new debugfs interface to drop dev reference.

Please refer to commit messages for details.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

e81f1e0d

net: wwan: iosm: drop debugfs dev reference · 163f69ae

M Chetan Kumar authored Feb 14, 2022

Post debugfs use call wwan_put_debugfs_dir()to drop
debugfs dev reference.
Signed-off-by: M Chetan Kumar <m.chetan.kumar@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

163f69ae

net: wwan: debugfs obtained dev reference not dropped · 76f05d88

M Chetan Kumar authored Feb 14, 2022

WWAN driver call's wwan_get_debugfs_dir() to obtain
WWAN debugfs dir entry. As part of this procedure it
returns a reference to a found device.

Since there is no debugfs interface available at WWAN
subsystem, it is not possible to drop dev reference post
debugfs use. This leads to side effects like post wwan
driver load and reload the wwan instance gets increment
from wwanX to wwanX+1.

A new debugfs interface is added in wwan subsystem so that
wwan driver can drop the obtained dev reference post debugfs
use.

void wwan_put_debugfs_dir(struct dentry *dir)
Signed-off-by: M Chetan Kumar <m.chetan.kumar@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

76f05d88

Merge branch 'dsa-realtek-next' · 1e997d04

David S. Miller authored Feb 14, 2022

Luiz Angelo Daros de Luca says:

====================
net: dsa: realtek: realtek-mdio: reset before setup

This patch series cleans the realtek-smi reset code and copy that to the
realtek-mdio.

v1-v2)
- do not run reset code block if GPIO is missing. It was printing "RESET
  deasserted" even when there is no GPIO configured.
- reset switch after dsa_unregister_switch()
- demote reset messages to debug

v2-v3)
- do not assert the reset on gpiod_get. Do it explicitly aferwards.
- split the commit into two (one for each module)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

1e997d04

net: dsa: realtek: realtek-mdio: reset before setup · 05f7b042

Luiz Angelo Daros de Luca authored Feb 13, 2022

Some devices, like the switch in Banana Pi BPI R64 only starts to answer
after a HW reset. It is the same reset code from realtek-smi.
Reported-by: Frank Wunderlich <frank-w@public-files.de>
Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
Tested-by: Frank Wunderlich <frank-w@public-files.de>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Acked-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

05f7b042

net: dsa: realtek: realtek-smi: clean-up reset · 9a236b54

Luiz Angelo Daros de Luca authored Feb 13, 2022

When reset GPIO was missing, the driver was still printing an info
message and still trying to assert the reset. Although gpiod_set_value()
will silently ignore calls with NULL gpio_desc, it is better to make it
clear the driver might allow gpio_desc to be NULL.

The initial value for the reset pin was changed to GPIOD_OUT_LOW,
followed by a gpiod_set_value() asserting the reset. This way, it will
be easier to spot if and where the reset really happens.

A new "asserted RESET" message was added just after the reset is
asserted, similar to the existing "deasserted RESET" message. Both
messages were demoted to dbg. The code comment is not needed anymore.
Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
Acked-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9a236b54

ipv6: blackhole_netdev needs snmp6 counters · dd263a8c

Ido Schimmel authored Feb 13, 2022

Whenever rt6_uncached_list_flush_dev() swaps rt->rt6_idev
to the blackhole device, parts of IPv6 stack might still need
to increment one SNMP counter.

Root cause, patch from Ido, changelog from Eric :)

This bug suggests that we need to audit rt->rt6_idev usages
and make sure they are properly using RCU protection.

Fixes: e5f80fcf ("ipv6: give an IPv6 dev to blackhole_netdev")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dd263a8c

net: dsa: realtek: rename macro to match filename · 7db45f8d

Luiz Angelo Daros de Luca authored Feb 11, 2022

The macro was missed while renaming realtek-smi.h to realtek.h.

Fixes: f5f11907 (net: dsa: realtek: rename realtek_smi to)
Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Acked-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7db45f8d

Merge branch 'netdev-RT' · da54d75b

David S. Miller authored Feb 14, 2022

Sebastian Andrzej Siewior says:

====================
net: dev: PREEMPT_RT fixups.

this series removes or replaces preempt_disable() and local_irq_save()
sections which are problematic on PREEMPT_RT.
Patch 2 makes netif_rx() work from any context after I found suggestions
for it in an old thread. Should that work, then the context-specific
variants could be removed.

v2…v3:
   - #2
     - Export __netif_rx() so it can be used by everyone.
     - Add a lockdep assert to check for interrupt context.
     - Update the kernel doc and mention that the skb is posted to
       backlog NAPI.
     - Use __netif_rx() also in drivers/net/*.c.
     - Added Toke''s review tag and kept Eric's desptite the changes
       made.

v1…v2:
  - #1 and #2
    - merge patch 1 und 2 from the series (as per Toke).
    - updated patch description and corrected the first commit number (as
      per Eric).
   - #2
     - Provide netif_rx() as in v1 and additionally __netif_rx() without
       local_bh disable()+enable() for the loopback driver. __netif_rx() is
       not exported (loopback is built-in only) so it won't be used
       drivers. If this doesn't work then we can still export/ define a
       wrapper as Eric suggested.
     - Added a comment that netif_rx() considered legacy.
   - #3
     - Moved ____napi_schedule() into rps_ipi_queued() and
       renamed it napi_schedule_rps().
   https://lore.kernel.org/all/20220204201259.1095226-1-bigeasy@linutronix.de/

v1:
   https://lore.kernel.org/all/20220202122848.647635-1-bigeasy@linutronix.de
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

da54d75b

net: dev: Make rps_lock() disable interrupts. · e722db8d

Sebastian Andrzej Siewior authored Feb 12, 2022

Disabling interrupts and in the RPS case locking input_pkt_queue is
split into local_irq_disable() and optional spin_lock().

This breaks on PREEMPT_RT because the spinlock_t typed lock can not be
acquired with disabled interrupts.
The sections in which the lock is acquired is usually short in a sense that it
is not causing long und unbounded latiencies. One exception is the
skb_flow_limit() invocation which may invoke a BPF program (and may
require sleeping locks).

By moving local_irq_disable() + spin_lock() into rps_lock(), we can keep
interrupts disabled on !PREEMPT_RT and enabled on PREEMPT_RT kernels.
Without RPS on a PREEMPT_RT kernel, the needed synchronisation happens
as part of local_bh_disable() on the local CPU.
____napi_schedule() is only invoked if sd is from the local CPU. Replace
it with __napi_schedule_irqoff() which already disables interrupts on
PREEMPT_RT as needed. Move this call to rps_ipi_queued() and rename the
function to napi_schedule_rps as suggested by Jakub.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

e722db8d

net: dev: Makes sure netif_rx() can be invoked in any context. · baebdf48

Sebastian Andrzej Siewior authored Feb 12, 2022

Dave suggested a while ago (eleven years by now) "Let's make netif_rx()
work in all contexts and get rid of netif_rx_ni()". Eric agreed and
pointed out that modern devices should use netif_receive_skb() to avoid
the overhead.
In the meantime someone added another variant, netif_rx_any_context(),
which behaves as suggested.

netif_rx() must be invoked with disabled bottom halves to ensure that
pending softirqs, which were raised within the function, are handled.
netif_rx_ni() can be invoked only from process context (bottom halves
must be enabled) because the function handles pending softirqs without
checking if bottom halves were disabled or not.
netif_rx_any_context() invokes on the former functions by checking
in_interrupts().

netif_rx() could be taught to handle both cases (disabled and enabled
bottom halves) by simply disabling bottom halves while invoking
netif_rx_internal(). The local_bh_enable() invocation will then invoke
pending softirqs only if the BH-disable counter drops to zero.

Eric is concerned about the overhead of BH-disable+enable especially in
regard to the loopback driver. As critical as this driver is, it will
receive a shortcut to avoid the additional overhead which is not needed.

Add a local_bh_disable() section in netif_rx() to ensure softirqs are
handled if needed.
Provide __netif_rx() which does not disable BH and has a lockdep assert
to ensure that interrupts are disabled. Use this shortcut in the
loopback driver and in drivers/net/*.c.
Make netif_rx_ni() and netif_rx_any_context() invoke netif_rx() so they
can be removed once they are no more users left.

Link: https://lkml.kernel.org/r/20100415.020246.218622820.davem@davemloft.netSigned-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

baebdf48

net: dev: Remove preempt_disable() and get_cpu() in netif_rx_internal(). · f234ae29

Sebastian Andrzej Siewior authored Feb 12, 2022

The preempt_disable() () section was introduced in commit
    cece1945 ("net: disable preemption before call smp_processor_id()")

and adds it in case this function is invoked from preemtible context and
because get_cpu() later on as been added.

The get_cpu() usage was added in commit
    b0e28f1e ("net: netif_rx() must disable preemption")

because ip_dev_loopback_xmit() invoked netif_rx() with enabled preemption
causing a warning in smp_processor_id(). The function netif_rx() should
only be invoked from an interrupt context which implies disabled
preemption. The commit
   e30b38c2 ("ip: Fix ip_dev_loopback_xmit()")

was addressing this and replaced netif_rx() with in netif_rx_ni() in
ip_dev_loopback_xmit().

Based on the discussion on the list, the former patch (b0e28f1e)
should not have been applied only the latter (e30b38c2).

Remove get_cpu() and preempt_disable() since the function is supposed to
be invoked from context with stable per-CPU pointers. Bottom halves have
to be disabled at this point because the function may raise softirqs
which need to be processed.

Link: https://lkml.kernel.org/r/20100415.013347.98375530.davem@davemloft.netSigned-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f234ae29

ice: Simplify tracking status of RDMA support · 88f62aea

Dave Ertman authored Feb 11, 2022

The status of support for RDMA is currently being tracked with two
separate status flags. This is unnecessary with the current state of
the driver.

Simplify status tracking down to a single flag.

Rename the helper function to denote the RDMA specific status and
universally use the helper function to test the status bit.
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Leszek Kaliszczuk <leszek.kaliszczuk@intel.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

88f62aea

Merge branch 'ocelot-stats' · d4e7592b

David S. Miller authored Feb 14, 2022

Colin Foster says:

====================
use bulk reads for ocelot statistics

Ocelot loops over memory regions to gather stats on different ports.
These regions are mostly continuous, and are ordered. This patch set
uses that information to break the stats reads into regions that can get
read in bulk.

The motiviation is for general cleanup, but also for SPI. Performing two
back-to-back reads on a SPI bus require toggling the CS line, holding,
re-toggling the CS line, sending 3 address bytes, sending N padding
bytes, then actually performing the read. Bulk reads could reduce almost
all of that overhead, but require that the reads are performed via
regmap_bulk_read.

Verified with eth0 hooked up to the CPU port:
NIC statistics:
     Good Rx Frames: 905
     Rx Octets: 78848
     Good Tx Frames: 691
     Tx Octets: 52516
     Rx + Tx 65-127 Octet Frames: 1574
     Rx + Tx 128-255 Octet Frames: 22
     Net Octets: 131364
     Rx DMA chan 0: head_enqueue: 1
     Rx DMA chan 0: tail_enqueue: 1032
     Rx DMA chan 0: busy_dequeue: 628
     Rx DMA chan 0: good_dequeue: 905
     Tx DMA chan 0: head_enqueue: 346
     Tx DMA chan 0: tail_enqueue: 345
     Tx DMA chan 0: misqueued: 345
     Tx DMA chan 0: empty_dequeue: 346
     Tx DMA chan 0: good_dequeue: 691
     p00_rx_octets: 52516
     p00_rx_unicast: 691
     p00_rx_frames_65_to_127_octets: 691
     p00_tx_octets: 78848
     p00_tx_unicast: 905
     p00_tx_frames_65_to_127_octets: 883
     p00_tx_frames_128_255_octets: 22
     p00_tx_green_prio_0: 905

And with swp2 connected to swp3 with STP enabled:
NIC statistics:
     tx_packets: 379
     tx_bytes: 19708
     rx_packets: 1
     rx_bytes: 46
     rx_octets: 64
     rx_multicast: 1
     rx_frames_below_65_octets: 1
     rx_classified_drops: 1
     tx_octets: 44630
     tx_multicast: 387
     tx_broadcast: 290
     tx_frames_below_65_octets: 379
     tx_frames_65_to_127_octets: 294
     tx_frames_128_255_octets: 4
     tx_green_prio_0: 298
     tx_green_prio_7: 379
NIC statistics:
     tx_packets: 1
     tx_bytes: 52
     rx_packets: 713
     rx_bytes: 34148
     rx_octets: 46982
     rx_multicast: 407
     rx_broadcast: 306
     rx_frames_below_65_octets: 399
     rx_frames_65_to_127_octets: 310
     rx_frames_128_to_255_octets: 4
     rx_classified_drops: 399
     rx_green_prio_0: 314
     tx_octets: 64
     tx_multicast: 1
     tx_frames_below_65_octets: 1
     tx_green_prio_7: 1

v1 > v2: reword commit messages
v2 > v3: correctly mark this for net-next when sending
v3 > v4: calloc array instead of zalloc per review
v4 > v5:
    Apply CR suggestions for whitespace
    Fix calloc / zalloc mixup
    Properly destroy workqueues
    Add third commit to split long macros
v5 > v6:
    Fix functionality - v5 was improperly tested
    Add bugfix for ethtool mutex lock
    Remove unnecessary ethtool stats reads
v6 > v7:
    Remove mutex bug patch that was applied via net
    Rename function based on CR
    Add missed error check
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

d4e7592b

net: mscc: ocelot: use bulk reads for stats · d87b1c08

Colin Foster authored Feb 13, 2022

Create and utilize bulk regmap reads instead of single access for gathering
stats. The background reading of statistics happens frequently, and over
a few contiguous memory regions.

High speed PCIe buses and MMIO access will probably see negligible
performance increase. Lower speed buses like SPI and I2C could see
significant performance increase, since the bus configuration and register
access times account for a large percentage of data transfer time.
Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d87b1c08

net: mscc: ocelot: add ability to perform bulk reads · 40f3a5c8

Colin Foster authored Feb 13, 2022

Regmap supports bulk register reads. Ocelot does not. This patch adds
support for Ocelot to invoke bulk regmap reads. That will allow any driver
that performs consecutive reads over memory regions to optimize that
access.
Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

40f3a5c8