Commits · 9b19e57a3c78f1f7c08a48bafb7d84caf6e80b68 · Kirill Smelkov / linux

12 May, 2022 20 commits

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 9b19e57a

Jakub Kicinski authored May 12, 2022

No conflicts.

Build issue in drivers/net/ethernet/sfc/ptp.c
  54fccfdd ("sfc: efx_default_channel_type APIs can be static")
  49e6123c ("net: sfc: fix memory leak due to ptp channel")
https://lore.kernel.org/all/20220510130556.52598fe2@canb.auug.org.au/Signed-off-by: Jakub Kicinski <kuba@kernel.org>

9b19e57a

Merge tag 'net-5.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · f3f19f93

Linus Torvalds authored May 12, 2022

Pull networking fixes from Jakub Kicinski:
 "Including fixes from wireless, and bluetooth.

  No outstanding fires.

  Current release - regressions:

   - eth: atlantic: always deep reset on pm op, fix null-deref

  Current release - new code bugs:

   - rds: use maybe_get_net() when acquiring refcount on TCP sockets
     [refinement of a previous fix]

   - eth: ocelot: mark traps with a bool instead of guessing type based
     on list membership

  Previous releases - regressions:

   - net: fix skipping features in for_each_netdev_feature()

   - phy: micrel: fix null-derefs on suspend/resume and probe

   - bcmgenet: check for Wake-on-LAN interrupt probe deferral

  Previous releases - always broken:

   - ipv4: drop dst in multicast routing path, prevent leaks

   - ping: fix address binding wrt vrf

   - net: fix wrong network header length when BPF protocol translation
     is used on skbs with a fraglist

   - bluetooth: fix the creation of hdev->name

   - rfkill: uapi: fix RFKILL_IOCTL_MAX_SIZE ioctl request definition

   - wifi: iwlwifi: iwl-dbg: use del_timer_sync() before freeing

   - wifi: ath11k: reduce the wait time of 11d scan and hw scan while
     adding an interface

   - mac80211: fix rx reordering with non explicit / psmp ack policy

   - mac80211: reset MBSSID parameters upon connection

   - nl80211: fix races in nl80211_set_tx_bitrate_mask()

   - tls: fix context leak on tls_device_down

   - sched: act_pedit: really ensure the skb is writable

   - batman-adv: don't skb_split skbuffs with frag_list

   - eth: ocelot: fix various issues with TC actions (null-deref; bad
     stats; ineffective drops; ineffective filter removal)"

* tag 'net-5.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (61 commits)
  tls: Fix context leak on tls_device_down
  net: sfc: ef10: fix memory leak in efx_ef10_mtd_probe()
  net/smc: non blocking recvmsg() return -EAGAIN when no data and signal_pending
  net: dsa: bcm_sf2: Fix Wake-on-LAN with mac_link_down()
  mlxsw: Avoid warning during ip6gre device removal
  net: bcmgenet: Check for Wake-on-LAN interrupt probe deferral
  net: ethernet: mediatek: ppe: fix wrong size passed to memset()
  Bluetooth: Fix the creation of hdev->name
  i40e: i40e_main: fix a missing check on list iterator
  net/sched: act_pedit: really ensure the skb is writable
  s390/lcs: fix variable dereferenced before check
  s390/ctcm: fix potential memory leak
  s390/ctcm: fix variable dereferenced before check
  net: atlantic: verify hw_head_ lies within TX buffer ring
  net: atlantic: add check for MAX_SKB_FRAGS
  net: atlantic: reduce scope of is_rsc_complete
  net: atlantic: fix "frag[0] not initialized"
  net: stmmac: fix missing pci_disable_device() on error in stmmac_pci_probe()
  net: phy: micrel: Fix incorrect variable type in micrel
  decnet: Use container_of() for struct dn_neigh casts
  ...

f3f19f93

Merge branch 'for-5.18-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup · 0ac824f3

Linus Torvalds authored May 12, 2022

Pull cgroup fix from Tejun Heo:
 "Waiman's fix for a cgroup2 cpuset bug where it could miss nodes which
  were hot-added"

* 'for-5.18-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup/cpuset: Remove cpus_allowed/mems_allowed setup in cpuset_init_smp()

0ac824f3

Merge tag 'fixes_for_v5.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs · c37dba6a

Linus Torvalds authored May 12, 2022

Pull fs fixes from Jan Kara:
 "Three fixes that I'd still like to get to 5.18:

   - add a missing sanity check in the fanotify FAN_RENAME feature
     (added in 5.17, let's fix it before it gets wider usage in
     userspace)

   - udf fix for recently introduced filesystem corruption issue

   - writeback fix for a race in inode list handling that can lead to
     delayed writeback and possible dirty throttling stalls"

* tag 'fixes_for_v5.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  udf: Avoid using stale lengthOfImpUse
  writeback: Avoid skipping inode writeback
  fanotify: do not allow setting dirent events in mask of non-dir

c37dba6a

tls: Fix context leak on tls_device_down · 3740651b

Maxim Mikityanskiy authored May 12, 2022

The commit cited below claims to fix a use-after-free condition after
tls_device_down. Apparently, the description wasn't fully accurate. The
context stayed alive, but ctx->netdev became NULL, and the offload was
torn down without a proper fallback, so a bug was present, but a
different kind of bug.

Due to misunderstanding of the issue, the original patch dropped the
refcount_dec_and_test line for the context to avoid the alleged
premature deallocation. That line has to be restored, because it matches
the refcount_inc_not_zero from the same function, otherwise the contexts
that survived tls_device_down are leaked.

This patch fixes the described issue by restoring refcount_dec_and_test.
After this change, there is no leak anymore, and the fallback to
software kTLS still works.

Fixes: c55dcdd4 ("net/tls: Fix use-after-free after the TLS device goes down and up")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20220512091830.678684-1-maximmi@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

3740651b

net: sfc: ef10: fix memory leak in efx_ef10_mtd_probe() · 1fa89ffb

Taehee Yoo authored May 12, 2022

In the NIC ->probe() callback, ->mtd_probe() callback is called.
If NIC has 2 ports, ->probe() is called twice and ->mtd_probe() too.
In the ->mtd_probe(), which is efx_ef10_mtd_probe() it allocates and
initializes mtd partiion.
But mtd partition for sfc is shared data.
So that allocated mtd partition data from last called
efx_ef10_mtd_probe() will not be used.
Therefore it must be freed.
But it doesn't free a not used mtd partition data in efx_ef10_mtd_probe().

kmemleak reports:
unreferenced object 0xffff88811ddb0000 (size 63168):
  comm "systemd-udevd", pid 265, jiffies 4294681048 (age 348.586s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffffa3767749>] kmalloc_order_trace+0x19/0x120
    [<ffffffffa3873f0e>] __kmalloc+0x20e/0x250
    [<ffffffffc041389f>] efx_ef10_mtd_probe+0x11f/0x270 [sfc]
    [<ffffffffc0484c8a>] efx_pci_probe.cold.17+0x3df/0x53d [sfc]
    [<ffffffffa414192c>] local_pci_probe+0xdc/0x170
    [<ffffffffa4145df5>] pci_device_probe+0x235/0x680
    [<ffffffffa443dd52>] really_probe+0x1c2/0x8f0
    [<ffffffffa443e72b>] __driver_probe_device+0x2ab/0x460
    [<ffffffffa443e92a>] driver_probe_device+0x4a/0x120
    [<ffffffffa443f2ae>] __driver_attach+0x16e/0x320
    [<ffffffffa4437a90>] bus_for_each_dev+0x110/0x190
    [<ffffffffa443b75e>] bus_add_driver+0x39e/0x560
    [<ffffffffa4440b1e>] driver_register+0x18e/0x310
    [<ffffffffc02e2055>] 0xffffffffc02e2055
    [<ffffffffa3001af3>] do_one_initcall+0xc3/0x450
    [<ffffffffa33ca574>] do_init_module+0x1b4/0x700
Acked-by: Martin Habets <habetsm.xilinx@gmail.com>
Fixes: 8127d661 ("sfc: Add support for Solarflare SFC9100 family")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Link: https://lore.kernel.org/r/20220512054709.12513-1-ap420073@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

1fa89ffb

net/smc: non blocking recvmsg() return -EAGAIN when no data and signal_pending · f3c46e41

Guangguan Wang authored May 12, 2022

Non blocking sendmsg will return -EAGAIN when any signal pending
and no send space left, while non blocking recvmsg return -EINTR
when signal pending and no data received. This may makes confused.
As TCP returns -EAGAIN in the conditions described above. Align the
behavior of smc with TCP.

Fixes: 846e344e ("net/smc: add receive timeout check")
Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Acked-by: Karsten Graul <kgraul@linux.ibm.com>
Link: https://lore.kernel.org/r/20220512030820.73848-1-guangguan.wang@linux.alibaba.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

f3c46e41

net: dsa: bcm_sf2: Fix Wake-on-LAN with mac_link_down() · b7be130c

Florian Fainelli authored May 11, 2022

After commit 2d1f90f9 ("net: dsa/bcm_sf2: fix incorrect usage of
state->link") the interface suspend path would call our mac_link_down()
call back which would forcibly set the link down, thus preventing
Wake-on-LAN packets from reaching our management port.

Fix this by looking at whether the port is enabled for Wake-on-LAN and
not clearing the link status in that case to let packets go through.

Fixes: 2d1f90f9 ("net: dsa/bcm_sf2: fix incorrect usage of state->link")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20220512021731.2494261-1-f.fainelli@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

b7be130c

mlxsw: Avoid warning during ip6gre device removal · 810c2f0a

Amit Cohen authored May 11, 2022

IPv6 addresses which are used for tunnels are stored in a hash table
with reference counting. When a new GRE tunnel is configured, the driver
is notified and configures it in hardware.

Currently, any change in the tunnel is not applied in the driver. It
means that if the remote address is changed, the driver is not aware of
this change and the first address will be used.

This behavior results in a warning [1] in scenarios such as the
following:

 # ip link add name gre1 type ip6gre local 2000::3 remote 2000::fffe tos inherit ttl inherit
 # ip link set name gre1 type ip6gre local 2000::3 remote 2000::ffff ttl inherit
 # ip link delete gre1

The change of the address is not applied in the driver. Currently, the
driver uses the remote address which is stored in the 'parms' of the
overlay device. When the tunnel is removed, the new IPv6 address is
used, the driver tries to release it, but as it is not aware of the
change, this address is not configured and it warns about releasing non
existing IPv6 address.

Fix it by using the IPv6 address which is cached in the IPIP entry, this
address is the last one that the driver used, so even in cases such the
above, the first address will be released, without any warning.

[1]:

WARNING: CPU: 1 PID: 2197 at drivers/net/ethernet/mellanox/mlxsw/spectrum.c:2920 mlxsw_sp_ipv6_addr_put+0x146/0x220 [mlxsw_spectrum]
...
CPU: 1 PID: 2197 Comm: ip Not tainted 5.17.0-rc8-custom-95062-gc1e5ded51a9a #84
Hardware name: Mellanox Technologies Ltd. MSN4700/VMOD0010, BIOS 5.11 07/12/2021
RIP: 0010:mlxsw_sp_ipv6_addr_put+0x146/0x220 [mlxsw_spectrum]
...
Call Trace:
 <TASK>
 mlxsw_sp2_ipip_rem_addr_unset_gre6+0xf1/0x120 [mlxsw_spectrum]
 mlxsw_sp_netdevice_ipip_ol_event+0xdb/0x640 [mlxsw_spectrum]
 mlxsw_sp_netdevice_event+0xc4/0x850 [mlxsw_spectrum]
 raw_notifier_call_chain+0x3c/0x50
 call_netdevice_notifiers_info+0x2f/0x80
 unregister_netdevice_many+0x311/0x6d0
 rtnl_dellink+0x136/0x360
 rtnetlink_rcv_msg+0x12f/0x380
 netlink_rcv_skb+0x49/0xf0
 netlink_unicast+0x233/0x340
 netlink_sendmsg+0x202/0x440
 ____sys_sendmsg+0x1f3/0x220
 ___sys_sendmsg+0x70/0xb0
 __sys_sendmsg+0x54/0xa0
 do_syscall_64+0x35/0x80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Fixes: e846efe2 ("mlxsw: spectrum: Add hash table for IPv6 address mapping")
Reported-by: Maksym Yaremchuk <maksymy@nvidia.com>
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20220511115747.238602-1-idosch@nvidia.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>

810c2f0a

Merge branch 'nfp-vf-rate-limit-support' · b33177f1

Paolo Abeni authored May 12, 2022

Simon Horman says:

====================
*nfp: VF rate limit support

this short series adds VF rate limiting to the NFP driver.

The first patch, as suggested by Jakub Kicinski, adds a helper
to check that ndo_set_vf_rate() rate parameters are sane.
It also provides a place for further parameter checking to live,
if needed in future.

The second patch adds VF rate limit support to the NFP driver.
It addresses several comments made on v1, including removing
the parameter check that is now provided by the helper added
in the first patch.
====================

Link: https://lore.kernel.org/r/20220511113932.92114-1-simon.horman@corigine.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>

b33177f1

nfp: VF rate limit support · e0d0e1fd

Bin Chen authored May 11, 2022

Add VF rate limit feature

This patch enhances the NFP driver to supports assignment of
both max_tx_rate and min_tx_rate to VFs

The template of configurations below is all supported.
e.g.
 # ip link set $DEV vf $VF_NUM max_tx_rate $RATE_VALUE
 # ip link set $DEV vf $VF_NUM min_tx_rate $RATE_VALUE
 # ip link set $DEV vf $VF_NUM max_tx_rate $RATE_VALUE \
			       min_tx_rate $RATE_VALUE
 # ip link set $DEV vf $VF_NUM min_tx_rate $RATE_VALUE \
			       max_tx_rate $RATE_VALUE

The max RATE_VALUE is limited to 0xFFFF which is about
63Gbps (using 1024 for 1G)
Signed-off-by: Bin Chen <bin.chen@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

e0d0e1fd

rtnetlink: verify rate parameters for calls to ndo_set_vf_rate · a14857c2

Bin Chen authored May 11, 2022

When calling ndo_set_vf_rate() the max_tx_rate parameter may be zero,
in which case the setting is cleared, or it must be greater or equal to
min_tx_rate.

Enforce this requirement on all calls to ndo_set_vf_rate via a wrapper
which also only calls ndo_set_vf_rate() if defined by the driver.

Based on work by Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Bin Chen <bin.chen@corigine.com>
Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

a14857c2

net: ethernet: SP7021: Fix spelling mistake "Interrput" -> "Interrupt" · 982c97ee

Colin Ian King authored May 11, 2022

There is a spelling mistake in a dev_dbg message. Fix it.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Link: https://lore.kernel.org/r/20220511104448.150800-1-colin.i.king@gmail.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>

982c97ee

net: enetc: kill PHY-less mode for PFs · 0f84d403

Vladimir Oltean authored May 11, 2022

Right now, a PHY-less port (no phy-mode, no fixed-link, no phy-handle)
doesn't register with phylink, but calls netif_carrier_on() from
enetc_start().

This makes sense for a VF, but for a PF, this is braindead, because we
never call enetc_mac_enable() so the MAC is left inoperational.
Furthermore, commit 71b77a7a ("enetc: Migrate to PHYLINK and
PCS_LYNX") put the nail in the coffin because it removed the initial
netif_carrier_off() call done right after register_netdev().

Without that call, netif_carrier_on() does not call
linkwatch_fire_event(), so the operstate remains IF_OPER_UNKNOWN.

Just deny the broken configuration by requiring that a phy-mode is
present, and always register a PF with phylink.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Link: https://lore.kernel.org/r/20220511094200.558502-1-vladimir.oltean@nxp.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>

0f84d403

fortify: Provide a memcpy trap door for sharp corners · 43213dae

Kees Cook authored May 10, 2022

As we continue to narrow the scope of what the FORTIFY memcpy() will
accept and build alternative APIs that give the compiler appropriate
visibility into more complex memcpy scenarios, there is a need for
"unfortified" memcpy use in rare cases where combinations of compiler
behaviors, source code layout, etc, result in cases where the stricter
memcpy checks need to be bypassed until appropriate solutions can be
developed (i.e. fix compiler bugs, code refactoring, new API, etc). The
intention is for this to be used only if there's no other reasonable
solution, for its use to include a justification that can be used
to assess future solutions, and for it to be temporary.

Example usage included, based on analysis and discussion from:
https://lore.kernel.org/netdev/CANn89iLS_2cshtuXPyNUGDPaic=sJiYfvTb_wNLgWrZRyBxZ_g@mail.gmail.com

Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Coco Li <lixiaoyan@google.com>
Cc: Tariq Toukan <tariqt@nvidia.com>
Cc: Saeed Mahameed <saeedm@nvidia.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: netdev@vger.kernel.org
Cc: linux-hardening@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220511025301.3636666-1-keescook@chromium.orgSigned-off-by: Paolo Abeni <pabeni@redhat.com>

43213dae

net: bcmgenet: Check for Wake-on-LAN interrupt probe deferral · 6b77c066

Florian Fainelli authored May 10, 2022

The interrupt controller supplying the Wake-on-LAN interrupt line maybe
modular on some platforms (irq-bcm7038-l1.c) and might be probed at a
later time than the GENET driver. We need to specifically check for
-EPROBE_DEFER and propagate that error to ensure that we eventually
fetch the interrupt descriptor.

Fixes: 9deb48b5 ("bcmgenet: add WOL IRQ check")
Fixes: 5b1f0e62 ("net: bcmgenet: Avoid touching non-existent interrupt")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Stefan Wahren <stefan.wahren@i2se.com>
Link: https://lore.kernel.org/r/20220511031752.2245566-1-f.fainelli@gmail.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>

6b77c066

net: ethernet: mediatek: ppe: fix wrong size passed to memset() · 00832b1d

Yang Yingliang authored May 11, 2022

'foe_table' is a pointer, the real size of struct mtk_foe_entry
should be pass to memset().

Fixes: ba37b7ca ("net: ethernet: mtk_eth_soc: add support for initializing the PPE")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Acked-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20220511030829.3308094-1-yangyingliang@huawei.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>

00832b1d

Merge tag 'for-net-2022-05-11' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth · a48ab883

Jakub Kicinski authored May 11, 2022

Luiz Augusto von Dentz says:

====================
bluetooth pull request for net:

 - Fix the creation of hdev->name when index is greater than 9999

* tag 'for-net-2022-05-11' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
  Bluetooth: Fix the creation of hdev->name
====================

Link: https://lore.kernel.org/r/20220512002901.823647-1-luiz.dentz@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

a48ab883

Merge tag 'wireless-2022-05-11' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless · 8bf6008c

Jakub Kicinski authored May 11, 2022

Kalle Valo says:

====================
wireless fixes for v5.18

Second set of fixes for v5.18 and hopefully the last one. We have a
new iwlwifi maintainer, a fix to rfkill ioctl interface and important
fixes to both stack and two drivers.

* tag 'wireless-2022-05-11' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
  rfkill: uapi: fix RFKILL_IOCTL_MAX_SIZE ioctl request definition
  nl80211: fix locking in nl80211_set_tx_bitrate_mask()
  mac80211_hwsim: call ieee80211_tx_prepare_skb under RCU protection
  mac80211_hwsim: fix RCU protected chanctx access
  mailmap: update Kalle Valo's email
  mac80211: Reset MBSSID parameters upon connection
  cfg80211: retrieve S1G operating channel number
  nl80211: validate S1G channel width
  mac80211: fix rx reordering with non explicit / psmp ack policy
  ath11k: reduce the wait time of 11d scan and hw scan while add interface
  MAINTAINERS: update iwlwifi driver maintainer
  iwlwifi: iwl-dbg: Use del_timer_sync() before freeing
====================

Link: https://lore.kernel.org/r/20220511154535.A1A12C340EE@smtp.kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

8bf6008c

Bluetooth: Fix the creation of hdev->name · 103a2f32

Itay Iellin authored May 07, 2022

Set a size limit of 8 bytes of the written buffer to "hdev->name"
including the terminating null byte, as the size of "hdev->name" is 8
bytes. If an id value which is greater than 9999 is allocated,
then the "snprintf(hdev->name, sizeof(hdev->name), "hci%d", id)"
function call would lead to a truncation of the id value in decimal
notation.

Set an explicit maximum id parameter in the id allocation function call.
The id allocation function defines the maximum allocated id value as the
maximum id parameter value minus one. Therefore, HCI_MAX_ID is defined
as 10000.
Signed-off-by: Itay Iellin <ieitayie@gmail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>

103a2f32

11 May, 2022 20 commits

Merge branch 'count-tc-taprio-window-drops-in-enetc-driver' · bb709987

Jakub Kicinski authored May 11, 2022

Vladimir Oltean says:

====================
Count tc-taprio window drops in enetc driver

This series includes a patch from Po Liu (no longer with NXP) which
counts frames dropped by the tc-taprio offload in ethtool -S and in
ndo_get_stats64. It also contains a preparation patch from myself.
====================

Link: https://lore.kernel.org/r/20220510163615.6096-1-vladimir.oltean@nxp.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

bb709987

net: enetc: count the tc-taprio window drops · 285e8ded

Po Liu authored May 10, 2022

The enetc scheduler for IEEE 802.1Qbv has 2 options (depending on
PTGCR[TG_DROP_DISABLE]) when we attempt to send an oversized packet
which will never fit in its allotted time slot for its traffic class:
either block the entire port due to head-of-line blocking, or drop the
packet and set a bit in the writeback format of the transmit buffer
descriptor, allowing other packets to be sent.

We obviously choose the second option in the driver, but we do not
detect the drop condition, so from the perspective of the network stack,
the packet is sent and no error counter is incremented.

This change checks the writeback of the TX BD when tc-taprio is enabled,
and increments a specific ethtool statistics counter and a generic
"tx_dropped" counter in ndo_get_stats64.
Signed-off-by: Po Liu <Po.Liu@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

285e8ded

net: enetc: manage ENETC_F_QBV in priv->active_offloads only when enabled · 32bf8e1f

Vladimir Oltean authored May 10, 2022

Future work in this driver would like to look at priv->active_offloads &
ENETC_F_QBV to determine whether a tc-taprio qdisc offload was
installed, but this does not produce the intended effect.

All the other flags in priv->active_offloads are managed dynamically,
except ENETC_F_QBV which is set statically based on the probed SI capability.

This change makes priv->active_offloads & ENETC_F_QBV really track the
presence of a tc-taprio schedule on the port.

Some existing users, like the enetc_sched_speed_set() call from
phylink_mac_link_up(), are best kept using the old logic: the tc-taprio
offload does not re-trigger another link mode resolve, so the scheduler
needs to be functional from the get go, as long as Qbv is supported at
all on the port. So to preserve functionality there, look at the static
station interface capability from pf->si->hw_features instead.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

32bf8e1f

Merge branch 'macb-napi-improvements' · d7722973

Jakub Kicinski authored May 11, 2022

Robert Hancock says:

====================
MACB NAPI improvements

Simplify the logic in the Cadence MACB/GEM driver for determining
when to reschedule NAPI processing, and update it to use NAPI for the
TX path as well as the RX path.

Changes since v1: Changed to use separate TX and RX NAPI instances and
poll functions to avoid unnecessary checks of the other ring (TX/RX)
states during polling and to use budget handling for both RX and TX.
Fixed locking to protect against concurrent access to TX ring on
TX transmit and TX poll paths.
====================

Link: https://lore.kernel.org/r/20220509194635.3094080-1-robert.hancock@calian.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

d7722973

net: macb: use NAPI for TX completion path · 138badbc

Robert Hancock authored May 09, 2022

This driver was using the TX IRQ handler to perform all TX completion
tasks. Under heavy TX network load, this can cause significant irqs-off
latencies (found to be in the hundreds of microseconds using ftrace).
This can cause other issues, such as overrunning serial UART FIFOs when
using high baud rates with limited UART FIFO sizes.

Switch to using a NAPI poll handler to perform the TX completion work
to get this out of hard IRQ context and avoid the IRQ latency impact. A
separate NAPI instance is used for TX and RX to avoid checking the other
ring's state unnecessarily when doing the poll, and so that the NAPI
budget handling can work for both TX and RX packets.

A new per-queue tx_ptr_lock spinlock has been added to avoid using the
main device lock (with IRQs needing to be disabled) across the entire TX
mapping operation, and also to protect the TX queue pointers from
concurrent access between the TX start and TX poll operations.

The TX Used Bit Read interrupt (TXUBR) handling also needs to be moved into
the TX NAPI poll handler to maintain the proper order of operations. A flag
is used to notify the poll handler that a UBR condition needs to be
handled. The macb_tx_restart handler has had some locking added for global
register access, since this could now potentially happen concurrently on
different queues.
Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

138badbc

net: macb: simplify/cleanup NAPI reschedule checking · 1900e30d

Robert Hancock authored May 09, 2022

Previously the macb_poll method was checking the RSR register after
completing its RX receive work to see if additional packets had been
received since IRQs were disabled, since this controller does not
maintain the pending IRQ status across IRQ disable. It also had to
double-check the register after re-enabling IRQs to detect if packets
were received after the first check but before IRQs were enabled.

Using the RSR register for this purpose is problematic since it reflects
the global device state rather than the per-queue state, so if packets
are being received on multiple queues it may end up retriggering receive
on a queue where the packets did not actually arrive and not on the one
where they did arrive. This will also cause problems with an upcoming
change to use NAPI for the TX path where use of multiple queues is more
likely.

Add a macb_rx_pending function to check the RX ring to see if more
packets have arrived in the queue, and use that to check if NAPI should
be rescheduled rather than the RSR register. By doing this, we can just
ignore the global RSR register entirely, and thus save some extra device
register accesses at the same time.

This also makes the previous first check for pending packets rather
redundant, since it would be checking the RX ring state which was just
checked in the receive work function. Therefore we can get rid of it and
just check after enabling interrupts whether packets are already
pending.
Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

1900e30d

net: dsa: ocelot: accept 1000base-X for VSC9959 and VSC9953 · 11ecf341

Vladimir Oltean authored May 10, 2022

Switches using the Lynx PCS driver support 1000base-X optical SFP
modules. Accept this interface type on a port.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220510164320.10313-1-vladimir.oltean@nxp.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

11ecf341

i40e: i40e_main: fix a missing check on list iterator · 3f95a747

Xiaomeng Tong authored May 10, 2022

The bug is here:
	ret = i40e_add_macvlan_filter(hw, ch->seid, vdev->dev_addr, &aq_err);

The list iterator 'ch' will point to a bogus position containing
HEAD if the list is empty or no element is found. This case must
be checked before any use of the iterator, otherwise it will
lead to a invalid memory access.

To fix this bug, use a new variable 'iter' as the list iterator,
while use the origin variable 'ch' as a dedicated pointer to
point to the found element.

Cc: stable@vger.kernel.org
Fixes: 1d8d80b4 ("i40e: Add macvlan support on i40e")
Signed-off-by: Xiaomeng Tong <xiam0nd.tong@gmail.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20220510204846.2166999-1-anthony.l.nguyen@intel.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

3f95a747

selftests: forwarding: tc_actions: allow mirred egress test to run on non-offloaded h2 · b57c7e8b

Vladimir Oltean authored May 11, 2022

The host interfaces $h1 and $h2 don't have to be switchdev interfaces,
but due to the fact that we pass $tcflags which may have the value of
"skip_sw", we force $h2 to offload a drop rule for dst_ip, something
which it may not be able to do.

The selftest only wants to verify the hit count of this rule as a means
of figuring out whether the packet was received, so remove the $tcflags
for it and let it be done in software.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20220510220904.284552-1-vladimir.oltean@nxp.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

b57c7e8b

Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue · ddae9bc4

Jakub Kicinski authored May 11, 2022

Tony Nguyen says:

====================
1GbE Intel Wired LAN Driver Updates 2022-05-10

This series contains updates to igc driver only.

Sasha cleans up the code by removing an unused function and removing an
enum for PHY type as there is only one PHY. The return type for
igc_check_downshift() is changed to void as it always returns success.

* '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  igc: Change type of the 'igc_check_downshift' method
  igc: Remove unused phy_type enum
  igc: Remove igc_set_spd_dplx method
====================

Link: https://lore.kernel.org/r/20220510210656.2168393-1-anthony.l.nguyen@intel.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

ddae9bc4

net/sched: act_pedit: really ensure the skb is writable · 8b796475

Paolo Abeni authored May 10, 2022

Currently pedit tries to ensure that the accessed skb offset
is writable via skb_unclone(). The action potentially allows
touching any skb bytes, so it may end-up modifying shared data.

The above causes some sporadic MPTCP self-test failures, due to
this code:

	tc -n $ns2 filter add dev ns2eth$i egress \
		protocol ip prio 1000 \
		handle 42 fw \
		action pedit munge offset 148 u8 invert \
		pipe csum tcp \
		index 100

The above modifies a data byte outside the skb head and the skb is
a cloned one, carrying a TCP output packet.

This change addresses the issue by keeping track of a rough
over-estimate highest skb offset accessed by the action and ensuring
such offset is really writable.

Note that this may cause performance regressions in some scenarios,
but hopefully pedit is not in the critical path.

Fixes: db2c2417 ("act_pedit: access skb->data safely")
Acked-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Tested-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/1fcf78e6679d0a287dd61bb0f04730ce33b3255d.1652194627.git.pabeni@redhat.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

8b796475

eth: amd: remove NI6510 support (ni65) · 01f46857

Jakub Kicinski authored May 09, 2022

Looks like all the changes to this driver had been tree-wide
refactoring since git era begun. The driver is using virt_to_bus()
we should make it use more modern DMA APIs but since it's unlikely
to be getting any use these days delete it instead. We can always
revert to bring it back.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

01f46857

net: appletalk: remove Apple/Farallon LocalTalk PC support · 03dcb90d

Jakub Kicinski authored May 09, 2022

Looks like all the changes to this driver had been tree-wide
refactoring since git era begun. The driver is using virt_to_bus()
we should make it use more modern DMA APIs but since it's unlikely
to be getting any use these days delete it instead. We can always
revert to bring it back.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

03dcb90d

Merge branch 'debug-net' · e508af8a

David S. Miller authored May 11, 2022

Eric Dumazet says:

====================
net: CONFIG_DEBUG_NET and friends

This patch series is inspired by some syzbot reports
hinting that skb transport_header might be not set
in places we expect it being set.

Add a new CONFIG_DEBUG_NET option
and DEBUG_NET_WARN_ON_ONCE() helper, so that we can start
adding more sanity checks in the future.

Replace two BUG() in skb_checksum_help()
with less risky code.

v2: make first patch compile on more arches/compilers
    add the 5th patch to add more debugging in skb_checksum_help()
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

e508af8a

net: add more debug info in skb_checksum_help() · eeee4b77

Eric Dumazet authored May 09, 2022

This is a followup of previous patch.

Dumping the stack trace is a good start, but printing
basic skb information is probably better.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

eeee4b77

net: remove two BUG() from skb_checksum_help() · d7ea0d9d

Eric Dumazet authored May 09, 2022

I have a syzbot report that managed to get a crash in skb_checksum_help()

If syzbot can trigger these BUG(), it makes sense to replace
them with more friendly WARN_ON_ONCE() since skb_checksum_help()
can instead return an error code.

Note that syzbot will still crash there, until real bug is fixed.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d7ea0d9d

net: warn if transport header was not set · 66e4c8d9

Eric Dumazet authored May 09, 2022

Make sure skb_transport_header() and skb_transport_offset() uses
are not fooled if the transport header has not been set.

This change will likely expose existing bugs in linux networking stacks.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

66e4c8d9

net: add CONFIG_DEBUG_NET · d268c1f5

Eric Dumazet authored May 09, 2022

This config option enables network debugging checks.

This patch adds DEBUG_NET_WARN_ON_ONCE(cond)
Note that this is not a replacement for WARN_ON_ONCE(cond)
as (cond) is not evaluated if CONFIG_DEBUG_NET is not set.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d268c1f5

net: add include/net/net_debug.h · 5b87be9e

Eric Dumazet authored May 09, 2022

Remove from include/linux/netdevice.h helpers
that send debug/info/warnings to syslog.

We plan adding more helpers in following patches.

v2: added two includes, and 'struct net_device' forward declaration
    to avoid compile errors (kernel bots)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5b87be9e

Merge branch 's390-net-fixes' · 3cc5c6a7

David S. Miller authored May 11, 2022

Alexandra Winter says:

====================
s390/net: Cleanup some code checker findings

clean up smatch findings in legacy code. I was not able to provoke
any real failures on my systems, but other hardware reactions,
timing conditions or compiler output, may cause failures.

There are still 2 smatch warnings left in s390/net:

drivers/s390/net/ctcm_main.c:1326 add_channel() warn: missing error code 'rc'
This one is a false positive.

drivers/s390/net/netiucv.c:1355 netiucv_check_user() warn: argument 3 to %02x specifier has type 'char'
Postponing this one, need to better understand string handling in iucv.

There are several sparse warnings left in ctcm, like:
drivers/s390/net/ctcm_fsms.c:573:9: warning: context imbalance in 'ctcm_chx_setmode' - different lock contexts for basic block
Those are mentioned in the source, no plan to rework.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

3cc5c6a7