Commits · 82e94d4144d7a29e6e955e4b2ea681ed3f16d689 · Kirill Smelkov / linux

17 Mar, 2022 40 commits

Merge branch 'net-bridge-multiple-spanning-trees' · 82e94d41

Jakub Kicinski authored Mar 17, 2022

Tobias Waldekranz says:

====================
net: bridge: Multiple Spanning Trees

The bridge has had per-VLAN STP support for a while now, since:

https://lore.kernel.org/netdev/20200124114022.10883-1-nikolay@cumulusnetworks.com/

The current implementation has some problems:

- The mapping from VLAN to STP state is fixed as 1:1, i.e. each VLAN
  is managed independently. This is awkward from an MSTP (802.1Q-2018,
  Clause 13.5) point of view, where the model is that multiple VLANs
  are grouped into MST instances.

  Because of the way that the standard is written, presumably, this is
  also reflected in hardware implementations. It is not uncommon for a
  switch to support the full 4k range of VIDs, but that the pool of
  MST instances is much smaller. Some examples:

  Marvell LinkStreet (mv88e6xxx): 4k VLANs, but only 64 MSTIs
  Marvell Prestera: 4k VLANs, but only 128 MSTIs
  Microchip SparX-5i: 4k VLANs, but only 128 MSTIs

- By default, the feature is enabled, and there is no way to disable
  it. This makes it hard to add offloading in a backwards compatible
  way, since any underlying switchdevs have no way to refuse the
  function if the hardware does not support it

- The port-global STP state has precedence over per-VLAN states. In
  MSTP, as far as I understand it, all VLANs will use the common
  spanning tree (CST) by default - through traffic engineering you can
  then optimize your network to group subsets of VLANs to use
  different trees (MSTI). To my understanding, the way this is
  typically managed in silicon is roughly:

  Incoming packet:
  .----.----.--------------.----.-------------
  | DA | SA | 802.1Q VID=X | ET | Payload ...
  '----'----'--------------'----'-------------
                        |
                        '->|\     .----------------------------.
                           | +--> | VID | Members | ... | MSTI |
                   PVID -->|/     |-----|---------|-----|------|
                                  |   1 | 0001001 | ... |    0 |
                                  |   2 | 0001010 | ... |   10 |
                                  |   3 | 0001100 | ... |   10 |
                                  '----------------------------'
                                                             |
                               .-----------------------------'
                               |  .------------------------.
                               '->| MSTI | Fwding | Lrning |
                                  |------|--------|--------|
                                  |    0 | 111110 | 111110 |
                                  |   10 | 110111 | 110111 |
                                  '------------------------'

  What this is trying to show is that the STP state (whether MSTP is
  used, or ye olde STP) is always accessed via the VLAN table. If STP
  is running, all MSTI pointers in that table will reference the same
  index in the STP stable - if MSTP is running, some VLANs may point
  to other trees (like in this example).

  The fact that in the Linux bridge, the global state (think: index 0
  in most hardware implementations) is supposed to override the
  per-VLAN state, is very awkward to offload. In effect, this means
  that when the global state changes to blocking, drivers will have to
  iterate over all MSTIs in use, and alter them all to match. This
  also means that you have to cache whether the hardware state is
  currently tracking the global state or the per-VLAN state. In the
  first case, you also have to cache the per-VLAN state so that you
  can restore it if the global state transitions back to forwarding.

This series adds a new mst_enable bridge setting (as suggested by Nik)
that can only be changed when no VLANs are configured on the
bridge. Enabling this mode has the following effect:

- The port-global STP state is used to represent the CST (Common
  Spanning Tree) (1/15)

- Ingress STP filtering is deferred until the frame's VLAN has been
  resolved (1/15)

- The preexisting per-VLAN states can no longer be controlled directly
  (1/15). They are instead placed under the MST module's control,
  which is managed using a new netlink interface (described in 3/15)

- VLANs can br mapped to MSTIs in an arbitrary M:N fashion, using a
  new global VLAN option (2/15)

Switchdev notifications are added so that a driver can track:
- MST enabled state
- VID to MSTI mappings
- MST port states

An offloading implementation is this provided for mv88e6xxx.
====================

Link: https://lore.kernel.org/r/20220316150857.2442916-1-tobias@waldekranz.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

82e94d41

net: dsa: mv88e6xxx: MST Offloading · acaf4d2e

Tobias Waldekranz authored Mar 16, 2022

Allocate a SID in the STU for each MSTID in use by a bridge and handle
the mapping of MSTIDs to VLANs using the SID field of each VTU entry.
Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

acaf4d2e

net: dsa: mv88e6xxx: Export STU as devlink region · 7dc96039

Tobias Waldekranz authored Mar 16, 2022

Export the raw STU data in a devlink region so that it can be
inspected from userspace and compared to the current bridge
configuration.
Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

7dc96039

net: dsa: mv88e6xxx: Disentangle STU from VTU · 49c98c1d

Tobias Waldekranz authored Mar 16, 2022

In early LinkStreet silicon (e.g. 6095/6185), the per-VLAN STP states
were kept in the VTU - there was no concept of a SID. Later, the
information was split into two tables, where the VTU only tracked
memberships and deferred the STP state tracking to the STU via a
pointer (SID). This meant that a group of VLANs could share the same
STU entry. Most likely, this was done to align with MSTP (802.1Q-2018,
Clause 13), which is built on this principle.

While the VTU is still 4k lines on most devices, the STU is capped at
64 entries. This means that the current stategy, updating STU info
whenever a VTU entry is updated, can not easily support MSTP because:

- The maximum number of VIDs would also be capped at 64, as we would
  have to allocate one SID for every VTU entry - even if many VLANs
  would effectively share the same MST.

- MSTP updates would be unnecessarily slow as you would have to
  iterate over all VLANs that share the same MST.

In order to support MSTP offloading in the future, manage the STU as a
separate entity from the VTU.

Only add support for newer hardware with separate VTU and
STU. VTU-only devices can also be supported, but essentially this
requires a software implementation of an STU (fanning out state
changed to all VLANs tied to the same MST).
Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

49c98c1d

net: dsa: Handle MST state changes · 7414af30

Tobias Waldekranz authored Mar 16, 2022

Add the usual trampoline functionality from the generic DSA layer down
to the drivers for MST state changes.

When a state changes to disabled/blocking/listening, make sure to fast
age any dynamic entries in the affected VLANs (those controlled by the
MSTI in question).
Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

7414af30

net: dsa: Pass VLAN MSTI migration notifications to driver · 8e6598a7

Tobias Waldekranz authored Mar 16, 2022

Add the usual trampoline functionality from the generic DSA layer down
to the drivers for VLAN MSTI migrations.
Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

8e6598a7

net: dsa: Validate hardware support for MST · 332afc4c

Tobias Waldekranz authored Mar 16, 2022

When joining a bridge where MST is enabled, we validate that the
proper offloading support is in place, otherwise we fallback to
software bridging.

When then mode is changed on a bridge in which we are members, we
refuse the change if offloading is not supported.

At the moment we only check for configurable learning, but this will
be further restricted as we support more MST related switchdev events.
Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

332afc4c

net: bridge: mst: Add helper to query a port's MST state · f54fd0e1

Tobias Waldekranz authored Mar 16, 2022

This is useful for switchdev drivers who are offloading MST states
into hardware. As an example, a driver may wish to flush the FDB for a
port when it transitions from forwarding to blocking - which means
that the previous state must be discoverable.
Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

f54fd0e1

net: bridge: mst: Add helper to check if MST is enabled · 48d57b2e

Tobias Waldekranz authored Mar 16, 2022

This is useful for switchdev drivers that might want to refuse to join
a bridge where MST is enabled, if the hardware can't support it.
Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

48d57b2e

net: bridge: mst: Add helper to map an MSTI to a VID set · cceac97a

Tobias Waldekranz authored Mar 16, 2022

br_mst_get_info answers the question: "On this bridge, which VIDs are
mapped to the given MSTI?"

This is useful in switchdev drivers, which might have to fan-out
operations, relating to an MSTI, per VLAN.

An example: When a port's MST state changes from forwarding to
blocking, a driver may choose to flush the dynamic FDB entries on that
port to get faster reconvergence of the network, but this should only
be done in the VLANs that are managed by the MSTI in question.
Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

cceac97a

net: bridge: mst: Notify switchdev drivers of MST state changes · 7ae9147f

Tobias Waldekranz authored Mar 16, 2022

Generate a switchdev notification whenever an MST state changes. This
notification is keyed by the VLANs MSTI rather than the VID, since
multiple VLANs may share the same MST instance.
Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

7ae9147f

net: bridge: mst: Notify switchdev drivers of VLAN MSTI migrations · 6284c723

Tobias Waldekranz authored Mar 16, 2022

Whenever a VLAN moves to a new MSTI, send a switchdev notification so
that switchdevs can track a bridge's VID to MSTI mappings.
Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

6284c723

net: bridge: mst: Notify switchdev drivers of MST mode changes · 87c167bb

Tobias Waldekranz authored Mar 16, 2022

Trigger a switchdev event whenever the bridge's MST mode is
enabled/disabled. This allows constituent ports to either perform any
required hardware config, or refuse the change if it not supported.
Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

87c167bb

net: bridge: mst: Support setting and reporting MST port states · 122c2948

Tobias Waldekranz authored Mar 16, 2022

Make it possible to change the port state in a given MSTI by extending
the bridge port netlink interface (RTM_SETLINK on PF_BRIDGE).The
proposed iproute2 interface would be:

    bridge mst set dev <PORT> msti <MSTI> state <STATE>

Current states in all applicable MSTIs can also be dumped via a
corresponding RTM_GETLINK. The proposed iproute interface looks like
this:

$ bridge mst
port              msti
vb1               0
		    state forwarding
		  100
		    state disabled
vb2               0
		    state forwarding
		  100
		    state forwarding

The preexisting per-VLAN states are still valid in the MST
mode (although they are read-only), and can be queried as usual if one
is interested in knowing a particular VLAN's state without having to
care about the VID to MSTI mapping (in this example VLAN 20 and 30 are
bound to MSTI 100):

$ bridge -d vlan
port              vlan-id
vb1               10
		    state forwarding mcast_router 1
		  20
		    state disabled mcast_router 1
		  30
		    state disabled mcast_router 1
		  40
		    state forwarding mcast_router 1
vb2               10
		    state forwarding mcast_router 1
		  20
		    state forwarding mcast_router 1
		  30
		    state forwarding mcast_router 1
		  40
		    state forwarding mcast_router 1
Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

122c2948

net: bridge: mst: Allow changing a VLAN's MSTI · 8c678d60

Tobias Waldekranz authored Mar 16, 2022

Allow a VLAN to move out of the CST (MSTI 0), to an independent tree.

The user manages the VID to MSTI mappings via a global VLAN
setting. The proposed iproute2 interface would be:

    bridge vlan global set dev br0 vid <VID> msti <MSTI>

Changing the state in non-zero MSTIs is still not supported, but will
be addressed in upcoming changes.
Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

8c678d60

net: bridge: mst: Multiple Spanning Tree (MST) mode · ec7328b5

Tobias Waldekranz authored Mar 16, 2022

Allow the user to switch from the current per-VLAN STP mode to an MST
mode.

Up to this point, per-VLAN STP states where always isolated from each
other. This is in contrast to the MSTP standard (802.1Q-2018, Clause
13.5), where VLANs are grouped into MST instances (MSTIs), and the
state is managed on a per-MSTI level, rather that at the per-VLAN
level.

Perhaps due to the prevalence of the standard, many switching ASICs
are built after the same model. Therefore, add a corresponding MST
mode to the bridge, which we can later add offloading support for in a
straight-forward way.

For now, all VLANs are fixed to MSTI 0, also called the Common
Spanning Tree (CST). That is, all VLANs will follow the port-global
state.

Upcoming changes will make this actually useful by allowing VLANs to
be mapped to arbitrary MSTIs and allow individual MSTI states to be
changed.
Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ec7328b5

r8169: improve driver unload and system shutdown behavior on DASH-enabled systems · 54744510

Heiner Kallweit authored Mar 16, 2022

There's a number of systems supporting DASH remote management.
Driver unload and system shutdown can result in the PHY suspending,
thus making DASH unusable. Improve this by handling DASH being enabled
very similar to WoL being enabled.
Tested-by: Yanko Kaneti <yaneti@declera.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/1de3b176-c09c-1654-6f00-9785f7a4f954@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

54744510

Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue · fad6c1f1

Jakub Kicinski authored Mar 17, 2022

Tony Nguyen says:

====================
100GbE Intel Wired LAN Driver Updates 2022-03-16

This series contains updates to gtp and ice driver.

Wojciech fixes smatch reported inconsistent indenting for gtp and ice.

Yang Yingliang fixes a couple of return value checks for GNSS to IS_PTR
instead of null.

Jacob adds support for trace events on tx timestamps.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  ice: add trace events for tx timestamps
  ice: fix return value check in ice_gnss.c
  ice: Fix inconsistent indenting in ice_switch
  gtp: Fix inconsistent indenting
====================

Link: https://lore.kernel.org/r/20220316204024.3201500-1-anthony.l.nguyen@intel.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

fad6c1f1

ethernet: sun: Fix spelling mistake "mis-matched" -> "mismatched" · 21c68644

Colin Ian King authored Mar 16, 2022

There is a spelling mistake in a dev_err message. Fix it.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Link: https://lore.kernel.org/r/20220316234620.55885-1-colin.i.king@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

21c68644

net: ethernet: ti: Fix spelling mistake and clean up message · 30fb3598

Colin Ian King authored Mar 16, 2022

There is a spelling mistake in a dev_err message and the MAX_SKB_FRAGS
value does not need to be printed between parentheses. Fix this.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Link: https://lore.kernel.org/r/20220316233455.54541-1-colin.i.king@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

30fb3598

vlan: use correct format characters · 8624a95e

Bill Wendling authored Mar 16, 2022

When compiling with -Wformat, clang emits the following warning:

net/8021q/vlanproc.c:284:22: warning: format specifies type 'unsigned
short' but the argument has type 'int' [-Wformat]
                                   mp->priority, ((mp->vlan_qos >> 13) & 0x7));
                                                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~

The types of these arguments are unconditionally defined, so this patch
updates the format character to the correct ones for ints and unsigned
ints.

Link: https://github.com/ClangBuiltLinux/linux/issues/378Signed-off-by: Bill Wendling <morbo@google.com>
Link: https://lore.kernel.org/r/20220316213125.2353370-1-morbo@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

8624a95e

net/fsl: xgmac_mdio: use correct format characters · c011072c

Bill Wendling authored Mar 16, 2022

When compiling with -Wformat, clang emits the following warning:

drivers/net/ethernet/freescale/xgmac_mdio.c:243:22: warning: format
specifies type 'unsigned char' but the argument has type 'int'
[-Wformat]
                        phy_id, dev_addr, regnum);
                                          ^~~~~~
./include/linux/dev_printk.h:163:47: note: expanded from macro 'dev_dbg'
                dev_printk(KERN_DEBUG, dev, dev_fmt(fmt), ##__VA_ARGS__); \
                                                    ~~~     ^~~~~~~~~~~
./include/linux/dev_printk.h:129:34: note: expanded from macro 'dev_printk'
                _dev_printk(level, dev, fmt, ##__VA_ARGS__);            \
                                        ~~~    ^~~~~~~~~~~

The types of these arguments are unconditionally defined, so this patch
updates the format character to the correct ones for ints and unsigned
ints.

Link: https://github.com/ClangBuiltLinux/linux/issues/378Signed-off-by: Bill Wendling <morbo@google.com>
Link: https://lore.kernel.org/r/20220316213114.2352352-1-morbo@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

c011072c

bnx2x: use correct format characters · d65aea8e

Bill Wendling authored Mar 16, 2022

When compiling with -Wformat, clang emits the following warnings:

drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c:6181:40: warning: format
specifies type 'unsigned short' but the argument has type 'u32'
(aka 'unsigned int') [-Wformat]
        ret = scnprintf(str, *len, "%hx.%hx", num >> 16, num);
                                    ~~~       ^~~~~~~~~
                                    %x
drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c:6181:51: warning: format
specifies type 'unsigned short' but the argument has type 'u32'
(aka 'unsigned int') [-Wformat]
        ret = scnprintf(str, *len, "%hx.%hx", num >> 16, num);
                                        ~~~              ^~~
                                        %x
drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c:6196:47: warning: format
specifies type 'unsigned char' but the argument has type 'u32'
(aka 'unsigned int') [-Wformat]
        ret = scnprintf(str, *len, "%hhx.%hhx.%hhx", num >> 16, num >> 8, num);
                                    ~~~~             ^~~~~~~~~
                                    %x
drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c:6196:58: warning: format
specifies type 'unsigned char' but the argument has type 'u32'
(aka 'unsigned int') [-Wformat]
        ret = scnprintf(str, *len, "%hhx.%hhx.%hhx", num >> 16, num >> 8, num);
                                         ~~~~                   ^~~~~~~~
                                         %x
drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c:6196:68: warning: format
specifies type 'unsigned char' but the argument has type 'u32'
(aka 'unsigned int') [-Wformat]
        ret = scnprintf(str, *len, "%hhx.%hhx.%hhx", num >> 16, num >> 8, num);
                                              ~~~~                        ^~~
                                              %x

The types of these arguments are unconditionally defined, so this patch
updates the format character to the correct ones for ints and unsigned
ints.

Link: https://github.com/ClangBuiltLinux/linux/issues/378Signed-off-by: Bill Wendling <morbo@google.com>
Link: https://lore.kernel.org/r/20220316213104.2351651-1-morbo@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

d65aea8e

enetc: use correct format characters · df4d35e1

Bill Wendling authored Mar 16, 2022

When compiling with -Wformat, clang emits the following warning:

drivers/net/ethernet/freescale/enetc/enetc_mdio.c:151:22: warning:
format specifies type 'unsigned char' but the argument has type 'int'
[-Wformat]
                        phy_id, dev_addr, regnum);
                                          ^~~~~~
./include/linux/dev_printk.h:163:47: note: expanded from macro 'dev_dbg'
                dev_printk(KERN_DEBUG, dev, dev_fmt(fmt), ##__VA_ARGS__); \
                                                    ~~~     ^~~~~~~~~~~
./include/linux/dev_printk.h:129:34: note: expanded from macro 'dev_printk'
                _dev_printk(level, dev, fmt, ##__VA_ARGS__);            \
                                        ~~~    ^~~~~~~~~~~

The types of these arguments are unconditionally defined, so this patch
updates the format character to the correct ones for ints and unsigned
ints.

Link: https://github.com/ClangBuiltLinux/linux/issues/378Signed-off-by: Bill Wendling <morbo@google.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Link: https://lore.kernel.org/r/20220316213109.2352015-1-morbo@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

df4d35e1

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · e243f396
Jakub Kicinski authored Mar 17, 2022
```
No conflicts.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
```
e243f396

Merge tag 'net-5.17-final' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 551acdc3

Linus Torvalds authored Mar 17, 2022

Pull networking fixes from Jakub Kicinski:
 "Including fixes from netfilter, ipsec, and wireless.

  A few last minute revert / disable and fix patches came down from our
  sub-trees. We're not waiting for any fixes at this point.

  Current release - regressions:

   - Revert "netfilter: nat: force port remap to prevent shadowing
     well-known ports", restore working conntrack on asymmetric paths

   - Revert "ath10k: drop beacon and probe response which leak from
     other channel", restore working AP and mesh mode on QCA9984

   - eth: intel: fix hang during reboot/shutdown

  Current release - new code bugs:

   - netfilter: nf_tables: disable register tracking, it needs more work
     to cover all corner cases

  Previous releases - regressions:

   - ipv6: fix skb_over_panic in __ip6_append_data when (admin-only)
     extension headers get specified

   - esp6: fix ESP over TCP/UDP, interpret ipv6_skip_exthdr's return
     value more selectively

   - bnx2x: fix driver load failure when FW not present in initrd

  Previous releases - always broken:

   - vsock: stop destroying unrelated sockets in nested virtualization

   - packet: fix slab-out-of-bounds access in packet_recvmsg()

  Misc:

   - add Paolo Abeni to networking maintainers!"

* tag 'net-5.17-final' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (26 commits)
  iavf: Fix hang during reboot/shutdown
  net: mscc: ocelot: fix backwards compatibility with single-chain tc-flower offload
  net: bcmgenet: skip invalid partial checksums
  bnx2x: fix built-in kernel driver load failure
  net: phy: mscc: Add MODULE_FIRMWARE macros
  net: dsa: Add missing of_node_put() in dsa_port_parse_of
  net: handle ARPHRD_PIMREG in dev_is_mac_header_xmit()
  Revert "ath10k: drop beacon and probe response which leak from other channel"
  hv_netvsc: Add check for kvmalloc_array
  iavf: Fix double free in iavf_reset_task
  ice: destroy flow director filter mutex after releasing VSIs
  ice: fix NULL pointer dereference in ice_update_vsi_tx_ring_stats()
  Add Paolo Abeni to networking maintainers
  atm: eni: Add check for dma_map_single
  net/packet: fix slab-out-of-bounds access in packet_recvmsg()
  net: mdio: mscc-miim: fix duplicate debugfs entry
  net: phy: marvell: Fix invalid comparison in the resume and suspend functions
  esp6: fix check on ipv6_skip_exthdr's return value
  net: dsa: microchip: add spi_device_id tables
  netfilter: nf_tables: disable register tracking
  ...

551acdc3

Merge tag 'acpi-5.17-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · c81801eb

Linus Torvalds authored Mar 17, 2022

Pull ACPI fix from Rafael Wysocki:
 "Revert recent commit that caused multiple systems to misbehave due to
  firmware issues"

* tag 'acpi-5.17-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  Revert "ACPI: scan: Do not add device IDs from _CID if _HID is not valid"

c81801eb

Merge branch 'akpm' (patches from Andrew) · 2ab99e54

Linus Torvalds authored Mar 17, 2022

Merge misc fixes from Andrew Morton:
 "Four patches.

  Subsystems affected by this patch series: mm/swap, kconfig, ocfs2, and
  selftests"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  selftests: vm: fix clang build error multiple output files
  ocfs2: fix crash when initialize filecheck kobj fails
  configs/debug: restore DEBUG_INFO=y for overriding
  mm: swap: get rid of livelock in swapin readahead

2ab99e54

selftests: vm: fix clang build error multiple output files · 1c4debc4

Yosry Ahmed authored Mar 16, 2022

When building the vm selftests using clang, some errors are seen due to
having headers in the compilation command:

clang -Wall -I ../../../../usr/include -no-pie gup_test.c ../../../../mm/gup_test.h -lrt -lpthread -o .../tools/testing/selftests/vm/gup_test
clang: error: cannot specify -o when generating multiple output files
make[1]: *** [../lib.mk:146: .../tools/testing/selftests/vm/gup_test] Error 1

Rework to add the header files to LOCAL_HDRS before including ../lib.mk,
since the dependency is evaluated in '$(OUTPUT)/%:%.c $(LOCAL_HDRS)' in
file lib.mk.

Link: https://lkml.kernel.org/r/20220304000645.1888133-1-yosryahmed@google.comSigned-off-by: Yosry Ahmed <yosryahmed@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

1c4debc4

ocfs2: fix crash when initialize filecheck kobj fails · 7b0b1332

Joseph Qi authored Mar 16, 2022

Once s_root is set, genric_shutdown_super() will be called if
fill_super() fails.  That means, we will call ocfs2_dismount_volume()
twice in such case, which can lead to kernel crash.

Fix this issue by initializing filecheck kobj before setting s_root.

Link: https://lkml.kernel.org/r/20220310081930.86305-1-joseph.qi@linux.alibaba.com
Fixes: 5f483c4a ("ocfs2: add kobject for online file check")
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

7b0b1332

configs/debug: restore DEBUG_INFO=y for overriding · 8208257d

Qian Cai authored Mar 16, 2022

Previously, I failed to realize that Kees' patch [1] has not been merged
into the mainline yet, and dropped DEBUG_INFO=y too eagerly from the
mainline. As the results, "make debug.config" won't be able to flip
DEBUG_INFO=n from the existing .config. This should close the gaps of a
few weeks before Kees' patch is there, and work regardless of their
merging status anyway.

Link: https://lore.kernel.org/all/20220125075126.891825-1-keescook@chromium.org/ [1]
Link: https://lkml.kernel.org/r/20220308153524.8618-1-quic_qiancai@quicinc.comSigned-off-by: Qian Cai <quic_qiancai@quicinc.com>
Reported-by: Daniel Thompson <daniel.thompson@linaro.org>
Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

8208257d

mm: swap: get rid of livelock in swapin readahead · 029c4628

Guo Ziliang authored Mar 16, 2022

In our testing, a livelock task was found.  Through sysrq printing, same
stack was found every time, as follows:

  __swap_duplicate+0x58/0x1a0
  swapcache_prepare+0x24/0x30
  __read_swap_cache_async+0xac/0x220
  read_swap_cache_async+0x58/0xa0
  swapin_readahead+0x24c/0x628
  do_swap_page+0x374/0x8a0
  __handle_mm_fault+0x598/0xd60
  handle_mm_fault+0x114/0x200
  do_page_fault+0x148/0x4d0
  do_translation_fault+0xb0/0xd4
  do_mem_abort+0x50/0xb0

The reason for the livelock is that swapcache_prepare() always returns
EEXIST, indicating that SWAP_HAS_CACHE has not been cleared, so that it
cannot jump out of the loop.  We suspect that the task that clears the
SWAP_HAS_CACHE flag never gets a chance to run.  We try to lower the
priority of the task stuck in a livelock so that the task that clears
the SWAP_HAS_CACHE flag will run.  The results show that the system
returns to normal after the priority is lowered.

In our testing, multiple real-time tasks are bound to the same core, and
the task in the livelock is the highest priority task of the core, so
the livelocked task cannot be preempted.

Although cond_resched() is used by __read_swap_cache_async, it is an
empty function in the preemptive system and cannot achieve the purpose
of releasing the CPU.  A high-priority task cannot release the CPU
unless preempted by a higher-priority task.  But when this task is
already the highest priority task on this core, other tasks will not be
able to be scheduled.  So we think we should replace cond_resched() with
schedule_timeout_uninterruptible(1), schedule_timeout_interruptible will
call set_current_state first to set the task state, so the task will be
removed from the running queue, so as to achieve the purpose of giving
up the CPU and prevent it from running in kernel mode for too long.

(akpm: ugly hack becomes uglier.  But it fixes the issue in a
backportable-to-stable fashion while we hopefully work on something
better)

Link: https://lkml.kernel.org/r/20220221111749.1928222-1-cgel.zte@gmail.comSigned-off-by: Guo Ziliang <guo.ziliang@zte.com.cn>
Reported-by: Zeal Robot <zealci@zte.com.cn>
Reviewed-by: Ran Xiaokai <ran.xiaokai@zte.com.cn>
Reviewed-by: Jiang Xuexin <jiang.xuexin@zte.com.cn>
Reviewed-by: Yang Yang <yang.yang29@zte.com.cn>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Roger Quadros <rogerq@kernel.org>
Cc: Ziliang Guo <guo.ziliang@zte.com.cn>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

029c4628

iavf: Fix hang during reboot/shutdown · b04683ff

Ivan Vecera authored Mar 17, 2022

Recent commit 97457801 ("iavf: Add waiting so the port is
initialized in remove") adds a wait-loop at the beginning of
iavf_remove() to ensure that port initialization is finished
prior unregistering net device. This causes a regression
in reboot/shutdown scenario because in this case callback
iavf_shutdown() is called and this callback detaches the device,
makes it down if it is running and sets its state to __IAVF_REMOVE.
Later shutdown callback of associated PF driver (e.g. ice_shutdown)
is called. That callback calls among other things sriov_disable()
that calls indirectly iavf_remove() (see stack trace below).
As the adapter state is already __IAVF_REMOVE then the mentioned
loop is end-less and shutdown process hangs.

The patch fixes this by checking adapter's state at the beginning
of iavf_remove() and skips the rest of the function if the adapter
is already in remove state (shutdown is in progress).

Reproducer:
1. Create VF on PF driven by ice or i40e driver
2. Ensure that the VF is bound to iavf driver
3. Reboot

[52625.981294] sysrq: SysRq : Show Blocked State
[52625.988377] task:reboot          state:D stack:    0 pid:17359 ppid:     1 f2
[52625.996732] Call Trace:
[52625.999187]  __schedule+0x2d1/0x830
[52626.007400]  schedule+0x35/0xa0
[52626.010545]  schedule_hrtimeout_range_clock+0x83/0x100
[52626.020046]  usleep_range+0x5b/0x80
[52626.023540]  iavf_remove+0x63/0x5b0 [iavf]
[52626.027645]  pci_device_remove+0x3b/0xc0
[52626.031572]  device_release_driver_internal+0x103/0x1f0
[52626.036805]  pci_stop_bus_device+0x72/0xa0
[52626.040904]  pci_stop_and_remove_bus_device+0xe/0x20
[52626.045870]  pci_iov_remove_virtfn+0xba/0x120
[52626.050232]  sriov_disable+0x2f/0xe0
[52626.053813]  ice_free_vfs+0x7c/0x340 [ice]
[52626.057946]  ice_remove+0x220/0x240 [ice]
[52626.061967]  ice_shutdown+0x16/0x50 [ice]
[52626.065987]  pci_device_shutdown+0x34/0x60
[52626.070086]  device_shutdown+0x165/0x1c5
[52626.074011]  kernel_restart+0xe/0x30
[52626.077593]  __do_sys_reboot+0x1d2/0x210
[52626.093815]  do_syscall_64+0x5b/0x1a0
[52626.097483]  entry_SYSCALL_64_after_hwframe+0x65/0xca

Fixes: 97457801 ("iavf: Add waiting so the port is initialized in remove")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Link: https://lore.kernel.org/r/20220317104524.2802848-1-ivecera@redhat.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

b04683ff

net: mscc: ocelot: fix backwards compatibility with single-chain tc-flower offload · 8e0341ae

Vladimir Oltean authored Mar 16, 2022

ACL rules can be offloaded to VCAP IS2 either through chain 0, or, since
the blamed commit, through a chain index whose number encodes a specific
PAG (Policy Action Group) and lookup number.

The chain number is translated through ocelot_chain_to_pag() into a PAG,
and through ocelot_chain_to_lookup() into a lookup number.

The problem with the blamed commit is that the above 2 functions don't
have special treatment for chain 0. So ocelot_chain_to_pag(0) returns
filter->pag = 224, which is in fact -32, but the "pag" field is an u8.

So we end up programming the hardware with VCAP IS2 entries having a PAG
of 224. But the way in which the PAG works is that it defines a subset
of VCAP IS2 filters which should match on a packet. The default PAG is
0, and previous VCAP IS1 rules (which we offload using 'goto') can
modify it. So basically, we are installing filters with a PAG on which
no packet will ever match. This is the hardware equivalent of adding
filters to a chain which has no 'goto' to it.

Restore the previous functionality by making ACL filters offloaded to
chain 0 go to PAG 0 and lookup number 0. The choice of PAG is clearly
correct, but the choice of lookup number isn't "as before" (which was to
leave the lookup a "don't care"). However, lookup 0 should be fine,
since even though there are ACL actions (policers) which have a
requirement to be used in a specific lookup, that lookup is 0.

Fixes: 226e9cd8 ("net: mscc: ocelot: only install TCAM entries into a specific lookup and PAG")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220316192117.2568261-1-vladimir.oltean@nxp.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

8e0341ae

net: bcmgenet: skip invalid partial checksums · 0f643c88

Doug Berger authored Mar 16, 2022

The RXCHK block will return a partial checksum of 0 if it encounters
a problem while receiving a packet. Since a 1's complement sum can
only produce this result if no bits are set in the received data
stream it is fair to treat it as an invalid partial checksum and
not pass it up the stack.

Fixes: 81015539 ("net: bcmgenet: use CHECKSUM_COMPLETE for NETIF_F_RXCSUM")
Signed-off-by: Doug Berger <opendmb@gmail.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20220317012812.1313196-1-opendmb@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

0f643c88

bnx2x: fix built-in kernel driver load failure · 424e7834

Manish Chopra authored Mar 16, 2022

Commit b7a49f73 ("bnx2x: Utilize firmware 7.13.21.0")
added request_firmware() logic in probe() which caused
load failure when firmware file is not present in initrd (below),
as access to firmware file is not feasible during probe.

Direct firmware load for bnx2x/bnx2x-e2-7.13.15.0.fw failed with error -2
Direct firmware load for bnx2x/bnx2x-e2-7.13.21.0.fw failed with error -2

This patch fixes this issue by -

1. Removing request_firmware() logic from the probe()
such that .ndo_open() handle it as it used to handle
it earlier

2. Given request_firmware() is removed from probe(), so
driver has to relax FW version comparisons a bit against
the already loaded FW version (by some other PFs of same
adapter) to allow different compatible/close enough FWs with which
multiple PFs may run with (in different environments), as the
given PF who is in probe flow has no idea now with which firmware
file version it is going to initialize the device in ndo_open()

Link: https://lore.kernel.org/all/46f2d9d9-ae7f-b332-ddeb-b59802be2bab@molgen.mpg.de/Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Paul Menzel <pmenzel@molgen.mpg.de>
Fixes: b7a49f73 ("bnx2x: Utilize firmware 7.13.21.0")
Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Link: https://lore.kernel.org/r/20220316214613.6884-1-manishc@marvell.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

424e7834

net: phy: mscc: Add MODULE_FIRMWARE macros · f1858c27

Juerg Haefliger authored Mar 16, 2022

The driver requires firmware so define MODULE_FIRMWARE so that modinfo
provides the details.

Fixes: fa164e40 ("net: phy: mscc: split the driver into separate files")
Signed-off-by: Juerg Haefliger <juergh@canonical.com>
Link: https://lore.kernel.org/r/20220316151835.88765-1-juergh@canonical.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

f1858c27

selftests: net: fix array_size.cocci warning · 1abea24a

Guo Zhengkui authored Mar 16, 2022

Fix array_size.cocci warning in tools/testing/selftests/net.

Use `ARRAY_SIZE(arr)` instead of forms like `sizeof(arr)/sizeof(arr[0])`.

It has been tested with gcc (Debian 8.3.0-6) 8.3.0.
Signed-off-by: Guo Zhengkui <guozhengkui@vivo.com>
Link: https://lore.kernel.org/r/20220316092858.9398-1-guozhengkui@vivo.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>

1abea24a

net: stmmac: clean up impossible condition · 58e06d05

Dan Carpenter authored Mar 16, 2022

This code works but it has a static checker warning:

drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:1687 init_dma_rx_desc_rings()
warn: always true condition '(queue >= 0) => (0-u32max >= 0)'

Obviously, it makes no sense to check if an unsigned int is >= 0. What
prevents this code from being a forever loop is that later there is a
separate check for if (queue == 0).

The "queue" variable is less than MTL_MAX_RX_QUEUES (8) so it can easily
fit in an int type. Any larger value for "queue" would lead to an array
overflow when we assign "rx_q = &priv->rx_queue[queue]".

Fixes: de0b90e5 ("net: stmmac: rearrange RX and TX desc init into per-queue basis")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/20220316083744.GB30941@kiliSigned-off-by: Paolo Abeni <pabeni@redhat.com>

58e06d05

net: dsa: Add missing of_node_put() in dsa_port_parse_of · cb0b430b

Miaoqian Lin authored Mar 16, 2022

The device_node pointer is returned by of_parse_phandle() with refcount
incremented. We should use of_node_put() on it when done.

Fixes: 6d4e5c57 ("net: dsa: get port type at parse time")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Link: https://lore.kernel.org/r/20220316082602.10785-1-linmq006@gmail.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>

cb0b430b