Commits · e917a789594cab0621b19ab33cae298f2bce2ea3 · Kirill Smelkov / linux

30 Nov, 2023 22 commits

mlxsw: spectrum_fid: Add an op to get PGT address of a FID · e917a789

Petr Machata authored Nov 28, 2023

In the CFF flood mode, the way to determine a PGT address where a given FID
/ flood table resides is different from the controlled flood mode, which
mlxsw currently uses. Furthermore, this will differ between rFID family and
bridge families. The operation therefore needs to be dynamically
dispatched. To that end, add an op to FID-family ops.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Link: https://lore.kernel.org/r/00e8f6ad79009a9a77a5c95d596ea9574776dc95.1701183892.git.petrm@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

e917a789

mlxsw: spectrum_fid: Add an op to get PGT allocation size · 1686b8d9

Petr Machata authored Nov 28, 2023

In the CFF flood mode, the PGT allocation size of RFID family will not
depend on number of FIDs, but rather number of ports and LAGs. Therefore
introduce a FID family operation to calculate the PGT allocation size.

The way that size is calculated in the CFF mode depends on calling fallible
functions. Thus express the op as returning an int, with the size returned
via a pointer argument.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/1174651b7160fcedbef50010ae4b68201112fe6f.1701183892.git.petrm@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

1686b8d9

mlxsw: spectrum_fid: Add an op for flood table initialization · 80638da2

Petr Machata authored Nov 28, 2023

In controlled flood mode, for each bridge FID family (i.e., 802.1Q and
802.1D) and packet type (i.e., UUC/MC/BC), the hardware needs to be told
which PGT address to use as the base address for the flood table and how
to determine the offset from the base for each FID.

The above is not needed in CFF mode where each FID has its own flood
table instead of the FID family itself.

Therefore, create a new FID family operation for the above configuration
and only implement it for the 802.1Q and 802.1D families in controlled
flood mode.

No functional changes intended.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/06f71415eec75811585ec597e1dd101b6dff77e7.1701183892.git.petrm@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

80638da2

mlxsw: spectrum_fid: Move mlxsw_sp_fid_flood_table_init() up · 1d079116

Petr Machata authored Nov 28, 2023

Move the function to the point where it will need to be to be visible for
the 802.1d ops.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/aef09e26b0c2dd077531e665d7135b300bdaf0a8.1701183892.git.petrm@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

1d079116

mlxsw: spectrum_fid: Make mlxsw_sp_fid_ops.setup return an int · 17eda112

Petr Machata authored Nov 28, 2023

This operation will be fallible for rFIDs in CFF mode, which will be
introduced in follow-up patches. Have it return an int, and handle
the failures in the caller.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/75f1b85c0cb86bea5501fcc8657042f221a78b32.1701183892.git.petrm@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

17eda112

mlxsw: spectrum_fid: Split a helper out of mlxsw_sp_fid_flood_table_mid() · 82ff7a19

Petr Machata authored Nov 28, 2023

In future patches, for CFF flood mode support, we will need a way to
determine a PGT base dynamically, as an op. Therefore, for symmetry,
split out a helper, mlxsw_sp_fid_pgt_base_ctl(), that determines a PGT base
in the controlled mode as well.

Now that the helper is available, use it in mlxsw_sp_fid_flood_table_init()
which currently invokes the FID->MID helper to that end.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/fd41c66a1df4df6499d3da34f40e7b9efa15bc3e.1701183892.git.petrm@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

82ff7a19

mlxsw: spectrum_fid: Rename FID ops, families, arrays · ab68bd74

Petr Machata authored Nov 28, 2023

Currently, mlxsw always uses a "controlled" flood mode on all Nvidia
Spectrum generations. The following patches will however introduce a
possibility to run a "CFF" (for Compressed FID Flooding) mode on newer
machines, if the FW supports it.

To reflect that, label all FID ops, FID families and FID family arrays with
a _ctl suffix. This will make it clearer what is what when the CFF families
are introduced in later patches.

Keep the dummy family intact. Since the dummy family has no flood tables
in either CTL or CFF mode, there are no flood-mode-specific callbacks.

Additionally, add a remark at two fields that they are only relevant when
flood mode is not CFF.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/96b6da5439bb662fa86e795bbcec9dc3ccfa59fd.1701183892.git.petrm@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

ab68bd74

mlxsw: spectrum_fid: Privatize FID families · 01de00f4

Petr Machata authored Nov 28, 2023

Several operations will differ between how they need to be done in
controlled mode vs. CFF mode. Thus the per-FID-family ops will differ
between controlled and CFF, thus the FID family array as such will
differ depending on whether the mode negotiated with FW is controlled
or CFF.

The simple approach of having several globally visible arrays for
spectrum.c to statically choose from no longer works. Instead privatize all
FID initialization and finalization logic, and expose it as ops instead.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/d3fa390d97cf3dbd2f7a28741be69b311e2059e4.1701183891.git.petrm@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

01de00f4

net: phy: aquantia: drop wrong endianness conversion for addr and CRC · 7edce370

Christian Marangi authored Nov 28, 2023

On further testing on BE target with kernel test robot, it was notice
that the endianness conversion for addr and CRC in fw_load_memory was
wrong.

Drop the cpu_to_le32 conversion for addr load as it's not needed.

Use get_unaligned_le32 instead of get_unaligned for FW data word load to
correctly convert data in the correct order to follow system endian.

Also drop the cpu_to_be32 for CRC calculation as it's wrong and would
cause different CRC on BE system.
The loaded word is swapped internally and MAILBOX calculates the CRC on
the swapped word. To correctly calculate the CRC to be later matched
with the one from MAILBOX, use an u8 struct and swap the word there to
keep the same order on both LE and BE for crc_ccitt_false function.
Also add additional comments on how the CRC verification for the loaded
section works.

CRC is calculated as we load the section and verified with the MAILBOX
only after the entire section is loaded to skip additional slowdown by
loop the section data again.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202311210414.sEJZjlcD-lkp@intel.com/
Fixes: e93984eb ("net: phy: aquantia: add firmware load support")
Tested-by: Robert Marko <robimarko@gmail.com> # ipq8072 LE device
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Link: https://lore.kernel.org/r/20231128135928.9841-1-ansuelsmth@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

7edce370

net: phy: adin: allow control of Fast Link Down · cb2f01b8

Vincent Whitchurch authored Nov 27, 2023

Add support to allow Fast Link Down (aka "Enhanced link detection") to
be controlled via the ETHTOOL_PHY_FAST_LINK_DOWN tunable. These PHYs
have this feature enabled by default.
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Acked-by: Nuno Sa <nuno.sa@analog.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20231127-adin-fld-v1-1-797f6423fd48@axis.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

cb2f01b8

r8169: improve handling task scheduling · 127532cd

Heiner Kallweit authored Nov 27, 2023

If we know that the task is going to be a no-op, don't even schedule it.
And remove the check for netif_running() in the worker function, the
check for flag RTL_FLAG_TASK_ENABLED is sufficient. Note that we can't
remove the check for flag RTL_FLAG_TASK_ENABLED in the worker function
because we have no guarantee when it will be executed.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://lore.kernel.org/r/c65873a3-7394-4107-99a7-83f20030779c@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

127532cd

Merge branch 'create-a-binding-for-the-marvell-mv88e6xxx-dsa-switches' · ee754639

Jakub Kicinski authored Nov 29, 2023

Linus Walleij says:

====================
Create a binding for the Marvell MV88E6xxx DSA switches

The Marvell switches are lacking DT bindings.

I need proper schema checking to add LED support to the
Marvell switch. Just how it is, it can't go on like this.

Some Device Tree fixes are included in the series, these
remove the major and most annoying warnings fallout noise:
some warnings remain, and these are of more serious nature,
such as missing phy-mode. They can be applied individually,
or to the networking tree with the rest of the patches.

Thanks to Andrew Lunn, Vladimir Oltean and Russell King
for excellent review and feedback!

This latest version employs special compatibles in the
odd ABI device trees.

v8: https://lore.kernel.org/r/20231114-marvell-88e6152-wan-led-v8-0-50688741691b@linaro.org
v7: https://lore.kernel.org/r/20231024-marvell-88e6152-wan-led-v7-0-2869347697d1@linaro.org
v6: https://lore.kernel.org/r/20231024-marvell-88e6152-wan-led-v6-0-993ab0949344@linaro.org
v5: https://lore.kernel.org/r/20231023-marvell-88e6152-wan-led-v5-0-0e82952015a7@linaro.org
v4: https://lore.kernel.org/r/20231018-marvell-88e6152-wan-led-v4-0-3ee0c67383be@linaro.org
v3: https://lore.kernel.org/r/20231016-marvell-88e6152-wan-led-v3-0-38cd449dfb15@linaro.org
v2: https://lore.kernel.org/r/20231014-marvell-88e6152-wan-led-v2-0-7fca08b68849@linaro.org
v1: https://lore.kernel.org/r/20231013-marvell-88e6152-wan-led-v1-0-0712ba99857c@linaro.org
====================

Link: https://lore.kernel.org/r/20231127-marvell-88e6152-wan-led-v9-0-272934e04681@linaro.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

ee754639

dt-bindings: marvell: Add Marvell MV88E6060 DSA schema · 017ca9c9

Linus Walleij authored Nov 27, 2023

The Marvell MV88E6060 is one of the oldest DSA switches from
Marvell, and it has DT bindings used in the wild. Let's define
them properly.

It is different enough from the rest of the MV88E6xxx switches
that it deserves its own binding.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20231127-marvell-88e6152-wan-led-v9-5-272934e04681@linaro.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

017ca9c9

dt-bindings: marvell: Rewrite MV88E6xxx in schema · 43915b2f

Linus Walleij authored Nov 27, 2023

This is an attempt to rewrite the Marvell MV88E6xxx switch bindings
in YAML schema.

The current text binding says:
  WARNING: This binding is currently unstable. Do not program it into a
  FLASH never to be changed again. Once this binding is stable, this
  warning will be removed.

Well that never happened before we switched to YAML markup,
we can't have it like this, what about fixing the mess?
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20231127-marvell-88e6152-wan-led-v9-4-272934e04681@linaro.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

43915b2f

dt-bindings: net: ethernet-switch: Accept special variants · f45c1974

Linus Walleij authored Nov 27, 2023

Accept special node naming variants for Marvell switches with
special node names as ABI.

This is maybe not the prettiest but it avoids special-casing
the Marvell MV88E6xxx bindings by copying a lot of generic
binding code down into that one binding just to special-case
these unfixable nodes.
Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20231127-marvell-88e6152-wan-led-v9-3-272934e04681@linaro.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

f45c1974

dt-bindings: net: mvusb: Fix up DSA example · a6e44f30

Linus Walleij authored Nov 27, 2023

When adding a proper schema for the Marvell mx88e6xxx switch,
the scripts start complaining about this embedded example:

  dtschema/dtc warnings/errors:
  net/marvell,mvusb.example.dtb: switch@0: ports: '#address-cells'
  is a required property
  from schema $id: http://devicetree.org/schemas/net/dsa/marvell,mv88e6xxx.yaml#
  net/marvell,mvusb.example.dtb: switch@0: ports: '#size-cells'
  is a required property
  from schema $id: http://devicetree.org/schemas/net/dsa/marvell,mv88e6xxx.yaml#

Fix this up by extending the example with those properties in
the ports node.

While we are at it, rename "ports" to "ethernet-ports" and rename
"switch" to "ethernet-switch" as this is recommended practice.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20231127-marvell-88e6152-wan-led-v9-2-272934e04681@linaro.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

a6e44f30

dt-bindings: net: dsa: Require ports or ethernet-ports · fbb7033b

Linus Walleij authored Nov 27, 2023

Bindings using dsa.yaml#/$defs/ethernet-ports specify that
a DSA switch node need to have a ports or ethernet-ports
subnode, and that is actually required, so add requirements
using oneOf.
Suggested-by: Rob Herring <robh@kernel.org>
Acked-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20231127-marvell-88e6152-wan-led-v9-1-272934e04681@linaro.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

fbb7033b

Merge branch 'tools-ynl-fixes-for-the-page-pool-sample-and-the-generation-process' · 6afb936f

Jakub Kicinski authored Nov 29, 2023

Jakub Kicinski says:

====================
tools: ynl: fixes for the page-pool sample and the generation process

Minor fixes to the new sample and the Makefiles.
====================

Link: https://lore.kernel.org/r/20231129193622.2912353-1-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

6afb936f

tools: ynl: don't skip regeneration from make targets · a115b927

Jakub Kicinski authored Nov 29, 2023

Commit 2b7ac0c8 ("tools: ynl-gen: don't touch the output file if
content is the same") is working too well. It was added so that
ynl-regen -f doesn't make us rebuild half of the kernel, if there
are no actual changes in any generated code.

When ynl-gen-c is called by make, however, we're better off trusting
make's tracking and overwrite the file. Otherwise if output is identical
we won't update file timestamps and make will retry code gen on every
invocation.

Link: https://lore.kernel.org/r/20231129193622.2912353-5-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

a115b927

tools: ynl: order building samples after generated code · 9cf9b570

Jakub Kicinski authored Nov 29, 2023

Parallel builds of ynl:

  make -C tools/net/ynl/ -j 4

don't work correctly right now. samples get handled before
generated, so build of samples does not notice that protos.a
has changed. Order samples to be last.

Link: https://lore.kernel.org/r/20231129193622.2912353-4-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

9cf9b570

tools: ynl: make sure we use local headers for page-pool · 92900372

Jakub Kicinski authored Nov 29, 2023

Building samples generates the following warning:

  In file included from page-pool.c:11:
  generated/netdev-user.h:21:45: warning: ‘enum netdev_xdp_rx_metadata’ declared inside parameter list will not be visible outside of this definition or declaration
   21 | const char *netdev_xdp_rx_metadata_str(enum netdev_xdp_rx_metadata value);
      |                                             ^~~~~~~~~~~~~~~~~~~~~~

Our magic way of including uAPI headers assumes the sample
name matches the family name. We need to copy the flags over.

Fixes: 637567e4 ("tools: ynl: add sample for getting page-pool information")
Link: https://lore.kernel.org/r/20231129193622.2912353-3-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

92900372

tools: ynl: fix build of the page-pool sample · ee1eb9de

Jakub Kicinski authored Nov 29, 2023

The name of the "destroyed" field in the reply was not changed
in the sample after we started calling it "detach_time".

page-pool.c: In function ‘main’:
page-pool.c:84:33: error: ‘struct <anonymous>’ has no member named ‘destroyed’
   84 |                 if (pp->_present.destroyed)
      |                                 ^

Fixes: 637567e4 ("tools: ynl: add sample for getting page-pool information")
Link: https://lore.kernel.org/r/20231129193622.2912353-2-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

ee1eb9de

29 Nov, 2023 14 commits

Merge branch 'fine-tune-flow-control-and-speed-configurations-in-microchip-ksz8xxx-dsa-driver' · 987b71f8

Jakub Kicinski authored Nov 29, 2023

Oleksij Rempel says:

====================
Fine-Tune Flow Control and Speed Configurations in Microchip KSZ8xxx DSA Driver

This patch set focuses on enhancing the configurability of flow
control, speed, and duplex settings in the Microchip KSZ8xxx DSA driver.

The first patch allows more granular control over the CPU port's flow
control, speed, and duplex settings. The second patch introduces a
method for port configurations for port with integrated PHYs, primarily
concerning flow control based on duplex mode.
====================

Link: https://lore.kernel.org/r/20231127145101.3039399-1-o.rempel@pengutronix.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>

987b71f8

net: dsa: microchip: make phylink_mac_link_up() not optional · 71cd5ce7

Oleksij Rempel authored Nov 27, 2023

Last part of the driver do now support phylink_mac_link_up(). So, make it
not optional.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/20231127145101.3039399-4-o.rempel@pengutronix.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>

71cd5ce7

net: dsa: microchip: ksz8: Add function to configure ports with integrated PHYs · 2f58148c

Oleksij Rempel authored Nov 27, 2023

This patch introduces the function 'ksz8_phy_port_link_up' to the
Microchip KSZ8xxx driver. This function is responsible for setting up
flow control and duplex settings for the ports that are integrated with
PHYs.

The KSZ8795 switch supports asymmetric pause control, which can't be
fully utilized since a single bit controls both RX and TX pause. Despite
this, the flow control can be adjusted based on the auto-negotiation
process, taking into account the capabilities of both link partners.

On the other hand, the KSZ8873's PORT_FORCE_FLOW_CTRL bit can be set by
the hardware bootstrap, which ignores the auto-negotiation result.
Therefore, even in auto-negotiation mode, we need to ensure that this
bit is correctly set.

When auto-negotiation isn't in use, we enforce symmetric pause control
for the KSZ8795 switch.

Please note, forcing flow control disable on a port while still
advertising pause support isn't possible. While this scenario
might not be practical or desired, it's important to be aware of this
limitation when working with the KSZ8873 and similar devices.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/20231127145101.3039399-3-o.rempel@pengutronix.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>

2f58148c

net: dsa: microchip: ksz8: Make flow control, speed, and duplex on CPU port configurable · 87f062ed

Oleksij Rempel authored Nov 27, 2023

Allow flow control, speed, and duplex settings on the CPU port to be
configurable. Previously, the speed and duplex relied on default switch
values, which limited flexibility. Additionally, flow control was
hardcoded and only functional in duplex mode. This update enhances the
configurability of these parameters.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/20231127145101.3039399-2-o.rempel@pengutronix.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>

87f062ed

Merge branch 'gve-add-support-for-non-4k-page-sizes' · bed7b22e

Jakub Kicinski authored Nov 29, 2023

John Fraker says:

====================
gve: Add support for non-4k page sizes.

This patch series adds support for non-4k page sizes to the driver. Prior
to this patch series, the driver assumes a 4k page size in many small
ways, and will crash in a kernel compiled for a different page size.

This changeset aims to be a minimal changeset that unblocks certain arm
platforms with large page sizes.
====================

Link: https://lore.kernel.org/r/20231128002648.320892-1-jfraker@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

bed7b22e

gve: Remove dependency on 4k page size. · da7d4b42

John Fraker authored Nov 27, 2023

Prior to this change, gve crashes when attempting to run in kernels with
page sizes other than 4k. This change removes unnecessary references to
PAGE_SIZE and replaces them with more meaningful constants.
Signed-off-by: Jordan Kimbrough <jrkim@google.com>
Signed-off-by: John Fraker <jfraker@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20231128002648.320892-6-jfraker@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

da7d4b42

gve: Add page size register to the register_page_list command. · 513072fb

John Fraker authored Nov 27, 2023

This register is required on platforms with page sizes greater than 4k.
This is because the tx side of the driver vmaps the entire queue page
list of pages into a single flat address space, then uses the entire
space. Without communicating the guest page size to the backend, the
backend will only access the first 4k of each page in the queue page list.
Signed-off-by: Jordan Kimbrough <jrkim@google.com>
Signed-off-by: John Fraker <jfraker@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20231128002648.320892-5-jfraker@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

513072fb

gve: Remove obsolete checks that rely on page size. · ce260cb1

John Fraker authored Nov 27, 2023

These checks are safe to remove as they are no longer enforced by the
backend. Retaining them would require updating them to work differently
with page sizes larger than 4k.
Signed-off-by: Jordan Kimbrough <jrkim@google.com>
Signed-off-by: John Fraker <jfraker@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20231128002648.320892-4-jfraker@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

ce260cb1

gve: Deprecate adminq_pfn for pci revision 0x1. · 8ae980d2

John Fraker authored Nov 27, 2023

adminq_pfn assumes a page size of 4k, causing this mechanism to break in
kernels compiled with different page sizes. A new PCI device revision was
needed for the device to be able to communicate with the driver how to
set up the admin queue prior to having access to the admin queue.
Signed-off-by: Jordan Kimbrough <jrkim@google.com>
Signed-off-by: John Fraker <jfraker@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20231128002648.320892-3-jfraker@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

8ae980d2

gve: Perform adminq allocations through a dma_pool. · 955f4d3b

John Fraker authored Nov 27, 2023

This allows the adminq to be smaller than a page, paving the way for
non 4k page support. This is to support platforms where PAGE_SIZE
is not 4k, such as some ARM platforms.
Signed-off-by: Jordan Kimbrough <jrkim@google.com>
Signed-off-by: John Fraker <jfraker@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20231128002648.320892-2-jfraker@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

955f4d3b

Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue · f1be1e04

Jakub Kicinski authored Nov 28, 2023

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2023-11-27 (i40e, iavf)

This series contains updates to i40e and iavf drivers.

Ivan Vecera performs more cleanups on i40e and iavf drivers; removing
unused fields, defines, and unneeded fields.

Petr Oros utilizes iavf_schedule_aq_request() helper to replace open
coded equivalents.

* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  iavf: use iavf_schedule_aq_request() helper
  iavf: Remove queue tracking fields from iavf_adminq_ring
  i40e: Remove queue tracking fields from i40e_adminq_ring
  i40e: Remove AQ register definitions for VF types
  i40e: Delete unused and useless i40e_pf fields
====================

Link: https://lore.kernel.org/r/20231127211037.1135403-1-anthony.l.nguyen@intel.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

f1be1e04

r8169: remove multicast filter limit · cd04b44b

Heiner Kallweit authored Nov 27, 2023

Once upon a time, when r8169 was new, the multicast filter limit code
was copied from RTL8139 driver. There the filter limit is even
user-configurable.
The filtering is hash-based and we don't have perfect filtering.
Actually the mc filtering on RTL8125 still seems to be the same
as used on 8390/NE2000. So it's not clear to me which benefit it
should bring when switching to all-multi mode once a certain number
of filter bits is set. More the opposite: Filtering out at least
some unwanted mc traffic is better than no filtering.
Also the available chip documentation doesn't mention any restriction.
Therefore remove the filter limit.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/57076c05-3730-40d1-ab9a-5334b263e41a@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

cd04b44b

ice: fix error code in ice_eswitch_attach() · 1bc9d12e

Dan Carpenter authored Nov 27, 2023

Set the "err" variable on this error path.

Fixes: fff292b4 ("ice: add VF representors one by one")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Link: https://lore.kernel.org/r/e0349ee5-76e6-4ff4-812f-4aa0d3f76ae7@moroto.mountainSigned-off-by: Jakub Kicinski <kuba@kernel.org>

1bc9d12e

nfp: ethtool: support TX/RX pause frame on/off · 4540c29a

Yu Xiao authored Nov 27, 2023

Add support for ethtool -A tx on/off and rx on/off.
Signed-off-by: Yu Xiao <yu.xiao@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20231127055116.6668-1-louis.peens@corigine.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

4540c29a

28 Nov, 2023 4 commits

Merge branch 'net-page_pool-add-netlink-based-introspection' · a3799729

Paolo Abeni authored Nov 28, 2023

Jakub Kicinski says:

====================
net: page_pool: add netlink-based introspection

We recently started to deploy newer kernels / drivers at Meta,
making significant use of page pools for the first time.
We immediately run into page pool leaks both real and false positive
warnings. As Eric pointed out/predicted there's no guarantee that
applications will read / close their sockets so a page pool page
may be stuck in a socket (but not leaked) forever. This happens
a lot in our fleet. Most of these are obviously due to application
bugs but we should not be printing kernel warnings due to minor
application resource leaks.

Conversely the page pool memory may get leaked at runtime, and
we have no way to detect / track that, unless someone reconfigures
the NIC and destroys the page pools which leaked the pages.

The solution presented here is to expose the memory use of page
pools via netlink. This allows for continuous monitoring of memory
used by page pools, regardless if they were destroyed or not.
Sample in patch 15 can print the memory use and recycling
efficiency:

$ ./page-pool
    eth0[2]	page pools: 10 (zombies: 0)
		refs: 41984 bytes: 171966464 (refs: 0 bytes: 0)
		recycling: 90.3% (alloc: 656:397681 recycle: 89652:270201)

v4:
 - use dev_net(netdev)->loopback_dev
 - extend inflight doc
v3: https://lore.kernel.org/all/20231122034420.1158898-1-kuba@kernel.org/
 - ID is still here, can't decide if it matters
 - rename destroyed -> detach-time, good enough?
 - fix build for netsec
v2: https://lore.kernel.org/r/20231121000048.789613-1-kuba@kernel.org
 - hopefully fix build with PAGE_POOL=n
v1: https://lore.kernel.org/all/20231024160220.3973311-1-kuba@kernel.org/
 - The main change compared to the RFC is that the API now exposes
   outstanding references and byte counts even for "live" page pools.
   The warning is no longer printed if page pool is accessible via netlink.
RFC: https://lore.kernel.org/all/20230816234303.3786178-1-kuba@kernel.org/
====================

Link: https://lore.kernel.org/r/20231126230740.2148636-1-kuba@kernel.orgSigned-off-by: Paolo Abeni <pabeni@redhat.com>

a3799729

tools: ynl: add sample for getting page-pool information · 637567e4

Jakub Kicinski authored Nov 26, 2023

Regenerate the tools/ code after netdev spec changes.

Add sample to query page-pool info in a concise fashion:

$ ./page-pool
    eth0[2]	page pools: 10 (zombies: 0)
		refs: 41984 bytes: 171966464 (refs: 0 bytes: 0)
		recycling: 90.3% (alloc: 656:397681 recycle: 89652:270201)
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

637567e4

net: page_pool: mute the periodic warning for visible page pools · be009667

Jakub Kicinski authored Nov 26, 2023

Mute the periodic "stalled pool shutdown" warning if the page pool
is visible to user space. Rolling out a driver using page pools
to just a few hundred hosts at Meta surfaces applications which
fail to reap their broken sockets. Obviously it's best if the
applications are fixed, but we don't generally print warnings
for application resource leaks. Admins can now depend on the
netlink interface for getting page pool info to detect buggy
apps.

While at it throw in the ID of the pool into the message,
in rare cases (pools from destroyed netns) this will make
finding the pool with a debugger easier.
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

be009667

net: page_pool: expose page pool stats via netlink · d49010ad

Jakub Kicinski authored Nov 26, 2023

Dump the stats into netlink. More clever approaches
like dumping the stats per-CPU for each CPU individually
to see where the packets get consumed can be implemented
in the future.

A trimmed example from a real (but recently booted system):

$ ./cli.py --no-schema --spec netlink/specs/netdev.yaml \
           --dump page-pool-stats-get
[{'info': {'id': 19, 'ifindex': 2},
  'alloc-empty': 48,
  'alloc-fast': 3024,
  'alloc-refill': 0,
  'alloc-slow': 48,
  'alloc-slow-high-order': 0,
  'alloc-waive': 0,
  'recycle-cache-full': 0,
  'recycle-cached': 0,
  'recycle-released-refcnt': 0,
  'recycle-ring': 0,
  'recycle-ring-full': 0},
 {'info': {'id': 18, 'ifindex': 2},
  'alloc-empty': 66,
  'alloc-fast': 11811,
  'alloc-refill': 35,
  'alloc-slow': 66,
  'alloc-slow-high-order': 0,
  'alloc-waive': 0,
  'recycle-cache-full': 1145,
  'recycle-cached': 6541,
  'recycle-released-refcnt': 0,
  'recycle-ring': 1275,
  'recycle-ring-full': 0},
 {'info': {'id': 17, 'ifindex': 2},
  'alloc-empty': 73,
  'alloc-fast': 62099,
  'alloc-refill': 413,
...
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

d49010ad