Commits · 2b30f8291a30305eeedcd787b4d0e352410f7268 · Kirill Smelkov / linux

23 Jan, 2023 2 commits

net: ethtool: add support for MAC Merge layer · 2b30f829

Vladimir Oltean authored Jan 19, 2023

The MAC merge sublayer (IEEE 802.3-2018 clause 99) is one of 2
specifications (the other being Frame Preemption; IEEE 802.1Q-2018
clause 6.7.2), which work together to minimize latency caused by frame
interference at TX. The overall goal of TSN is for normal traffic and
traffic with a bounded deadline to be able to cohabitate on the same L2
network and not bother each other too much.

The standards achieve this (partly) by introducing the concept of
preemptible traffic, i.e. Ethernet frames that have a custom value for
the Start-of-Frame-Delimiter (SFD), and these frames can be fragmented
and reassembled at L2 on a link-local basis. The non-preemptible frames
are called express traffic, they are transmitted using a normal SFD, and
they can preempt preemptible frames, therefore having lower latency,
which can matter at lower (100 Mbps) link speeds, or at high MTUs (jumbo
frames around 9K). Preemption is not recursive, i.e. a P frame cannot
preempt another P frame. Preemption also does not depend upon priority,
or otherwise said, an E frame with prio 0 will still preempt a P frame
with prio 7.

In terms of implementation, the standards talk about the presence of an
express MAC (eMAC) which handles express traffic, and a preemptible MAC
(pMAC) which handles preemptible traffic, and these MACs are multiplexed
on the same MII by a MAC merge layer.

To support frame preemption, the definition of the SFD was generalized
to SMD (Start-of-mPacket-Delimiter), where an mPacket is essentially an
Ethernet frame fragment, or a complete frame. Stations unaware of an SMD
value different from the standard SFD will treat P frames as error
frames. To prevent that from happening, a negotiation process is
defined.

On RX, packets are dispatched to the eMAC or pMAC after being filtered
by their SMD. On TX, the eMAC/pMAC classification decision is taken by
the 802.1Q spec, based on packet priority (each of the 8 user priority
values may have an admin-status of preemptible or express).

The MAC Merge layer and the Frame Preemption parameters have some degree
of independence in terms of how software stacks are supposed to deal
with them. The activation of the MM layer is supposed to be controlled
by an LLDP daemon (after it has been communicated that the link partner
also supports it), after which a (hardware-based or not) verification
handshake takes place, before actually enabling the feature. So the
process is intended to be relatively plug-and-play. Whereas FP settings
are supposed to be coordinated across a network using something
approximating NETCONF.

The support contained here is exclusively for the 802.3 (MAC Merge)
portions and not for the 802.1Q (Frame Preemption) parts. This API is
sufficient for an LLDP daemon to do its job. The FP adminStatus variable
from 802.1Q is outside the scope of an LLDP daemon.

I have taken a few creative licenses and augmented the Linux kernel UAPI
compared to the standard managed objects recommended by IEEE 802.3.
These are:

- ETHTOOL_A_MM_PMAC_ENABLED: According to Figure 99-6: Receive
  Processing state diagram, a MAC Merge layer is always supposed to be
  able to receive P frames. However, this implies keeping the pMAC
  powered on, which will consume needless power in applications where FP
  will never be used. If LLDP is used, the reception of an Additional
  Ethernet Capabilities TLV from the link partner is sufficient
  indication that the pMAC should be enabled. So my proposal is that in
  Linux, we keep the pMAC turned off by default and that user space
  turns it on when needed.

- ETHTOOL_A_MM_VERIFY_ENABLED: The IEEE managed object is called
  aMACMergeVerifyDisableTx. I opted for consistency (positive logic) in
  the boolean netlink attributes offered, so this is also positive here.
  Other than the meaning being reversed, they correspond to the same
  thing.

- ETHTOOL_A_MM_MAX_VERIFY_TIME: I found it most reasonable for a LLDP
  daemon to maximize the verifyTime variable (delay between SMD-V
  transmissions), to maximize its chances that the LP replies. IEEE says
  that the verifyTime can range between 1 and 128 ms, but the NXP ENETC
  stupidly keeps this variable in a 7 bit register, so the maximum
  supported value is 127 ms. I could have chosen to hardcode this in the
  LLDP daemon to a lower value, but why not let the kernel expose its
  supported range directly.

- ETHTOOL_A_MM_TX_MIN_FRAG_SIZE: the standard managed object is called
  aMACMergeAddFragSize, and expresses the "additional" fragment size
  (on top of ETH_ZLEN), whereas this expresses the absolute value of the
  fragment size.

- ETHTOOL_A_MM_RX_MIN_FRAG_SIZE: there doesn't appear to exist a managed
  object mandated by the standard, but user space clearly needs to know
  what is the minimum supported fragment size of our local receiver,
  since LLDP must advertise a value no lower than that.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2b30f829

net/sock: Introduce trace_sk_data_ready() · 40e0b090

Peilin Ye authored Jan 19, 2023

As suggested by Cong, introduce a tracepoint for all ->sk_data_ready()
callback implementations.  For example:

<...>
  iperf-609  [002] .....  70.660425: sk_data_ready: family=2 protocol=6 func=sock_def_readable
  iperf-609  [002] .....  70.660436: sk_data_ready: family=2 protocol=6 func=sock_def_readable
<...>
Suggested-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

40e0b090

21 Jan, 2023 17 commits

Merge branch 'mlxsw-add-support-of-latency-tlv' · a7b87d2a

Jakub Kicinski authored Jan 20, 2023

Petr Machata says:

====================
mlxsw: Add support of latency TLV

Amit Cohen writes:

Ethernet Management Datagrams (EMADs) are Ethernet packets sent between
the driver and device's firmware. They are used to pass various
configurations to the device, but also to get events (e.g., port up)
from it. After the Ethernet header, these packets are built in a TLV
format.

This is the structure of EMADs:
* Ethernet header
* Operation TLV
* String TLV (optional)
* Latency TLV (optional)
* Reg TLV
* End TLV

The latency of each EMAD is measured by firmware. The driver can get the
measurement via latency TLV which can be added to each EMAD. This TLV is
optional, when EMAD is sent with this TLV, the EMAD's response will include
the TLV and will contain the firmware measurement.

Add support for Latency TLV and use it by default for all EMADs (see
more information in commit messages). The latency measurements can be
processed using BPF program for example, to create a histogram and average
of the latency per register. In addition, it is possible to measure the
end-to-end latency, so then the latency of the software overhead can be
calculated. This information can be useful to improve the driver
performance.

See an example of output of BPF tool which presents these measurements:

$ ./emadlatency -f -a
    Tracing EMADs... Hit Ctrl-C to end.
    Register write = RALUE (0x8013)
    E2E Measurements:
    average = 23 usecs, total = 32052693 usecs, count = 1337061
         usecs               : count    distribution
             0 -> 1          : 0        |                                 |
             2 -> 3          : 0        |                                 |
             4 -> 7          : 0        |                                 |
             8 -> 15         : 0        |                                 |
            16 -> 31         : 1290814  |*********************************|
            32 -> 63         : 45339    |*                                |
            64 -> 127        : 532      |                                 |
           128 -> 255        : 247      |                                 |
           256 -> 511        : 57       |                                 |
           512 -> 1023       : 26       |                                 |
          1024 -> 2047       : 33       |                                 |
          2048 -> 4095       : 0        |                                 |
          4096 -> 8191       : 10       |                                 |
          8192 -> 16383      : 1        |                                 |
         16384 -> 32767      : 1        |                                 |
         32768 -> 65535      : 1        |                                 |

    Firmware Measurements:
    average = 10 usecs, total = 13884128 usecs, count = 1337061
         usecs               : count    distribution
             0 -> 1          : 0        |                                 |
             2 -> 3          : 0        |                                 |
             4 -> 7          : 0        |                                 |
             8 -> 15         : 1337035  |*********************************|
            16 -> 31         : 17       |                                 |
            32 -> 63         : 7        |                                 |
            64 -> 127        : 0        |                                 |
           128 -> 255        : 2        |                                 |

    Diff between measurements: 13 usecs

Patch set overview:
Patches #1-#3 add support for querying MGIR, to know if string TLV and
latency TLV are supported
Patches #4-#5 add some relevant fields to support latency TLV
Patch #6 adds support of latency TLV
====================

Link: https://lore.kernel.org/r/cover.1674123673.git.petrm@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

a7b87d2a

mlxsw: Add support of latency TLV · 49f5b769

Amit Cohen authored Jan 19, 2023

The latency of each EMAD can be measured by firmware. The driver can get
the measurement via latency TLV which can be added to each EMAD. This TLV
is optional, when EMAD is sent with this TLV, the EMAD's response will
include the TLV and the field 'latency_time' will contain the firmware
measurement.

This information can be processed using BPF program for example, to
create a histogram and average of the latency per register. In addition,
it is possible to measure the end-to-end latency, and then reduce firmware
measurement, which will result in the latency of the software overhead.
This information can be useful to improve the driver performance.

Add support for latency TLV by default for all EMADs. First we planned to
enable latency TLV per demand, using devlink-param. After some tests, we
know that the usage of latency TLV does not impact the end-to-end latency,
so it is OK to enable it by default.

Note that similar to string TLV, the latency TLV is not supported in all
firmware versions. Enable the usage of this TLV only after verifying it is
supported by the current firmware version by querying the Management
General Information Register (MGIR).
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

49f5b769

mlxsw: core: Define latency TLV fields · 6ee0d3a9

Amit Cohen authored Jan 19, 2023

The next patch will add support for latency TLV as part of EMAD (Ethernet
Management Datagrams) packets. As preparation, add the relevant fields.
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

6ee0d3a9

mlxsw: emad: Add support for latency TLV · 695f7306

Amit Cohen authored Jan 19, 2023

The next patches will add support for latency TLV as part of EMAD (Ethernet
Management Datagrams) packets. As preparation, add the relevant values.
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

695f7306

mlxsw: core: Do not worry about changing 'enable_string_tlv' while sending EMADs · 563bd3c4

Amit Cohen authored Jan 19, 2023

Till now, the field 'mlxsw_core->emad.enable_string_tlv' is set as part
of mlxsw_sp_init(), this means that it can be changed during
emad_reg_access(). To avoid such change, this field is read once in
emad_reg_access() and the value is used all the way.

The previous patch sets this value according to MGIR output, as part of
mlxsw_emad_init(), so now it cannot be changed while sending EMADs.

Do not save 'enable_string_tlv' and do not pass it to functions, just pass
'struct mlxsw_core' and use the value directly from it.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

563bd3c4

mlxsw: Enable string TLV usage according to MGIR output · d84e2359

Amit Cohen authored Jan 19, 2023

String TLV is not supported by old firmware versions, therefore
'struct mlxsw_core' stores the field 'emad.enable_string_tlv', which is
set to true only after firmware version check.

Instead of assuming that firmware version check is enough to enable
string TLV, a better solution is to query if this TLV is supported from
MGIR register. Add such query and initialize 'emad.enable_string_tlv'
accordingly.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

d84e2359

mlxsw: reg: Add TLV related fields to MGIR register · 42b4f757

Amit Cohen authored Jan 19, 2023

MGIR (Management General Information Register) allows software to query the
hardware and firmware general information. As part of firmware information,
the driver can query if string TLV and latency TLV are supported. These
TLVs are part of EMAD's header and are used to provide information per
EMAD packet to software.

Currently, string TLV is already used by the driver, but it does not
query if this TLV is supported from MGIR. The next patches will add support
of latency TLV. Add the relevant fields to MGIR, so then the driver will
query them to know if the TLVs are supported before using them.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

42b4f757

ptp_qoriq: fix latency in ptp_qoriq_adjtime() operation · 24a7fffb

Nikhil Gupta authored Jan 20, 2023

1588 driver loses about 1us in adjtime operation at PTP slave
This is because adjtime operation uses a slow non-atomic tmr_cnt_read()
followed by tmr_cnt_write() operation.

In the above sequence, since the timer counter operation keeps
incrementing, it leads to latency. The tmr_offset register
(which is added to TMR_CNT_H/L register giving the current time)
must be programmed with the delta nanoseconds.
Signed-off-by: Nikhil Gupta <nikhil.gupta@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20230119204034.7969-1-nikhil.gupta@nxp.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

24a7fffb

net: microchip: vcap: use kmemdup() to allocate memory · 5e64f59a

Yang Yingliang authored Jan 19, 2023

Use kmemdup() helper instead of open-coding to simplify
the code when allocating newckf and newcaf.

Generated by: scripts/coccinelle/api/memdup.cocci
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Link: https://lore.kernel.org/r/20230119092210.3607634-1-yangyingliang@huawei.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

5e64f59a

Merge branch 'net-mdio-remove-support-for-building-c45-muxed-addresses' · bad5532e

Jakub Kicinski authored Jan 20, 2023

Michael Walle says:

====================
net: mdio: Remove support for building C45 muxed addresses

I've picked this older series from Andrew up and rebased it onto
the latest net-next.

With all drivers which support c45 now being converted to a seperate c22
and c45 access op, we can now remove the old MII_ADDR_C45 flag.
====================

Link: https://lore.kernel.org/r/20230119130700.440601-1-michael@walle.ccSigned-off-by: Jakub Kicinski <kuba@kernel.org>

bad5532e

net: mdio: Remove support for building C45 muxed addresses · 99d5fe9c

Andrew Lunn authored Jan 19, 2023

The old way of performing a C45 bus transfer created a special
register value and passed it to the MDIO bus driver, in the hope it
would see the MII_ADDR_C45 bit set, and perform a C45 transfer. Now
that there is a clear separation of C22 and C45, this scheme is no
longer used. Remove all the #defines and helpers, to prevent any code
being added which tries to use it.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

99d5fe9c

net: Remove C45 check in C22 only MDIO bus drivers · 660a5704

Andrew Lunn authored Jan 19, 2023

The MDIO core should not pass a C45 request via the C22 API call any
more. So remove the tests from the drivers.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

660a5704

net: ngbe: Drop mdiobus_c45_regad() · 45d564bf

Michael Walle authored Jan 19, 2023

With the new C45 MDIO access API, there is no encoding of the register
number anymore and thus the masking isn't necessary anymore. Remove it.
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

45d564bf

net: phy: Remove fallback to old C45 method · db1a63ae

Andrew Lunn authored Jan 19, 2023

Now that all MDIO bus drivers which support C45 implement the c45
specific ops, remove the fallback to the old method.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

db1a63ae

Merge branch 'r8152-improve-the-code' · bc170f96

Jakub Kicinski authored Jan 20, 2023

Hayes Wang says:

====================
r8152: improve the code

These are some minor improvements depending on commit ec51fbd1 ("r8152:
add USB device driver for config selection").
====================

Link: https://lore.kernel.org/r/20230119074043.10021-397-nic_swsd@realtek.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

bc170f96

r8152: reduce the control transfer of rtl8152_get_version() · 02767440

Hayes Wang authored Jan 19, 2023

Reduce the control transfer by moving calling rtl8152_get_version() in
rtl8152_probe(). This could prevent from calling rtl8152_get_version()
for unnecessary situations. For example, after setting config #2 for the
device, there are two interfaces and rtl8152_probe() may be called
twice. However, we don't need to call rtl8152_get_version() for this
situation.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

02767440

r8152: remove rtl_vendor_mode function · 95a4c1d6

Hayes Wang authored Jan 19, 2023

After commit ec51fbd1 ("r8152: add USB device driver for
config selection"), the code about changing USB configuration
in rtl_vendor_mode() wouldn't be run anymore. Therefore, the
function could be removed.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

95a4c1d6

20 Jan, 2023 21 commits

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · b3c588cd

Jakub Kicinski authored Jan 20, 2023

drivers/net/ipa/ipa_interrupt.c
drivers/net/ipa/ipa_interrupt.h
9ec9b2a3 ("net: ipa: disable ipa interrupt during suspend")
8e461e1f ("net: ipa: introduce ipa_interrupt_enable()")
d50ed355 ("net: ipa: enable IPA interrupt handlers separate from registration")
https://lore.kernel.org/all/20230119114125.5182c7ab@canb.auug.org.au/
https://lore.kernel.org/all/79e46152-8043-a512-79d9-c3b905462774@tessares.net/Signed-off-by: Jakub Kicinski <kuba@kernel.org>

b3c588cd

Merge tag 'net-6.2-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 5deaa985

Linus Torvalds authored Jan 20, 2023

Pull networking fixes from Jakub Kicinski:
 "Including fixes from wireless, bluetooth, bpf and netfilter.

  Current release - regressions:

   - Revert "net: team: use IFF_NO_ADDRCONF flag to prevent ipv6
     addrconf", fix nsna_ping mode of team

   - wifi: mt76: fix bugs in Rx queue handling and DMA mapping

   - eth: mlx5:
      - add missing mutex_unlock in error reporter
      - protect global IPsec ASO with a lock

  Current release - new code bugs:

   - rxrpc: fix wrong error return in rxrpc_connect_call()

  Previous releases - regressions:

   - bluetooth: hci_sync: fix use of HCI_OP_LE_READ_BUFFER_SIZE_V2

   - wifi:
      - mac80211: fix crashes on Rx due to incorrect initialization of
        rx->link and rx->link_sta
      - mac80211: fix bugs in iTXQ conversion - Tx stalls, incorrect
        aggregation handling, crashes
      - brcmfmac: fix regression for Broadcom PCIe wifi devices
      - rndis_wlan: prevent buffer overflow in rndis_query_oid

   - netfilter: conntrack: handle tcp challenge acks during connection
     reuse

   - sched: avoid grafting on htb_destroy_class_offload when destroying

   - virtio-net: correctly enable callback during start_xmit, fix stalls

   - tcp: avoid the lookup process failing to get sk in ehash table

   - ipa: disable ipa interrupt during suspend

   - eth: stmmac: enable all safety features by default

  Previous releases - always broken:

   - bpf:
      - fix pointer-leak due to insufficient speculative store bypass
        mitigation (Spectre v4)
      - skip task with pid=1 in send_signal_common() to avoid a splat
      - fix BPF program ID information in BPF_AUDIT_UNLOAD as well as
        PERF_BPF_EVENT_PROG_UNLOAD events
      - fix potential deadlock in htab_lock_bucket from same bucket
        index but different map_locked index

   - bluetooth:
      - fix a buffer overflow in mgmt_mesh_add()
      - hci_qca: fix driver shutdown on closed serdev
      - ISO: fix possible circular locking dependency
      - CIS: hci_event: fix invalid wait context

   - wifi: brcmfmac: fixes for survey dump handling

   - mptcp: explicitly specify sock family at subflow creation time

   - netfilter: nft_payload: incorrect arithmetics when fetching VLAN
     header bits

   - tcp: fix rate_app_limited to default to 1

   - l2tp: close all race conditions in l2tp_tunnel_register()

   - eth: mlx5: fixes for QoS config and eswitch configuration

   - eth: enetc: avoid deadlock in enetc_tx_onestep_tstamp()

   - eth: stmmac: fix invalid call to mdiobus_get_phy()

  Misc:

   - ethtool: add netlink attr in rss get reply only if the value is not
     empty"

* tag 'net-6.2-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (88 commits)
  Revert "Merge branch 'octeontx2-af-CPT'"
  tcp: fix rate_app_limited to default to 1
  bnxt: Do not read past the end of test names
  net: stmmac: enable all safety features by default
  octeontx2-af: add mbox to return CPT_AF_FLT_INT info
  octeontx2-af: update cpt lf alloc mailbox
  octeontx2-af: restore rxc conf after teardown sequence
  octeontx2-af: optimize cpt pf identification
  octeontx2-af: modify FLR sequence for CPT
  octeontx2-af: add mbox for CPT LF reset
  octeontx2-af: recover CPT engine when it gets fault
  net: dsa: microchip: ksz9477: port map correction in ALU table entry register
  selftests/net: toeplitz: fix race on tpacket_v3 block close
  net/ulp: use consistent error code when blocking ULP
  octeontx2-pf: Fix the use of GFP_KERNEL in atomic context on rt
  tcp: avoid the lookup process failing to get sk in ehash table
  Revert "net: team: use IFF_NO_ADDRCONF flag to prevent ipv6 addrconf"
  MAINTAINERS: add networking entries for Willem
  net: sched: gred: prevent races when adding offloads to stats
  l2tp: prevent lockdep issue in l2tp_tunnel_register()
  ...

5deaa985

Revert "Merge branch 'octeontx2-af-CPT'" · 45a919bb

Jakub Kicinski authored Jan 20, 2023

This reverts commit b4fbf0b2, reversing
changes made to 6c977c5c.

This seems like net-next material.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

45a919bb

Merge branch 'octeontx2-af-miscellaneous-changes-for-cpt' · 7a590bd6

Jakub Kicinski authored Jan 20, 2023

Srujana Challa says:

====================
octeontx2-af: Miscellaneous changes for CPT

This patchset consists of miscellaneous changes for CPT.
- Adds a new mailbox to reset the requested CPT LF.
- Modify FLR sequence as per HW team suggested.
- Adds support to recover CPT engines when they gets fault.
- Updates CPT inbound inline IPsec configuration mailbox,
  as per new generation of the OcteonTX2 chips.
- Adds a new mailbox to return CPT FLT Interrupt info.
====================

Link: https://lore.kernel.org/r/20230118120354.1017961-1-schalla@marvell.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

7a590bd6

octeontx2-af: add mbox to return CPT_AF_FLT_INT info · b814cc90

Srujana Challa authored Jan 18, 2023

CPT HW would trigger the CPT AF FLT interrupt when CPT engines
hits some uncorrectable errors and AF is the one which receives
the interrupt and recovers the engines.
This patch adds a mailbox for CPT VFs to request for CPT faulted
and recovered engines info.
Signed-off-by: Srujana Challa <schalla@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

b814cc90

octeontx2-af: update cpt lf alloc mailbox · d1e1de10

Srujana Challa authored Jan 18, 2023

The CN10K CPT coprocessor contains a context processor
to accelerate updates to the IPsec security association
contexts. The context processor contains a context cache.
This patch updates CPT LF ALLOC mailbox to config ctx_ilen
requested by VFs. CPT_LF_ALLOC:ctx_ilen is the size of
initial context fetch.
Signed-off-by: Srujana Challa <schalla@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

d1e1de10

octeontx2-af: restore rxc conf after teardown sequence · e2784acb

Nithin Dabilpuram authored Jan 18, 2023

CN10K CPT coprocessor includes a component named RXC which
is responsible for reassembly of inner IP packets. RXC has
the feature to evict oldest entries based on age/threshold.
The age/threshold is being set to minimum values to evict
all entries at the time of teardown.
This patch adds code to restore timeout and threshold config
after teardown sequence is complete as it is global config.
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

e2784acb

octeontx2-af: optimize cpt pf identification · 41b166e5

Srujana Challa authored Jan 18, 2023

Optimize CPT PF identification in mbox handling for faster
mbox response by doing it at AF driver probe instead of
every mbox message.
Signed-off-by: Srujana Challa <schalla@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

41b166e5

octeontx2-af: modify FLR sequence for CPT · 5c22fce6

Srujana Challa authored Jan 18, 2023

On OcteonTX2 platform CPT instruction enqueue is only
possible via LMTST operations.
The existing FLR sequence mentioned in HRM requires
a dummy LMTST to CPT but LMTST can't be submitted from
AF driver. So, HW team provided a new sequence to avoid
dummy LMTST. This patch adds code for the same.
Signed-off-by: Srujana Challa <schalla@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

5c22fce6

octeontx2-af: add mbox for CPT LF reset · b7e41527

Srujana Challa authored Jan 18, 2023

On OcteonTX2 SoC, the admin function (AF) is the only one with all
priviliges to configure HW and alloc resources, PFs and it's VFs
have to request AF via mailbox for all their needs.
This patch adds a new mailbox for CPT VFs to request for CPT LF
reset.
Signed-off-by: Srujana Challa <schalla@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

b7e41527

octeontx2-af: recover CPT engine when it gets fault · e625dad8

Srujana Challa authored Jan 18, 2023

When CPT engine has uncorrectable errors, it will get halted and
must be disabled and re-enabled. This patch adds code for the same.
Signed-off-by: Srujana Challa <schalla@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

e625dad8

ice: use GNSS subsystem instead of TTY · c7ef8221

Arkadiusz Kubalewski authored Jan 18, 2023

Previously support for GNSS was implemented as a TTY driver, it allowed
to access GNSS receiver on /dev/ttyGNSS_<bus><func>.

Use generic GNSS subsystem API instead of implementing own TTY driver.
The receiver is accessible on /dev/gnss<id>. In case of multiple receivers
in the OS, correct device can be found by enumerating either:
- /sys/class/net/<eth port>/device/gnss/
- /sys/class/gnss/gnss<id>/device/

Using GNSS subsystem is superior to implementing own TTY driver, as the
GNSS subsystem was designed solely for this purpose. It also implements
TTY driver but in a common and defined way.

From user perspective, there is no difference in communicating with a
device, except new path to the device shall be used. The device will
provide same information to the userspace as the old one, and can be used
in the same way, i.e.:
old # gpsmon /dev/ttyGNSS_2100_0
new # gpsmon /dev/gnss0
There is no other impact on userspace tools.

User expecting onboard GNSS receiver support is required to enable
CONFIG_GNSS=y/m in kernel config.
Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Signed-off-by: Michal Michalik <michal.michalik@intel.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

c7ef8221

net: hns: Switch to use acpi_evaluate_dsm_typed() · 498fe810

Andy Shevchenko authored Jan 19, 2023

The acpi_evaluate_dsm_typed() provides a way to check the type of the
object evaluated by _DSM call. Use it instead of open coded variant.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

498fe810

ACPI: utils: Add acpi_evaluate_dsm_typed() and acpi_check_dsm() stubs · 1b94ad7c

Andy Shevchenko authored Jan 19, 2023

When the ACPI part of a driver is optional the methods used in it
are expected to be available even if CONFIG_ACPI=n. This is not
the case for _DSM related methods. Add stubs for
acpi_evaluate_dsm_typed() and acpi_check_dsm() methods.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1b94ad7c

tcp: fix rate_app_limited to default to 1 · 300b655d

David Morley authored Jan 19, 2023

The initial default value of 0 for tp->rate_app_limited was incorrect,
since a flow is indeed application-limited until it first sends
data. Fixing the default to be 1 is generally correct but also
specifically will help user-space applications avoid using the initial
tcpi_delivery_rate value of 0 that persists until the connection has
some non-zero bandwidth sample.

Fixes: eb8329e0 ("tcp: export data delivery rate")
Suggested-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David Morley <morleyd@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Tested-by: David Morley <morleyd@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

300b655d

bnxt: Do not read past the end of test names · d3e599c0

Kees Cook authored Jan 18, 2023

Test names were being concatenated based on a offset beyond the end of
the first name, which tripped the buffer overflow detection logic:

 detected buffer overflow in strnlen
 [...]
 Call Trace:
 bnxt_ethtool_init.cold+0x18/0x18

Refactor struct hwrm_selftest_qlist_output to use an actual array,
and adjust the concatenation to use snprintf() rather than a series of
strncat() calls.
Reported-by: Niklas Cassel <Niklas.Cassel@wdc.com>
Link: https://lore.kernel.org/lkml/Y8F%2F1w1AZTvLglFX@x1-carbon/Tested-by: Niklas Cassel <Niklas.Cassel@wdc.com>
Fixes: eb513658 ("bnxt_en: Add basic ethtool -t selftest support.")
Cc: Michael Chan <michael.chan@broadcom.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d3e599c0

Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue · ba197fde

David S. Miller authored Jan 20, 2023

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2023-01-19 (ice)

This series contains updates to ice driver only.

Tsotne and Anatolii implement new handling, and AdminQ command, for
firmware LLDP, adding a pending notification to allow for proper
cleanup between TC changes.

Amritha extends support for drop action outside of switchdev.

Siddaraju adjusts restriction for PTP HW clock adjustments.

Ani removes an unneeded non-null check and improves reporting of some link
modes to utilize more appropriate values.

Jesse adds checks to ensure PF VSI type.

Przemek combines duplicate checks of the same condition into one check.

Tony makes various cleanups to code: removes comments for cppcheck
suppressions, reduces scope of some variables, changes some return
statements to reflect an explicit 0 return, matches naming for function
declaration and definition, adds local variable for readability, and
fixes indenting.

Sergey separates DDP (Dynamic Device Personalization) code into its own
file.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

ba197fde

Merge branch 'net-dcb-rewrite-table' · f5339209

David S. Miller authored Jan 20, 2023

Daniel Machon says:

====================
net: Introduce new DCB rewrite table

There is currently no support for per-port egress mapping of priority to PCP and
priority to DSCP. Some support for expressing egress mapping of PCP is supported
through ip link, with the 'egress-qos-map', however this command only maps
priority to PCP, and for vlan interfaces only. DCB APP already has support for
per-port ingress mapping of PCP/DEI, DSCP and a bunch of other stuff. So why not
take advantage of this fact, and add a new table that does the reverse.

This patch series introduces the new DCB rewrite table. Whereas the DCB
APP table deals with ingress mapping of PID (protocol identifier) to priority,
the rewrite table deals with egress mapping of priority to PID.

It is indeed possible to integrate rewrite in the existing APP table, by
introducing new dedicated rewrite selectors, and altering existing functions
to treat rewrite entries specially. However, I feel like this is not a good
solution, and will pollute the APP namespace. APP is well-defined in IEEE, and
some userspace relies of advertised entries - for this fact, separating APP and
rewrite into to completely separate objects, seems to me the best solution.

The new table shares much functionality with the APP table, and as such, much
existing code is reused, or slightly modified, to work for both.

================================================================================
DCB rewrite table in a nutshell
================================================================================
The table is implemented as a simple linked list, and uses the same lock as the
APP table. New functions for getting, setting and deleting entries have been
added, and these are exported, so they can be used by the stack or drivers.
Additionnaly, new dcbnl_setrewr and dcnl_delrewr hooks has been added, to
support hardware offload of the entries.

================================================================================
Sparx5 per-port PCP rewrite support
================================================================================
Sparx5 supports PCP egress mapping through two eight-entry switch tables.
One table maps QoS class 0-7 to PCP for DE0 (DP levels mapped to
drop-eligibility 0) and the other for DE1. DCB does currently not have support
for expressing DP/color, so instead, the tagged DEI bit will reflect the DP
levels, for any rewrite entries> 7 ('de').

The driver will take apptrust (contributed earlier) into consideration, so
that the mapping tables only be used, if PCP is trusted *and* the rewrite table
has active mappings, otherwise classified PCP (same as frame PCP) will be used
instead.

================================================================================
Sparx5 per-port DSCP rewrite support
================================================================================
Sparx5 support DSCP egress mapping through a single 32-entry table. This table
maps classified QoS class and DP level to classified DSCP, and is consulted by
the switch Analyzer Classifier at ingress. At egress, the frame DSCP can either
be rewritten to classified DSCP to frame DSCP.

The driver will take apptrust into consideration, so that the mapping tables
only be used, if DSCP is trusted *and* the rewrite table has active mappings,
otherwise frame DSCP will be used instead.

================================================================================
Patches
================================================================================
Patch #1 modifies dcb_app_add to work for both APP and rewrite

Patch #2 adds dcbnl_app_table_setdel() for setting and deleting both APP and
         rewrite entries.

Patch #3 adds the rewrite table and all required functions, offload hooks and
         bookkeeping for maintaining it.

Patch #4 adds two new helper functions for getting a priority to PCP bitmask
         map, and a priority to DSCP bitmask map.

Patch #5 adds support for PCP rewrite in the Sparx5 driver.
Patch #6 adds support for DSCP rewrite in the Sparx5 driver.

================================================================================
v2 -> v3:
  in dcbnl_ieee_fill() use nla_nest_start() instead of the _noflag() version.
  Also, cancel the rewrite nest in case of an error (Petr Machata).

v1 -> v2:
  In dcb_setrewr() change proto to u16 as it ought to be, and remove zero
  initialization of err. (Dan Carpenter).
  Change name of dcbnl_apprewr_setdel -> dcbnl_app_table_setdel and change the
  function signature to take a single function pointer. Update uses accordingly
  (Petr Machata).

====================
Signed-off-by: David S. Miller <davem@davemloft.net>

f5339209

net: microchip: sparx5: add support for DSCP rewrite · 246c77f6

Daniel Machon authored Jan 18, 2023

Add support for DSCP rewrite in Sparx5 driver. On egress DSCP is
rewritten from either classified DSCP, or frame DSCP. Classified DSCP is
determined by the Analyzer Classifier on ingress, and is mapped from
classified QoS class and DP level. Classification of DSCP is by default
enabled for all ports.

It is required that DSCP is trusted for the egress port *and* rewrite
table is not empty, in order to rewrite DSCP based on classified DSCP,
otherwise DSCP is always rewritten from frame DSCP.

classified_dscp = qos_dscp_map[8 * dp_level + qos_class];
if (active_mappings && dscp_is_trusted)
	rewritten_dscp = classified_dscp
else
	rewritten_dscp = frame_dscp

To rewrite DSCP to 20 for any frames with priority 7:

$ dcb apptrust set dev eth0 order dscp
$ dcb rewr add dev eth0 7:20 <-- not in iproute2/dcb yet
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

246c77f6

net: microchip: sparx5: add support for PCP rewrite · 2234879f

Daniel Machon authored Jan 18, 2023

Add support for rewrite of PCP and DEI, based on classified Quality of
Service (QoS) class and Drop-Precedence (DP) level.

The DCB rewrite table is queried for mappings between priority and
PCP/DEI. The classified DP level is then encoded in the DEI bit, if a
mapping for DEI exists.

Sparx5 has four DP levels, where by default, 0 is mapped to DE0 and 1-3
are mapped to DE1. If a mapping exists where DEI=1, then all classified
DP levels mapped to DE1 will set the DEI bit. The other way around for
DEI=0. Effectively, this means that the tagged DEI bit will reflect the
DP level for any mappings where DEI=1.

Map priority=1 to PCP=1 and DEI=1:
$ dcb rewr add dev eth0 pcp-prio 1:1de

Map priority=7 to PCP=2 and DEI=0
$ dcb rewr add dev eth0 pcp-prio 7:2nd

Also, sparx5_dcb_ieee_dscp_setdel() has been refactored, to work for
both APP and rewrite entries.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2234879f

net: dcb: add helper functions to retrieve PCP and DSCP rewrite maps · 1df99338

Daniel Machon authored Jan 18, 2023

Add two new helper functions to retrieve a mapping of priority to PCP
and DSCP bitmasks, where each bitmap contains ones in positions that
match a rewrite entry.

dcb_ieee_getrewr_prio_dscp_mask_map() reuses the dcb_ieee_app_prio_map,
as this struct is already used for a similar mapping in the app table.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1df99338