Commits · 346497c78d15cdd5bdc3b642a895009359e5457f · Kirill Smelkov / linux

15 Mar, 2021 1 commit

i40e: optimize for XDP_REDIRECT in xsk path · 346497c7

Magnus Karlsson authored Dec 02, 2020

Optimize i40e_run_xdp_zc() for the XDP program verdict being
XDP_REDIRECT in the xsk zero-copy path. This path is only used when
having AF_XDP zero-copy on and in that case most packets will be
directed to user space. This provides a little over 100k extra packets
in throughput on my server when running l2fwd in xdpsock.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

346497c7

14 Mar, 2021 33 commits

Merge branch 'psample-Add-additional-metadata-attributes' · 2117fce8

David S. Miller authored Mar 14, 2021

Ido Schimmel says:

====================
psample: Add additional metadata attributes

This series extends the psample module to expose additional metadata to
user space for packets sampled via act_sample. The new metadata (e.g.,
transit delay) can then be consumed by applications such as hsflowd [1]
for better network observability.

netdevsim is extended with a dummy psample implementation that
periodically reports "sampled" packets to the psample module. In
addition to testing of the psample module, it enables the development
and demonstration of user space applications (e.g., hsflowd) that are
interested in the new metadata even without access to specialized
hardware (e.g., Spectrum ASIC) that can provide it.

mlxsw is also extended to provide the new metadata to psample.

A Wireshark dissector for psample netlink packets [2] will be submitted
upstream after the kernel patches are accepted. In addition, a libpcap
capture module for psample is currently in the works. Eventually, users
should be able to run:

 # tshark -i psample

In order to consume sampled packets along with their metadata.

Series overview:

Patch #1 makes it easier to extend the metadata provided to psample

Patch #2 adds the new metadata attributes to psample

Patch #3 extends netdevsim to periodically report "sampled" packets to
psample. Various debugfs knobs are added to control the reporting

Patch #4 adds a selftest over netdevsim

Patches #5-#10 gradually add support for the new metadata in mlxsw

Patch #11 adds a selftest over mlxsw

[1] https://sflow.org/draft4_sflow_transit.txt
[2] https://gitlab.com/amitcohen1/wireshark/-/commit/3d711143024e032aef1b056dd23f0266c54fab56
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

2117fce8

selftests: mlxsw: Add tc sample tests · bb24d592

Ido Schimmel authored Mar 14, 2021

Test that packets are sampled when tc-sample is used and that reported
metadata is correct. Two sets of hosts (with and without LAG) are used,
since metadata extraction in mlxsw is a bit different when LAG is
involved.

 # ./tc_sample.sh
 TEST: tc sample rate (forward)                                      [ OK ]
 TEST: tc sample rate (local receive)                                [ OK ]
 TEST: tc sample maximum rate                                        [ OK ]
 TEST: tc sample group conflict test                                 [ OK ]
 TEST: tc sample iif                                                 [ OK ]
 TEST: tc sample lag iif                                             [ OK ]
 TEST: tc sample oif                                                 [ OK ]
 TEST: tc sample lag oif                                             [ OK ]
 TEST: tc sample out-tc                                              [ OK ]
 TEST: tc sample out-tc-occ                                          [ OK ]
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bb24d592

mlxsw: spectrum: Report extra metadata to psample module · 2073c600

Ido Schimmel authored Mar 14, 2021

Make use of the previously added metadata and report it to the psample
module. The metadata is read from the skb's control block, which was
initialized by the bus driver (i.e., 'mlxsw_pci') after decoding the
packet's Completion Queue Element (CQE).
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2073c600

mlxsw: spectrum: Remove mlxsw_sp_sample_receive() · 48990bef

Ido Schimmel authored Mar 14, 2021

The function resolves the psample sampling group from the Rx port
because this is the only form of sampling the driver currently supports.
Subsequent patches are going to add support for Tx-based and
policy-based sampling, in which case the sampling group would not be
resolved from the Rx port.

Therefore, move this code to the Rx-specific sampling listener.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

48990bef

mlxsw: spectrum: Remove unnecessary RCU read-side critical section · e1f78ecd

Ido Schimmel authored Mar 14, 2021

Since commit 7d8e8f34 ("mlxsw: core: Increase scope of RCU read-side
critical section"), all Rx handlers are called from an RCU read-side
critical section.

Remove the unnecessary rcu_read_lock() / rcu_read_unlock().
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e1f78ecd

mlxsw: pci: Set extra metadata in skb control block · 5ab6dc9f

Ido Schimmel authored Mar 14, 2021

Packets that are mirrored / sampled to the CPU have extra metadata
encoded in their corresponding Completion Queue Element (CQE). Retrieve
this metadata from the CQE and set it in the skb control block so that
it could be accessed by the switch driver (i.e., 'mlxsw_spectrum').
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5ab6dc9f

mlxsw: Create dedicated field for Rx metadata in skb control block · d4cabaad

Ido Schimmel authored Mar 14, 2021

Next patch will need to encode more Rx metadata in the skb control
block, so create a dedicated field for it and move the cookie index
there.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d4cabaad

mlxsw: pci: Add more metadata fields to CQEv2 · e0eeede3

Ido Schimmel authored Mar 14, 2021

The Completion Queue Element version 2 (CQEv2) includes various metadata
fields for packets that are mirrored / sampled to the CPU.

Add these fields so that they could be used by a later patch.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e0eeede3

selftests: netdevsim: Test psample functionality · f26b3091

Ido Schimmel authored Mar 14, 2021

Test various aspects of psample functionality over netdevsim and in
particular test that the psample module correctly reports the provided
metadata.

Example:

 # ./psample.sh
 TEST: psample enable / disable                                      [ OK ]
 TEST: psample group number                                          [ OK ]
 TEST: psample metadata                                              [ OK ]
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f26b3091

netdevsim: Add dummy psample implementation · a8700c3d

Ido Schimmel authored Mar 14, 2021

Allow netdevsim to report "sampled" packets to the psample module by
periodically generating packets from a work queue. The behavior can be
enabled / disabled (default) and the various meta data attributes can be
controlled via debugfs knobs.

This implementation enables both testing of the psample module with all
the optional attributes as well as development of user space
applications on top of psample such as hsflowd and a Wireshark dissector
for psample generic netlink packets.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a8700c3d

psample: Add additional metadata attributes · 07e1a580

Ido Schimmel authored Mar 14, 2021

Extend psample to report the following attributes when available:

* Output traffic class as a 16-bit value
* Output traffic class occupancy in bytes as a 64-bit value
* End-to-end latency of the packet in nanoseconds resolution
* Software timestamp in nanoseconds resolution (always available)
* Packet's protocol. Needed for packet dissection in user space (always
  available)
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

07e1a580

psample: Encapsulate packet metadata in a struct · a03e99d3

Ido Schimmel authored Mar 14, 2021

Currently, callers of psample_sample_packet() pass three metadata
attributes: Ingress port, egress port and truncated size. Subsequent
patches are going to add more attributes (e.g., egress queue occupancy),
which also need an indication whether they are valid or not.

Encapsulate packet metadata in a struct in order to keep the number of
arguments reasonable.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a03e99d3

Merge branch 'skbuff-micro-optimize-flow-dissection' · c6baf7ee

David S. Miller authored Mar 14, 2021

Alexander Lobakin says:

====================
skbuff: micro-optimize flow dissection

This little number makes all of the flow dissection functions take
raw input data pointer as const (1-5) and shuffles the branches in
__skb_header_pointer() according to their hit probability.

The result is +20 Mbps per flow/core with one Flow Dissector pass
per packet. This affects RPS (with software hashing), drivers that
use eth_get_headlen() on their Rx path and so on.

From v2 [1]:
 - reword some commit messages as a potential fix for NIPA;
 - no functional changes.

From v1 [0]:
 - rebase on top of the latest net-next. This was super-weird, but
   I double-checked that the series applies with no conflicts, and
   then on Patchwork it didn't;
 - no other changes.

[0] https://lore.kernel.org/netdev/20210312194538.337504-1-alobakin@pm.me
[1] https://lore.kernel.org/netdev/20210313113645.5949-1-alobakin@pm.me
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

c6baf7ee

skbuff: micro-optimize {,__}skb_header_pointer() · d206121f

Alexander Lobakin authored Mar 14, 2021

{,__}skb_header_pointer() helpers exist mainly for preventing
accesses-beyond-end of the linear data.
In the vast majorify of cases, they bail out on the first condition.
All code going after is mostly a fallback.
Mark the most common branch as 'likely' one to move it in-line.
Also, skb_copy_bits() can return negative values only when the input
arguments are invalid, e.g. offset is greater than skb->len. It can
be safely marked as 'unlikely' branch, assuming that hotpath code
provides sane input to not fail here.

These two bump the throughput with a single Flow Dissector pass on
every packet (e.g. with RPS or driver that uses eth_get_headlen())
on 20 Mbps per flow/core.
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: David S. Miller <davem@davemloft.net>

d206121f

ethernet: constify eth_get_headlen()'s data argument · 59753ce8

Alexander Lobakin authored Mar 14, 2021

It's used only for flow dissection, which now takes constant data
pointers.
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: David S. Miller <davem@davemloft.net>

59753ce8

linux/etherdevice.h: misc trailing whitespace cleanup · 805a25f3

Alexander Lobakin authored Mar 14, 2021

Caught by the text editor. Fix it separately from the actual changes.
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: David S. Miller <davem@davemloft.net>

805a25f3

flow_dissector: constify raw input data argument · f96533cd

Alexander Lobakin authored Mar 14, 2021

Flow Dissector code never modifies the input buffer, neither skb nor
raw data.
Make 'data' argument const for all of the Flow dissector's functions.
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: David S. Miller <davem@davemloft.net>

f96533cd

skbuff: make __skb_header_pointer()'s data argument const · e3305138

Alexander Lobakin authored Mar 14, 2021

The function never modifies the input buffer, so 'data' argument
can be marked as const.
This implies one harmless cast-away.
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: David S. Miller <davem@davemloft.net>

e3305138

flow_dissector: constify bpf_flow_dissector's data pointers · dac06b32

Alexander Lobakin authored Mar 14, 2021

BPF Flow dissection programs are read-only and don't touch input
buffers.
Mark 'data' and 'data_end' in struct bpf_flow_dissector as const
in preparation for global input constifying.
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: David S. Miller <davem@davemloft.net>

dac06b32

Merge branch 'gro-micro-optimize-dev_gro_receive' · 3f79eb3c

David S. Miller authored Mar 14, 2021

Alexander Lobakin says:

====================
gro: micro-optimize dev_gro_receive()

This random series addresses some of suboptimal constructions used
in the main GRO entry point.
The main body is gro_list_prepare() simplification and pointer usage
optimization in dev_gro_receive() itself. Being mostly cosmetic, it
gives like +10 Mbps on my setup to both TCP and UDP (both single- and
multi-flow).

Since v1 [0]:
 - drop the replacement of bucket index calculation with
   reciprocal_scale() since it makes absolutely no sense (Eric);
 - improve stack usage in dev_gro_receive() (Eric);
 - reverse the order of patches to avoid changes superseding.

[0] https://lore.kernel.org/netdev/20210312162127.239795-1-alobakin@pm.me
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

3f79eb3c

gro: give 'hash' variable in dev_gro_receive() a less confusing name · d0eed5c3

Alexander Lobakin authored Mar 13, 2021

'hash' stores not the flow hash, but the index of the GRO bucket
corresponding to it.
Change its name to 'bucket' to avoid confusion while reading lines
like '__set_bit(hash, &napi->gro_bitmask)'.
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: David S. Miller <davem@davemloft.net>

d0eed5c3

gro: consistentify napi->gro_hash[x] access in dev_gro_receive() · 9dc2c313

Alexander Lobakin authored Mar 13, 2021

GRO bucket index doesn't change through the entire function.
Store a pointer to the corresponding bucket instead of its member
and use it consistently through the function.
It is performance-safe since &gro_list->list == gro_list.

Misc: remove superfluous braces around single-line branches.
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: David S. Miller <davem@davemloft.net>

9dc2c313

gro: simplify gro_list_prepare() · 0ccf4d50

Alexander Lobakin authored Mar 13, 2021

gro_list_prepare() always returns &napi->gro_hash[bucket].list,
without any variations. Moreover, it uses 'napi' argument only to
have access to this list, and calculates the bucket index for the
second time (firstly it happens at the beginning of
dev_gro_receive()) to do that.
Given that dev_gro_receive() already has an index to the needed
list, just pass it as the first argument to eliminate redundant
calculations, and make gro_list_prepare() return void.
Also, both arguments of gro_list_prepare() can be constified since
this function can only modify the skbs from the bucket list.
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: David S. Miller <davem@davemloft.net>

0ccf4d50

net: dsa: bcm_sf2: Fill in BCM4908 CFP entries · f4e6d7cd

Florian Fainelli authored Mar 12, 2021

The BCM4908 switch has 256 CFP entrie, update that setting so CFP can be
used.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f4e6d7cd

hv_netvsc: Add a comment clarifying batching logic · bd49fea7

Shachar Raindel authored Mar 12, 2021

The batching logic in netvsc_send is non-trivial, due to
a combination of the Linux API and the underlying hypervisor
interface. Add a comment explaining why the code is written this
way.
Signed-off-by: Shachar Raindel <shacharr@microsoft.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bd49fea7

Merge branch 'pktgen-scripts-improvements' · 0f88e6f3

David S. Miller authored Mar 14, 2021

Igor Russkikh says:

====================
pktgen: scripts improvements

Please consider small improvements to pktgen scripts we use in our environment.

Adding delay parameter through command line,
Adding new -a (append) parameter to make flex runs

v3: change us to ns in docs
v2: Review comments from Jesper

CC: Jesper Dangaard Brouer <brouer@redhat.com>
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

0f88e6f3

samples: pktgen: new append mode · c8fd4852

Igor Russkikh authored Mar 11, 2021

To configure various complex flows we for sure can create custom
pktgen init scripts, but sometimes thats not that easy.

New "-a" (append) option in all the existing sample scripts allows
to append more "devices" into pktgen threads.

The most straightforward usecases for that are:
- using multiple devices. We have to generate full linerate on
all physical functions (ports) of our multiport device.
- pushing multiple flows (with different packet options)
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c8fd4852

samples: pktgen: allow to specify delay parameter via new opt · ef700f2e

Igor Russkikh authored Mar 11, 2021

DELAY may now be explicitly specified via common parameter -w
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ef700f2e

docs: net: add missing devlink health cmd - trigger · 6f162909

Jakub Kicinski authored Mar 12, 2021

Documentation is missing and it's not very clear what
this callback is for - presumably testing the recovery?
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

6f162909

docs: net: tweak devlink health documentation · 3cc9b29a

Jakub Kicinski authored Mar 12, 2021

Minor tweaks and improvement of wording about the diagnose callback.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

3cc9b29a

net: stmmac: Set FIFO sizes for ipq806x · e127906b

Jonathan McDowell authored Mar 13, 2021

Commit eaf4fac4 ("net: stmmac: Do not accept invalid MTU values")
started using the TX FIFO size to verify what counts as a valid MTU
request for the stmmac driver.  This is unset for the ipq806x variant.
Looking at older patches for this it seems the RX + TXs buffers can be
up to 8k, so set appropriately.

(I sent this as an RFC patch in June last year, but received no replies.
I've been running with this on my hardware (a MikroTik RB3011) since
then with larger MTUs to support both the internal qca8k switch and
VLANs with no problems. Without the patch it's impossible to set the
larger MTU required to support this.)
Signed-off-by: Jonathan McDowell <noodles@earth.li>
Signed-off-by: David S. Miller <davem@davemloft.net>

e127906b

drivers: net: vxlan.c: Fix declaration issue · 6fadbdd6

Sanjana Srinidhi authored Mar 13, 2021

Added a blank line after structure declaration.
This is done to maintain code uniformity.
Signed-off-by: Sanjana Srinidhi <sanjanasrinidhi1810@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6fadbdd6

net: ethernet: marvell: Fixed typo in the file sky2.c · 65c7bc1b

Bhaskar Chowdhury authored Mar 13, 2021

s/calclation/calculation/
Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

65c7bc1b

13 Mar, 2021 6 commits

Merge branch 'dsa-hewllcreek-dumps' · b8eccf2a

David S. Miller authored Mar 13, 2021

Kurt Kanzenbach says:

====================
net: dsa: hellcreek: Add support for dumping tables

add support for dumping the VLAN and FDB table via devlink. As the driver uses
internal VLANs and static FDB entries, this is a useful debugging feature.

Changes since v1:

 * Drop memory reporting as there are better APIs to expose this
 * Move comment to VLAN patch

Previous versions:

 * https://lkml.kernel.org/netdev/20210311175344.3084-1-kurt@kmk-computers.de/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

b8eccf2a

net: dsa: hellcreek: Add devlink FDB region · 292cd449

Kurt Kanzenbach authored Mar 13, 2021

Allow to dump the FDB table via devlink. This is a useful debugging feature.
Signed-off-by: Kurt Kanzenbach <kurt@kmk-computers.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

292cd449

net: dsa: hellcreek: Move common code to helper · eb5f3d31

Kurt Kanzenbach authored Mar 13, 2021

There are two functions which need to populate fdb entries. Move that to a
helper function.
Signed-off-by: Kurt Kanzenbach <kurt@kmk-computers.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

eb5f3d31

net: dsa: hellcreek: Use boolean value · e81813fb

Kurt Kanzenbach authored Mar 13, 2021

hellcreek_select_vlan() takes a boolean instead of an integer.
So, use false accordingly.
Signed-off-by: Kurt Kanzenbach <kurt@kmk-computers.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e81813fb

net: dsa: hellcreek: Add devlink VLAN region · ba2d1c28

Kurt Kanzenbach authored Mar 13, 2021

Allow to dump the VLAN table via devlink. This especially useful, because the
driver internally leverages VLANs for the port separation. These are not visible
via the bridge utility.
Signed-off-by: Kurt Kanzenbach <kurt@kmk-computers.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ba2d1c28

Merge tag 'batadv-next-pullrequest-20210312' of git://git.open-mesh.org/linux-merge · ebc71a38

David S. Miller authored Mar 13, 2021

Simon Wunderlich says:

====================
There is only a single patch this time:

 - Use netif_rx_any_context(), by Sebastian Andrzej Siewior
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

ebc71a38