- 15 Mar, 2021 1 commit
-
-
Magnus Karlsson authored
Optimize i40e_run_xdp_zc() for the XDP program verdict being XDP_REDIRECT in the xsk zero-copy path. This path is only used when having AF_XDP zero-copy on and in that case most packets will be directed to user space. This provides a little over 100k extra packets in throughput on my server when running l2fwd in xdpsock. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-
- 14 Mar, 2021 33 commits
-
-
David S. Miller authored
Ido Schimmel says: ==================== psample: Add additional metadata attributes This series extends the psample module to expose additional metadata to user space for packets sampled via act_sample. The new metadata (e.g., transit delay) can then be consumed by applications such as hsflowd [1] for better network observability. netdevsim is extended with a dummy psample implementation that periodically reports "sampled" packets to the psample module. In addition to testing of the psample module, it enables the development and demonstration of user space applications (e.g., hsflowd) that are interested in the new metadata even without access to specialized hardware (e.g., Spectrum ASIC) that can provide it. mlxsw is also extended to provide the new metadata to psample. A Wireshark dissector for psample netlink packets [2] will be submitted upstream after the kernel patches are accepted. In addition, a libpcap capture module for psample is currently in the works. Eventually, users should be able to run: # tshark -i psample In order to consume sampled packets along with their metadata. Series overview: Patch #1 makes it easier to extend the metadata provided to psample Patch #2 adds the new metadata attributes to psample Patch #3 extends netdevsim to periodically report "sampled" packets to psample. Various debugfs knobs are added to control the reporting Patch #4 adds a selftest over netdevsim Patches #5-#10 gradually add support for the new metadata in mlxsw Patch #11 adds a selftest over mlxsw [1] https://sflow.org/draft4_sflow_transit.txt [2] https://gitlab.com/amitcohen1/wireshark/-/commit/3d711143024e032aef1b056dd23f0266c54fab56 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
Test that packets are sampled when tc-sample is used and that reported metadata is correct. Two sets of hosts (with and without LAG) are used, since metadata extraction in mlxsw is a bit different when LAG is involved. # ./tc_sample.sh TEST: tc sample rate (forward) [ OK ] TEST: tc sample rate (local receive) [ OK ] TEST: tc sample maximum rate [ OK ] TEST: tc sample group conflict test [ OK ] TEST: tc sample iif [ OK ] TEST: tc sample lag iif [ OK ] TEST: tc sample oif [ OK ] TEST: tc sample lag oif [ OK ] TEST: tc sample out-tc [ OK ] TEST: tc sample out-tc-occ [ OK ] Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
Make use of the previously added metadata and report it to the psample module. The metadata is read from the skb's control block, which was initialized by the bus driver (i.e., 'mlxsw_pci') after decoding the packet's Completion Queue Element (CQE). Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
The function resolves the psample sampling group from the Rx port because this is the only form of sampling the driver currently supports. Subsequent patches are going to add support for Tx-based and policy-based sampling, in which case the sampling group would not be resolved from the Rx port. Therefore, move this code to the Rx-specific sampling listener. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
Since commit 7d8e8f34 ("mlxsw: core: Increase scope of RCU read-side critical section"), all Rx handlers are called from an RCU read-side critical section. Remove the unnecessary rcu_read_lock() / rcu_read_unlock(). Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
Packets that are mirrored / sampled to the CPU have extra metadata encoded in their corresponding Completion Queue Element (CQE). Retrieve this metadata from the CQE and set it in the skb control block so that it could be accessed by the switch driver (i.e., 'mlxsw_spectrum'). Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
Next patch will need to encode more Rx metadata in the skb control block, so create a dedicated field for it and move the cookie index there. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
The Completion Queue Element version 2 (CQEv2) includes various metadata fields for packets that are mirrored / sampled to the CPU. Add these fields so that they could be used by a later patch. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
Test various aspects of psample functionality over netdevsim and in particular test that the psample module correctly reports the provided metadata. Example: # ./psample.sh TEST: psample enable / disable [ OK ] TEST: psample group number [ OK ] TEST: psample metadata [ OK ] Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
Allow netdevsim to report "sampled" packets to the psample module by periodically generating packets from a work queue. The behavior can be enabled / disabled (default) and the various meta data attributes can be controlled via debugfs knobs. This implementation enables both testing of the psample module with all the optional attributes as well as development of user space applications on top of psample such as hsflowd and a Wireshark dissector for psample generic netlink packets. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
Extend psample to report the following attributes when available: * Output traffic class as a 16-bit value * Output traffic class occupancy in bytes as a 64-bit value * End-to-end latency of the packet in nanoseconds resolution * Software timestamp in nanoseconds resolution (always available) * Packet's protocol. Needed for packet dissection in user space (always available) Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
Currently, callers of psample_sample_packet() pass three metadata attributes: Ingress port, egress port and truncated size. Subsequent patches are going to add more attributes (e.g., egress queue occupancy), which also need an indication whether they are valid or not. Encapsulate packet metadata in a struct in order to keep the number of arguments reasonable. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Alexander Lobakin says: ==================== skbuff: micro-optimize flow dissection This little number makes all of the flow dissection functions take raw input data pointer as const (1-5) and shuffles the branches in __skb_header_pointer() according to their hit probability. The result is +20 Mbps per flow/core with one Flow Dissector pass per packet. This affects RPS (with software hashing), drivers that use eth_get_headlen() on their Rx path and so on. From v2 [1]: - reword some commit messages as a potential fix for NIPA; - no functional changes. From v1 [0]: - rebase on top of the latest net-next. This was super-weird, but I double-checked that the series applies with no conflicts, and then on Patchwork it didn't; - no other changes. [0] https://lore.kernel.org/netdev/20210312194538.337504-1-alobakin@pm.me [1] https://lore.kernel.org/netdev/20210313113645.5949-1-alobakin@pm.me ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alexander Lobakin authored
{,__}skb_header_pointer() helpers exist mainly for preventing accesses-beyond-end of the linear data. In the vast majorify of cases, they bail out on the first condition. All code going after is mostly a fallback. Mark the most common branch as 'likely' one to move it in-line. Also, skb_copy_bits() can return negative values only when the input arguments are invalid, e.g. offset is greater than skb->len. It can be safely marked as 'unlikely' branch, assuming that hotpath code provides sane input to not fail here. These two bump the throughput with a single Flow Dissector pass on every packet (e.g. with RPS or driver that uses eth_get_headlen()) on 20 Mbps per flow/core. Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alexander Lobakin authored
It's used only for flow dissection, which now takes constant data pointers. Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alexander Lobakin authored
Caught by the text editor. Fix it separately from the actual changes. Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alexander Lobakin authored
Flow Dissector code never modifies the input buffer, neither skb nor raw data. Make 'data' argument const for all of the Flow dissector's functions. Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alexander Lobakin authored
The function never modifies the input buffer, so 'data' argument can be marked as const. This implies one harmless cast-away. Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alexander Lobakin authored
BPF Flow dissection programs are read-only and don't touch input buffers. Mark 'data' and 'data_end' in struct bpf_flow_dissector as const in preparation for global input constifying. Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Alexander Lobakin says: ==================== gro: micro-optimize dev_gro_receive() This random series addresses some of suboptimal constructions used in the main GRO entry point. The main body is gro_list_prepare() simplification and pointer usage optimization in dev_gro_receive() itself. Being mostly cosmetic, it gives like +10 Mbps on my setup to both TCP and UDP (both single- and multi-flow). Since v1 [0]: - drop the replacement of bucket index calculation with reciprocal_scale() since it makes absolutely no sense (Eric); - improve stack usage in dev_gro_receive() (Eric); - reverse the order of patches to avoid changes superseding. [0] https://lore.kernel.org/netdev/20210312162127.239795-1-alobakin@pm.me ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alexander Lobakin authored
'hash' stores not the flow hash, but the index of the GRO bucket corresponding to it. Change its name to 'bucket' to avoid confusion while reading lines like '__set_bit(hash, &napi->gro_bitmask)'. Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alexander Lobakin authored
GRO bucket index doesn't change through the entire function. Store a pointer to the corresponding bucket instead of its member and use it consistently through the function. It is performance-safe since &gro_list->list == gro_list. Misc: remove superfluous braces around single-line branches. Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alexander Lobakin authored
gro_list_prepare() always returns &napi->gro_hash[bucket].list, without any variations. Moreover, it uses 'napi' argument only to have access to this list, and calculates the bucket index for the second time (firstly it happens at the beginning of dev_gro_receive()) to do that. Given that dev_gro_receive() already has an index to the needed list, just pass it as the first argument to eliminate redundant calculations, and make gro_list_prepare() return void. Also, both arguments of gro_list_prepare() can be constified since this function can only modify the skbs from the bucket list. Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Florian Fainelli authored
The BCM4908 switch has 256 CFP entrie, update that setting so CFP can be used. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Shachar Raindel authored
The batching logic in netvsc_send is non-trivial, due to a combination of the Linux API and the underlying hypervisor interface. Add a comment explaining why the code is written this way. Signed-off-by: Shachar Raindel <shacharr@microsoft.com> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Igor Russkikh says: ==================== pktgen: scripts improvements Please consider small improvements to pktgen scripts we use in our environment. Adding delay parameter through command line, Adding new -a (append) parameter to make flex runs v3: change us to ns in docs v2: Review comments from Jesper CC: Jesper Dangaard Brouer <brouer@redhat.com> ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Igor Russkikh authored
To configure various complex flows we for sure can create custom pktgen init scripts, but sometimes thats not that easy. New "-a" (append) option in all the existing sample scripts allows to append more "devices" into pktgen threads. The most straightforward usecases for that are: - using multiple devices. We have to generate full linerate on all physical functions (ports) of our multiport device. - pushing multiple flows (with different packet options) Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Igor Russkikh authored
DELAY may now be explicitly specified via common parameter -w Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
Documentation is missing and it's not very clear what this callback is for - presumably testing the recovery? Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
Minor tweaks and improvement of wording about the diagnose callback. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jonathan McDowell authored
Commit eaf4fac4 ("net: stmmac: Do not accept invalid MTU values") started using the TX FIFO size to verify what counts as a valid MTU request for the stmmac driver. This is unset for the ipq806x variant. Looking at older patches for this it seems the RX + TXs buffers can be up to 8k, so set appropriately. (I sent this as an RFC patch in June last year, but received no replies. I've been running with this on my hardware (a MikroTik RB3011) since then with larger MTUs to support both the internal qca8k switch and VLANs with no problems. Without the patch it's impossible to set the larger MTU required to support this.) Signed-off-by: Jonathan McDowell <noodles@earth.li> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Sanjana Srinidhi authored
Added a blank line after structure declaration. This is done to maintain code uniformity. Signed-off-by: Sanjana Srinidhi <sanjanasrinidhi1810@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Bhaskar Chowdhury authored
s/calclation/calculation/ Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 13 Mar, 2021 6 commits
-
-
David S. Miller authored
Kurt Kanzenbach says: ==================== net: dsa: hellcreek: Add support for dumping tables add support for dumping the VLAN and FDB table via devlink. As the driver uses internal VLANs and static FDB entries, this is a useful debugging feature. Changes since v1: * Drop memory reporting as there are better APIs to expose this * Move comment to VLAN patch Previous versions: * https://lkml.kernel.org/netdev/20210311175344.3084-1-kurt@kmk-computers.de/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Kurt Kanzenbach authored
Allow to dump the FDB table via devlink. This is a useful debugging feature. Signed-off-by: Kurt Kanzenbach <kurt@kmk-computers.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Kurt Kanzenbach authored
There are two functions which need to populate fdb entries. Move that to a helper function. Signed-off-by: Kurt Kanzenbach <kurt@kmk-computers.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Kurt Kanzenbach authored
hellcreek_select_vlan() takes a boolean instead of an integer. So, use false accordingly. Signed-off-by: Kurt Kanzenbach <kurt@kmk-computers.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Kurt Kanzenbach authored
Allow to dump the VLAN table via devlink. This especially useful, because the driver internally leverages VLANs for the port separation. These are not visible via the bridge utility. Signed-off-by: Kurt Kanzenbach <kurt@kmk-computers.de> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.open-mesh.org/linux-mergeDavid S. Miller authored
Simon Wunderlich says: ==================== There is only a single patch this time: - Use netif_rx_any_context(), by Sebastian Andrzej Siewior ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-