- 02 Jul, 2018 8 commits
-
-
David S. Miller authored
Amritha Nambiar says: ==================== Symmetric queue selection using XPS for Rx queues This patch series implements support for Tx queue selection based on Rx queue(s) map. This is done by configuring Rx queue(s) map per Tx-queue using sysfs attribute. If the user configuration for Rx queues does not apply, then the Tx queue selection falls back to XPS using CPUs and finally to hashing. XPS is refactored to support Tx queue selection based on either the CPUs map or the Rx-queues map. The config option CONFIG_XPS needs to be enabled. By default no receive queues are configured for the Tx queue. - /sys/class/net/<dev>/queues/tx-*/xps_rxqs A set of receive queues can be mapped to a set of transmit queues (many:many), although the common use case is a 1:1 mapping. This will enable sending packets on the same Tx-Rx queue association as this is useful for busy polling multi-threaded workloads where it is not possible to pin the threads to a CPU. This is a rework of Sridhar's patch for symmetric queueing via socket option: https://www.spinics.net/lists/netdev/msg453106.html Testing Hints: Kernel: Linux 4.17.0-rc7+ Interface: driver: ixgbe version: 5.1.0-k firmware-version: 0x00015e0b Configuration: ethtool -L $iface combined 16 ethtool -C $iface rx-usecs 1000 sysctl net.core.busy_poll=1000 ATR disabled: ethtool -K $iface ntuple on Workload: Modified memcached that changes the thread selection policy to be based on the incoming rx-queue of a connection using SO_INCOMING_NAPI_ID socket option. The default is round-robin. Default: No rxqs_map configured Symmetric queues: Enable rxqs_map for all queues 1:1 mapped to Tx queue System: Architecture: x86_64 CPU(s): 72 Model name: Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz 16 threads 400K requests/sec ============================= ------------------------------------------------------------------------------- Default Symmetric queues ------------------------------------------------------------------------------- RTT min/avg/max 4/51/2215 2/30/5163 (usec) intr/sec 26655 18606 contextswitch/sec 5145 4044 insn per cycle 0.43 0.72 cache-misses 6.919 4.310 (% of all cache refs) L1-dcache-load- 4.49 3.29 -misses (% of all L1-dcache hits) LLC-load-misses 13.26 8.96 (% of all LL-cache hits) ------------------------------------------------------------------------------- 32 threads 400K requests/sec ============================= ------------------------------------------------------------------------------- Default Symmetric queues ------------------------------------------------------------------------------- RTT min/avg/max 10/112/5562 9/46/4637 (usec) intr/sec 30456 27666 contextswitch/sec 7552 5133 insn per cycle 0.41 0.49 cache-misses 9.357 2.769 (% of all cache refs) L1-dcache-load- 4.09 3.98 -misses (% of all L1-dcache hits) LLC-load-misses 12.96 3.96 (% of all LL-cache hits) ------------------------------------------------------------------------------- 16 threads 800K requests/sec ============================= ------------------------------------------------------------------------------- Default Symmetric queues ------------------------------------------------------------------------------- RTT min/avg/max 5/151/4989 9/69/2611 (usec) intr/sec 35686 22907 contextswitch/sec 25522 12281 insn per cycle 0.67 0.74 cache-misses 8.652 6.38 (% of all cache refs) L1-dcache-load- 3.19 2.86 -misses (% of all L1-dcache hits) LLC-load-misses 16.53 11.99 (% of all LL-cache hits) ------------------------------------------------------------------------------- 32 threads 800K requests/sec ============================= ------------------------------------------------------------------------------- Default Symmetric queues ------------------------------------------------------------------------------- RTT min/avg/max 6/163/6152 8/88/4209 (usec) intr/sec 47079 26548 contextswitch/sec 42190 39168 insn per cycle 0.45 0.54 cache-misses 8.798 4.668 (% of all cache refs) L1-dcache-load- 6.55 6.29 -misses (% of all L1-dcache hits) LLC-load-misses 13.91 10.44 (% of all LL-cache hits) ------------------------------------------------------------------------------- v6: - Changed the names of some functions to begin with net_if. - Cleaned up sk_tx_queue_set/sk_rx_queue_set functions. - Added sk_rx_queue_clear to make it consistent with tx_queue_mapping initialization. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Amritha Nambiar authored
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Amritha Nambiar authored
Extend transmit queue sysfs attribute to configure Rx queue(s) map per Tx queue. By default no receive queues are configured for the Tx queue. - /sys/class/net/eth0/queues/tx-*/xps_rxqs Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Amritha Nambiar authored
This patch adds support to pick Tx queue based on the Rx queue(s) map configuration set by the admin through the sysfs attribute for each Tx queue. If the user configuration for receive queue(s) map does not apply, then the Tx queue selection falls back to CPU(s) map based selection and finally to hashing. Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Amritha Nambiar authored
This patch adds a new field to sock_common 'skc_rx_queue_mapping' which holds the receive queue number for the connection. The Rx queue is marked in tcp_finish_connect() to allow a client app to do SO_INCOMING_NAPI_ID after a connect() call to get the right queue association for a socket. Rx queue is also marked in tcp_conn_request() to allow syn-ack to go on the right tx-queue associated with the queue on which syn is received. Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Amritha Nambiar authored
Change 'skc_tx_queue_mapping' field in sock_common structure from 'int' to 'unsigned short' type with ~0 indicating unset and other positive queue values being set. This will accommodate adding a new 'unsigned short' field in sock_common in the next patch for rx_queue_mapping. Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Amritha Nambiar authored
Use static_key for XPS maps to reduce the cost of extra map checks, similar to how it is used for RPS and RFS. This includes static_key 'xps_needed' for XPS and another for 'xps_rxqs_needed' for XPS using Rx queues map. Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Amritha Nambiar authored
Refactor XPS code to support Tx queue selection based on CPU(s) map or Rx queue(s) map. Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 30 Jun, 2018 32 commits
-
-
David S. Miller authored
Petr Machata says: ==================== mlxsw: Add resource scale tests There are a number of tests that check features of the Linux networking stack. By running them on suitable interfaces, one can exercise the mlxsw offloading code. However none of these tests attempts to push mlxsw to the limits supported by the ASIC. As an additional wrinkle, the "limits supported by the ASIC" themselves may not be a set of fixed numbers, but rather depend on a profile that determines how the ASIC resources are allocated for different purposes. This patchset introduces several tests that verify capability of mlxsw to offload amounts of routes, flower rules, and mirroring sessions that match predicted ASIC capacity, at different configuration profiles. Additionally they verify that amounts exceeding the predicted capacity can *not* be offloaded. These are not generic tests, but ones that are tailored for mlxsw specifically. For that reason they are not added to net/forwarding selftests subdirectory, but rather to a newly-added drivers/net/mlxsw. Patches #1, #2 and #3 tweak the generic forwarding/lib.sh to support the new additions. In patches #4 and #5, new libraries for interfacing with devlink are introduced, first a generic one, then a Spectrum-specific one. In patch #6, a devlink resource test is introduced. Patches #7 and #8, #9 and #10, and #11 and #12 introduce three scale tests: router, flower and mirror-to-gretap. The first of each pair of patches introduces a generic portion of the test (mlxsw-specific), the second introduces a Spectrum-specific wrapper. Patch #13 then introduces a scale test driver that runs (possibly a subset of) the tests introduced by patches from previous paragraph. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yuval Mintz authored
Add a scale test capable of validating that offloaded network functionality is indeed functional at scale when configured to the different KVD profiles available. Start by testing offloaded routes are functional at scale by passing traffic on each one of them in turn. Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
Add a wrapper around mlxsw/mirror_gre_scale.sh that parameterized number of offloadable mirrors on Spectrum machines. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
Test that it's possible to offload a given number of mirrors. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
Add a wrapper around mlxsw/tc_flower_scale.sh that parameterizes the generic tc flower scale test template with Spectrum-specific target values. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Yuval Mintz <yuvalm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
Add test of capacity to offload flower. This is a generic portion of the test that is meant to be called from a driver that supplies a particular number of rules to be tested with. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Yuval Mintz <yuvalm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yuval Mintz authored
IPv4 routes in Spectrum are based on the kvd single-hash, but as it's a hash we need to assume we cannot reach 100% of its capacity. Add a wrapper that provides us with good/bad target numbers for the Spectrum ASIC. Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> [petrm@mellanox.com: Drop shebang.] Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arkadi Sharshevsky authored
This test aims for both stand alone and internal usage by the resource infra. The test receives the number routes to offload and checks: - The routes were offloaded correctly - Traffic for each route. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yuval Mintz authored
Add a selftest that can be used to perform basic sanity of the devlink resource API as well as test the behavior of KVD manipulation in the driver. This is the first case of a HW-only test - in order to test the devlink resource a driver capable of exposing resources has to be provided first. Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> [petrm@mellanox.com: Extracted two patches out of this patch. Tweaked commit message.] Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
This library builds on top of devlink_lib.sh and contains functionality specific to Spectrum ASICs, e.g., re-partitioning the various KVD sub-parts. Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> [petrm@mellanox.com: Split this out from another patch. Fix line length in devlink_sp_read_kvd_defaults().] Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
This helper library contains wrappers to devlink functionality agnostic to the underlying device. Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> [petrm@mellanox.com: Split this out from another patch.] Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
setup_wait() and tc_offload_check() both assume that all NUM_NETIFS interfaces are relevant for a given test. However, the scale test script acts as an umbrella for a number of sub-tests, some of which may not require all the interfaces. Thus it's suboptimal for tc_offload_check() to query all the interfaces. In case of setup_wait() it's incorrect, because the sub-test in question of course doesn't configure any interfaces beyond what it needs, and setup_wait() then ends up waiting indefinitely for the extraneous interfaces to come up. For that reason, give setup_wait() and tc_offload_check() an optional parameter with a number of interfaces to probe. Fall back to global NUM_NETIFS if the parameter is not given. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Yuval Mintz <yuvalm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
In the scale testing scenarios, one usually has a condition that is expected to either fail, or pass, depending on which side of the scale is being tested. To capture this logic, add a function check_err_fail(), which dispatches either to check_err() or check_fail(), depending on the value of the first argument, should_fail. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Yuval Mintz <yuvalm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yuval Mintz authored
The devlink related scripts are mlxsw-specific. As a result, they'll reside in a different directory - but would still need the common logic implemented in lib.sh. So as a preliminary step, allow lib.sh to be sourced from other directories as well. Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Jakub Kicinski says: ==================== nfp: flower updates and netconsole This set contains assorted updates to driver base and flower. First patch is a follow up to a fix to calculating counters which went into net. For ethtool counters we should also make sure they are visible even after ring reconfiguration. Next patch is a safety measure in case we are running on a machine with a broken BIOS we should fail the probe when risk of incorrect operation is too high. The next two patches add netpoll support and make use of napi_consume_skb(). Last we populate bus info on all representors. Pieter adds support for offload of the checksum action in flower. John follows up to another fix he's done in net, we set TTL values on tunnels to stack's default, now Johns does a full route lookup to get a more precise information, he populates ToS field as well. Last but not least he follows up on Jiri's request to enable LAG offload in case the team driver is used and then hash function is unknown. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
John Hurley authored
Currently the NFP fw only supports L3/L4 hashing so rejects the offload of filters that output to LAG ports implementing other hash algorithms. Team, however, uses a BPF function for the hash that is not defined. To support Team offload, accept hashes that are defined as 'unknown' (only Team defines such hash types). In this case, use the NFP default of L3/L4 hashing for egress port selection. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
John Hurley authored
Extract the tos and the tunnel flags from the tunnel key and offload these action fields. Only the checksum and tunnel key flags are implemented in fw so reject offloads of other flags. The tunnel key flag is always considered set in the fw so enforce that it is set in the rule. Note that the compulsory setting of the tunnel key flag and optional setting of checksum is inline with how tc currently generates ipv4 udp tunnel actions. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
John Hurley authored
Previously the ttl for ipv4 udp tunnels was set to the namespace default. Modify this to attempt to extract the ttl from a full route lookup on the tunnel destination. If this is not possible then resort to the default. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Pieter Jansen van Vuuren authored
Hardware will automatically update csum in headers when a set action has been performed. This means we could in the driver ignore the explicit checksum action when performing a set action. Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
We used to leave bus-info in ethtool driver info empty for representors in case multi-PCIe-to-single-host cards make the association between PCIe device and NFP many to one. It seems these attempts are futile, we need to link the representors to one PCIe device in sysfs to get consistent naming, plus devlink uses one PCIe as a handle, anyway. The multi-PCIe-to-single-system support won't be clean, if it ever comes. Turns out some user space (RHEL tests) likes to read bus-info so just populate it. While at it remove unnecessary app NULL-check, representors are spawned by an app, so it must exist. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
Use napi_consume_skb() in nfp_net_tx_complete() to get bulk free. Pass 0 as budget for ctrl queue completion since it runs out of a tasklet. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
NFP NAPI handling will only complete the TXed packets when called with budget of 0, implement ndo_poll_controller by scheduling NAPI on all TX queues. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
On some platforms with broken ACPI tables we may not have access to the Serial Number PCIe capability. This capability is crucial for us for switchdev operation as we use serial number as switch ID, and for communication with management FW where interface ID is used. If we can't determine the Serial Number we have to fail device probe. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
After user changes the ring count statistics for deactivated rings disappear from ethtool -S output. This causes loss of information to the user and means that ethtool stats may not add up to interface stats. Always expose counters from all the rings. Note that we allocate at most num_possible_cpus() rings so number of rings should be reasonable. The alternative of only listing stats for rings which were ever in use could be confusing. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vakul Garg authored
Calling skb_unclone() is expensive as it triggers a memcpy operation. Instead of calling skb_unclone() unconditionally, call it only when skb has a shared frag_list. This improves tls rx throughout significantly. Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Suggested-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Keara Leibovitz authored
Create unittests for the tc tunnel_key action. v2: For the tests expecting failures, added non-zero exit codes in the teardowns. This prevents those tests from failing if the act_tunnel_key module is unloaded. Signed-off-by: Keara Leibovitz <kleib@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Merge tag 'mac80211-next-for-davem-2018-06-29' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Small merge conflict in net/mac80211/scan.c, I preserved the kcalloc() conversion. -DaveM Johannes Berg says: ==================== This round's updates: * finally some of the promised HE code, but it turns out to be small - but everything kept changing, so one part I did in the driver was >30 patches for what was ultimately <200 lines of code ... similar here for this code. * improved scan privacy support - can now specify scan flags for randomizing the sequence number as well as reducing the probe request element content * rfkill cleanups * a timekeeping cleanup from Arnd * various other cleanups ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
GhantaKrishnamurthy MohanKrishna authored
This commit extends the existing TIPC socket diagnostics framework for information related to TIPC group communication. Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: GhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
GhantaKrishnamurthy MohanKrishna authored
A peer node is considered down if there are no active links (or) lost contact to the node. In current implementation, a peer node instance is deleted either if a) TIPC module is removed (or) b) Application can use a netlink/iproute2 interface to delete a specific down node. Thus, a down node instance lives in the system forever, unless the application explicitly removes it. We fix this by deleting the nodes which are down for a specified amount of time (5 minutes). Existing node supervision timer is used to achieve this. Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: GhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Heiner Kallweit authored
The very first version of RTL8169 from 2002 (and only this one) has support for a TBI 1000BaseX fiber interface. The TBI support in the driver makes switching to phylib tricky, so best would be to get rid of it. I found no report from anybody using a device with RTL8169 and fiber interface, also the vendor driver doesn't support this mode (any longer). So remove TBI support and bail out with a message if a card with activated TBI is detected. If there really should be any user of it out there, we could add a stripped-down version of the driver supporting chip version 01 and TBI only (and maybe move it to staging). Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tung Nguyen authored
In single-link usage, the function tipc_node_timeout() still iterates over the whole link array to handle each link. Given that the maximum number of bearers are 3, there are 2 redundant iterations with lock grab/release. Since this function is executing very frequently it makes sense to optimize it. This commit adds conditional checking to exit from the loop if the known number of configured links has already been accessed. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tung Nguyen authored
The function tipc_msg_extract() is using skb_clone() to clone inner messages from a message bundle buffer. Although this method is safe, it has an undesired effect that each buffer clone inherits the true-size of the bundling buffer. As a result, the buffer clone almost always ends up with being copied anyway by the message validation function. This makes the cloning into a sub-optimization. In this commit we take the consequence of this realization, and copy each inner message to a separately allocated buffer up front in the extraction function. As a bonus we can now eliminate the two cases where we had to copy re-routed packets that may potentially go out on the wire again. Signed-off-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-