- 16 Jun, 2017 8 commits
-
-
Tariq Toukan authored
Increase the default TX ring size (from 512 to 1024) to match the RX ring size. This gives the XDP TX ring a better chance to keep up with the rate of its RX ring in case of a high load of XDP_TX actions. Tested: Ethtool counter rx_xdp_tx_full used to increase, after applying this patch it stopped. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Cc: kernel-team@fb.com Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tariq Toukan authored
Instead of having their own NAPIs, XDP TX completion queues get polled within the corresponding RX NAPI. This prevents any possible race on TX ring prod/cons indices, between the context that issues the transmits (RX NAPI) and the context that handles the completions (was previously done in a separate NAPI). This also improves performance, as it decreases the number of NAPIs running on a CPU, saving the overhead of syncing and switching between the contexts. Performance tests: Tested on ConnectX3Pro, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz Single queue no-RSS optimization ON. XDP_TX packet rate: ------------------------------------- | Before | After | Gain | IPv4 | 12.0 Mpps | 13.8 Mpps | 15% | IPv6 | 12.0 Mpps | 13.8 Mpps | 15% | ------------------------------------- Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Cc: kernel-team@fb.com Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tariq Toukan authored
Several performance improvements in XDP TX datapath, including: - Ring a single doorbell for XDP TX ring per NAPI budget, instead of doing it per a lower threshold (was 8). This includes removing the flow of immediate doorbell ringing in case of a full TX ring. - Compiler branch predictor hints. - Calculate values in compile time rather than in runtime. Performance tests: Tested on ConnectX3Pro, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz Single queue no-RSS optimization ON. XDP_TX packet rate: ------------------------------------- | Before | After | Gain | IPv4 | 10.3 Mpps | 12.0 Mpps | 17% | IPv6 | 10.3 Mpps | 12.0 Mpps | 17% | ------------------------------------- Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Cc: kernel-team@fb.com Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tariq Toukan authored
Several small code and performance improvements in stack TX datapath, including: - Compiler branch predictor hints. - Minimize variables scope. - Move tx_info non-inline flow handling to a separate function. - Calculate data_offset in compile time rather than in runtime (for !lso_header_size branch). - Avoid trinary-operator ("?") when value can be preset in a matching branch. Performance tests: Tested on ConnectX3Pro, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz Gain is too small to be measurable, no degradation sensed. Results are similar for IPv4 and IPv6. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Cc: kernel-team@fb.com Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tariq Toukan authored
Several small performance improvements in TX CQ polling, including: - Compiler branch predictor hints. - Minimize variables scope. - More proper check of cq type. - Use boolean instead of int for a binary indication. Performance tests: Tested on ConnectX3Pro, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz Packet-rate tests for both regular stack and XDP use cases: No noticeable gain, no degradation. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Cc: kernel-team@fb.com Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tariq Toukan authored
Several small performance improvements in RX datapath, including: - Compiler branch predictor hints. - Replace a multiplication with a shift operation. - Minimize variables scope. - Write-prefetch for packet header. - Avoid trinary-operator ("?") when value can be preset in a matching branch. - Save a branch by updating RX ring doorbell within mlx4_en_refill_rx_buffers(), which now returns void. Performance tests: Tested on ConnectX3Pro, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz Single queue no-RSS optimization ON (enable by ethtool -L <interface> rx 1). XDP_DROP packet rate: Same (28.1 Mpps), lower CPU utilization (from ~100% to ~92%). Drop packets in TC: ------------------------------------- | Before | After | Gain | IPv4 | 4.14 Mpps | 4.18 Mpps | 1% | ------------------------------------- XDP_TX packet rate: ------------------------------------- | Before | After | Gain | IPv4 | 10.1 Mpps | 10.3 Mpps | 2% | IPv6 | 10.1 Mpps | 10.3 Mpps | 2% | ------------------------------------- Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Cc: kernel-team@fb.com Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Saeed Mahameed authored
Avoid touching RX QP RSS context when loading with only one RX ring, to allow optimized A0 RX steering. Enable by: - loading mlx4_core with module param: log_num_mgm_entry_size = -6. - then: ethtool -L <interface> rx 1 Performance tests: Tested on ConnectX3Pro, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz XDP_DROP packet rate: ------------------------------------- | Before | After | Gain | IPv4 | 20.5 Mpps | 28.1 Mpps | 37% | IPv6 | 18.4 Mpps | 28.1 Mpps | 53% | ------------------------------------- Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Cc: kernel-team@fb.com Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tariq Toukan authored
Remove owner argument, as it is obsolete and unused. This also saves the overhead of calculating its value in data-path. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Cc: kernel-team@fb.com Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 15 Jun, 2017 32 commits
-
-
Gustavo A. R. Silva authored
Value assigned to variable _data32_ at lines 1254 and 1257 is overwritten at line 1260 before it can be used. This makes such variable assignments useless. Addresses-Coverity-ID: 1227049 Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vivien Didelot authored
The current code only assigns the default cpu_dp to all user ports of the switch to which the CPU port belongs. The user ports of the other switches of the fabric thus don't have a default CPU port. This patch fixes this by assigning the cpu_dp of all user ports of all switches of the fabric when the tree is fully parsed. Fixes: a29342e7 ("net: dsa: Associate slave network device with CPU port") Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Hayes Wang says: ==================== r8152: support new chips These patches are used to support new chips. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
hayeswang authored
Add byte_enable for ocp_read_word() to replace reading 4 bytes data with reading the desired 2 bytes data. This is used to avoid the issue which is described in commit b4d99def ("r8152: remove sram_read"). The original method always reads 4 bytes data, and it may have problem when reading the PHY registers. The new method is supported since RTL8153B, but it doesn't influence the previous chips. The bits of the byte_enable for the previous chips are the reserved bits, and the hw would ignore them. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
hayeswang authored
This patch supports two new chips for RTL8153B. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
hayeswang authored
The settings of the new chip are the same with RTL8152, except that its product ID is 0x8050. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Thomas Falcon says: ==================== ibmvnic: LPM bug fixes This series of small patches is meant to resolve a number of bugs, mostly occurring during an ibmvnic driver reset when recovering from a logical partition migration (LPM). The first patch ensures that RX buffer pools are properly activated following an adapter reset by setting the proper flag in the pool data structure. The second patch uses netif_tx_disable to stop TX queues when closing the device during a reset. Third, fixup a typo that resulted in partial sanitization of TX/RX descriptor queues following a device reset. Fourth, remove an ambiguous conditional check that was resulting in a kernel panic as null RX/TX completion descriptors were being processed during napi polling while the device is closing. Finally, fix a condition where the napi polling routine exits before it has completed its work budget without notifying the upper network layers. This omission could result in the napi_disable function sleeping indefinitely under certain conditions. v2: Attempt to provide a proper cover letter ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Thomas Falcon authored
This patch fixes a bug where, in the case of a device reset, the polling routine will never complete, causing napi_disable to sleep indefinitely when attempting to close the device. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Thomas Falcon authored
Fix a kernel panic resulting from data access of a NULL pointer during device close. The pending_scrq routine is meant to determine whether there is a valid sub-CRQ message awaiting processing. When the device is closing, however, there is a possibility that NULL messages can be processed because pending_scrq will always return 1 even if there no valid message in the queue. It's not clear what this closing state check was originally meant to accomplish, so just remove it. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Thomas Falcon authored
Fixup a typo so that the entire SCRQ buffer is cleaned. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Thomas Falcon authored
Use netif_tx_disable to guarantee that TX queues are disabled when __ibmvnic_close is called by the device reset routine. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Thomas Falcon authored
RX buffer pools are disabled while awaiting a device reset if firmware indicates that the resource is closed. This patch fixes a bug where pools were not being subsequently enabled after the device reset, causing the device to become inoperable. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Shannon Nelson authored
As much as we'd like to play well with others, we really aren't handling the checksums on non-IP protocol packets very well. This is easily seen when trying to do TCP over ipv6 - the checksums are garbage. Here we restrict the checksum feature flag to just IP traffic so that we aren't given work we can't yet do. Orabug: 26175391, 26259755 Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Jiri Benc says: ==================== net: sched: act_tunnel_key: UDP checksums Currently, the tunnel_key tc action does not set TUNNEL_CSUM, thus transmitting packets with zero UDP checksum. This is inconsistent with how we treat non-lwt UDP tunnels where the default is to fill in the UDP checksum. Non-zero UDP checksum is the better default anyway for various reasons previously discussed. Make this configurable for the tunnel_key tc action with the default being non-zero checksum. Saves a lot of surprises especially with IPv6. Signed-off-by: Jiri Benc <jbenc@redhat.com> ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Benc authored
Allow requesting of zero UDP checksum for encapsulated packets. The name and meaning of the attribute is "NO_CSUM" in order to have the same meaning of the attribute missing and being 0. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Benc authored
There's currently no way to request (outer) UDP checksum with act_tunnel_key. This is problem especially for IPv6. Right now, tunnel_key action with IPv6 does not work without going through hassles: both sides have to have udp6zerocsumrx configured on the tunnel interface. This is obviously not a good solution universally. It makes more sense to compute the UDP checksum by default even for IPv4. Just set the default to request the checksum when using act_tunnel_key. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Gustavo A. R. Silva authored
Remove useless variable rxd_index and code related. Addresses-Coverity-ID: 1397691 Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Vivien Didelot says: ==================== net: dsa: prefix Global macros This patch series is the 2/3 step of the register definitions cleanup. It brings no functional changes. It prefixes and documents all Global (1) registers with MV88E6XXX_G1_ (or a specific model like MV88E6352_G1_STS_PPU_STATE), and prefers a 16-bit hexadecimal representation of the Marvell registers layout. The next and last patchset will prefix the Global 2 registers. ==================== Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vivien Didelot authored
Prefix and document the remaining Global IP and IEEE Priority and Core Tag Type registers and give them a clear 16-bit register representation. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vivien Didelot authored
Prefix and document the Global Stats Operation and Counter registers and give them a clear 16-bit registers representation. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vivien Didelot authored
Prefix and document the Global Monitor Control Register macros (which became the Global Monitor & MGMT Control Register with 88E6390) and give a clear 16-bit registers representation. Use __bf_shf to get the shift value at compile time instead of adding new defined macros for it. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vivien Didelot authored
Prefix and document the Global Control and Control 2 registers macros and give a clear 16-bit registers representation. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vivien Didelot authored
Prefix and document the Global VTU registers macros and give a clear 16-bit registers representation. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vivien Didelot authored
Prefix and document the Global ATU Registers macros and give clear 16-bit registers representation. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vivien Didelot authored
Prefix and document the Global Switch MAC Address Register macros and give clear 16-bit register representation. At the same time, move mv88e6xxx_g1_set_switch_mac in global1.c, where it belongs. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vivien Didelot authored
Prefix and document the Global Status Register macros and give clear 16-bit register representation. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Johannes Berg authored
It's nicer to return void, since then there's no need to cast to any structures. Currently none of the users have a cast, but a number of future conversions do. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Dave Watson says: ==================== net: kernel TLS This series adds support for kernel TLS encryption over TCP sockets. A standard TCP socket is converted to a TLS socket using a setsockopt. Only symmetric crypto is done in the kernel, as well as TLS record framing. The handshake remains in userspace, and the negotiated cipher keys/iv are provided to the TCP socket. We implemented support for this API in OpenSSL 1.1.0, the code is available at https://github.com/Mellanox/tls-openssl/tree/master It should work with any TLS library with similar modifications, a test tool using gnutls is here: https://github.com/Mellanox/tls-af_ktls_tool RFC patch to openssl: https://mta.openssl.org/pipermail/openssl-dev/2017-June/009384.html Changes from V2: * EXPORT_SYMBOL_GPL in patch 1 * Ensure cleanup code always called before sk_stream_kill_queues to avoid warnings Changes from V1: * EXPORT_SYMBOL GPL in patch 2 * Add link to OpenSSL patch & gnutls example in documentation patch. * sk_write_pending check was rolled in to wait_for_memory path, avoids special case and fixes lock inbalance issue. * Unify flag handling for sendmsg/sendfile Changes from RFC V2: * Generic ULP (upper layer protocol) framework instead of TLS specific setsockopts * Dropped Mellanox hardware patches, will come as separate series. Framework will work for both. RFC V2: http://www.mail-archive.com/netdev@vger.kernel.org/msg160317.html Changes from RFC V1: * Socket based on changing TCP proto_ops instead of crypto framework * Merged code with Mellanox's hardware tls offload * Zerocopy sendmsg support added - sendpage/sendfile is no longer necessary for zerocopy optimization RFC V1: http://www.mail-archive.com/netdev@vger.kernel.org/msg88021.html * Socket based on crypto userspace API framework, required two sockets in userspace, one encrypted, one unencrypted. Paper: https://netdevconf.org/1.2/papers/ktls.pdf ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Dave Watson authored
Add documentation for the tcp ULP tls interface. Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Dave Watson authored
Software implementation of transport layer security, implemented using ULP infrastructure. tcp proto_ops are replaced with tls equivalents of sendmsg and sendpage. Only symmetric crypto is done in the kernel, keys are passed by setsockopt after the handshake is complete. All control messages are supported via CMSG data - the actual symmetric encryption is the same, just the message type needs to be passed separately. For user API, please see Documentation patch. Pieces that can be shared between hw and sw implementation are in tls_main.c Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Dave Watson authored
Export do_tcp_sendpages and tcp_rate_check_app_limited, since tls will need to sendpages while the socket is already locked. tcp_sendpage is exported, but requires the socket lock to not be held already. Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Dave Watson authored
Add the infrustructure for attaching Upper Layer Protocols (ULPs) over TCP sockets. Based on a similar infrastructure in tcp_cong. The idea is that any ULP can add its own logic by changing the TCP proto_ops structure to its own methods. Example usage: setsockopt(sock, SOL_TCP, TCP_ULP, "tls", sizeof("tls")); modules will call: tcp_register_ulp(&tcp_tls_ulp_ops); to register/unregister their ulp, with an init function and name. A list of registered ulps will be returned by tcp_get_available_ulp, which is hooked up to /proc. Example: $ cat /proc/sys/net/ipv4/tcp_available_ulp tls There is currently no functionality to remove or chain ULPs, but it should be possible to add these in the future if needed. Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-