- 23 Apr, 2019 12 commits
-
-
Maxim Mikityanskiy authored
mlx5e_trigger_irq posts a NOP to the ICO SQ just to trigger an IRQ and enter the NAPI poll on the right CPU according to the affinity. Use it in mlx5e_activate_rq. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Maxim Mikityanskiy authored
mdev is unused in mlx5e_rx_is_linear_skb. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Maxim Mikityanskiy authored
mlx5e_mpwqe_get_log_rq_size calculates the number of WQEs (N) based on the requested number of frames in the RQ (F) and the number of packets per WQE (P). It ensures that N is not less than the minimum number of WQEs in an RQ (N_min). Arithmetically, it means that F / P >= N_min should be true. This function deals with logarithms, so it should check that log(F) - log(P) >= log(N_min). However, if F < P, this expression will cause an unsigned underflow. Check log(F) >= log(P) + log(N_min) instead. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Maxim Mikityanskiy authored
This commit moves the parameter calculation functions to a separate file for better modularity and code sharing with future features. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Maxim Mikityanskiy authored
If the channels fail to reopen after setting an XDP program, return the error code instead of 0. A proper fix is still needed, as now any error while reopening the channels brings the interface down. This patch only adds error reporting. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Maxim Mikityanskiy authored
params is unused in mlx5e_init_di_list. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Shay Agroskin authored
Upon high packet rate with multiple CPUs TX workloads, much of the HCA's resources are spent on prefetching TX descriptors, thus affecting transmission rates. This patch comes to mitigate this problem by moving some workload to the CPU and reducing the HW data prefetch overhead for small packets (<= 256B). When forwarding packets with XDP, a packet that is smaller than a certain size (set to ~256 bytes) would be sent inline within its WQE TX descrptor (mem-copied), when the hardware tx queue is congested beyond a pre-defined water-mark. This is added to better utilize the HW resources (which now makes one less packet data prefetch) and allow better scalability, on the account of CPU usage (which now 'memcpy's the packet into the WQE). To load balance between HW and CPU and get max packet rate, we use watermarks to detect how much the HW is congested and move the work loads back and forth between HW and CPU. Performance: Tested packet rate for UDP 64Byte multi-stream over two dual port ConnectX-5 100Gbps NICs. CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz * Tested with hyper-threading disabled XDP_TX: | | before | after | | | 24 rings | 51Mpps | 116Mpps | +126% | | 1 ring | 12Mpps | 12Mpps | same | XDP_REDIRECT: ** Below is the transmit rate, not the redirection rate which might be larger, and is not affected by this patch. | | before | after | | | 32 rings | 64Mpps | 92Mpps | +43% | | 1 ring | 6.4Mpps | 6.4Mpps | same | As we can see, feature significantly improves scaling, without hurting single ring performance. Signed-off-by: Shay Agroskin <shayag@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Shay Agroskin authored
This counter tracks how many TX MPWQE sessions are started in XDP SQ in XDP TX/REDIRECT flow. It counts per-channel and global stats. Signed-off-by: Shay Agroskin <shayag@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Tariq Toukan authored
The XDP redirect flush indication belongs to the receive queue, not to its XDP send queue. For this, use a new bit on rq->flags. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Shay Agroskin <shayag@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Tariq Toukan authored
Values in enum mlx5e_rq_flag are used as bit indixes. Intention was to use them with no BIT(i) wrapping. No functional bug fix here, as the same (shifted)flag bit is used for all set, test, and clear operations. Fixes: 121e8927 ("net/mlx5e: Refactor RQ XDP_TX indication") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Shay Agroskin <shayag@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Tariq Toukan authored
The buffers mapping of the Multi-Packet WQEs (of Striding RQ) is done via UMR posts, one UMR WQE per an RX MPWQE. A single MPWQE is capable of serving many incoming packets, usually larger than the budget of a single napi cycle. Hence, posting a single UMR WQE per napi cycle (and handling its completion in the next cycle) works fine in many common cases, but not always. When an XDP program is loaded, every MPWQE is capable of serving less packets, to satisfy the packet-per-page requirement. Thus, for the same number of packets more MPWQEs (and UMR posts) are needed (twice as much for the default MTU), giving less latency room for the UMR completions. In this patch, we add support for multiple outstanding UMR posts, to allow faster gap closure between consuming MPWQEs and reposting them back into the WQ. For better SW and HW locality, we combine the UMR posts in bulks of (at least) two. This is expected to improve packet rate in high CPU scale. Performance test: As expected, huge improvement in large-scale (48 cores). xdp_redirect_map, 64B UDP multi-stream. Redirect from ConnectX-5 100Gbps to ConnectX-6 100Gbps. CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz. Before: Unstable, 7 to 30 Mpps After: Stable, at 70.5 Mpps No degradation in other tested scenarios. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
-
- 22 Apr, 2019 2 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linuxSaeed Mahameed authored
Linux 5.1-rc1 We forgot to reset the branch last merge window thus mlx5-next is outdated and still based on 5.0-rc2. This merge commit is needed to sync mlx5-next branch with 5.1-rc1. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
David Ahern authored
The RTF_ADDRCONF flag filters out routes added by RA's in determining which routes can be appended to an existing one to create a multipath route. Restore the flag check and add a comment to document the RA piece. Fixes: 4e54507a ("ipv6: Simplify rt6_qualify_for_ecmp") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 21 Apr, 2019 7 commits
-
-
David Ahern authored
After commit c7a1ce39 ("ipv6: Change addrconf_f6i_alloc to use ip6_route_info_create"), the gateway is no longer filled in for fib6_nh structs in a prefix route. Accordingly, the RTF_ADDRCONF flag check can be dropped from the 'rt6_qualify_for_ecmp'. Further, RTF_DYNAMIC is only set in rt6_info instances, so it can be removed from the check as well. This reduces rt6_qualify_for_ecmp and the mlxsw version to just checking if the nexthop has a gateway which is the real indication of whether entries can be coalesced into a multipath route. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Fuqian Huang authored
The pointer should be printed with %p or %px rather than cast to unsigned long type and printed with %08lx. Change %08lx to %p to print the pointer. Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Fuqian Huang authored
Pointers should be printed with %p or %px rather than cast to long type and printed with %8.8lx. Change %8.8lx to %p to print the pointer. Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Fuqian Huang authored
Pointers should be printed with %p or %px rather than cast to long type and printed with %x. Change %x to %p to print the pointers. Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Ido Schimmel says: ==================== mlxsw: Small routing improvements Patch #1 switches the driver to use a unique and stable ECMP/LAG seed. This allows for consistent behavior across reboots and avoids hash polarization at the same time. Patch #2 relaxes the FIB rule validation in the driver to allow the installation of rules that direct locally generated traffic (iif=lo). This does not result in a discrepancy between both data paths because packets received by the device would never match such rules. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
Currently, mlxsw does not support policy-based routing (PBR) and therefore forbids the installation of non-default FIB rules except for the l3mdev rule which is used for VRFs. Relax the check to allow the installation of FIB rules that would never match packets received by the device. Specifically, if the iif is that of the loopback netdev. This is useful for users that need to redirect locally generated packets based on FIB rules. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Tested-by: Alexander Petrovskiy <alexpe@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
In order to get a consistent behavior of traffic flows across reboots / module unload, we need to use the same ECMP/LAG seed. Calculate the seed by hashing the base MAC of the device. This results in a seed that is both unique (to avoid polarization) and consistent. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 20 Apr, 2019 14 commits
-
-
Pablo Cascón authored
By default VFs are not trusted. Add ndo_set_vf_trust support to toggle a new per-VF bit. Coupled with FW with this capability allows a trusted VF to change its MAC even after being administratively set by the PF. Also populate the trusted field on ndo_get_vf_config. Add the same ndo to the representors. Signed-off-by: Pablo Cascón <pablo.cascon@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Huazhong Tan says: ==================== net: hns3: add some DFX info for HNS3 driver This patch-set adds some DFX information to HNS3 driver, for easily debug some problems, and fixes some related bugs. [patch 1/12 - 4/12] adds debug info about reset & interrupt events [patch 5/12 - 7/12] adds debug info about TX time out & fixes related bugs [patch 8/12] adds support for setting netif message level [patch 9/12 - 10/12] adds debugfs command to dump NCL & MAC info [patch 11/12] adds VF's queue statistics info updating [patch 12/12] adds a check for debugfs help function to decide which commands are supportable ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yufeng Mo authored
PF supports all debugfs command, but VF only supports part of debugfs command. So VF should not show unsupported help information. This patch adds a check for PF and PF to show the supportable help information. Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
liuzhongzhu authored
This patch updates VF's TQP statistic info in the service task, and adds a limitation to prevent update too frequently. Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Weihang Li authored
MAC tnl interruptions are different from other type of RAS and MSI-X errors, because some bits, such as OVF/LR/RF will occur during link up and down. The drivers should clear status of all MAC tnl interruption bits but shouldn't print any message that would mislead the users. In case that link down and re-up in a short time because of some reasons, we record when they occurred, and users can query them by debugfs. Signed-off-by: Weihang Li <liweihang@hisilicon.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Weihang Li authored
This patch allow users to dump content of NCL_CONFIG by using debugfs command. Command format: echo dump ncl_config <offset> <length> > cmd It will print as follows: hns3 0000:7d:00.0: offset | data hns3 0000:7d:00.0: 0x0000 | 0x00000020 hns3 0000:7d:00.0: 0x0004 | 0x00000400 hns3 0000:7d:00.0: 0x0008 | 0x08020401 Signed-off-by: Weihang Li <liweihang@hisilicon.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yonglong Liu authored
This patch adds support for network interface message level settings. The message level can be changed by module parameter or ethtool. Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jian Shen authored
Currently we just print few information when tx timeout happens. In order to find out the cause of timeout, this patch prints more information about the packet statistics, tqp registers and napi state. Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jian Shen authored
In function hns3_get_tx_timeo_queue_info(), it should use netdev->num_tx_queues, instead of netdve->real_num_tx_queues as the loop limitation. Fixes: 424eb834 ("net: hns3: Unified HNS3 {VF|PF} Ethernet Driver for hip08 SoC") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jian Shen authored
In current codes, tx_timeout_cnt is used before increased, then we can see the tx_timeout_count is still 0 from the print when tx timeout happens, e.g. "hns3 0000:7d:00.3 eth3: tx_timeout count: 0, queue id: 0, SW_NTU: 0xa6, SW_NTC: 0xa4, HW_HEAD: 0xa4, HW_TAIL: 0xa6, INT: 0x1" The tx_timeout_cnt should be updated before used. Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Huazhong Tan authored
When wait for response timeout or response code not match, there should be more information for debugging. Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Huazhong Tan authored
When received vector0 msix and other event interrupt, it should print the value of the register for debugging. Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Huazhong Tan authored
This patch adds some statistics for VF reset. Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Huazhong Tan authored
This patch adds statistics for PF's reset information, also, provides a debugfs command to dump these statistics. Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 19 Apr, 2019 5 commits
-
-
Eric Dumazet authored
tcp sendmsg() and sendpage() normally advance skb->data_len and skb->truesize by the payload added to an skb. But sendmsg(fd, ..., MSG_ZEROCOPY) has to account for whole pages, even if a single byte of payload is used in the page. This means that we can not assume skb->truesize can be adjusted by skb->data_len. We must instead overwrite its value. Otherwise skb->truesize is too big and can hit socket sndbuf limit, especially if the skb is recycled multiple times :/ Fixes: 472c2e07 ("tcp: add one skb cache for tx") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Willem de Bruijn <willemb@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
Since commit f6b19b35 ("net: devlink: select NET_DEVLINK from drivers") adds implicit select of NET_DEVLINK for mlxsw, the code does not have to deal with the case when CONFIG_NET_DEVLINK is not enabled. So remove the ifcase and adjust Makefile. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tung Nguyen authored
When using TIPC_SOCK_RECVQ_DEPTH for getsockopt(), it returns the number of buffers in receive socket buffer which is not so helpful for user space applications. This commit introduces the new option TIPC_SOCK_RECVQ_USED which returns the current allocated bytes of the receive socket buffer. This helps user space applications dimension its buffer usage to avoid buffer overload issue. Signed-off-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Andrew Lunn authored
phylink will call the mac_config() callback once per second when polling a PHY or a fixed link. The MAC driver is not supposed to reconfigure the MAC if nothing has changed. Make the mv88e6xxx driver look at the current configuration of the port, and return early if nothing has changed. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arnd Bergmann authored
The 'timeval' and 'timespec' data structures used for socket timestamps are going to be redefined in user space based on 64-bit time_t in future versions of the C library to deal with the y2038 overflow problem, which breaks the ABI definition. Unlike many modern ioctl commands, SIOCGSTAMP and SIOCGSTAMPNS do not use the _IOR() macro to encode the size of the transferred data, so it remains ambiguous whether the application uses the old or new layout. The best workaround I could find is rather ugly: we redefine the command code based on the size of the respective data structure with a ternary operator. This lets it get evaluated as late as possible, hopefully after that structure is visible to the caller. We cannot use an #ifdef here, because inux/sockios.h might have been included before any libc header that could determine the size of time_t. The ioctl implementation now interprets the new command codes as always referring to the 64-bit structure on all architectures, while the old architecture specific command code still refers to the old architecture specific layout. The new command number is only used when they are actually different. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-