- 19 Jun, 2024 20 commits
-
-
David S. Miller authored
Yan Zhai says: ==================== net: pass receive socket to drop tracepoint We set up our production packet drop monitoring around the kfree_skb tracepoint. While this tracepoint is extremely valuable for diagnosing critical problems, it also has some limitation with drops on the local receive path: this tracepoint can only inspect the dropped skb itself, but such skb might not carry enough information to: 1. determine in which netns/container this skb gets dropped 2. determine by which socket/service this skb oughts to be received The 1st issue is because skb->dev is the only member field with valid netns reference. But skb->dev can get cleared or reused. For example, tcp_v4_rcv will clear skb->dev and in later processing it might be reused for OFO tree. The 2nd issue is because there is no reference on an skb that reliably points to a receiving socket. skb->sk usually points to the local sending socket, and it only points to a receive socket briefly after early demux stage, yet the socket can get stolen later. For certain drop reason like TCP OFO_MERGE, Zerowindow, UDP at PROTO_MEM error, etc, it is hard to infer which receiving socket is impacted. This cannot be overcome by simply looking at the packet header, because of complications like sk lookup programs. In the past, single purpose tracepoints like trace_udp_fail_queue_rcv_skb, trace_sock_rcvqueue_full, etc are added as needed to provide more visibility. This could be handled in a more generic way. In this change set we propose a new 'sk_skb_reason_drop' call as a drop-in replacement for kfree_skb_reason at various local input path. It accepts an extra receiving socket argument. Both issues above can be resolved via this new argument. V4->V5: rename rx_skaddr to rx_sk to be more clear visually, suggested by Jesper Dangaard Brouer. V3->V4: adjusted the TP_STRUCT field order to align better, suggested by Steven Rostedt. V2->V3: fixed drop_monitor function signatures; fixed a few uninitialized sks; Added a few missing report tags from test bots (also noticed by Dan Carpenter and Simon Horman). V1->V2: instead of using skb->cb, directly add the needed argument to trace_kfree_skb tracepoint. Also renamed functions as Eric Dumazet suggested. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yan Zhai authored
Replace kfree_skb_reason with sk_skb_reason_drop and pass the receiving socket to the tracepoint. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/r/202406011859.Aacus8GV-lkp@intel.com/Signed-off-by: Yan Zhai <yan@cloudflare.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yan Zhai authored
Replace kfree_skb_reason with sk_skb_reason_drop and pass the receiving socket to the tracepoint. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/r/202406011751.NpVN0sSk-lkp@intel.com/Signed-off-by: Yan Zhai <yan@cloudflare.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yan Zhai authored
Replace kfree_skb_reason with sk_skb_reason_drop and pass the receiving socket to the tracepoint. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/r/202406011539.jhwBd7DX-lkp@intel.com/Signed-off-by: Yan Zhai <yan@cloudflare.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yan Zhai authored
Replace kfree_skb_reason with sk_skb_reason_drop and pass the receiving socket to the tracepoint. Signed-off-by: Yan Zhai <yan@cloudflare.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yan Zhai authored
Replace kfree_skb_reason with sk_skb_reason_drop and pass the receiving socket to the tracepoint. Signed-off-by: Yan Zhai <yan@cloudflare.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yan Zhai authored
Long used destructors kfree_skb and kfree_skb_reason do not pass receiving socket to packet drop tracepoints trace_kfree_skb. This makes it hard to track packet drops of a certain netns (container) or a socket (user application). The naming of these destructors are also not consistent with most sk/skb operating functions, i.e. functions named "sk_xxx" or "skb_xxx". Introduce a new functions sk_skb_reason_drop as drop-in replacement for kfree_skb_reason on local receiving path. Callers can now pass receiving sockets to the tracepoints. kfree_skb and kfree_skb_reason are still usable but they are now just inline helpers that call sk_skb_reason_drop. Note it is not feasible to do the same to consume_skb. Packets not dropped can flow through multiple receive handlers, and have multiple receiving sockets. Leave it untouched for now. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yan Zhai <yan@cloudflare.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yan Zhai authored
skb does not include enough information to find out receiving sockets/services and netns/containers on packet drops. In theory skb->dev tells about netns, but it can get cleared/reused, e.g. by TCP stack for OOO packet lookup. Similarly, skb->sk often identifies a local sender, and tells nothing about a receiver. Allow passing an extra receiving socket to the tracepoint to improve the visibility on receiving drops. Signed-off-by: Yan Zhai <yan@cloudflare.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Diogo Ivo says ==================== Enable PTP timestamping/PPS for AM65x SR1.0 devices This patch series enables support for PTP in AM65x SR1.0 devices. This feature relies heavily on the Industrial Ethernet Peripheral (IEP) hardware module, which implements a hardware counter through which time is kept. This hardware block is the basis for exposing a PTP hardware clock to userspace and for issuing timestamps for incoming/outgoing packets, allowing for time synchronization. The IEP also has compare registers that fire an interrupt when the counter reaches the value stored in a compare register. This feature allows us to support PPS events in the kernel. The changes are separated into five patches: - PATCH 01/05: Register SR1.0 devices with the IEP infrastructure to expose a PHC clock to userspace, allowing time to be adjusted using standard PTP tools. The code for issuing/ collecting packet timestamps is already present in the current state of the driver, so only this needs to be done. - PATCH 02/05: Remove unnecessary spinlock synchronization. - PATCH 03/05: Document IEP interrupt in DT binding. - PATCH 04/05: Add support for IEP compare event/interrupt handling to enable PPS events. - PATCH 05/05: Add the interrupts to the IOT2050 device tree. Currently every compare event generates two interrupts, the first corresponding to the actual event and the second being a spurious but otherwise harmless interrupt. The root cause of this has been identified and has been solved in the platform's SDK. A forward port of the SDK's patches also fixes the problem in upstream but is not included here since it's upstreaming is out of the scope of this series. If someone from TI would be willing to chime in and help get the interrupt changes upstream that would be great! Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com> --- Changes in v4: - Remove unused 'flags' variables in patch 02/05 - Add patch 03/05 describing IEP interrupt in DT binding - Link to v3: https://lore.kernel.org/r/20240607-iep-v3-0-4824224105bc@siemens.com Changes in v3: - Collect Reviewed-by tags - Add patch 02/04 removing spinlocks from IEP driver - Use mutex-based synchronization when accessing HW registers - Link to v2: https://lore.kernel.org/r/20240604-iep-v2-0-ea8e1c0a5686@siemens.com Changes in v2: - Collect Reviewed-by tags - PATCH 01/03: Limit line length to 80 characters - PATCH 02/03: Proceed with limited functionality if getting IRQ fails, limit line length to 80 characters - Link to v1: https://lore.kernel.org/r/20240529-iep-v1-0-7273c07592d3@siemens.com ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Diogo Ivo authored
Add the interrupts needed for PTP Hardware Clock support via IEP in SR1.0 devices. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Diogo Ivo authored
The IEP module supports compare events, in which a value is written to a hardware register and when the IEP counter reaches the written value an interrupt is generated. Add handling for this interrupt in order to support PPS events. Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Diogo Ivo authored
The IEP interrupt is used in order to support both capture events, where an incoming external signal gets timestamped on arrival, and compare events, where an interrupt is generated internally when the IEP counter reaches a programmed value. Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Diogo Ivo authored
As all sources of concurrency in hardware register access occur in non-interrupt context eliminate spinlock-based synchronization and rely on the mutex-based synchronization that is already present. Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Diogo Ivo authored
Enable PTP support for AM65x SR1.0 devices by registering with the IEP infrastructure in order to expose a PTP clock to userspace. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Hongfu Li authored
Use the new KMEM_CACHE() macro instead of direct kmem_cache_create to simplify the creation of SLAB caches. Signed-off-by: Hongfu Li <lihongfu@kylinos.cn> Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Haiyang Zhang authored
As defined by the MANA Hardware spec, the queue size for DMA is 4KB minimal, and power of 2. And, the HWC queue size has to be exactly 4KB. To support page sizes other than 4KB on ARM64, define the minimal queue size as a macro separately from the PAGE_SIZE, which we always assumed it to be 4KB before supporting ARM64. Also, add MANA specific macros and update code related to size alignment, DMA region calculations, etc. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Link: https://lore.kernel.org/r/1718655446-6576-1-git-send-email-haiyangz@microsoft.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Kamal Heib says: ==================== net/mlx4_en: Use ethtool_puts/sprintf This patchset updates the mlx4_en driver to use the ethtool_puts and ethtool_sprintf helper functions. Signed-off-by: Kamal Heib <kheib@redhat.com> ==================== Link: https://lore.kernel.org/r/20240617172329.239819-1-kheib@redhat.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Kamal Heib authored
Use the ethtool_puts/ethtool_sprintf helper to print the stats strings into the ethtool strings interface. Signed-off-by: Kamal Heib <kheib@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240617172329.239819-4-kheib@redhat.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Kamal Heib authored
Use the ethtool_puts helper to print the selftest strings into the ethtool strings interface. Signed-off-by: Kamal Heib <kheib@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240617172329.239819-3-kheib@redhat.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Kamal Heib authored
Use the ethtool_puts helper to print the priv flags strings into the ethtool strings interface. Signed-off-by: Kamal Heib <kheib@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240617172329.239819-2-kheib@redhat.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 18 Jun, 2024 9 commits
-
-
Christophe JAILLET authored
"struct vcap_operations" are not modified in these drivers. Constifying this structure moves some data to a read-only section, so increase overall security. In order to do it, "struct vcap_control" also needs to be adjusted to this new const qualifier. As an example, on a x86_64, with allmodconfig: Before: ====== text data bss dec hex filename 15176 1094 16 16286 3f9e drivers/net/ethernet/microchip/lan966x/lan966x_vcap_impl.o After: ===== text data bss dec hex filename 15268 998 16 16282 3f9a drivers/net/ethernet/microchip/lan966x/lan966x_vcap_impl.o Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Daniel Machon <daniel.machon@microchip.com> Link: https://lore.kernel.org/r/d8e76094d2e98ebb5bfc8205799b3a9db0b46220.1718524644.git.christophe.jaillet@wanadoo.frSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Paolo Abeni authored
Luo Jie says: ==================== Introduce PHY mode 10G-QXGMII This patch series adds 10G-QXGMII mode for PHY driver. The patch series is split from the QCA8084 PHY driver patch series below. https://lore.kernel.org/all/20231215074005.26976-1-quic_luoj@quicinc.com/ Per Andrew Lunn’s advice, submitting this patch series for acceptance as they already include the necessary 'Reviewed-by:' tags. This way, they need not wait for QCA8084 series patches to conclude review. Changes in v2: * remove PHY_INTERFACE_MODE_10G_QXGMII from workaround of validation in the phylink_validate_phy. 10G_QXGMII will be set into phy->possible_interfaces in its .config_init method of PHY driver that supports it. ==================== Link: https://lore.kernel.org/r/20240615120028.2384732-1-quic_luoj@quicinc.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Vladimir Oltean authored
Add the new interface mode 10g-qxgmii, which is similar to usxgmii but extend to 4 channels to support maximum of 4 ports with the link speed 10M/100M/1G/2.5G. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Luo Jie <quic_luoj@quicinc.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Vladimir Oltean authored
10G-QXGMII is a MAC-to-PHY interface defined by the USXGMII multiport specification. It uses the same signaling as USXGMII, but it multiplexes 4 ports over the link, resulting in a maximum speed of 2.5G per port. Some in-tree SoCs like the NXP LS1028A use "usxgmii" when they mean either the single-port USXGMII or the quad-port 10G-QXGMII variant, and they could get away just fine with that thus far. But there is a need to distinguish between the 2 as far as SerDes drivers are concerned. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Luo Jie <quic_luoj@quicinc.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Furong Xu authored
The TSO engine works well when the frames are not VLAN Tagged. But it will produce broken segments when frames are VLAN Tagged. The first segment is all good, while the second segment to the last segment are broken, they lack of required VLAN tag. An example here: ======== // 1st segment of a VLAN Tagged TSO frame, nothing wrong. MacSrc > MacDst, ethertype 802.1Q (0x8100), length 1518: vlan 100, p 1, ethertype IPv4 (0x0800), HostA:42643 > HostB:5201: Flags [.], seq 1:1449 // 2nd to last segments of a VLAN Tagged TSO frame, VLAN tag is missing. MacSrc > MacDst, ethertype IPv4 (0x0800), length 1514: HostA:42643 > HostB:5201: Flags [.], seq 1449:2897 MacSrc > MacDst, ethertype IPv4 (0x0800), length 1514: HostA:42643 > HostB:5201: Flags [.], seq 2897:4345 MacSrc > MacDst, ethertype IPv4 (0x0800), length 1514: HostA:42643 > HostB:5201: Flags [.], seq 4345:5793 MacSrc > MacDst, ethertype IPv4 (0x0800), length 1514: HostA:42643 > HostB:5201: Flags [P.], seq 5793:7241 // normal VLAN Tagged non-TSO frame, nothing wrong. MacSrc > MacDst, ethertype 802.1Q (0x8100), length 1022: vlan 100, p 1, ethertype IPv4 (0x0800), HostA:42643 > HostB:5201: Flags [P.], seq 7241:8193 MacSrc > MacDst, ethertype 802.1Q (0x8100), length 70: vlan 100, p 1, ethertype IPv4 (0x0800), HostA:42643 > HostB:5201: Flags [F.], seq 8193 ======== When transmitting VLAN Tagged TSO frames, never insert VLAN tag by HW, always insert VLAN tag to SKB payload, then TSO works well on VLANs for all MAC cores. Tested on DWMAC CORE 5.10a, DWMAC CORE 5.20a and DWXGMAC CORE 3.20a Signed-off-by: Furong Xu <0x1207@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240615095611.517323-1-0x1207@gmail.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Kory Maincent authored
This declaration was added to the header to be called from ethtool. ethtool is separated from core for code organization but it is not really a separate entity, it controls very core things. As ethtool is an internal stuff it is not wise to have it in netdevice.h. Move the declaration to net/core/dev.h instead. Remove the EXPORT_SYMBOL_GPL call as ethtool can not be built as a module. Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Link: https://lore.kernel.org/r/20240612-feature_ptp_netnext-v15-2-b2a086257b63@bootlin.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jeff Johnson authored
With ARCH=hexagon, make allmodconfig && make W=1 C=1 reports: WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/ethernet/synopsys/dwc-xlgmac.o With most other ARCH settings the MODULE_DESCRIPTION() is provided by the macro invocation in dwc-xlgmac-pci.c. However, for hexagon, the PCI bus is not enabled, and hence CONFIG_DWC_XLGMAC_PCI is not set. As a result, dwc-xlgmac-pci.c is not compiled, and hence is not linked into dwc-xlgmac.o. To avoid this issue, relocate the MODULE_DESCRIPTION() and other related macros from dwc-xlgmac-pci.c to dwc-xlgmac-common.c, since that file already has an existing MODULE_LICENSE() and it is unconditionally linked into dwc-xlgmac.o. Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Link: https://lore.kernel.org/r/20240616-md-hexagon-drivers-net-ethernet-synopsys-v1-1-55852b60aef8@quicinc.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Shradha Gupta authored
To cleanup rxqs in port context structures, instead of duplicating the code, use existing function mana_cleanup_port_context() which does the exact cleanup that's needed. Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Wei Liu <wei.liu@kernel.org> Reviewed-by: Heng Qi <hengqi@linux.alibaba.com> Link: https://lore.kernel.org/r/1718349548-28697-1-git-send-email-shradhagupta@linux.microsoft.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Willem de Bruijn authored
Drop the WARN_ON_ONCE inn gue_gro_receive if the encapsulated type is not known or does not have a GRO handler. Such a packet is easily constructed. Syzbot generates them and sets off this warning. Remove the warning as it is expected and not actionable. The warning was previously reduced from WARN_ON to WARN_ON_ONCE in commit 27013661 ("fou: Do WARN_ON_ONCE in gue_gro_receive for bad proto callbacks"). Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240614122552.1649044-1-willemdebruijn.kernel@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 17 Jun, 2024 8 commits
-
-
David S. Miller authored
D. Wythe says: ==================== Introduce IPPROTO_SMC This patch allows to create smc socket via AF_INET, similar to the following code, /* create v4 smc sock */ v4 = socket(AF_INET, SOCK_STREAM, IPPROTO_SMC); /* create v6 smc sock */ v6 = socket(AF_INET6, SOCK_STREAM, IPPROTO_SMC); There are several reasons why we believe it is appropriate here: 1. For smc sockets, it actually use IPv4 (AF-INET) or IPv6 (AF-INET6) address. There is no AF_SMC address at all. 2. Create smc socket in the AF_INET(6) path, which allows us to reuse the infrastructure of AF_INET(6) path, such as common ebpf hooks. Otherwise, smc have to implement it again in AF_SMC path. Such as: 1. Replace IPPROTO_TCP with IPPROTO_SMC in the socket() syscall initiated by the user, without the use of LD-PRELOAD. 2. Select whether immediate fallback is required based on peer's port/ip before connect(). A very significant result is that we can now use eBPF to implement smc_run instead of LD_PRELOAD, who is completely ineffective in scenarios of static linking. Another potential value is that we are attempting to optimize the performance of fallback socks, where merging socks is an important part, and it relies on the creation of SMC sockets under the AF_INET path. (More information : https://lore.kernel.org/netdev/1699442703-25015-1-git-send-email-alibuda@linux.alibaba.com/T/) v2 -> v1: - Code formatting, mainly including alignment and annotation repair. - move inet_smc proto ops to inet_smc.c, avoiding af_smc.c becoming too bulky. - Fix the issue where refactoring affects the initialization order. - Fix compile warning (unused out_inet_prot) while CONFIG_IPV6 was not set. v3 -> v2: - Add Alibaba's copyright information to the newfile v4 -> v3: - Fix some spelling errors - Align function naming style with smc_sock_init() to smc_sk_init() - Reversing the order of the conditional checks on clcsock to make the code more intuitive v5 -> v4: - Fix some spelling errors - Added comment, "/* CONFIG_IPV6 */", after the final #endif directive. - Rename smc_inet.h and smc_inet.c to smc_inet.h and smc_inet.c - Encapsulate the initialization and destruction of inet_smc in inet_smc.c, rather than implementing it directly in af_smc.c. - Remove useless header files in smc_inet.h - Make smc_inet_prot_xxx and smc_inet_sock_init() to be static, since it's only used in smc_inet.c v6 -> v5: - Wrapping lines to not exceed 80 characters - Combine initialization and error handling of smc_inet6 into the same #if macro block. v7 -> v6: - Modify the value of IPPROTO_SMC to 256 so that it does not affect IPPROTO-MAX v8 -> v7: - Remove useless declarations. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
D. Wythe authored
This patch allows to create smc socket via AF_INET, similar to the following code, /* create v4 smc sock */ v4 = socket(AF_INET, SOCK_STREAM, IPPROTO_SMC); /* create v6 smc sock */ v6 = socket(AF_INET6, SOCK_STREAM, IPPROTO_SMC); There are several reasons why we believe it is appropriate here: 1. For smc sockets, it actually use IPv4 (AF-INET) or IPv6 (AF-INET6) address. There is no AF_SMC address at all. 2. Create smc socket in the AF_INET(6) path, which allows us to reuse the infrastructure of AF_INET(6) path, such as common ebpf hooks. Otherwise, smc have to implement it again in AF_SMC path. Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Reviewed-by: Dust Li <dust.li@linux.alibaba.com> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> Tested-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
D. Wythe authored
Externalize smc proto operations (smc_xxx) to allow access from files other than af_smc.c This is in preparation for the subsequent implementation of the AF_INET version of SMC. Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Reviewed-by: Dust Li <dust.li@linux.alibaba.com> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> Tested-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
D. Wythe authored
This patch aims to isolate the shared components of SMC socket allocation by introducing smc_sk_init() for sock initialization and __smc_create_clcsk() for the initialization of clcsock. This is in preparation for the subsequent implementation of the AF_INET version of SMC. Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Reviewed-by: Dust Li <dust.li@linux.alibaba.com> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> Tested-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
I find the behavior of xa_for_each_start() slightly counter-intuitive. It doesn't end the iteration by making the index point after the last element. IOW calling xa_for_each_start() again after it "finished" will run the body of the loop for the last valid element, instead of doing nothing. This works fine for netlink dumps if they terminate correctly (i.e. coalesce or carefully handle NLM_DONE), but as we keep getting reminded legacy dumps are unlikely to go away. Fixing this generically at the xa_for_each_start() level seems hard - there is no index reserved for "end of iteration". ifindexes are 31b wide, tho, and iterator is ulong so for for_each_netdev_dump() it's safe to go to the next element. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Joe Damato says: ==================== mlx5: Add netdev-genl queue stats Welcome to v5. Switched from RFC to just a v5, because I think this is pretty close. Minor changes from v4 summarized below in the changelog. Note that my NIC does not seem to support PTP and I couldn't get the mlnx-tools mlnx_qos script to work, so I was only able to test the following cases: - device up at boot - adjusting queue counts - device down (e.g. ip link set dev eth4 down) Please see the commit message of patch 2/2 for more details on output and test cases. rfcv4 thread: https://lore.kernel.org/linux-kernel/20240604004629.299699-1-jdamato@fastly.com/T/ rfcv4 -> v5: - Patch 1/2: change variable name 'mlx5e_qid' to 'txq_ix'. - Patch 2/2: - remove logic in mlx5e_get_queue_stats_rx for PTP. PTP RX are always reported in base. - report PTP TX in mlx5e_get_base_stats only if: - PTP has ever been opened, and - either PTP is NULL (closed) or the MLX5E_PTP_STATE_TX bit in its state is not set Otherwise, PTP TX will be reported when the txq_ix is passed into mlx5e_get_queue_stats_tx rfcv3 -> rfcv4: - Patch 1/2 now creates a mapping (priv->txq2sq_stats) which maps txq indices to sq_stats structures so stats can be accessed directly. This mapping is kept up to date along side txq2sq. - Patch 2/2: - All mutex_lock/unlock on state_lock has been dropped. - mlx5e_get_queue_stats_rx now uses ASSERT_RTNL() and has a special case for PTP. If PTP was ever opened, is currently opened, and the channel index matches, stats for PTP RX are output. - mlx5e_get_queue_stats_tx rewritten to use priv->txq2sq_stats. No corner cases are needed here because any txq idx (passed in as i) will have an up to date mapping in priv->txq2sq_stats. - mlx5e_get_base_stats: - in the RX case: - iterates from [params.num_channels, stats_nch) collecting stats. - if ptp was ever opened but is currently closed, add the PTP stats. - in the TX case: - handle 2 cases: - the channel is available, so sum only the unavailable TCs [mlx5e_get_dcb_num_tc, max_opened_tc). - the channel is unavailable, so sum all TCs [0, max_opened_tc). - if ptp was ever opened but is currently closed, add the PTP sq stats. v2 -> rfcv3: - Added patch 1/2 which creates some helpers for computing the txq_ix and ch_ix/tc_ix. - Patch 2/2 modified in several ways: - Fixed variable declarations in mlx5e_get_queue_stats_rx to be at the start of the function. - mlx5e_get_queue_stats_tx rewritten to access sq stats directly by using the helpers added in the previous patch. - mlx5e_get_base_stats modified in several ways: - Took the state_lock when accessing priv->channels. - For the base RX stats, code was simplified to call mlx5e_get_queue_stats_rx instead of repeating the same code. - For the base TX stats, I attempted to implement what I think Tariq suggested in the previous thread: - for available channels, only unavailable TC stats are summed - for unavailable channels, all stats for TCs up to max_opened_tc are summed. v1 - > v2: - Essentially a full rewrite after comments from Jakub, Tariq, and Zhu. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Joe Damato authored
./cli.py --spec netlink/specs/netdev.yaml \ --dump qstats-get --json '{"scope": "queue"}' ...snip {'ifindex': 7, 'queue-id': 62, 'queue-type': 'rx', 'rx-alloc-fail': 0, 'rx-bytes': 105965251, 'rx-packets': 179790}, {'ifindex': 7, 'queue-id': 0, 'queue-type': 'tx', 'tx-bytes': 9402665, 'tx-packets': 17551}, ...snip Also tested with the script tools/testing/selftests/drivers/net/stats.py in several scenarios to ensure stats tallying was correct: - on boot (default queue counts) - adjusting queue count up or down (ethtool -L eth0 combined ...) The tools/testing/selftests/drivers/net/stats.py brings the device up, so to test with the device down, I did the following: $ ip link show eth4 7: eth4: <BROADCAST,MULTICAST> mtu 9000 qdisc mq state DOWN [..snip..] [..snip..] $ cat /proc/net/dev | grep eth4 eth4: 235710489 434811 [..snip rx..] 2878744 21227 [..snip tx..] $ ./cli.py --spec ../../../Documentation/netlink/specs/netdev.yaml \ --dump qstats-get --json '{"ifindex": 7}' [{'ifindex': 7, 'rx-alloc-fail': 0, 'rx-bytes': 235710489, 'rx-packets': 434811, 'tx-bytes': 2878744, 'tx-packets': 21227}] Compare the values in /proc/net/dev match the output of cli for the same device, even while the device is down. Note that while the device is down, per queue stats output nothing (because the device is down there are no queues): $ ./cli.py --spec ../../../Documentation/netlink/specs/netdev.yaml \ --dump qstats-get --json '{"scope": "queue", "ifindex": 7}' [] Signed-off-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Joe Damato authored
mlx5 currently maps txqs to an sq via priv->txq2sq. It is useful to map txqs to sq_stats, as well, for direct access to stats. Add priv->txq2sq_stats and insert mappings. The mappings will be used next to tabulate stats information. Signed-off-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 15 Jun, 2024 3 commits
-
-
Sagi Grimberg authored
We only use the mapping in a single context in a short and contained scope, so kmap_local_page is sufficient and cheaper. This will also allow skb_datagram_iter to be called from softirq context. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Link: https://lore.kernel.org/r/20240613113504.1079860-1-sagi@grimberg.meSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Petr Machata says: ==================== mlxsw: Handle MTU values Amit Cohen writes: The driver uses two values for maximum MTU, but neither is accurate. In addition, the value which is configured to hardware is not calculated correctly. Handle these issues and expose accurate values for minimum and maximum MTU per netdevice. Add test cases to check that the exposed values are really supported. Patch set overview: Patches #1-#3 set the driver to use accurate values for MTU Patch #4 aligns the driver to always use the same value for maximum MTU Patch #5 adds a test ==================== Link: https://lore.kernel.org/r/cover.1718275854.git.petrm@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Amit Cohen authored
Add cases to check minimum and maximum MTU which are exposed via "ip -d link show". Test configuration and traffic. Use VLAN devices as usually VLAN header (4 bytes) is not included in the MTU, and drivers should configure hardware correctly to send maximum MTU payload size in VLAN tagged packets. $ ./min_max_mtu.sh TEST: ping [ OK ] TEST: ping6 [ OK ] TEST: Test maximum MTU configuration [ OK ] TEST: Test traffic, packet size is maximum MTU [ OK ] TEST: Test minimum MTU configuration [ OK ] TEST: Test traffic, packet size is minimum MTU [ OK ] Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/89de8be8989db7a97f3b39e3c9da695673e78d2e.1718275854.git.petrm@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-