- 26 Sep, 2024 13 commits
-
-
Florian Westphal authored
The netfilter race happens when two packets with the same tuple are DNATed and enqueued with nfqueue in the postrouting hook. Once one of the packet is reinjected it may be DNATed again to a different destination, but the conntrack entry remains the same and the return packet was dropped. Based on earlier patch from Antonio Ojea. Link: https://bugzilla.netfilter.org/show_bug.cgi?id=1766Co-developed-by: Antonio Ojea <aojea@google.com> Signed-off-by: Antonio Ojea <aojea@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Florian Westphal authored
For historical reasons there are two clash resolution spots in netfilter, one in nfnetlink_queue and one in conntrack core. nfnetlink_queue one was added first: If a colliding entry is found, NAT NAT transformation is reversed by calling nat engine again with altered tuple. See commit 368982cd ("netfilter: nfnetlink_queue: resolve clash for unconfirmed conntracks") for details. One problem is that nf_reroute() won't take an action if the queueing doesn't occur in the OUTPUT hook, i.e. when queueing in forward or postrouting, packet will be sent via the wrong path. Another problem is that the scenario addressed (2nd UDP packet sent with identical addresses while first packet is still being processed) can also occur without any nfqueue involvement due to threaded resolvers doing A and AAAA requests back-to-back. This lead us to add clash resolution logic to the conntrack core, see commit 6a757c07 ("netfilter: conntrack: allow insertion of clashing entries"). Instead of fixing the nfqueue based logic, lets remove it and let conntrack core handle this instead. Retain the ->update hook for sake of nfqueue based conntrack helpers. We could axe this hook completely but we'd have to split confirm and helper logic again, see commit ee04805f ("netfilter: conntrack: make conntrack userspace helpers work again"). This SHOULD NOT be backported to kernels earlier than v5.6; they lack adequate clash resolution handling. Patch was originally written by Pablo Neira Ayuso. Reported-by: Antonio Ojea <aojea@google.com> Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1766Signed-off-by: Florian Westphal <fw@strlen.de> Tested-by: Antonio Ojea <aojea@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Pablo Neira Ayuso authored
Several ruleset objects are still not using GFP_KERNEL_ACCOUNT for memory accounting, update them. This includes: - catchall elements - compat match large info area - log prefix - meta secctx - numgen counters - pipapo set backend datastructure - tunnel private objects Fixes: 33758c89 ("memcg: enable accounting for nft objects") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Pablo Neira Ayuso authored
Lockless iteration over hook list is possible from netlink dump path, use rcu variant to iterate over the hook list as is done with flowtable hooks. Fixes: b9703ed4 ("netfilter: nf_tables: support for adding new devices to an existing netdev chain") Reported-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Simon Horman authored
Only provide ctnetlink_label_size when it is used, which is when CONFIG_NF_CONNTRACK_EVENTS is configured. Flagged by clang-18 W=1 builds as: .../nf_conntrack_netlink.c:385:19: warning: unused function 'ctnetlink_label_size' [-Wunused-function] 385 | static inline int ctnetlink_label_size(const struct nf_conn *ct) | ^~~~~~~~~~~~~~~~~~~~ The condition on CONFIG_NF_CONNTRACK_LABELS being removed by this patch guards compilation of non-trivial implementations of ctnetlink_dump_labels() and ctnetlink_label_size(). However, this is not necessary as each of these functions will always return 0 if CONFIG_NF_CONNTRACK_LABELS is not defined as each function starts with the equivalent of: struct nf_conn_labels *labels = nf_ct_labels_find(ct); if (!labels) return 0; And nf_ct_labels_find always returns NULL if CONFIG_NF_CONNTRACK_LABELS is not enabled. So I believe that the compiler optimises the code away in such cases anyway. Found by inspection. Compile tested only. Originally splitted in two patches, Pablo Neira Ayuso collapsed them and added Fixes: tag. Fixes: 0ceabd83 ("netfilter: ctnetlink: deliver labels to userspace") Link: https://lore.kernel.org/netfilter-devel/20240909151712.GZ2097826@kernel.org/Signed-off-by: Simon Horman <horms@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Simon Horman authored
If CONFIG_BRIDGE_NETFILTER is not enabled, which is the case for x86_64 defconfig, then building nf_reject_ipv4.c and nf_reject_ipv6.c with W=1 using gcc-14 results in the following warnings, which are treated as errors: net/ipv4/netfilter/nf_reject_ipv4.c: In function 'nf_send_reset': net/ipv4/netfilter/nf_reject_ipv4.c:243:23: error: variable 'niph' set but not used [-Werror=unused-but-set-variable] 243 | struct iphdr *niph; | ^~~~ cc1: all warnings being treated as errors net/ipv6/netfilter/nf_reject_ipv6.c: In function 'nf_send_reset6': net/ipv6/netfilter/nf_reject_ipv6.c:286:25: error: variable 'ip6h' set but not used [-Werror=unused-but-set-variable] 286 | struct ipv6hdr *ip6h; | ^~~~ cc1: all warnings being treated as errors Address this by reducing the scope of these local variables to where they are used, which is code only compiled when CONFIG_BRIDGE_NETFILTER enabled. Compile tested and run through netfilter selftests. Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Closes: https://lore.kernel.org/netfilter-devel/20240906145513.567781-1-andriy.shevchenko@linux.intel.com/Signed-off-by: Simon Horman <horms@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Phil Sutter authored
Documentation of list_del_rcu() warns callers to not immediately free the deleted list item. While it seems not necessary to use the RCU-variant of list_del() here in the first place, doing so seems to require calling kfree_rcu() on the deleted item as well. Fixes: 3f0465a9 ("netfilter: nf_tables: dynamically allocate hooks per net_device in flowtables") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
谢致邦 (XIE Zhibang) authored
The iptables example was added in commit d2f26037 (netfilter: Add documentation for tproxy, 2008-10-08), but xt_socket 'transparent' option was added in commit a31e1ffd (netfilter: xt_socket: added new revision of the 'socket' match supporting flags, 2009-06-09). Now add the 'transparent' option to the iptables example to ignore non-transparent sockets, which is also consistent with the nft example. Signed-off-by: 谢致邦 (XIE Zhibang) <Yeking@Red54.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Andy Shevchenko authored
Some of the functions may be unused (CONFIG_NETFILTER_NETLINK_GLUE_CT=n and CONFIG_NF_CONNTRACK_EVENTS=n), it prevents kernel builds with clang, `make W=1` and CONFIG_WERROR=y: net/netfilter/nf_conntrack_netlink.c:657:22: error: unused function 'ctnetlink_acct_size' [-Werror,-Wunused-function] 657 | static inline size_t ctnetlink_acct_size(const struct nf_conn *ct) | ^~~~~~~~~~~~~~~~~~~ net/netfilter/nf_conntrack_netlink.c:667:19: error: unused function 'ctnetlink_secctx_size' [-Werror,-Wunused-function] 667 | static inline int ctnetlink_secctx_size(const struct nf_conn *ct) | ^~~~~~~~~~~~~~~~~~~~~ net/netfilter/nf_conntrack_netlink.c:683:22: error: unused function 'ctnetlink_timestamp_size' [-Werror,-Wunused-function] 683 | static inline size_t ctnetlink_timestamp_size(const struct nf_conn *ct) | ^~~~~~~~~~~~~~~~~~~~~~~~ Fix this by guarding possible unused functions with ifdeffery. See also commit 6863f564 ("kbuild: allow Clang to find unused static inline functions for W=1 build"). Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Antonio Ojea authored
The TPROXY functionality is widely used, however, there are only mptcp selftests covering this feature. The selftests represent the most common scenarios and can also be used as selfdocumentation of the feature. UDP and TCP testcases are split in different files because of the different nature of the protocols, specially due to the challenges that present to reliable test UDP due to the connectionless nature of the protocol. UDP only covers the scenarios involving the prerouting hook. The UDP tests are signfinicantly slower than the TCP ones, hence they use a larger timeout, it takes 20 seconds to run the full UDP suite on a 48 vCPU Intel(R) Xeon(R) CPU @2.60GHz. Signed-off-by: Antonio Ojea <aojea@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Florian Westphal authored
Add test program that is sending UDP packets in both directions and check that packets arrive without source port modification. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Florian Westphal authored
Given existing entry: ORIGIN: a:b -> c:d REPLY: c:d -> a:b And colliding entry: ORIGIN: c:d -> a:b REPLY: a:b -> c:d The colliding ct (and the associated skb) get dropped on insert. Permit this by checking if the colliding entry matches the reply direction. Happens when both ends send packets at same time, both requests are picked up as NEW, rather than NEW for the 'first' and 'ESTABLISHED' for the second packet. This is an esoteric condition, as ruleset must permit NEW connections in either direction and both peers must already have a bidirectional traffic flow at the time conntrack gets enabled. Allow the 'reverse' skb to pass and assign the existing (clashing) entry. While at it, also drop the extra 'dying' check, this is already tested earlier by the calling function. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Florian Westphal authored
A conntrack entry can be inserted to the connection tracking table if there is no existing entry with an identical tuple in either direction. Example: INITIATOR -> NAT/PAT -> RESPONDER Initiator passes through NAT/PAT ("us") and SNAT is done (saddr rewrite). Then, later, NAT/PAT machine itself also wants to connect to RESPONDER. This will not work if the SNAT done earlier has same IP:PORT source pair. Conntrack table has: ORIGINAL: $IP_INITATOR:$SPORT -> $IP_RESPONDER:$DPORT REPLY: $IP_RESPONDER:$DPORT -> $IP_NAT:$SPORT and new locally originating connection wants: ORIGINAL: $IP_NAT:$SPORT -> $IP_RESPONDER:$DPORT REPLY: $IP_RESPONDER:$DPORT -> $IP_NAT:$SPORT This is handled by the NAT engine which will do a source port reallocation for the locally originating connection that is colliding with an existing tuple by attempting a source port rewrite. This is done even if this new connection attempt did not go through a masquerade/snat rule. There is a rare race condition with connection-less protocols like UDP, where we do the port reallocation even though its not needed. This happens when new packets from the same, pre-existing flow are received in both directions at the exact same time on different CPUs after the conntrack table was flushed (or conntrack becomes active for first time). With strict ordering/single cpu, the first packet creates new ct entry and second packet is resolved as established reply packet. With parallel processing, both packets are picked up as new and both get their own ct entry. In this case, the 'reply' packet (picked up as ORIGINAL) can be mangled by NAT engine because a port collision is detected. This change isn't enough to prevent a packet drop later during nf_conntrack_confirm(), the existing clash resolution strategy will not detect such reverse clash case. This is resolved by a followup patch. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
- 16 Sep, 2024 1 commit
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextLinus Torvalds authored
Pull networking updates from Jakub Kicinski: "The zero-copy changes are relatively significant, but regression risk should be contained. The feature needs to be used to cause trouble. Also it feels like we got an order of magnitude more semi-automated "refactoring" chaff than usual, I wonder if it's just us. Core & protocols: - Support Device Memory TCP, ability to zero-copy receive TCP payloads to a DMABUF region of memory while packet headers land separately in normal kernel buffers, and TCP processes then as usual. - The ability to read the PTP PHC (Physical Hardware Clock) alongside MONOTONIC_RAW timestamps with PTP_SYS_OFFSET_EXTENDED. Previously only CLOCK_REALTIME was supported. - Allow matching on all bits of IP DSCP for routing decisions. Previously we only supported on matching TOS bits in IPv4 which is a narrower interpretation of the same header field. - Increase the range of weights used for multi-path routing from 8 bits to 16 bits. - Add support for IPv6 PIO p flag in the Prefix Information Option per draft-ietf-6man-pio-pflag. - IPv6 IOAM6 support for new tunsrc encap mode for better performance. - Detect destinations which blackhole MPTCP traffic and avoid initiating MPTCP connections to them for a certain period of time, 1h by default. - Improve IPsec control path performance by removing the inexact policies list. - AF_VSOCK: add support for SIOCOUTQ ioctl. - Add enum for reasons TCP reset was sent for easier tracing. - Add SMC ringbufs usage statistics. Drivers: - Handle netconsole setup failures more gracefully, don't fail loading, retain the specified target as disabled. - Extend bonding's IPsec offload pass thru capabilities (ESN, stats). Filtering: - Add TCP_BPF_SOCK_OPS_CB_FLAGS to bpf_*sockopt() to address the case when long-lived sockets miss a chance to set additional callbacks if a sockops program was not attached early in their lifetime. - Support using BPF skb helpers in tracepoints. - Conntrack Netlink: support CTA_FILTER for flush. - Improve SCTP support in nfnetlink_queue. - Improve performance of large nftables flush transactions. Things we sprinkled into general kernel code: - selftests: support setting an "interpreter" for script files; make it easy to run as separate cases tests where one "interpreter" is fed various test descriptions (in our case packet sequences). Driver API: - Extend core and ethtool APIs to support many PHYs connected to a single interface (PHY topologies). - Extend cable diagnostics to specify whether Time Domain Reflectometry (TDR) or Active Link Cable Diagnostic (ALCD) was used. - Add library for implementing MAC-PHY Ethernet drivers for SPI devices compatible with Open Alliance 10BASE-T1x MAC-PHY Serial Interface (TC6) standard. - Add helpers to the PHY framework, for PHYs following the Open Alliance standards: - 1000BaseT1 link settings - cable test and diagnostics - Support listing / dumping all allocated RSS contexts. - Add configuration for frequency Embedded SYNC in DPLL, which magically embeds sync pulses into Ethernet signaling. Device drivers: - Ethernet high-speed NICs: - Broadcom (bnxt): - use better FW APIs for queue reset - support QOS and TPID settings for the SR-IOV VLAN - support dynamic MSI-X allocation - Intel (100G, ice, idpf): - ice: support PCIe subfunctions - iavf: add support for TC U32 filters on VFs - ice: support Embedded SYNC in DPLL - nVidia/Mellanox (mlx5): - support HW managed steering tables - support PCIe PTM cross timestamping - AMD/Pensando: - ionic: use page_pool to increase Rx performance - Cisco (enic): - report per-queue statistics - Ethernet virtual: - Microsoft vNIC: - mana: support configuring ring length - netvsc: enable more channels on systems with many CPUs - IBM veth: - optimize polling to improve TCP_RR performance - optimize performance of Tx handling - VirtIO net: - synchronize the operstate with the admin state to allow a lower virtio-net to propagate the link status to an upper device like macvlan - Ethernet NICs consumer, and embedded: - Add driver for Realtek automotive PCIe devices (RTL9054, RTL9068, RTL9072, RTL9075, RTL9068, RTL9071) - Add driver for Microchip LAN8650/1 10BASE-T1S MAC-PHY. - Microchip: - lan743x: use phylink - support WOL, EEE, pause, link settings - add Wake-on-LAN support for KSZ87xx family - add KSZ8895/KSZ8864 switch support - factor out FDMA code and use it in sparx5 and lan966x (including DCB support in both) - Synopsys (stmmac): - support frame preemption (configured using TC and ethtool) - support Loongson DWMAC (GMAC v3.73) - support RockChips RK3576 DWMAC - TI: - am65-cpsw: add multi queue RX support - icssg-prueth: HSR offload support - Cadence (macb): - enable software (hrtimer based) IRQ coalescing by default - Xilinx (axinet): - expose HW statistics - improve multicast filtering - relax Rx checksum offload constraints - MediaTek: - mt7530: add EN7581 support - Aspeed (ftgmac100): - report link speed and duplex - Intel: - igc: add mqprio offload - igc: report EEE configuration - RealTek (r8169): - add support for RTL8126A rev.b - Vitesse (vsc73xx): - implement FDB add/del/dump operations - Freescale (fs_enet): - use phylink - Ethernet PHYs: - vitesse: implement downshift and MDI-X in vsc73xx PHYs - microchip: support LAN887x, supporting IEEE 802.3bw (100BASE-T1) and IEEE 802.3bp (1000BASE-T1) specifications - add Applied Micro QT2025 PHY driver (in Rust) - add Motorcomm yt8821 2.5G Ethernet PHY driver - CAN: - add driver for Rockchip RK3568 CAN-FD controller - flexcan: add wakeup support for imx95 - kvaser_usb: set hardware timestamp on transmitted packets - WiFi: - mac80211/cfg80211: - EHT rate support in AQL airtime fairness - handle DFS (radar detection) per link in Multi-Link Operation - RealTek (rtw89): - support RTL8852BT and 8852BE-VT (WiFi 6) - support hardware rfkill - support HW encryption in unicast management frames - support Wake-on-WLAN with supported network detection - RealTek (rtw89): - improve Rx performance by using USB frame aggregation - support USB 3 with RTL8822CU/RTL8822BU - Intel (iwlwifi/mvm): - offload RLC/SMPS functionality to firmware - Marvell (mwifiex): - add host based MLME to enable WPA3 - Bluetooth: - add support for Amlogic HCI UART protocol - add support for ISO data/packets to Intel and NXP drivers" * tag 'net-next-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1303 commits) net/mlx5: HWS, check the correct variable in hws_send_ring_alloc_sq() netfilter: nft_socket: Fix a NULL vs IS_ERR() bug in nft_socket_cgroup_subtree_level() ice: Fix a NULL vs IS_ERR() check in probe() ice: Fix a couple NULL vs IS_ERR() bugs net: ethernet: fs_enet: Make the per clock optional net: ti: icssg-prueth: Add multicast filtering support in HSR mode net: ti: icssg-prueth: Enable HSR Tx duplication, Tx Tag and Rx Tag offload net: ti: icssg-prueth: Add support for HSR frame forward offload net: ti: icssg-prueth: Stop hardcoding def_inc net: ti: icss-iep: Move icss_iep structure net: ibm: emac: get rid of wol_irq net: ibm: emac: remove all waiting code net: ibm: emac: replace of_get_property net: ibm: emac: use netdev's phydev directly net: ibm: emac: use devm for register_netdev net: ibm: emac: remove mii_bus with devm net: ibm: emac: use devm for of_iomap net: ibm: emac: manage emac_irq with devm net: ibm: emac: use devm for alloc_etherdev octeontx2-af: debugfs: Add Channel info to RPM map ...
-
- 15 Sep, 2024 9 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski authored
Merge in late fixes to prepare for the 6.12 net-next PR. No conflicts or adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Dan Carpenter authored
There is a copy and paste bug so this code checks "sq->dep_wqe" where "sq->wr_priv" was intended. It could result in a NULL pointer dereference. Fixes: 2ca62599 ("net/mlx5: HWS, added send engine and context handling") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/da822315-02b7-4f5b-9c86-0d5176c5069d@stanley.mountainSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Dan Carpenter authored
The cgroup_get_from_path() function never returns NULL, it returns error pointers. Update the error handling to match. Fixes: 7f3287db ("netfilter: nft_socket: make cgroupsv2 matching work with namespaces") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Acked-by: Florian Westphal <fw@strlen.de> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Link: https://patch.msgid.link/bbc0c4e0-05cc-4f44-8797-2f4b3920a820@stanley.mountainSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Dan Carpenter authored
The ice_allocate_sf() function returns error pointers on error. It doesn't return NULL. Update the check to match. Fixes: 177ef7f1 ("ice: base subfunction aux driver") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/6951d217-ac06-4482-a35d-15d757fd90a3@stanley.mountainSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Dan Carpenter authored
The ice_repr_create() function returns error pointers. It never returns NULL. Fix the callers to check for IS_ERR(). Fixes: 977514fb ("ice: create port representor for SF") Fixes: 415db839 ("ice: make representor code generic") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/7f7aeb91-8771-47b8-9275-9d9f64f947dd@stanley.mountainSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Maxime Chevallier authored
Some platforms that use fs_enet don't have the PER register clock. This optional dependency on the clock was incorrectly made mandatory when switching to devm_ accessors. Reported-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Closes: https://lore.kernel.org/netdev/4e4defa9-ef2f-4ff1-95ca-6627c24db20c@wanadoo.fr/ Fixes: c614acf6 ("net: ethernet: fs_enet: simplify clock handling with devm accessors") Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://patch.msgid.link/20240914081821.209130-1-maxime.chevallier@bootlin.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Linus Torvalds authored
-
git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds authored
Pull kvm fix from Paolo Bonzini: "Do not always honor guest PAT on CPUs that support self-snoop. This triggers an issue in the bochsdrm driver, which used ioremap() instead of ioremap_wc() to map the video RAM. The revert lets video RAM use the WB memory type instead of the slower UC memory type" * tag 'for-linus-6.11' of git://git.kernel.org/pub/scm/virt/kvm/kvm: Revert "KVM: VMX: Always honor guest PAT on CPUs that support self-snoop"
-
Paolo Bonzini authored
This reverts commit 377b2f35. This caused a regression with the bochsdrm driver, which used ioremap() instead of ioremap_wc() to map the video RAM. After the commit, the WB memory type is used without the IGNORE_PAT, resulting in the slower UC memory type. In fact, UC is slow enough to basically cause guests to not boot... but only on new processors such as Sapphire Rapids and Cascade Lake. Coffee Lake for example works properly, though that might also be an effect of being on a larger, more NUMA system. The driver has been fixed but that does not help older guests. Until we figure out whether Cascade Lake and newer processors are working as intended, revert the commit. Long term we might add a quirk, but the details depend on whether the processors are working as intended: for example if they are, the quirk might reference bochs-compatible devices, e.g. in the name and documentation, so that userspace can disable the quirk by default and only leave it enabled if such a device is being exposed to the guest. If instead this is actually a bug in CLX+, then the actions we need to take are different and depend on the actual cause of the bug. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
- 14 Sep, 2024 17 commits
-
-
Jakub Kicinski authored
MD Danish Anwar says: ==================== Introduce HSR offload support for ICSSG This series introduces HSR offload support for ICSSG driver. To support HSR offload to hardware, ICSSG HSR firmware is used. This series introduces, 1. HSR frame offload support for ICSSG driver. 2. HSR Tx Packet duplication offload 3. HSR Tx Tag and Rx Tag offload 4. Multicast filtering support in HSR offload mode. 5. Dependencies related to IEP. HSR Test Setup: -------------- ___________ ___________ ___________ | | Link AB | | Link BC | | __| AM64* |_________| AM64 |_________| AM64* |___ | | Station A | | Station B | | Station C | | | |___________| |___________| |___________| | | | |______________________________________________________________| Link CA *Could be any device that supports two ethernet interfaces. Steps to switch to HSR frame forward offload mode: ------------------------------------------------- Example assuming eth1, eth2 ports of ICSSG1 on AM64-EVM 1) Enable HSR offload for both interfaces ethtool -K eth1 hsr-fwd-offload on ethtool -K eth1 hsr-dup-offload on ethtool -K eth1 hsr-tag-ins-offload on ethtool -K eth1 hsr-tag-rm-offload on ethtool -K eth2 hsr-fwd-offload on ethtool -K eth2 hsr-dup-offload on ethtool -K eth2 hsr-tag-ins-offload on ethtool -K eth2 hsr-tag-rm-offload on 2) Create HSR interface and add slave interfaces to it ip link add name hsr0 type hsr slave1 eth1 slave2 eth2 \ supervision 45 version 1 3) Add IP address to the HSR interface ip addr add <IP_ADDR>/24 dev hsr0 4) Bring up the HSR interface ip link set hsr0 up Switching back to previous mode: -------------------------------- 1) Delete HSR interface ip link delete hsr0 2) Disable HSR port-to-port offloading mode, packet duplication ethtool -K eth1 hsr-fwd-offload off ethtool -K eth1 hsr-dup-offload off ethtool -K eth1 hsr-tag-ins-offload off ethtool -K eth1 hsr-tag-rm-offload off ethtool -K eth2 hsr-fwd-offload off ethtool -K eth2 hsr-dup-offload off ethtool -K eth2 hsr-tag-ins-offload off ethtool -K eth2 hsr-tag-rm-offload off Testing the port-to-port frame forward offload feature: ----------------------------------------------------- 1) Connect the LAN cables as shown in the test setup. 2) Configure Station A and Station C in HSR non-offload mode. 3) Configure Station B is HSR offload mode. 4) Since HSR is a redundancy protocol, disconnect cable "Link CA", to ensure frames from Station A reach Station C only through Station B. 5) Run iperf3 Server on Station C and client on station A. 7) Check the CPU usage on Station B. CPU usage report on Station B using mpstat when running UDP iperf3: ------------------------------------------------------------------- 1) Non-Offload case ------------------- CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle all 0.00 0.00 0.50 0.00 3.52 29.15 0.00 0.00 66.83 0 0.00 0.00 0.00 0.00 7.00 58.00 0.00 0.00 35.00 1 0.00 0.00 0.99 0.00 0.99 0.00 0.00 0.00 98.02 2) Offload case --------------- CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle all 0.00 0.00 0.00 0.00 0.50 0.00 0.00 0.00 99.50 0 0.00 0.00 0.99 0.00 0.00 0.00 0.00 0.00 99.01 1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 Note: 1) At the very least, hsr-fwd-offload must be enabled. Without offloading the port-to-port offload, other HSR offloads cannot be enabled. 2) hsr-tag-ins-offload and hsr-dup-offload are tightly coupled in the firmware implementation. They both need to be enabled / disabled together. v1: https://lore.kernel.org/20240808110800.1281716-1-danishanwar@ti.com/ v2: https://lore.kernel.org/20240813074233.2473876-1-danishanwar@ti.com v3: https://lore.kernel.org/20240828091901.3120935-1-danishanwar@ti.com/ v4: https://lore.kernel.org/20240904100506.3665892-1-danishanwar@ti.com/ v5: https://lore.kernel.org/20240906111538.1259418-1-danishanwar@ti.com/ [0] https://lore.kernel.org/202409061658.vSwcFJiK-lkp@intel.com/ [1] https://lore.kernel.org/20240828091901.3120935-5-danishanwar@ti.com/ [2] https://lore.kernel.org/20240828091901.3120935-7-danishanwar@ti.com/ [3] https://lore.kernel.org/20240813074233.2473876-2-danishanwar@ti.com/ [4] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=e846be0fba85 ==================== Link: https://patch.msgid.link/20240911081603.2521729-1-danishanwar@ti.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
MD Danish Anwar authored
Add support for multicast filtering in HSR mode Reviewed-by: Roger Quadros <rogerq@kernel.org> Signed-off-by: MD Danish Anwar <danishanwar@ti.com> Link: https://patch.msgid.link/20240911081603.2521729-6-danishanwar@ti.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Ravi Gunasekaran authored
The HSR stack allows to offload its Tx packet duplication functionality to the hardware. Enable this offloading feature for ICSSG driver. Add support to offload HSR Tx Tag Insertion and Rx Tag Removal and duplicate discard. hsr tag insertion offload and hsr dup offload are tightly coupled in firmware implementation. Both these features need to be enabled / disabled together. Duplicate discard is done as part of RX tag removal and it is done by the firmware. When driver sends the r30 command ICSSG_EMAC_HSR_RX_OFFLOAD_ENABLE, firmware does RX tag removal as well as duplicate discard. Signed-off-by: Ravi Gunasekaran <r-gunasekaran@ti.com> Signed-off-by: MD Danish Anwar <danishanwar@ti.com> Reviewed-by: Roger Quadros <rogerq@kernel.org> Link: https://patch.msgid.link/20240911081603.2521729-5-danishanwar@ti.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
MD Danish Anwar authored
Add support for offloading HSR port-to-port frame forward to hardware. When the slave interfaces are added to the HSR interface, the PRU cores will be stopped and ICSSG HSR firmwares will be loaded to them. Similarly, when HSR interface is deleted, the PRU cores will be restarted and the last used firmwares will be reloaded. PRUeth interfaces will be back to the last used mode. This commit also renames some APIs that are common between switch and hsr mode with '_fw_offload' suffix. Signed-off-by: MD Danish Anwar <danishanwar@ti.com> Reviewed-by: Roger Quadros <rogerq@kernel.org> Link: https://patch.msgid.link/20240911081603.2521729-4-danishanwar@ti.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
MD Danish Anwar authored
The def_inc is stored in icss_iep structure. Currently default increment (ns per clock tick) is hardcoded to 4 (Clock frequency being 250 MHz). Change this to use the iep->def_inc variable as the iep structure is now accessible to the driver files. Reviewed-by: Roger Quadros <rogerq@kernel.org> Signed-off-by: MD Danish Anwar <danishanwar@ti.com> Link: https://patch.msgid.link/20240911081603.2521729-3-danishanwar@ti.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
MD Danish Anwar authored
Move icss_iep structure definition and to icss_iep.h file so that the structure members can be used / accessed by all icssg driver files. Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Roger Quadros <rogerq@kernel.org> Signed-off-by: MD Danish Anwar <danishanwar@ti.com> Link: https://patch.msgid.link/20240911081603.2521729-2-danishanwar@ti.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrlLinus Torvalds authored
Pull pin control fixes from Linus Walleij: - One Intel patch that I mistakenly merged into for-next despite it belonging in fixes: add Arrow Lake-H/U ACPI ID so this Arrow Lake chip probes. - One fix making the CY895x0 reg cache work, which is good because it makes the device work too. * tag 'pinctrl-v6.11-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl: pinctrl-cy8c95x0: Fix regcache pinctrl: meteorlake: Add Arrow Lake-H/U ACPI ID
-
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/soundLinus Torvalds authored
Pull sound fixes from Takashi Iwai: "A few last-minute ASoC fixes and MAINTAINERS update. All look small, obvious and nice-to-have fixes for 6.11-final" * tag 'sound-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ASoC: meson: axg-card: fix 'use-after-free' ASoC: codecs: avoid possible garbage value in peb2466_reg_read() MAINTAINERS: update Pierre Bossart's email and role ASoC: tas2781: fix to save the dsp bin file name into the correct array in case name_prefix is not NULL ASoC: Intel: soc-acpi-intel-mtl-match: add missing empty item ASoC: Intel: soc-acpi-intel-lnl-match: add missing empty item
-
git://git.samba.org/sfrench/cifs-2.6Linus Torvalds authored
Pull smb client fix from Steve French: "Fix for packet signing of write" * tag '6.11-rc7-SMB3-client-fix' of git://git.samba.org/sfrench/cifs-2.6: cifs: Fix signature miscalculation
-
Takashi Iwai authored
Merge tag 'asoc-fix-v6.11-rc7' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Fixes for v6.11 A few last minute fixes, plus an update for Pierre's contact details and status. It'd be good to get these into v6.11 (especially the MAINTAINERS update) but it wouldn't be the end of the world if they waited for the merge window, none of them are super remarkable and it's just a question of timing that they're last minute.
-
Jakub Kicinski authored
Rosen Penev says: ==================== net: ibm: emac: modernize a bit ==================== Link: https://patch.msgid.link/20240912024903.6201-1-rosenp@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Rosen Penev authored
This is completely unused. Signed-off-by: Rosen Penev <rosenp@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20240912024903.6201-10-rosenp@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Rosen Penev authored
EPROBE_DEFER, which probably wasn't available when this driver was written, can be used instead of waiting manually. Signed-off-by: Rosen Penev <rosenp@gmail.com> Link: https://patch.msgid.link/20240912024903.6201-9-rosenp@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Rosen Penev authored
of_property_read_u32 can be used. Signed-off-by: Rosen Penev <rosenp@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20240912024903.6201-8-rosenp@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Rosen Penev authored
Avoids having to use own struct member. Signed-off-by: Rosen Penev <rosenp@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20240912024903.6201-7-rosenp@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Rosen Penev authored
Cleans it up automatically. No need to handle manually. Signed-off-by: Rosen Penev <rosenp@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20240912024903.6201-6-rosenp@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Rosen Penev authored
Switching to devm management of mii_bus allows to remove mdiobus_unregister calls and thus avoids needing a mii_bus global struct member. Signed-off-by: Rosen Penev <rosenp@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20240912024903.6201-5-rosenp@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-