- 03 May, 2023 4 commits
-
-
Pablo Neira Ayuso authored
Toggle deleted anonymous sets as inactive in the next generation, so users cannot perform any update on it. Clear the generation bitmask in case the transaction is aborted. The following KASAN splat shows a set element deletion for a bound anonymous set that has been already removed in the same transaction. [ 64.921510] ================================================================== [ 64.923123] BUG: KASAN: wild-memory-access in nf_tables_commit+0xa24/0x1490 [nf_tables] [ 64.924745] Write of size 8 at addr dead000000000122 by task test/890 [ 64.927903] CPU: 3 PID: 890 Comm: test Not tainted 6.3.0+ #253 [ 64.931120] Call Trace: [ 64.932699] <TASK> [ 64.934292] dump_stack_lvl+0x33/0x50 [ 64.935908] ? nf_tables_commit+0xa24/0x1490 [nf_tables] [ 64.937551] kasan_report+0xda/0x120 [ 64.939186] ? nf_tables_commit+0xa24/0x1490 [nf_tables] [ 64.940814] nf_tables_commit+0xa24/0x1490 [nf_tables] [ 64.942452] ? __kasan_slab_alloc+0x2d/0x60 [ 64.944070] ? nf_tables_setelem_notify+0x190/0x190 [nf_tables] [ 64.945710] ? kasan_set_track+0x21/0x30 [ 64.947323] nfnetlink_rcv_batch+0x709/0xd90 [nfnetlink] [ 64.948898] ? nfnetlink_rcv_msg+0x480/0x480 [nfnetlink] Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Jeremy Sowden authored
1. Don't hard-code pkg-config 2. Remove distro-specific default for CFLAGS 3. Use pkg-config for LDLIBS Fixes: a50a88f0 ("selftests: netfilter: fix a build error on openSUSE") Suggested-by: Jan Engelhardt <jengelh@inai.de> Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Pablo Neira Ayuso authored
If user does not specify hook number and priority, then assume this is a chain/flowtable update. Therefore, report ENOENT which provides a better hint than EINVAL. Set on extended netlink error report to refer to the chain name. Fixes: 5b6743fb ("netfilter: nf_tables: skip flowtable hooknum and priority on device updates") Fixes: 5efe72698a97 ("netfilter: nf_tables: support for adding new devices to an existing netdev chain") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Felix Fietkau authored
Through testing I found out that hardware vlan rx offload support seems to have some hardware issues. At least when using multiple MACs and when receiving tagged packets on the secondary MAC, the hardware can sometimes start to emit wrong tags on the first MAC as well. In order to avoid such issues, drop the feature configuration and use the offload feature only for DSA hardware untagging on MT7621/MT7622 devices where this feature works properly. Fixes: 08666cbb ("net: ethernet: mtk_eth_soc: add support for configuring vlan rx offload") Tested-by: Frank Wunderlich <frank-w@public-files.de> Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Frank Wunderlich <frank-w@public-files.de> Tested-by: Arınç ÜNAL <arinc.unal@arinc9.com> Acked-by: Arınç ÜNAL <arinc.unal@arinc9.com> Link: https://lore.kernel.org/r/20230426172153.8352-1-linux@fw-web.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 01 May, 2023 11 commits
-
-
David S. Miller authored
David Howells says: ==================== rxrpc: Timeout handling fixes Here are three patches to fix timeouts handling in AF_RXRPC: (1) The hard call timeout should be interpreted in seconds, not milliseconds. (2) Allow a waiting call to be aborted (thereby cancelling the call) in the case a signal interrupts sendmsg() and leaves it hanging until it is granted a channel on a connection. (3) Kernel-generated calls get the timer started on them even if they're still waiting to be attached to a connection. If the timer expires before the wait is complete and a conn is attached, an oops will occur. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
David Howells authored
afs_make_call() calls rxrpc_kernel_begin_call() to begin a call (which may get stalled in the background waiting for a connection to become available); it then calls rxrpc_kernel_set_max_life() to set the timeouts - but that starts the call timer so the call timer might then expire before we get a connection assigned - leading to the following oops if the call stalled: BUG: kernel NULL pointer dereference, address: 0000000000000000 ... CPU: 1 PID: 5111 Comm: krxrpcio/0 Not tainted 6.3.0-rc7-build3+ #701 RIP: 0010:rxrpc_alloc_txbuf+0xc0/0x157 ... Call Trace: <TASK> rxrpc_send_ACK+0x50/0x13b rxrpc_input_call_event+0x16a/0x67d rxrpc_io_thread+0x1b6/0x45f ? _raw_spin_unlock_irqrestore+0x1f/0x35 ? rxrpc_input_packet+0x519/0x519 kthread+0xe7/0xef ? kthread_complete_and_exit+0x1b/0x1b ret_from_fork+0x22/0x30 Fix this by noting the timeouts in struct rxrpc_call when the call is created. The timer will be started when the first packet is transmitted. It shouldn't be possible to trigger this directly from userspace through AF_RXRPC as sendmsg() will return EBUSY if the call is in the waiting-for-conn state if it dropped out of the wait due to a signal. Fixes: 9d35d880 ("rxrpc: Move client call connection to the I/O thread") Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: "David S. Miller" <davem@davemloft.net> cc: Eric Dumazet <edumazet@google.com> cc: Jakub Kicinski <kuba@kernel.org> cc: Paolo Abeni <pabeni@redhat.com> cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org cc: linux-kernel@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
-
David Howells authored
When sendmsg() creates an rxrpc call, it queues it to wait for a connection and channel to be assigned and then waits before it can start shovelling data as the encrypted DATA packet content includes a summary of the connection parameters. However, sendmsg() may get interrupted before a connection gets assigned and further sendmsg() calls will fail with EBUSY until an assignment is made. Fix this so that the call can at least be aborted without failing on EBUSY. We have to be careful here as sendmsg() mustn't be allowed to start the call timer if the call doesn't yet have a connection assigned as an oops may follow shortly thereafter. Fixes: 540b1c48 ("rxrpc: Fix deadlock between call creation and sendmsg/recvmsg") Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: "David S. Miller" <davem@davemloft.net> cc: Eric Dumazet <edumazet@google.com> cc: Jakub Kicinski <kuba@kernel.org> cc: Paolo Abeni <pabeni@redhat.com> cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org cc: linux-kernel@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
-
David Howells authored
The hard call timeout is specified in the RXRPC_SET_CALL_TIMEOUT cmsg in seconds, so fix the point at which sendmsg() applies it to the call to convert to jiffies from seconds, not milliseconds. Fixes: a158bdd3 ("rxrpc: Fix timeout of a call that hasn't yet been granted a channel") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: "David S. Miller" <davem@davemloft.net> cc: Eric Dumazet <edumazet@google.com> cc: Jakub Kicinski <kuba@kernel.org> cc: Paolo Abeni <pabeni@redhat.com> cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org cc: linux-kernel@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tom Rix authored
For s390, gcc with W=1 reports drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c:458:32: error: 'aq_pm_ops' defined but not used [-Werror=unused-const-variable=] 458 | static const struct dev_pm_ops aq_pm_ops = { | ^~~~~~~~~ The only use of aq_pm_ops is conditional on CONFIG_PM. The definition of aq_pm_ops and its functions should also be conditional on CONFIG_PM. Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Andy Moreton authored
The sfc driver does not report QSFP module EEPROM contents correctly as only the first page is fetched from hardware. Commit 0e1a2a3e ("ethtool: Add SFF-8436 and SFF-8636 max EEPROM length definitions") added ETH_MODULE_SFF_8436_MAX_LEN for the overall size of the EEPROM info, so use that to report the full EEPROM contents. Fixes: 9b17010d ("sfc: Add ethtool -m support for QSFP modules") Signed-off-by: Andy Moreton <andy.moreton@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Hayes Wang says: ==================== r8152: fix 2.5G devices v3: For patch #2, modify the comment. v2: For patch #1, Remove inline for fc_pause_on_auto() and fc_pause_off_auto(), and update the commit message. For patch #2, define the magic value for OCP register 0xa424. v1: These patches are used to fix some issues of RTL8156. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Hayes Wang authored
Move setting r8153b_rx_agg_chg_indicate() for 2.5G devices. The r8153b_rx_agg_chg_indicate() has to be called after enabling tx/rx. Otherwise, the coalescing settings are useless. Fixes: 195aae32 ("r8152: support new chips") Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Hayes Wang authored
Fix the poor throughput for 2.5G devices, when changing the speed from auto mode to force mode. This patch is used to notify the MAC when the mode is changed. Fixes: 195aae32 ("r8152: support new chips") Signed-off-by: Hayes Wang <hayeswang@realtek.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Hayes Wang authored
The feature of flow control becomes abnormal, if the device sends a pause frame and the tx/rx is disabled before sending a release frame. It causes the lost of packets. Set PLA_RX_FIFO_FULL and PLA_RX_FIFO_EMPTY to zeros before disabling the tx/rx. And, toggle FC_PATCH_TASK before enabling tx/rx to reset the flow control patch and timer. Then, the hardware could clear the state and the flow control becomes normal after enabling tx/rx. Besides, remove inline for fc_pause_on_auto() and fc_pause_off_auto(). Fixes: 195aae32 ("r8152: support new chips") Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Victor Nogueira authored
There are cases where the device is adminstratively UP, but operationally down. For example, we have a physical device (Nvidia ConnectX-6 Dx, 25Gbps) who's cable was pulled out, here is its ip link output: 5: ens2f1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000 link/ether b8:ce:f6:4b:68:35 brd ff:ff:ff:ff:ff:ff altname enp179s0f1np1 As you can see, it's administratively UP but operationally down. In this case, sending a packet to this port caused a nasty kernel hang (so nasty that we were unable to capture it). Aborting a transmit based on operational status (in addition to administrative status) fixes the issue. Fixes: 1da177e4 ("Linux-2.6.12-rc2") Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Victor Nogueira <victor@mojatatu.com> v1->v2: Add fixes tag v2->v3: Remove blank line between tags + add change log, suggested by Leon Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 28 Apr, 2023 7 commits
-
-
Angelo Dureghello authored
Add rsvd2cpu capability for mv88e6321 model, to allow proper bpdu processing. Signed-off-by: Angelo Dureghello <angelo.dureghello@timesys.com> Fixes: 51c901a7 ("net: dsa: mv88e6xxx: distinguish Global 2 Rsvd2CPU") Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Antoine Tenart authored
The skb hash comes from sk->sk_txhash when using TCP, except for some IPv6 RST packets. This is because in tcp_v6_send_reset when not in TIME_WAIT the hash is taken from sk->sk_hash, while it should come from sk->sk_txhash as those two hashes are not computed the same way. Packetdrill script to test the above, 0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0 +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress) +0 > (flowlabel 0x1) S 0:0(0) <...> // Wrong ack seq, trigger a rst. +0 < S. 0:0(0) ack 0 win 4000 // Check the flowlabel matches prior one from SYN. +0 > (flowlabel 0x1) R 0:0(0) <...> Fixes: 9258b8b1 ("ipv6: tcp: send consistent autoflowlabel in RST packets") Signed-off-by: Antoine Tenart <atenart@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Andrea Mayer authored
On some distributions, the rp_filter is automatically set (=1) by default on a netdev basis (also on VRFs). In an SRv6 End.DT46 behavior, decapsulated IPv4 packets are routed using the table associated with the VRF bound to that tunnel. During lookup operations, the rp_filter can lead to packet loss when activated on the VRF. Therefore, we chose to make this selftest more robust by explicitly disabling the rp_filter during tests (as it is automatically set by some Linux distributions). Fixes: 03a0b567 ("selftests: seg6: add selftest for SRv6 End.DT46 Behavior") Reported-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Tested-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Cong Wang authored
When a tunnel device is bound with the underlying device, its dev->needed_headroom needs to be updated properly. IPv4 tunnels already do the same in ip_tunnel_bind_dev(). Otherwise we may not have enough header room for skb, especially after commit b17f709a ("gue: TX support for using remote checksum offload option"). Fixes: 32b8a8e5 ("sit: add IPv4 over IPv4 support") Reported-by: Palash Oswal <oswalpalash@gmail.com> Link: https://lore.kernel.org/netdev/CAGyP=7fDcSPKu6nttbGwt7RXzE3uyYxLjCSE97J64pRxJP8jPA@mail.gmail.com/ Cc: Kuniyuki Iwashima <kuniyu@amazon.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vlad Buslov authored
Error handler of tcf_block_bind() frees the whole bo->cb_list on error. However, by that time the flow_block_cb instances are already in the driver list because driver ndo_setup_tc() callback is called before that up the call chain in tcf_block_offload_cmd(). This leaves dangling pointers to freed objects in the list and causes use-after-free[0]. Fix it by also removing flow_block_cb instances from driver_list before deallocating them. [0]: [ 279.868433] ================================================================== [ 279.869964] BUG: KASAN: slab-use-after-free in flow_block_cb_setup_simple+0x631/0x7c0 [ 279.871527] Read of size 8 at addr ffff888147e2bf20 by task tc/2963 [ 279.873151] CPU: 6 PID: 2963 Comm: tc Not tainted 6.3.0-rc6+ #4 [ 279.874273] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [ 279.876295] Call Trace: [ 279.876882] <TASK> [ 279.877413] dump_stack_lvl+0x33/0x50 [ 279.878198] print_report+0xc2/0x610 [ 279.878987] ? flow_block_cb_setup_simple+0x631/0x7c0 [ 279.879994] kasan_report+0xae/0xe0 [ 279.880750] ? flow_block_cb_setup_simple+0x631/0x7c0 [ 279.881744] ? mlx5e_tc_reoffload_flows_work+0x240/0x240 [mlx5_core] [ 279.883047] flow_block_cb_setup_simple+0x631/0x7c0 [ 279.884027] tcf_block_offload_cmd.isra.0+0x189/0x2d0 [ 279.885037] ? tcf_block_setup+0x6b0/0x6b0 [ 279.885901] ? mutex_lock+0x7d/0xd0 [ 279.886669] ? __mutex_unlock_slowpath.constprop.0+0x2d0/0x2d0 [ 279.887844] ? ingress_init+0x1c0/0x1c0 [sch_ingress] [ 279.888846] tcf_block_get_ext+0x61c/0x1200 [ 279.889711] ingress_init+0x112/0x1c0 [sch_ingress] [ 279.890682] ? clsact_init+0x2b0/0x2b0 [sch_ingress] [ 279.891701] qdisc_create+0x401/0xea0 [ 279.892485] ? qdisc_tree_reduce_backlog+0x470/0x470 [ 279.893473] tc_modify_qdisc+0x6f7/0x16d0 [ 279.894344] ? tc_get_qdisc+0xac0/0xac0 [ 279.895213] ? mutex_lock+0x7d/0xd0 [ 279.896005] ? __mutex_lock_slowpath+0x10/0x10 [ 279.896910] rtnetlink_rcv_msg+0x5fe/0x9d0 [ 279.897770] ? rtnl_calcit.isra.0+0x2b0/0x2b0 [ 279.898672] ? __sys_sendmsg+0xb5/0x140 [ 279.899494] ? do_syscall_64+0x3d/0x90 [ 279.900302] ? entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 279.901337] ? kasan_save_stack+0x2e/0x40 [ 279.902177] ? kasan_save_stack+0x1e/0x40 [ 279.903058] ? kasan_set_track+0x21/0x30 [ 279.903913] ? kasan_save_free_info+0x2a/0x40 [ 279.904836] ? ____kasan_slab_free+0x11a/0x1b0 [ 279.905741] ? kmem_cache_free+0x179/0x400 [ 279.906599] netlink_rcv_skb+0x12c/0x360 [ 279.907450] ? rtnl_calcit.isra.0+0x2b0/0x2b0 [ 279.908360] ? netlink_ack+0x1550/0x1550 [ 279.909192] ? rhashtable_walk_peek+0x170/0x170 [ 279.910135] ? kmem_cache_alloc_node+0x1af/0x390 [ 279.911086] ? _copy_from_iter+0x3d6/0xc70 [ 279.912031] netlink_unicast+0x553/0x790 [ 279.912864] ? netlink_attachskb+0x6a0/0x6a0 [ 279.913763] ? netlink_recvmsg+0x416/0xb50 [ 279.914627] netlink_sendmsg+0x7a1/0xcb0 [ 279.915473] ? netlink_unicast+0x790/0x790 [ 279.916334] ? iovec_from_user.part.0+0x4d/0x220 [ 279.917293] ? netlink_unicast+0x790/0x790 [ 279.918159] sock_sendmsg+0xc5/0x190 [ 279.918938] ____sys_sendmsg+0x535/0x6b0 [ 279.919813] ? import_iovec+0x7/0x10 [ 279.920601] ? kernel_sendmsg+0x30/0x30 [ 279.921423] ? __copy_msghdr+0x3c0/0x3c0 [ 279.922254] ? import_iovec+0x7/0x10 [ 279.923041] ___sys_sendmsg+0xeb/0x170 [ 279.923854] ? copy_msghdr_from_user+0x110/0x110 [ 279.924797] ? ___sys_recvmsg+0xd9/0x130 [ 279.925630] ? __perf_event_task_sched_in+0x183/0x470 [ 279.926656] ? ___sys_sendmsg+0x170/0x170 [ 279.927529] ? ctx_sched_in+0x530/0x530 [ 279.928369] ? update_curr+0x283/0x4f0 [ 279.929185] ? perf_event_update_userpage+0x570/0x570 [ 279.930201] ? __fget_light+0x57/0x520 [ 279.931023] ? __switch_to+0x53d/0xe70 [ 279.931846] ? sockfd_lookup_light+0x1a/0x140 [ 279.932761] __sys_sendmsg+0xb5/0x140 [ 279.933560] ? __sys_sendmsg_sock+0x20/0x20 [ 279.934436] ? fpregs_assert_state_consistent+0x1d/0xa0 [ 279.935490] do_syscall_64+0x3d/0x90 [ 279.936300] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 279.937311] RIP: 0033:0x7f21c814f887 [ 279.938085] Code: 0a 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10 [ 279.941448] RSP: 002b:00007fff11efd478 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 279.942964] RAX: ffffffffffffffda RBX: 0000000064401979 RCX: 00007f21c814f887 [ 279.944337] RDX: 0000000000000000 RSI: 00007fff11efd4e0 RDI: 0000000000000003 [ 279.945660] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000 [ 279.947003] R10: 00007f21c8008708 R11: 0000000000000246 R12: 0000000000000001 [ 279.948345] R13: 0000000000409980 R14: 000000000047e538 R15: 0000000000485400 [ 279.949690] </TASK> [ 279.950706] Allocated by task 2960: [ 279.951471] kasan_save_stack+0x1e/0x40 [ 279.952338] kasan_set_track+0x21/0x30 [ 279.953165] __kasan_kmalloc+0x77/0x90 [ 279.954006] flow_block_cb_setup_simple+0x3dd/0x7c0 [ 279.955001] tcf_block_offload_cmd.isra.0+0x189/0x2d0 [ 279.956020] tcf_block_get_ext+0x61c/0x1200 [ 279.956881] ingress_init+0x112/0x1c0 [sch_ingress] [ 279.957873] qdisc_create+0x401/0xea0 [ 279.958656] tc_modify_qdisc+0x6f7/0x16d0 [ 279.959506] rtnetlink_rcv_msg+0x5fe/0x9d0 [ 279.960392] netlink_rcv_skb+0x12c/0x360 [ 279.961216] netlink_unicast+0x553/0x790 [ 279.962044] netlink_sendmsg+0x7a1/0xcb0 [ 279.962906] sock_sendmsg+0xc5/0x190 [ 279.963702] ____sys_sendmsg+0x535/0x6b0 [ 279.964534] ___sys_sendmsg+0xeb/0x170 [ 279.965343] __sys_sendmsg+0xb5/0x140 [ 279.966132] do_syscall_64+0x3d/0x90 [ 279.966908] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 279.968407] Freed by task 2960: [ 279.969114] kasan_save_stack+0x1e/0x40 [ 279.969929] kasan_set_track+0x21/0x30 [ 279.970729] kasan_save_free_info+0x2a/0x40 [ 279.971603] ____kasan_slab_free+0x11a/0x1b0 [ 279.972483] __kmem_cache_free+0x14d/0x280 [ 279.973337] tcf_block_setup+0x29d/0x6b0 [ 279.974173] tcf_block_offload_cmd.isra.0+0x226/0x2d0 [ 279.975186] tcf_block_get_ext+0x61c/0x1200 [ 279.976080] ingress_init+0x112/0x1c0 [sch_ingress] [ 279.977065] qdisc_create+0x401/0xea0 [ 279.977857] tc_modify_qdisc+0x6f7/0x16d0 [ 279.978695] rtnetlink_rcv_msg+0x5fe/0x9d0 [ 279.979562] netlink_rcv_skb+0x12c/0x360 [ 279.980388] netlink_unicast+0x553/0x790 [ 279.981214] netlink_sendmsg+0x7a1/0xcb0 [ 279.982043] sock_sendmsg+0xc5/0x190 [ 279.982827] ____sys_sendmsg+0x535/0x6b0 [ 279.983703] ___sys_sendmsg+0xeb/0x170 [ 279.984510] __sys_sendmsg+0xb5/0x140 [ 279.985298] do_syscall_64+0x3d/0x90 [ 279.986076] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 279.987532] The buggy address belongs to the object at ffff888147e2bf00 which belongs to the cache kmalloc-192 of size 192 [ 279.989747] The buggy address is located 32 bytes inside of freed 192-byte region [ffff888147e2bf00, ffff888147e2bfc0) [ 279.992367] The buggy address belongs to the physical page: [ 279.993430] page:00000000550f405c refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147e2a [ 279.995182] head:00000000550f405c order:1 entire_mapcount:0 nr_pages_mapped:0 pincount:0 [ 279.996713] anon flags: 0x200000000010200(slab|head|node=0|zone=2) [ 279.997878] raw: 0200000000010200 ffff888100042a00 0000000000000000 dead000000000001 [ 279.999384] raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000 [ 280.000894] page dumped because: kasan: bad access detected [ 280.002386] Memory state around the buggy address: [ 280.003338] ffff888147e2be00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 280.004781] ffff888147e2be80: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc [ 280.006224] >ffff888147e2bf00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 280.007700] ^ [ 280.008592] ffff888147e2bf80: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc [ 280.010035] ffff888147e2c000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 280.011564] ================================================================== Fixes: 59094b1e ("net: sched: use flow block API") Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
David Ahern reported crashes in skb_copy_ubufs() caused by TCP tx zerocopy using hugepages, and skb length bigger than ~68 KB. skb_copy_ubufs() assumed it could copy all payload using up to MAX_SKB_FRAGS order-0 pages. This assumption broke when BIG TCP was able to put up to 512 KB per skb. We did not hit this bug at Google because we use CONFIG_MAX_SKB_FRAGS=45 and limit gso_max_size to 180000. A solution is to use higher order pages if needed. v2: add missing __GFP_COMP, or we leak memory. Fixes: 7c4e983c ("net: allow gso_max_size to exceed 65536") Reported-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/netdev/c70000f6-baa4-4a05-46d0-4b3e0dc1ccc8@gmail.com/T/Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Xin Long <lucien.xin@gmail.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Coco Li <lixiaoyan@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Cosmo Chou authored
ncsi_channel_is_tx() determines whether a given channel should be used for Tx or not. However, when reconfiguring the channel by handling a Configuration Required AEN, there is a misjudgment that the channel Tx has already been enabled, which results in the Enable Channel Network Tx command not being sent. Clear the channel Tx enable flag before reconfiguring the channel to avoid the misjudgment. Fixes: 8d951a75 ("net/ncsi: Configure multi-package, multi-channel modes with failover") Signed-off-by: Cosmo Chou <chou.cosmo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 27 Apr, 2023 14 commits
-
-
Paolo Abeni authored
Geetha sowjanya says: ==================== Macsec fixes for CN10KB This patch set has fixes for the issues encountered while testing macsec on CN10KB silicon. Below is the description of patches: Patch 1: For each LMAC two MCSX_MCS_TOP_SLAVE_CHANNEL_CFG registers exist in CN10KB. Bypass has to be disabled in two registers. Patch 2: Add workaround for errata w.r.t accessing TCAM DATA and MASK registers. Patch 3: Fixes the parser configuration to allow PTP traffic. Patch 4: Addresses the IP vector and block level interrupt mask changes. Patch 5: Fix NULL pointer crashes when rebooting Patch 6: Since MCS is global block shared by all LMACS the TCAM match must include macsec DMAC also to distinguish each macsec interface Patch 7: Before freeing MCS hardware resource to AF clear the stats also. Patch 8: Stats which share single counter in hardware are tracked in software. This tracking was based on wrong secy mode params. Use correct secy mode params Patch 9: When updating secy mode params, PN number was also reset to initial values. Hence do not write to PN value register when updating secy. ==================== Link: https://lore.kernel.org/r/20230426062528.20575-1-gakula@marvell.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Subbaraya Sundeep authored
After creating SecYs, SCs and SAs a SecY can be modified to change attributes like validation mode, protect frames mode etc. During this SecY update, packet number is reset to initial user given value by mistake. Hence do not reset PN when updating SecY parameters. Fixes: c54ffc73 ("octeontx2-pf: mcs: Introduce MACSEC hardware offloading") Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Geetha sowjanya <gakula@marvell.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Subbaraya Sundeep authored
Macsec stats like InPktsLate and InPktsDelayed share same counter in hardware. If SecY replay_protect is true then counter represents InPktsLate otherwise InPktsDelayed. This mode change was tracked based on protect_frames instead of replay_protect mistakenly. Similarly InPktsUnchecked and InPktsOk share same counter and mode change was tracked based on validate_check instead of validate_disabled. This patch fixes those problems. Fixes: c54ffc73 ("octeontx2-pf: mcs: Introduce MACSEC hardware offloading") Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Geetha sowjanya <gakula@marvell.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Subbaraya Sundeep authored
When freeing MCS hardware resources like SecY, SC and SA the corresponding stats needs to be cleared. Otherwise previous stats are shown in newly created macsec interfaces. Fixes: c54ffc73 ("octeontx2-pf: mcs: Introduce MACSEC hardware offloading") Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Geetha sowjanya <gakula@marvell.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Subbaraya Sundeep authored
On CN10KB silicon a single hardware macsec block is present and offloads macsec operations for all the ethernet LMACs. TCAM match with macsec ethertype 0x88e5 alone at RX side is not sufficient to distinguish all the macsec interfaces created on top of netdevs. Hence append the DMAC of the macsec interface too. Otherwise the first created macsec interface only receives all the macsec traffic. Fixes: c54ffc73 ("octeontx2-pf: mcs: Introduce MACSEC hardware offloading") Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Geetha sowjanya <gakula@marvell.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Subbaraya Sundeep authored
When system is rebooted after creating macsec interface below NULL pointer dereference crashes occurred. This patch fixes those crashes by using correct order of teardown [ 3324.406942] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 [ 3324.415726] Mem abort info: [ 3324.418510] ESR = 0x96000006 [ 3324.421557] EC = 0x25: DABT (current EL), IL = 32 bits [ 3324.426865] SET = 0, FnV = 0 [ 3324.429913] EA = 0, S1PTW = 0 [ 3324.433047] Data abort info: [ 3324.435921] ISV = 0, ISS = 0x00000006 [ 3324.439748] CM = 0, WnR = 0 .... [ 3324.575915] Call trace: [ 3324.578353] cn10k_mdo_del_secy+0x24/0x180 [ 3324.582440] macsec_common_dellink+0xec/0x120 [ 3324.586788] macsec_notify+0x17c/0x1c0 [ 3324.590529] raw_notifier_call_chain+0x50/0x70 [ 3324.594965] call_netdevice_notifiers_info+0x34/0x7c [ 3324.599921] rollback_registered_many+0x354/0x5bc [ 3324.604616] unregister_netdevice_queue+0x88/0x10c [ 3324.609399] unregister_netdev+0x20/0x30 [ 3324.613313] otx2_remove+0x8c/0x310 [ 3324.616794] pci_device_shutdown+0x30/0x70 [ 3324.620882] device_shutdown+0x11c/0x204 [ 966.664930] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 [ 966.673712] Mem abort info: [ 966.676497] ESR = 0x96000006 [ 966.679543] EC = 0x25: DABT (current EL), IL = 32 bits [ 966.684848] SET = 0, FnV = 0 [ 966.687895] EA = 0, S1PTW = 0 [ 966.691028] Data abort info: [ 966.693900] ISV = 0, ISS = 0x00000006 [ 966.697729] CM = 0, WnR = 0 [ 966.833467] Call trace: [ 966.835904] cn10k_mdo_stop+0x20/0xa0 [ 966.839557] macsec_dev_stop+0xe8/0x11c [ 966.843384] __dev_close_many+0xbc/0x140 [ 966.847298] dev_close_many+0x84/0x120 [ 966.851039] rollback_registered_many+0x114/0x5bc [ 966.855735] unregister_netdevice_many.part.0+0x14/0xa0 [ 966.860952] unregister_netdevice_many+0x18/0x24 [ 966.865560] macsec_notify+0x1ac/0x1c0 [ 966.869303] raw_notifier_call_chain+0x50/0x70 [ 966.873738] call_netdevice_notifiers_info+0x34/0x7c [ 966.878694] rollback_registered_many+0x354/0x5bc [ 966.883390] unregister_netdevice_queue+0x88/0x10c [ 966.888173] unregister_netdev+0x20/0x30 [ 966.892090] otx2_remove+0x8c/0x310 [ 966.895571] pci_device_shutdown+0x30/0x70 [ 966.899660] device_shutdown+0x11c/0x204 [ 966.903574] __do_sys_reboot+0x208/0x290 [ 966.907487] __arm64_sys_reboot+0x20/0x30 [ 966.911489] el0_svc_handler+0x80/0x1c0 [ 966.915316] el0_svc+0x8/0x180 [ 966.918362] Code: f9400000 f9400a64 91220014 f94b3403 (f9400060) [ 966.924448] ---[ end trace 341778e799c3d8d7 ]--- Fixes: c54ffc73 ("octeontx2-pf: mcs: Introduce MACSEC hardware offloading") Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Geetha sowjanya <gakula@marvell.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Geetha sowjanya authored
On CN10KB, MCS IP vector number, BBE and PAB interrupt mask got changed to support more block level interrupts. To address this changes, this patch fixes the bbe and pab interrupt handlers. Fixes: 6c635f78 ("octeontx2-af: cn10k: mcs: Handle MCS block interrupts") Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Geetha sowjanya <gakula@marvell.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Geetha sowjanya authored
When ptp timestamp is enabled in RPM, RPM will append 8B timestamp header for all RX traffic. MCS need to skip these 8 bytes header while parsing the packet header, so that correct tcam key is created for lookup. This patch fixes the mcs parser configuration to skip this 8B header for ptp packets. Fixes: ca7f49ff ("octeontx2-af: cn10k: Introduce driver for macsec block.") Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Geetha sowjanya <gakula@marvell.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Subbaraya Sundeep authored
As per hardware errata on CN10KB, all the four TCAM_DATA and TCAM_MASK registers has to be written at once otherwise write to individual registers will fail. Hence write to all TCAM_DATA registers and then to all TCAM_MASK registers. Fixes: cfc14181 ("octeontx2-af: cn10k: mcs: Manage the MCS block hardware resources") Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: Geetha sowjanya <gakula@marvell.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Geetha sowjanya authored
For each lmac port, MCS has two MCS_TOP_SLAVE_CHANNEL_CONFIGX registers. For CN10KB both register need to be configured for the port level mcs bypass to work. This patch also sets bitmap of flowid/secy entry reserved for default bypass so that these entries can be shown in debugfs. Fixes: bd69476e ("octeontx2-af: cn10k: mcs: Install a default TCAM for normal traffic") Signed-off-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
John Hickey authored
Commit 4fe81585 ("ixgbe: let the xdpdrv work with more than 64 cpus") adds support to allow XDP programs to run on systems with more than 64 CPUs by locking the XDP TX rings and indexing them using cpu % 64 (IXGBE_MAX_XDP_QS). Upon trying this out patch on a system with more than 64 cores, the kernel paniced with an array-index-out-of-bounds at the return in ixgbe_determine_xdp_ring in ixgbe.h, which means ixgbe_determine_xdp_q_idx was just returning the cpu instead of cpu % IXGBE_MAX_XDP_QS. An example splat: ========================================================================== UBSAN: array-index-out-of-bounds in /var/lib/dkms/ixgbe/5.18.6+focal-1/build/src/ixgbe.h:1147:26 index 65 is out of range for type 'ixgbe_ring *[64]' ========================================================================== BUG: kernel NULL pointer dereference, address: 0000000000000058 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP NOPTI CPU: 65 PID: 408 Comm: ksoftirqd/65 Tainted: G IOE 5.15.0-48-generic #54~20.04.1-Ubuntu Hardware name: Dell Inc. PowerEdge R640/0W23H8, BIOS 2.5.4 01/13/2020 RIP: 0010:ixgbe_xmit_xdp_ring+0x1b/0x1c0 [ixgbe] Code: 3b 52 d4 cf e9 42 f2 ff ff 66 0f 1f 44 00 00 0f 1f 44 00 00 55 b9 00 00 00 00 48 89 e5 41 57 41 56 41 55 41 54 53 48 83 ec 08 <44> 0f b7 47 58 0f b7 47 5a 0f b7 57 54 44 0f b7 76 08 66 41 39 c0 RSP: 0018:ffffbc3fcd88fcb0 EFLAGS: 00010282 RAX: ffff92a253260980 RBX: ffffbc3fe68b00a0 RCX: 0000000000000000 RDX: ffff928b5f659000 RSI: ffff928b5f659000 RDI: 0000000000000000 RBP: ffffbc3fcd88fce0 R08: ffff92b9dfc20580 R09: 0000000000000001 R10: 3d3d3d3d3d3d3d3d R11: 3d3d3d3d3d3d3d3d R12: 0000000000000000 R13: ffff928b2f0fa8c0 R14: ffff928b9be20050 R15: 000000000000003c FS: 0000000000000000(0000) GS:ffff92b9dfc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000058 CR3: 000000011dd6a002 CR4: 00000000007706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> ixgbe_poll+0x103e/0x1280 [ixgbe] ? sched_clock_cpu+0x12/0xe0 __napi_poll+0x30/0x160 net_rx_action+0x11c/0x270 __do_softirq+0xda/0x2ee run_ksoftirqd+0x2f/0x50 smpboot_thread_fn+0xb7/0x150 ? sort_range+0x30/0x30 kthread+0x127/0x150 ? set_kthread_struct+0x50/0x50 ret_from_fork+0x1f/0x30 </TASK> I think this is how it happens: Upon loading the first XDP program on a system with more than 64 CPUs, ixgbe_xdp_locking_key is incremented in ixgbe_xdp_setup. However, immediately after this, the rings are reconfigured by ixgbe_setup_tc. ixgbe_setup_tc calls ixgbe_clear_interrupt_scheme which calls ixgbe_free_q_vectors which calls ixgbe_free_q_vector in a loop. ixgbe_free_q_vector decrements ixgbe_xdp_locking_key once per call if it is non-zero. Commenting out the decrement in ixgbe_free_q_vector stopped my system from panicing. I suspect to make the original patch work, I would need to load an XDP program and then replace it in order to get ixgbe_xdp_locking_key back above 0 since ixgbe_setup_tc is only called when transitioning between XDP and non-XDP ring configurations, while ixgbe_xdp_locking_key is incremented every time ixgbe_xdp_setup is called. Also, ixgbe_setup_tc can be called via ethtool --set-channels, so this becomes another path to decrement ixgbe_xdp_locking_key to 0 on systems with more than 64 CPUs. Since ixgbe_xdp_locking_key only protects the XDP_TX path and is tied to the number of CPUs present, there is no reason to disable it upon unloading an XDP program. To avoid confusion, I have moved enabling ixgbe_xdp_locking_key into ixgbe_sw_init, which is part of the probe path. Fixes: 4fe81585 ("ixgbe: let the xdpdrv work with more than 64 cpus") Signed-off-by: John Hickey <jjh@daedalian.us> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20230425170308.2522429-1-anthony.l.nguyen@intel.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Pedro Tammela authored
Ido Schimmel reports a memleak on a syzkaller instance: BUG: memory leak unreferenced object 0xffff88803d45e400 (size 1024): comm "syz-executor292", pid 563, jiffies 4295025223 (age 51.781s) hex dump (first 32 bytes): 28 bd 70 00 fb db df 25 02 00 14 1f ff 02 00 02 (.p....%........ 00 32 00 00 1f 00 00 00 ac 14 14 3e 08 00 07 00 .2.........>.... backtrace: [<ffffffff81bd0f2c>] kmemleak_alloc_recursive include/linux/kmemleak.h:42 [inline] [<ffffffff81bd0f2c>] slab_post_alloc_hook mm/slab.h:772 [inline] [<ffffffff81bd0f2c>] slab_alloc_node mm/slub.c:3452 [inline] [<ffffffff81bd0f2c>] __kmem_cache_alloc_node+0x25c/0x320 mm/slub.c:3491 [<ffffffff81a865d9>] __do_kmalloc_node mm/slab_common.c:966 [inline] [<ffffffff81a865d9>] __kmalloc+0x59/0x1a0 mm/slab_common.c:980 [<ffffffff83aa85c3>] kmalloc include/linux/slab.h:584 [inline] [<ffffffff83aa85c3>] tcf_pedit_init+0x793/0x1ae0 net/sched/act_pedit.c:245 [<ffffffff83a90623>] tcf_action_init_1+0x453/0x6e0 net/sched/act_api.c:1394 [<ffffffff83a90e58>] tcf_action_init+0x5a8/0x950 net/sched/act_api.c:1459 [<ffffffff83a96258>] tcf_action_add+0x118/0x4e0 net/sched/act_api.c:1985 [<ffffffff83a96997>] tc_ctl_action+0x377/0x490 net/sched/act_api.c:2044 [<ffffffff83920a8d>] rtnetlink_rcv_msg+0x46d/0xd70 net/core/rtnetlink.c:6395 [<ffffffff83b24305>] netlink_rcv_skb+0x185/0x490 net/netlink/af_netlink.c:2575 [<ffffffff83901806>] rtnetlink_rcv+0x26/0x30 net/core/rtnetlink.c:6413 [<ffffffff83b21cae>] netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline] [<ffffffff83b21cae>] netlink_unicast+0x5be/0x8a0 net/netlink/af_netlink.c:1365 [<ffffffff83b2293f>] netlink_sendmsg+0x9af/0xed0 net/netlink/af_netlink.c:1942 [<ffffffff8380c39f>] sock_sendmsg_nosec net/socket.c:724 [inline] [<ffffffff8380c39f>] sock_sendmsg net/socket.c:747 [inline] [<ffffffff8380c39f>] ____sys_sendmsg+0x3ef/0xaa0 net/socket.c:2503 [<ffffffff838156d2>] ___sys_sendmsg+0x122/0x1c0 net/socket.c:2557 [<ffffffff8381594f>] __sys_sendmsg+0x11f/0x200 net/socket.c:2586 [<ffffffff83815ab0>] __do_sys_sendmsg net/socket.c:2595 [inline] [<ffffffff83815ab0>] __se_sys_sendmsg net/socket.c:2593 [inline] [<ffffffff83815ab0>] __x64_sys_sendmsg+0x80/0xc0 net/socket.c:2593 The recently added static offset check missed a free to the key buffer when bailing out on error. Fixes: e1201bc7 ("net/sched: act_pedit: check static offsets a priori") Reported-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20230425144725.669262-1-pctammela@mojatatu.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Ivan Vecera authored
Commit 08a0063d ("net/sched: flower: Move filter handle initialization earlier") moved filter handle initialization but an assignment of the handle to fnew->handle is done regardless of fold value. This is wrong because if fold != NULL (so fold->handle == handle) no new handle is allocated and passed handle is assigned to fnew->handle. Then if any subsequent action in fl_change() fails then the handle value is removed from IDR that is incorrect as we will have still valid old filter instance with handle that is not present in IDR. Fix this issue by moving the assignment so it is done only when passed fold == NULL. Prior the patch: [root@machine tc-testing]# ./tdc.py -d enp1s0f0np0 -e 14be Test 14be: Concurrently replace same range of 100k flower filters from 10 tc instances exit: 123 exit: 0 RTNETLINK answers: Invalid argument We have an error talking to the kernel Command failed tmp/replace_6:1885 All test results: 1..1 not ok 1 14be - Concurrently replace same range of 100k flower filters from 10 tc instances Command exited with 123, expected 0 RTNETLINK answers: Invalid argument We have an error talking to the kernel Command failed tmp/replace_6:1885 After the patch: [root@machine tc-testing]# ./tdc.py -d enp1s0f0np0 -e 14be Test 14be: Concurrently replace same range of 100k flower filters from 10 tc instances All test results: 1..1 ok 1 14be - Concurrently replace same range of 100k flower filters from 10 tc instances Fixes: 08a0063d ("net/sched: flower: Move filter handle initialization earlier") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230425140604.169881-1-ivecera@redhat.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
David Howells authored
Inside the loop in rxrpc_wait_to_be_connected() it checks call->error to see if it should exit the loop without first checking the call state. This is probably safe as if call->error is set, the call is dead anyway, but we should probably wait for the call state to have been set to completion first, lest it cause surprise on the way out. Fix this by only accessing call->error if the call is complete. We don't actually need to access the error inside the loop as we'll do that after. This caused the following report: BUG: KCSAN: data-race in rxrpc_send_data / rxrpc_set_call_completion write to 0xffff888159cf3c50 of 4 bytes by task 25673 on cpu 1: rxrpc_set_call_completion+0x71/0x1c0 net/rxrpc/call_state.c:22 rxrpc_send_data_packet+0xba9/0x1650 net/rxrpc/output.c:479 rxrpc_transmit_one+0x1e/0x130 net/rxrpc/output.c:714 rxrpc_decant_prepared_tx net/rxrpc/call_event.c:326 [inline] rxrpc_transmit_some_data+0x496/0x600 net/rxrpc/call_event.c:350 rxrpc_input_call_event+0x564/0x1220 net/rxrpc/call_event.c:464 rxrpc_io_thread+0x307/0x1d80 net/rxrpc/io_thread.c:461 kthread+0x1ac/0x1e0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308 read to 0xffff888159cf3c50 of 4 bytes by task 25672 on cpu 0: rxrpc_send_data+0x29e/0x1950 net/rxrpc/sendmsg.c:296 rxrpc_do_sendmsg+0xb7a/0xc20 net/rxrpc/sendmsg.c:726 rxrpc_sendmsg+0x413/0x520 net/rxrpc/af_rxrpc.c:565 sock_sendmsg_nosec net/socket.c:724 [inline] sock_sendmsg net/socket.c:747 [inline] ____sys_sendmsg+0x375/0x4c0 net/socket.c:2501 ___sys_sendmsg net/socket.c:2555 [inline] __sys_sendmmsg+0x263/0x500 net/socket.c:2641 __do_sys_sendmmsg net/socket.c:2670 [inline] __se_sys_sendmmsg net/socket.c:2667 [inline] __x64_sys_sendmmsg+0x57/0x60 net/socket.c:2667 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd value changed: 0x00000000 -> 0xffffffea Fixes: 9d35d880 ("rxrpc: Move client call connection to the I/O thread") Reported-by: syzbot+ebc945fdb4acd72cba78@syzkaller.appspotmail.com Link: https://lore.kernel.org/r/000000000000e7c6d205fa10a3cd@google.com/Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Dmitry Vyukov <dvyukov@google.com> cc: "David S. Miller" <davem@davemloft.net> cc: Eric Dumazet <edumazet@google.com> cc: Jakub Kicinski <kuba@kernel.org> cc: Paolo Abeni <pabeni@redhat.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org cc: netdev@vger.kernel.org Link: https://lore.kernel.org/r/508133.1682427395@warthog.procyon.org.ukSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
- 26 Apr, 2023 4 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextLinus Torvalds authored
Pull networking updates from Paolo Abeni: "Core: - Introduce a config option to tweak MAX_SKB_FRAGS. Increasing the default value allows for better BIG TCP performances - Reduce compound page head access for zero-copy data transfers - RPS/RFS improvements, avoiding unneeded NET_RX_SOFTIRQ when possible - Threaded NAPI improvements, adding defer skb free support and unneeded softirq avoidance - Address dst_entry reference count scalability issues, via false sharing avoidance and optimize refcount tracking - Add lockless accesses annotation to sk_err[_soft] - Optimize again the skb struct layout - Extends the skb drop reasons to make it usable by multiple subsystems - Better const qualifier awareness for socket casts BPF: - Add skb and XDP typed dynptrs which allow BPF programs for more ergonomic and less brittle iteration through data and variable-sized accesses - Add a new BPF netfilter program type and minimal support to hook BPF programs to netfilter hooks such as prerouting or forward - Add more precise memory usage reporting for all BPF map types - Adds support for using {FOU,GUE} encap with an ipip device operating in collect_md mode and add a set of BPF kfuncs for controlling encap params - Allow BPF programs to detect at load time whether a particular kfunc exists or not, and also add support for this in light skeleton - Bigger batch of BPF verifier improvements to prepare for upcoming BPF open-coded iterators allowing for less restrictive looping capabilities - Rework RCU enforcement in the verifier, add kptr_rcu and enforce BPF programs to NULL-check before passing such pointers into kfunc - Add support for kptrs in percpu hashmaps, percpu LRU hashmaps and in local storage maps - Enable RCU semantics for task BPF kptrs and allow referenced kptr tasks to be stored in BPF maps - Add support for refcounted local kptrs to the verifier for allowing shared ownership, useful for adding a node to both the BPF list and rbtree - Add BPF verifier support for ST instructions in convert_ctx_access() which will help new -mcpu=v4 clang flag to start emitting them - Add ARM32 USDT support to libbpf - Improve bpftool's visual program dump which produces the control flow graph in a DOT format by adding C source inline annotations Protocols: - IPv4: Allow adding to IPv4 address a 'protocol' tag. Such value indicates the provenance of the IP address - IPv6: optimize route lookup, dropping unneeded R/W lock acquisition - Add the handshake upcall mechanism, allowing the user-space to implement generic TLS handshake on kernel's behalf - Bridge: support per-{Port, VLAN} neighbor suppression, increasing resilience to nodes failures - SCTP: add support for Fair Capacity and Weighted Fair Queueing schedulers - MPTCP: delay first subflow allocation up to its first usage. This will allow for later better LSM interaction - xfrm: Remove inner/outer modes from input/output path. These are not needed anymore - WiFi: - reduced neighbor report (RNR) handling for AP mode - HW timestamping support - support for randomized auth/deauth TA for PASN privacy - per-link debugfs for multi-link - TC offload support for mac80211 drivers - mac80211 mesh fast-xmit and fast-rx support - enable Wi-Fi 7 (EHT) mesh support Netfilter: - Add nf_tables 'brouting' support, to force a packet to be routed instead of being bridged - Update bridge netfilter and ovs conntrack helpers to handle IPv6 Jumbo packets properly, i.e. fetch the packet length from hop-by-hop extension header. This is needed for BIT TCP support - The iptables 32bit compat interface isn't compiled in by default anymore - Move ip(6)tables builtin icmp matches to the udptcp one. This has the advantage that icmp/icmpv6 match doesn't load the iptables/ip6tables modules anymore when iptables-nft is used - Extended netlink error report for netdevice in flowtables and netdev/chains. Allow for incrementally add/delete devices to netdev basechain. Allow to create netdev chain without device Driver API: - Remove redundant Device Control Error Reporting Enable, as PCI core has already error reporting enabled at enumeration time - Move Multicast DB netlink handlers to core, allowing devices other then bridge to use them - Allow the page_pool to directly recycle the pages from safely localized NAPI - Implement lockless TX queue stop/wake combo macros, allowing for further code de-duplication and sanitization - Add YNL support for user headers and struct attrs - Add partial YNL specification for devlink - Add partial YNL specification for ethtool - Add tc-mqprio and tc-taprio support for preemptible traffic classes - Add tx push buf len param to ethtool, specifies the maximum number of bytes of a transmitted packet a driver can push directly to the underlying device - Add basic LED support for switch/phy - Add NAPI documentation, stop relaying on external links - Convert dsa_master_ioctl() to netdev notifier. This is a preparatory work to make the hardware timestamping layer selectable by user space - Add transceiver support and improve the error messages for CAN-FD controllers New hardware / drivers: - Ethernet: - AMD/Pensando core device support - MediaTek MT7981 SoC - MediaTek MT7988 SoC - Broadcom BCM53134 embedded switch - Texas Instruments CPSW9G ethernet switch - Qualcomm EMAC3 DWMAC ethernet - StarFive JH7110 SoC - NXP CBTX ethernet PHY - WiFi: - Apple M1 Pro/Max devices - RealTek rtl8710bu/rtl8188gu - RealTek rtl8822bs, rtl8822cs and rtl8821cs SDIO chipset - Bluetooth: - Realtek RTL8821CS, RTL8851B, RTL8852BS - Mediatek MT7663, MT7922 - NXP w8997 - Actions Semi ATS2851 - QTI WCN6855 - Marvell 88W8997 - Can: - STMicroelectronics bxcan stm32f429 Drivers: - Ethernet NICs: - Intel (1G, icg): - add tracking and reporting of QBV config errors - add support for configuring max SDU for each Tx queue - Intel (100G, ice): - refactor mailbox overflow detection to support Scalable IOV - GNSS interface optimization - Intel (i40e): - support XDP multi-buffer - nVidia/Mellanox: - add the support for linux bridge multicast offload - enable TC offload for egress and engress MACVLAN over bond - add support for VxLAN GBP encap/decap flows offload - extend packet offload to fully support libreswan - support tunnel mode in mlx5 IPsec packet offload - extend XDP multi-buffer support - support MACsec VLAN offload - add support for dynamic msix vectors allocation - drop RX page_cache and fully use page_pool - implement thermal zone to report NIC temperature - Netronome/Corigine: - add support for multi-zone conntrack offload - Solarflare/Xilinx: - support offloading TC VLAN push/pop actions to the MAE - support TC decap rules - support unicast PTP - Other NICs: - Broadcom (bnxt): enforce software based freq adjustments only on shared PHC NIC - RealTek (r8169): refactor to addess ASPM issues during NAPI poll - Micrel (lan8841): add support for PTP_PF_PEROUT - Cadence (macb): enable PTP unicast - Engleder (tsnep): add XDP socket zero-copy support - virtio-net: implement exact header length guest feature - veth: add page_pool support for page recycling - vxlan: add MDB data path support - gve: add XDP support for GQI-QPL format - geneve: accept every ethertype - macvlan: allow some packets to bypass broadcast queue - mana: add support for jumbo frame - Ethernet high-speed switches: - Microchip (sparx5): Add support for TC flower templates - Ethernet embedded switches: - Broadcom (b54): - configure 6318 and 63268 RGMII ports - Marvell (mv88e6xxx): - faster C45 bus scan - Microchip: - lan966x: - add support for IS1 VCAP - better TX/RX from/to CPU performances - ksz9477: add ETS Qdisc support - ksz8: enhance static MAC table operations and error handling - sama7g5: add PTP capability - NXP (ocelot): - add support for external ports - add support for preemptible traffic classes - Texas Instruments: - add CPSWxG SGMII support for J7200 and J721E - Intel WiFi (iwlwifi): - preparation for Wi-Fi 7 EHT and multi-link support - EHT (Wi-Fi 7) sniffer support - hardware timestamping support for some devices/firwmares - TX beacon protection on newer hardware - Qualcomm 802.11ax WiFi (ath11k): - MU-MIMO parameters support - ack signal support for management packets - RealTek WiFi (rtw88): - SDIO bus support - better support for some SDIO devices (e.g. MAC address from efuse) - RealTek WiFi (rtw89): - HW scan support for 8852b - better support for 6 GHz scanning - support for various newer firmware APIs - framework firmware backwards compatibility - MediaTek WiFi (mt76): - P2P support - mesh A-MSDU support - EHT (Wi-Fi 7) support - coredump support" * tag 'net-next-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2078 commits) net: phy: hide the PHYLIB_LEDS knob net: phy: marvell-88x2222: remove unnecessary (void*) conversions tcp/udp: Fix memleaks of sk and zerocopy skbs with TX timestamp. net: amd: Fix link leak when verifying config failed net: phy: marvell: Fix inconsistent indenting in led_blink_set lan966x: Don't use xdp_frame when action is XDP_TX tsnep: Add XDP socket zero-copy TX support tsnep: Add XDP socket zero-copy RX support tsnep: Move skb receive action to separate function tsnep: Add functions for queue enable/disable tsnep: Rework TX/RX queue initialization tsnep: Replace modulo operation with mask net: phy: dp83867: Add led_brightness_set support net: phy: Fix reading LED reg property drivers: nfc: nfcsim: remove return value check of `dev_dir` net: phy: dp83867: Remove unnecessary (void*) conversions net: ethtool: coalesce: try to make user settings stick twice net: mana: Check if netdev/napi_alloc_frag returns single page net: mana: Rename mana_refill_rxoob and remove some empty lines net: veth: add page_pool stats ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds authored
Pull SCSI updates from James Bottomley: "Updates to the usual drivers (megaraid_sas, scsi_debug, lpfc, target, mpi3mr, hisi_sas, arcmsr). The major core change is the constification of the host templates (which touches everything) along with other minor fixups and clean ups" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (207 commits) scsi: ufs: mcq: Use pointer arithmetic in ufshcd_send_command() scsi: ufs: mcq: Annotate ufshcd_inc_sq_tail() appropriately scsi: cxlflash: s/semahpore/semaphore/ scsi: lpfc: Silence an incorrect device output scsi: mpi3mr: Use IRQ save variants of spinlock to protect chain frame allocation scsi: scsi_debug: Fix missing error code in scsi_debug_init() scsi: hisi_sas: Work around build failure in suspend function scsi: lpfc: Fix ioremap issues in lpfc_sli4_pci_mem_setup() scsi: mpt3sas: Fix an issue when driver is being removed scsi: mpt3sas: Remove HBA BIOS version in the kernel log scsi: target: core: Fix invalid memory access scsi: scsi_debug: Drop sdebug_queue scsi: scsi_debug: Only allow sdebug_max_queue be modified when no shosts scsi: scsi_debug: Use scsi_host_busy() in delay_store() and ndelay_store() scsi: scsi_debug: Use blk_mq_tagset_busy_iter() in stop_all_queued() scsi: scsi_debug: Use blk_mq_tagset_busy_iter() in sdebug_blk_mq_poll() scsi: scsi_debug: Dynamically allocate sdebug_queued_cmd scsi: scsi_debug: Use scsi_block_requests() to block queues scsi: scsi_debug: Protect block_unblock_all_queues() with mutex scsi: scsi_debug: Change shost list lock to a mutex ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libataLinus Torvalds authored
Pull ata updates from Damien Le Moal: - Many cleanups of the pata_parport driver and of its protocol modules (Ondrej) - Remove unused code (ata_id_xxx() functions) (Sergey) - Add Add UniPhier SATA controller DT bindings (Kunihiko) - Fix dependencies for the Freescale QorIQ AHCI SATA controller driver (Geert) - DT property handling improvements (Rob) * tag 'ata-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata: (57 commits) ata: pata_parport-bpck6: Declare mode_map as static ata: pata_parport-bpck6: Remove dependency on 64BIT ata: pata_parport-bpck6: reduce indents in bpck6_open ata: pata_parport-bpck6: delete ppc6lnx.c ata: pata_parport-bpck6: move defines and mode_map to bpck6.c ata: pata_parport-bpck6: move ppc6_wr_data_byte to bpck6.c and rename ata: pata_parport-bpck6: move ppc6_rd_data_byte to bpck6.c and rename ata: pata_parport-bpck6: move ppc6_send_cmd to bpck6.c and rename ata: pata_parport-bpck6: move ppc6_deselect to bpck6.c and rename ata: pata_parport-bpck6: merge ppc6_select into bpck6_open ata: pata_parport-bpck6: move ppc6_open to bpck6.c and rename ata: pata_parport-bpck6: move ppc6_wr_extout to bpck6.c and rename ata: pata_parport-bpck6: move ppc6_wait_for_fifo to bpck6.c and rename ata: pata_parport-bpck6: merge ppc6_wr_data_blk into bpck6_write_block ata: pata_parport-bpck6: merge ppc6_rd_data_blk into bpck6_read_block ata: pata_parport-bpck6: merge ppc6_wr_port16_blk into bpck6_write_block ata: pata_parport-bpck6: merge ppc6_rd_port16_blk into bpck6_read_block ata: pata_parport-bpck6: merge ppc6_wr_port into bpck6_write_regr ata: pata_parport-bpck6: merge ppc6_rd_port into bpck6_read_regr ata: pata_parport-bpck6: remove ppc6_close ...
-
Linus Torvalds authored
Merge tag 'for-6.4/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: - Split dm-bufio's rw_semaphore and rbtree. Offers improvements to dm-bufio's locking to allow increased concurrent IO -- particularly for read access for buffers already in dm-bufio's cache. - Also split dm-bio-prison-v1's spinlock and rbtree with comparable aim at improving concurrent IO (for the DM thinp target). - Both the dm-bufio and dm-bio-prison-v1 scaling of the number of locks and rbtrees used are managed by dm_num_hash_locks(). And the hash function used by both is dm_hash_locks_index(). - Allow DM targets to require DISCARD, WRITE_ZEROES and SECURE_ERASE to be split at the target specified boundary (in terms of max_discard_sectors, max_write_zeroes_sectors and max_secure_erase_sectors respectively). - DM verity error handling fix for check_at_most_once on FEC. - Update DM verity target to emit audit events on verification failure and more. - DM core ->io_hints improvements needed in support of new discard support that is added to the DM "zero" and "error" targets. - Fix missing kmem_cache_destroy() call in initialization error path of both the DM integrity and DM clone targets. - A couple fixes for DM flakey, also add "error_reads" feature. - Fix DM core's resume to not lock FS when the DM map is NULL; otherwise initial table load can race with FS mount that takes superblock's ->s_umount rw_semaphore. - Various small improvements to both DM core and DM targets. * tag 'for-6.4/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (40 commits) dm: don't lock fs when the map is NULL in process of resume dm flakey: add an "error_reads" option dm flakey: remove trailing space in the table line dm flakey: fix a crash with invalid table line dm ioctl: fix nested locking in table_clear() to remove deadlock concern dm: unexport dm_get_queue_limits() dm: allow targets to require splitting WRITE_ZEROES and SECURE_ERASE dm: add helper macro for simple DM target module init and exit dm raid: remove unused d variable dm: remove unnecessary (void*) conversions dm mirror: add DMERR message if alloc_workqueue fails dm: push error reporting down to dm_register_target() dm integrity: call kmem_cache_destroy() in dm_integrity_init() error path dm clone: call kmem_cache_destroy() in dm_clone_init() error path dm error: add discard support dm zero: add discard support dm table: allow targets without devices to set ->io_hints dm verity: emit audit events on verification failure and more dm verity: fix error handling for check_at_most_once on FEC dm: improve hash_locks sizing and hash function ...
-