- 28 Dec, 2022 3 commits
-
-
Jakub Kicinski authored
The netdev-FAQ document has grown over the years to the point where finding information in it is somewhat challenging. The length of the questions prevents readers from locating content that's relevant at a glance. Convert to a more standard documentation format with sections and sub-sections rather than questions and answers. The content edits are limited to what's necessary to change the format, and very minor clarifications. Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
Subsequent changes will reformat the doc away from FAQ. To make that more readable perform the pure section moves now. Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David Howells authored
At the end of rxrpc_recvmsg(), if a call is found, the call is put and then a trace line is emitted referencing that call in a couple of places - but the call may have been deallocated by the time those traces happen. Fix this by stashing the call debug_id in a variable and passing that to the tracepoint rather than the call pointer. Fixes: 84997905 ("rxrpc: Add a tracepoint to follow what recvmsg does") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 26 Dec, 2022 5 commits
-
-
Anuradha Weeraman authored
Fix for uninitialized variable warning. Addresses-Coverity: ("Uninitialized scalar variable") Signed-off-by: Anuradha Weeraman <anuradha@debian.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Miaoqian Lin authored
nfc_get_device() take reference for the device, add missing nfc_put_device() to release it when not need anymore. Also fix the style warnning by use error EOPNOTSUPP instead of ENOTSUPP. Fixes: 5ce3f32b ("NFC: netlink: SE API implementation") Fixes: 29e76924 ("nfc: netlink: Add capability to reply to vendor_cmd with data") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Johnny S. Lee authored
PTP hardware timestamping related objects are not linked when PTP support for MV88E6xxx (NET_DSA_MV88E6XXX_PTP) is disabled, therefore NET_DSA_MV88E6XXX should not depend on PTP_1588_CLOCK_OPTIONAL regardless of NET_DSA_MV88E6XXX_PTP. Instead, condition more strictly on how NET_DSA_MV88E6XXX_PTP's dependencies are met, making sure that it cannot be enabled when NET_DSA_MV88E6XXX=y and PTP_1588_CLOCK=m. In other words, this commit allows NET_DSA_MV88E6XXX to be built-in while PTP_1588_CLOCK is a module, as long as NET_DSA_MV88E6XXX_PTP is prevented from being enabled. Fixes: e5f31552 ("ethernet: fix PTP_1588_CLOCK dependencies") Signed-off-by: Johnny S. Lee <foss@jsl.io> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Daniil Tatianin authored
adapter->dcb would get silently freed inside qlcnic_dcb_enable() in case qlcnic_dcb_attach() would return an error, which always happens under OOM conditions. This would lead to use-after-free because both of the existing callers invoke qlcnic_dcb_get_info() on the obtained pointer, which is potentially freed at that point. Propagate errors from qlcnic_dcb_enable(), and instead free the dcb pointer at callsite using qlcnic_dcb_free(). This also removes the now unused qlcnic_clear_dcb_ops() helper, which was a simple wrapper around kfree() also causing memory leaks for partially initialized dcb. Found by Linux Verification Center (linuxtesting.org) with the SVACE static analysis tool. Fixes: 3c44bba1 ("qlcnic: Disable DCB operations from SR-IOV VFs") Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Hawkins Jiawei authored
Syzkaller reports a memory leak as follows: ==================================== BUG: memory leak unreferenced object 0xffff88810c287f00 (size 256): comm "syz-executor105", pid 3600, jiffies 4294943292 (age 12.990s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff814cf9f0>] kmalloc_trace+0x20/0x90 mm/slab_common.c:1046 [<ffffffff839c9e07>] kmalloc include/linux/slab.h:576 [inline] [<ffffffff839c9e07>] kmalloc_array include/linux/slab.h:627 [inline] [<ffffffff839c9e07>] kcalloc include/linux/slab.h:659 [inline] [<ffffffff839c9e07>] tcf_exts_init include/net/pkt_cls.h:250 [inline] [<ffffffff839c9e07>] tcindex_set_parms+0xa7/0xbe0 net/sched/cls_tcindex.c:342 [<ffffffff839caa1f>] tcindex_change+0xdf/0x120 net/sched/cls_tcindex.c:553 [<ffffffff8394db62>] tc_new_tfilter+0x4f2/0x1100 net/sched/cls_api.c:2147 [<ffffffff8389e91c>] rtnetlink_rcv_msg+0x4dc/0x5d0 net/core/rtnetlink.c:6082 [<ffffffff839eba67>] netlink_rcv_skb+0x87/0x1d0 net/netlink/af_netlink.c:2540 [<ffffffff839eab87>] netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] [<ffffffff839eab87>] netlink_unicast+0x397/0x4c0 net/netlink/af_netlink.c:1345 [<ffffffff839eb046>] netlink_sendmsg+0x396/0x710 net/netlink/af_netlink.c:1921 [<ffffffff8383e796>] sock_sendmsg_nosec net/socket.c:714 [inline] [<ffffffff8383e796>] sock_sendmsg+0x56/0x80 net/socket.c:734 [<ffffffff8383eb08>] ____sys_sendmsg+0x178/0x410 net/socket.c:2482 [<ffffffff83843678>] ___sys_sendmsg+0xa8/0x110 net/socket.c:2536 [<ffffffff838439c5>] __sys_sendmmsg+0x105/0x330 net/socket.c:2622 [<ffffffff83843c14>] __do_sys_sendmmsg net/socket.c:2651 [inline] [<ffffffff83843c14>] __se_sys_sendmmsg net/socket.c:2648 [inline] [<ffffffff83843c14>] __x64_sys_sendmmsg+0x24/0x30 net/socket.c:2648 [<ffffffff84605fd5>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] [<ffffffff84605fd5>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 [<ffffffff84800087>] entry_SYSCALL_64_after_hwframe+0x63/0xcd ==================================== Kernel uses tcindex_change() to change an existing filter properties. Yet the problem is that, during the process of changing, if `old_r` is retrieved from `p->perfect`, then kernel uses tcindex_alloc_perfect_hash() to newly allocate filter results, uses tcindex_filter_result_init() to clear the old filter result, without destroying its tcf_exts structure, which triggers the above memory leak. To be more specific, there are only two source for the `old_r`, according to the tcindex_lookup(). `old_r` is retrieved from `p->perfect`, or `old_r` is retrieved from `p->h`. * If `old_r` is retrieved from `p->perfect`, kernel uses tcindex_alloc_perfect_hash() to newly allocate the filter results. Then `r` is assigned with `cp->perfect + handle`, which is newly allocated. So condition `old_r && old_r != r` is true in this situation, and kernel uses tcindex_filter_result_init() to clear the old filter result, without destroying its tcf_exts structure * If `old_r` is retrieved from `p->h`, then `p->perfect` is NULL according to the tcindex_lookup(). Considering that `cp->h` is directly copied from `p->h` and `p->perfect` is NULL, `r` is assigned with `tcindex_lookup(cp, handle)`, whose value should be the same as `old_r`, so condition `old_r && old_r != r` is false in this situation, kernel ignores using tcindex_filter_result_init() to clear the old filter result. So only when `old_r` is retrieved from `p->perfect` does kernel use tcindex_filter_result_init() to clear the old filter result, which triggers the above memory leak. Considering that there already exists a tc_filter_wq workqueue to destroy the old tcindex_data by tcindex_partial_destroy_work() at the end of tcindex_set_parms(), this patch solves this memory leak bug by removing this old filter result clearing part and delegating it to the tc_filter_wq workqueue. Note that this patch doesn't introduce any other issues. If `old_r` is retrieved from `p->perfect`, this patch just delegates old filter result clearing part to the tc_filter_wq workqueue; If `old_r` is retrieved from `p->h`, kernel doesn't reach the old filter result clearing part, so removing this part has no effect. [Thanks to the suggestion from Jakub Kicinski, Cong Wang, Paolo Abeni and Dmitry Vyukov] Fixes: b9a24bb7 ("net_sched: properly handle failure case of tcf_exts_init()") Link: https://lore.kernel.org/all/0000000000001de5c505ebc9ec59@google.com/ Reported-by: syzbot+232ebdbd36706c965ebf@syzkaller.appspotmail.com Tested-by: syzbot+232ebdbd36706c965ebf@syzkaller.appspotmail.com Cc: Cong Wang <cong.wang@bytedance.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 24 Dec, 2022 1 commit
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller authored
Daniel Borkmann says: ==================== The following pull-request contains BPF updates for your *net* tree. We've added 7 non-merge commits during the last 5 day(s) which contain a total of 11 files changed, 231 insertions(+), 3 deletions(-). The main changes are: 1) Fix a splat in bpf_skb_generic_pop() under CHECKSUM_PARTIAL due to misuse of skb_postpull_rcsum(), from Jakub Kicinski with test case from Martin Lau. 2) Fix BPF verifier's nullness propagation when registers are of type PTR_TO_BTF_ID, from Hao Sun. 3) Fix bpftool build for JIT disassembler under statically built libllvm, from Anton Protopopov. 4) Fix warnings reported by resolve_btfids when building vmlinux with CONFIG_SECURITY_NETWORK disabled, from Hou Tao. 5) Minor fix up for BPF selftest gitignore, from Stanislav Fomichev. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 23 Dec, 2022 9 commits
-
-
Stanislav Fomichev authored
Shows up when cross-compiling: HOST_SCRATCH_DIR := $(OUTPUT)/host-tools vs SCRATCH_DIR := $(OUTPUT)/tools HOST_SCRATCH_DIR := $(SCRATCH_DIR) Reported-by: John Sperbeck <jsperbeck@google.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20221222213958.2302320-1-sdf@google.com
-
Jakub Kicinski authored
Hao Lan says: ==================== net: hns3: fix some bug for hns3 There are some bugfixes for the HNS3 ethernet driver. patch#1 fix miss checking for rx packet. patch#2 fixes VF promisc mode not update when mac table full bug, and patch#3 fixes a nterrupts not initialization in VF FLR bug. ==================== Link: https://lore.kernel.org/r/20221222064343.61537-1-lanhao@huawei.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jian Shen authored
Currently, it missed set HCLGE_VPORT_STATE_PROMISC_CHANGE flag for VF when vport->overflow_promisc_flags changed. So the VF won't check whether to update promisc mode in this case. So add it. Fixes: 1e6e7610 ("net: hns3: configure promisc mode for VF asynchronously") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Hao Lan <lanhao@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jian Shen authored
For device supports RXD advanced layout, the driver will return directly if the hardware finish the checksum calculate. It cause missing L3E checking for ip packets. Fixes it. Fixes: 1ddc028a ("net: hns3: refactor out RX completion checksum") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Hao Lan <lanhao@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jie Wang authored
Currently keep alive message between PF and VF may be lost and the VF is unalive in PF. So the VF will not do reset during PF FLR reset process. This would make the allocated interrupt resources of VF invalid and VF would't receive or respond to PF any more. So this patch adds VF interrupts re-initialization during VF FLR for VF recovery in above cases. Fixes: 862d969a ("net: hns3: do VF's pci re-initialization while PF doing FLR") Signed-off-by: Jie Wang <wangjie125@huawei.com> Signed-off-by: Hao Lan <lanhao@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Rong Tao authored
Fix the typo of 'Unsuported' in atmbr2684.h Signed-off-by: Rong Tao <rongtao@cestc.cn> Link: https://lore.kernel.org/r/tencent_F1354BEC925C65EA357E741E91DF2044E805@qq.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Sean Anderson authored
There aren't enough resources to run these ports at 10G speeds. Disable 10G for these ports, reverting to the previous speed. Fixes: 36926a7d ("powerpc: dts: t208x: Mark MAC1 and MAC2 as 10G") Reported-by: Camelia Alexandra Groza <camelia.groza@nxp.com> Signed-off-by: Sean Anderson <sean.anderson@seco.com> Reviewed-by: Camelia Groza <camelia.groza@nxp.com> Tested-by: Camelia Groza <camelia.groza@nxp.com> Link: https://lore.kernel.org/r/20221216172937.2960054-1-sean.anderson@seco.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Hao Sun authored
Verify that nullness information is not porpagated in the branches of register to register JEQ and JNE operations if one of them is PTR_TO_BTF_ID. Implement this in C level so we can use CO-RE. Signed-off-by: Hao Sun <sunhao.th@gmail.com> Suggested-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20221222024414.29539-2-sunhao.th@gmail.comSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
Hao Sun authored
After befae758, the verifier would propagate null information after JEQ/JNE, e.g., if two pointers, one is maybe_null and the other is not, the former would be marked as non-null in eq path. However, as comment "PTR_TO_BTF_ID points to a kernel struct that does not need to be null checked by the BPF program ... The verifier must keep this in mind and can make no assumptions about null or non-null when doing branch ...". If one pointer is maybe_null and the other is PTR_TO_BTF, the former is incorrectly marked non-null. The following BPF prog can trigger a null-ptr-deref, also see this report for more details[1]: 0: (18) r1 = map_fd ; R1_w=map_ptr(ks=4, vs=4) 2: (79) r6 = *(u64 *)(r1 +8) ; R6_w=bpf_map->inner_map_data ; R6 is PTR_TO_BTF_ID ; equals to null at runtime 3: (bf) r2 = r10 4: (07) r2 += -4 5: (62) *(u32 *)(r2 +0) = 0 6: (85) call bpf_map_lookup_elem#1 ; R0_w=map_value_or_null 7: (1d) if r6 == r0 goto pc+1 8: (95) exit ; from 7 to 9: R0=map_value R6=ptr_bpf_map 9: (61) r0 = *(u32 *)(r0 +0) ; null-ptr-deref 10: (95) exit So, make the verifier propagate nullness information for reg to reg comparisons only if neither reg is PTR_TO_BTF_ID. [1] https://lore.kernel.org/bpf/CACkBjsaFJwjC5oiw-1KXvcazywodwXo4zGYsRHwbr2gSG9WcSw@mail.gmail.com/T/#u Fixes: befae758 ("bpf: propagate nullness information for reg to reg comparisons") Signed-off-by: Hao Sun <sunhao.th@gmail.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20221222024414.29539-1-sunhao.th@gmail.comSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
- 22 Dec, 2022 13 commits
-
-
Anton Protopopov authored
Since the commit eb9d1acf ("bpftool: Add LLVM as default library for disassembling JIT-ed programs") we might link the bpftool program with the libllvm library. This works fine when a shared libllvm library is available, but fails if we want to link bpftool with a statically built LLVM: [...] /usr/bin/ld: /usr/local/lib/libLLVMSupport.a(CrashRecoveryContext.cpp.o): in function `llvm::CrashRecoveryContextCleanup::~CrashRecoveryContextCleanup()': CrashRecoveryContext.cpp:(.text._ZN4llvm27CrashRecoveryContextCleanupD0Ev+0x17): undefined reference to `operator delete(void*, unsigned long)' /usr/bin/ld: /usr/local/lib/libLLVMSupport.a(CrashRecoveryContext.cpp.o): in function `llvm::CrashRecoveryContext::~CrashRecoveryContext()': CrashRecoveryContext.cpp:(.text._ZN4llvm20CrashRecoveryContextD2Ev+0xc8): undefined reference to `operator delete(void*, unsigned long)' [...] So in the case of static libllvm we need to explicitly link bpftool with required libraries, namely, libstdc++ and those provided by the `llvm-config --system-libs` command. We can distinguish between the shared and static cases by using the `llvm-config --shared-mode` command. Fixes: eb9d1acf ("bpftool: Add LLVM as default library for disassembling JIT-ed programs") Signed-off-by: Anton Protopopov <aspsk@isovalent.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20221222102627.1643709-1-aspsk@isovalent.com
-
Shawn Bohrer authored
When AF_XDP is used on on a veth interface the RX ring is updated in two steps. veth_xdp_rcv() removes packet descriptors from the FILL ring fills them and places them in the RX ring updating the cached_prod pointer. Later xdp_do_flush() syncs the RX ring prod pointer with the cached_prod pointer allowing user-space to see the recently filled in descriptors. The rings are intended to be SPSC, however the existing order in veth_poll allows the xdp_do_flush() to run concurrently with another CPU creating a race condition that allows user-space to see old or uninitialized descriptors in the RX ring. This bug has been observed in production systems. To summarize, we are expecting this ordering: CPU 0 __xsk_rcv_zc() CPU 0 __xsk_map_flush() CPU 2 __xsk_rcv_zc() CPU 2 __xsk_map_flush() But we are seeing this order: CPU 0 __xsk_rcv_zc() CPU 2 __xsk_rcv_zc() CPU 0 __xsk_map_flush() CPU 2 __xsk_map_flush() This occurs because we rely on NAPI to ensure that only one napi_poll handler is running at a time for the given veth receive queue. napi_schedule_prep() will prevent multiple instances from getting scheduled. However calling napi_complete_done() signals that this napi_poll is complete and allows subsequent calls to napi_schedule_prep() and __napi_schedule() to succeed in scheduling a concurrent napi_poll before the xdp_do_flush() has been called. For the veth driver a concurrent call to napi_schedule_prep() and __napi_schedule() can occur on a different CPU because the veth xmit path can additionally schedule a napi_poll creating the race. The fix as suggested by Magnus Karlsson, is to simply move the xdp_do_flush() call before napi_complete_done(). This syncs the producer ring pointers before another instance of napi_poll can be scheduled on another CPU. It will also slightly improve performance by moving the flush closer to when the descriptors were placed in the RX ring. Fixes: d1396004 ("veth: Add XDP TX and REDIRECT") Suggested-by: Magnus Karlsson <magnus.karlsson@gmail.com> Signed-off-by: Shawn Bohrer <sbohrer@cloudflare.com> Link: https://lore.kernel.org/r/20221220185903.1105011-1-sbohrer@cloudflare.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Horatiu Vultur authored
When the PCS was taken out of reset, we were changing by mistake also the speed to 100 Mbit. But in case the link was going down, the link up routine was setting correctly the link speed. If the link was not getting down then the speed was forced to run at 100 even if the speed was something else. On lan966x, to set the speed link to 1G or 2.5G a value of 1 needs to be written in DEV_CLOCK_CFG_LINK_SPEED. This is similar to the procedure in lan966x_port_init. The issue was reproduced using 1000base-x sfp module using the commands: ip link set dev eth2 up ip link addr add 10.97.10.2/24 dev eth2 ethtool -s eth2 speed 1000 autoneg off Fixes: d28d6d2e ("net: lan966x: add port module support") Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Reviewed-by: Piotr Raczynski <piotr.raczynski@intel.com> Link: https://lore.kernel.org/r/20221221093315.939133-1-horatiu.vultur@microchip.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Eric Dumazet authored
bond_miimon_commit() is run while RTNL is held, not RCU. WARNING: suspicious RCU usage 6.1.0-syzkaller-09671-g89529367 #0 Not tainted ----------------------------- drivers/net/bonding/bond_main.c:2704 suspicious rcu_dereference_check() usage! Fixes: e95cc447 ("bonding: do failover when high prio link up") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Hangbin Liu <liuhangbin@gmail.com> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Link: https://lore.kernel.org/r/20221220130831.1480888-1-edumazet@google.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Jakub Kicinski authored
Mat Martineau says: ==================== mptcp: Locking fixes Two separate locking fixes for the networking tree: Patch 1 addresses a MPTCP fastopen error-path deadlock that was found with syzkaller. Patch 2 works around a lockdep false-positive between MPTCP listening and non-listening sockets at socket destruct time. ==================== Link: https://lore.kernel.org/r/20221220195215.238353-1-mathew.j.martineau@linux.intel.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Paolo Abeni authored
MattB reported a lockdep splat in the mptcp listener code cleanup: WARNING: possible circular locking dependency detected packetdrill/14278 is trying to acquire lock: ffff888017d868f0 ((work_completion)(&msk->work)){+.+.}-{0:0}, at: __flush_work (kernel/workqueue.c:3069) but task is already holding lock: ffff888017d84130 (sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_close (net/mptcp/protocol.c:2973) which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (sk_lock-AF_INET){+.+.}-{0:0}: __lock_acquire (kernel/locking/lockdep.c:5055) lock_acquire (kernel/locking/lockdep.c:466) lock_sock_nested (net/core/sock.c:3463) mptcp_worker (net/mptcp/protocol.c:2614) process_one_work (kernel/workqueue.c:2294) worker_thread (include/linux/list.h:292) kthread (kernel/kthread.c:376) ret_from_fork (arch/x86/entry/entry_64.S:312) -> #0 ((work_completion)(&msk->work)){+.+.}-{0:0}: check_prev_add (kernel/locking/lockdep.c:3098) validate_chain (kernel/locking/lockdep.c:3217) __lock_acquire (kernel/locking/lockdep.c:5055) lock_acquire (kernel/locking/lockdep.c:466) __flush_work (kernel/workqueue.c:3070) __cancel_work_timer (kernel/workqueue.c:3160) mptcp_cancel_work (net/mptcp/protocol.c:2758) mptcp_subflow_queue_clean (net/mptcp/subflow.c:1817) __mptcp_close_ssk (net/mptcp/protocol.c:2363) mptcp_destroy_common (net/mptcp/protocol.c:3170) mptcp_destroy (include/net/sock.h:1495) __mptcp_destroy_sock (net/mptcp/protocol.c:2886) __mptcp_close (net/mptcp/protocol.c:2959) mptcp_close (net/mptcp/protocol.c:2974) inet_release (net/ipv4/af_inet.c:432) __sock_release (net/socket.c:651) sock_close (net/socket.c:1367) __fput (fs/file_table.c:320) task_work_run (kernel/task_work.c:181 (discriminator 1)) exit_to_user_mode_prepare (include/linux/resume_user_mode.h:49) syscall_exit_to_user_mode (kernel/entry/common.c:130) do_syscall_64 (arch/x86/entry/common.c:87) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120) other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(sk_lock-AF_INET); lock((work_completion)(&msk->work)); lock(sk_lock-AF_INET); lock((work_completion)(&msk->work)); *** DEADLOCK *** The report is actually a false positive, since the only existing lock nesting is the msk socket lock acquired by the mptcp work. cancel_work_sync() is invoked without the relevant socket lock being held, but under a different (the msk listener) socket lock. We could silence the splat adding a per workqueue dynamic lockdep key, but that looks overkill. Instead just tell lockdep the msk socket lock is not held around cancel_work_sync(). Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/322 Fixes: 30e51b92 ("mptcp: fix unreleased socket in accept queue") Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Paolo Abeni authored
MatM reported a deadlock at fastopening time: INFO: task syz-executor.0:11454 blocked for more than 143 seconds. Tainted: G S 6.1.0-rc5-03226-gdb0157db5153 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:syz-executor.0 state:D stack:25104 pid:11454 ppid:424 flags:0x00004006 Call Trace: <TASK> context_switch kernel/sched/core.c:5191 [inline] __schedule+0x5c2/0x1550 kernel/sched/core.c:6503 schedule+0xe8/0x1c0 kernel/sched/core.c:6579 __lock_sock+0x142/0x260 net/core/sock.c:2896 lock_sock_nested+0xdb/0x100 net/core/sock.c:3466 __mptcp_close_ssk+0x1a3/0x790 net/mptcp/protocol.c:2328 mptcp_destroy_common+0x16a/0x650 net/mptcp/protocol.c:3171 mptcp_disconnect+0xb8/0x450 net/mptcp/protocol.c:3019 __inet_stream_connect+0x897/0xa40 net/ipv4/af_inet.c:720 tcp_sendmsg_fastopen+0x3dd/0x740 net/ipv4/tcp.c:1200 mptcp_sendmsg_fastopen net/mptcp/protocol.c:1682 [inline] mptcp_sendmsg+0x128a/0x1a50 net/mptcp/protocol.c:1721 inet6_sendmsg+0x11f/0x150 net/ipv6/af_inet6.c:663 sock_sendmsg_nosec net/socket.c:714 [inline] sock_sendmsg+0xf7/0x190 net/socket.c:734 ____sys_sendmsg+0x336/0x970 net/socket.c:2476 ___sys_sendmsg+0x122/0x1c0 net/socket.c:2530 __sys_sendmmsg+0x18d/0x460 net/socket.c:2616 __do_sys_sendmmsg net/socket.c:2645 [inline] __se_sys_sendmmsg net/socket.c:2642 [inline] __x64_sys_sendmmsg+0x9d/0x110 net/socket.c:2642 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f5920a75e7d RSP: 002b:00007f59201e8028 EFLAGS: 00000246 ORIG_RAX: 0000000000000133 RAX: ffffffffffffffda RBX: 00007f5920bb4f80 RCX: 00007f5920a75e7d RDX: 0000000000000001 RSI: 0000000020002940 RDI: 0000000000000005 RBP: 00007f5920ae7593 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000020004050 R11: 0000000000000246 R12: 0000000000000000 R13: 000000000000000b R14: 00007f5920bb4f80 R15: 00007f59201c8000 </TASK> In the error path, tcp_sendmsg_fastopen() ends-up calling mptcp_disconnect(), and the latter tries to close each subflow, acquiring the socket lock on each of them. At fastopen time, we have a single subflow, and such subflow socket lock is already held by the called, causing the deadlock. We already track the 'fastopen in progress' status inside the msk socket. Use it to address the issue, making mptcp_disconnect() a no op when invoked from the fastopen (error) path and doing the relevant cleanup after releasing the subflow socket lock. While at the above, rename the fastopen status bit to something more meaningful. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/321 Fixes: fa9e5746 ("mptcp: fix abba deadlock on fastopen") Reported-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Yinjun Zhang authored
The callback `.ndo_set_rx_mode` is called in atomic context, sleep is not allowed in the implementation. Now use workqueue mechanism to avoid this issue. Fixes: de624864 ("nfp: add support for multicast filter") Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Reviewed-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20221220152100.1042774-1-simon.horman@corigine.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Ronak Doshi authored
Commit dacce2be ("vmxnet3: add geneve and vxlan tunnel offload support") added support for encapsulation offload. However, the pathc did not report correctly the csum_level for encapsulated packet. This patch fixes this issue by reporting correct csum level for the encapsulated packet. Fixes: dacce2be ("vmxnet3: add geneve and vxlan tunnel offload support") Signed-off-by: Ronak Doshi <doshir@vmware.com> Acked-by: Peng Li <lpeng@vmware.com> Link: https://lore.kernel.org/r/20221220202556.24421-1-doshir@vmware.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Aaron Conole authored
A recent commit introducing upcall packet accounting failed to properly release the vport object when the per-cpu stats struct couldn't be allocated. This can cause dangling pointers to dp objects long after they've been released. Cc: wangchuanlei <wangchuanlei@inspur.com> Fixes: 1933ea36 ("net: openvswitch: Add support to count upcall packets") Reported-by: syzbot+8f4e2dcfcb3209ac35f9@syzkaller.appspotmail.com Signed-off-by: Aaron Conole <aconole@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://lore.kernel.org/r/20221220212717.526780-1-aconole@redhat.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Antoine Tenart authored
Multicast packets received on an interface bound to a VRF are marked as belonging to the VRF and the skb device is updated to point to the VRF device itself. This was fine even when a route was associated to a device as when performing a fib table lookup 'oif' in fib6_table_lookup (coming from 'skb->dev->ifindex' in ip6_route_input) was set to 0 when FLOWI_FLAG_SKIP_NH_OIF was set. With commit 40867d74 ("net: Add l3mdev index to flow struct and avoid oif reset for port devices") this is not longer true and multicast traffic is not received on the original interface. Instead of adding back a similar check in fib6_table_lookup determine the dst using the original ifindex for multicast VRF traffic. To make things consistent across the function do the above for all strict packets, which was the logic before commit 6f12fa77 ("vrf: mark skb for multicast or link-local as enslaved to VRF"). Note that reverting to this behavior should be fine as the change was about marking packets belonging to the VRF, not about their dst. Fixes: 40867d74 ("net: Add l3mdev index to flow struct and avoid oif reset for port devices") Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Antoine Tenart <atenart@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20221220171825.1172237-1-atenart@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Maciej Fijalkowski authored
Previously ice XDP xmit routine was changed in a way that it avoids xdp_buff->xdp_frame conversion as it is simply not needed for handling XDP_TX action and what is more it saves us CPU cycles. This routine is re-used on ZC driver to handle XDP_TX action. Although for XDP_TX on Rx ZC xdp_buff that comes from xsk_buff_pool is converted to xdp_frame, xdp_frame itself is not stored inside ice_tx_buf, we only store raw data pointer. Casting this pointer to xdp_frame and calling against it xdp_return_frame in ice_clean_xdp_tx_buf() results in undefined behavior. To fix this, simply call page_frag_free() on tx_buf->raw_buf. Later intention is to remove the buff->frame conversion in order to simplify the codebase and improve XDP_TX performance on ZC. Fixes: 126cdfe1 ("ice: xsk: Improve AF_XDP ZC Tx and use batching API") Reported-and-tested-by: Robin Cowley <robin.cowley@thehutgroup.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Piotr Raczynski <piotr.raczynski@.intel.com> Link: https://lore.kernel.org/r/20221220175448.693999-1-anthony.l.nguyen@intel.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wirelessJakub Kicinski authored
Kalle Valo says: ==================== wireless fixes for v6.2 First set of fixes for v6.2. Fix for a link error in mt76, fix for an iwlwifi firmware crash and two cleanups. * tag 'wireless-2022-12-21' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: ath9k: use proper statements in conditionals wifi: mt76: mt7996: select CONFIG_RELAY wifi: iwlwifi: fw: skip PPAG for JF wifi: ti: remove obsolete lines in the Makefile ==================== Link: https://lore.kernel.org/r/20221221180808.96A8AC433EF@smtp.kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 21 Dec, 2022 5 commits
-
-
Martin KaFai Lau authored
When the bpf_skb_adjust_room() shrinks the skb such that its csum_start is invalid, the skb->ip_summed should be reset from CHECKSUM_PARTIAL to CHECKSUM_NONE. The commit 54c3f1a8 ("bpf: pull before calling skb_postpull_rcsum()") fixed it. This patch adds a test to ensure the skb->ip_summed changed from CHECKSUM_PARTIAL to CHECKSUM_NONE after bpf_skb_adjust_room(). Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20221221185653.1589961-1-martin.lau@linux.dev
-
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds authored
Pull networking fixes from Jakub Kicinski: "Including fixes from bpf, netfilter and can. Current release - regressions: - bpf: synchronize dispatcher update with bpf_dispatcher_xdp_func - rxrpc: - fix security setting propagation - fix null-deref in rxrpc_unuse_local() - fix switched parameters in peer tracing Current release - new code bugs: - rxrpc: - fix I/O thread startup getting skipped - fix locking issues in rxrpc_put_peer_locked() - fix I/O thread stop - fix uninitialised variable in rxperf server - fix the return value of rxrpc_new_incoming_call() - microchip: vcap: fix initialization of value and mask - nfp: fix unaligned io read of capabilities word Previous releases - regressions: - stop in-kernel socket users from corrupting socket's task_frag - stream: purge sk_error_queue in sk_stream_kill_queues() - openvswitch: fix flow lookup to use unmasked key - dsa: mv88e6xxx: avoid reg_lock deadlock in mv88e6xxx_setup_port() - devlink: - hold region lock when flushing snapshots - protect devlink dump by the instance lock Previous releases - always broken: - bpf: - prevent leak of lsm program after failed attach - resolve fext program type when checking map compatibility - skbuff: account for tail adjustment during pull operations - macsec: fix net device access prior to holding a lock - bonding: switch back when high prio link up - netfilter: flowtable: really fix NAT IPv6 offload - enetc: avoid buffer leaks on xdp_do_redirect() failure - unix: fix race in SOCK_SEQPACKET's unix_dgram_sendmsg() - dsa: microchip: remove IRQF_TRIGGER_FALLING in request_threaded_irq" * tag 'net-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (64 commits) net: fec: check the return value of build_skb() net: simplify sk_page_frag Treewide: Stop corrupting socket's task_frag net: Introduce sk_use_task_frag in struct sock. mctp: Remove device type check at unregister net: dsa: microchip: remove IRQF_TRIGGER_FALLING in request_threaded_irq can: kvaser_usb: hydra: help gcc-13 to figure out cmd_len can: flexcan: avoid unbalanced pm_runtime_enable warning Documentation: devlink: add missing toc entry for etas_es58x devlink doc mctp: serial: Fix starting value for frame check sequence nfp: fix unaligned io read of capabilities word net: stream: purge sk_error_queue in sk_stream_kill_queues() myri10ge: Fix an error handling path in myri10ge_probe() net: microchip: vcap: Fix initialization of value and mask rxrpc: Fix the return value of rxrpc_new_incoming_call() rxrpc: rxperf: Fix uninitialised variable rxrpc: Fix I/O thread stop rxrpc: Fix switched parameters in peer tracing rxrpc: Fix locking issues in rxrpc_put_peer_locked() rxrpc: Fix I/O thread startup getting skipped ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmappingLinus Torvalds authored
Pull vfsuid cleanup from Christian Brauner: "This moves the ima specific vfs{g,u}id_t comparison helpers out of the header and into the one file in ima where they are used. We shouldn't incentivize people to use them by placing them into the header. As discussed and suggested by Linus in [1] let's just define them locally in the one file in ima where they are used" Link: https://lore.kernel.org/lkml/CAHk-=wj4BpEwUd=OkTv1F9uykvSrsBNZJVHMp+p_+e2kiV71_A@mail.gmail.com [1] * tag 'fs.vfsuid.ima.v6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping: mnt_idmapping: move ima-only helpers to ima
-
git://git.kernel.org/pub/scm/linux/kernel/git/crng/randomLinus Torvalds authored
Pull more random number generator updates from Jason Donenfeld: "Two remaining changes that are now possible after you merged a few other trees: - #include <asm/archrandom.h> can be removed from random.h now, making the direct use of the arch_random_* API more of a private implementation detail between the archs and random.c, rather than something for general consumers. - Two additional uses of prandom_u32_max() snuck in during the initial phase of pulls, so these have been converted to get_random_u32_below(), and now the deprecated prandom_u32_max() alias -- which was just a wrapper around get_random_u32_below() -- can be removed. In addition, there is one fix: - Check efi_rt_services_supported() before attempting to use an EFI runtime function. This affected EFI systems that disable runtime services yet still boot via EFI (e.g. the reporter's Lenovo Thinkpad X13s laptop), as well systems where EFI runtime services have been forcibly disabled, such as on PREEMPT_RT. On those machines, a very early and hard to diagnose crash would happen, preventing boot" * tag 'random-6.2-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: prandom: remove prandom_u32_max() efi: random: fix NULL-deref when refreshing seed random: do not include <asm/archrandom.h> from random.h
-
Linus Torvalds authored
Merge tag 'rcu-urgent.2022.12.17a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull RCU fix from Paul McKenney: "This fixes a lockdep false positive in synchronize_rcu() that can otherwise occur during early boot. The fix simply avoids invoking lockdep if the scheduler has not yet been initialized, that is, during that portion of boot when interrupts are disabled" * tag 'rcu-urgent.2022.12.17a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: rcu: Don't assert interrupts enabled too early in boot
-
- 20 Dec, 2022 4 commits
-
-
Jakub Kicinski authored
Anand hit a BUG() when pulling off headers on egress to a SW tunnel. We get to skb_checksum_help() with an invalid checksum offset (commit d7ea0d9d ("net: remove two BUG() from skb_checksum_help()") converted those BUGs to WARN_ONs()). He points out oddness in how skb_postpull_rcsum() gets used. Indeed looks like we should pull before "postpull", otherwise the CHECKSUM_PARTIAL fixup from skb_postpull_rcsum() will not be able to do its job: if (skb->ip_summed == CHECKSUM_PARTIAL && skb_checksum_start_offset(skb) < 0) skb->ip_summed = CHECKSUM_NONE; Reported-by: Anand Parthasarathy <anpartha@meta.com> Fixes: 6578171a ("bpf: add bpf_skb_change_proto helper") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20221220004701.402165-1-kuba@kernel.orgSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
Wei Fang authored
The build_skb might return a null pointer but there is no check on the return value in the fec_enet_rx_queue(). So a null pointer dereference might occur. To avoid this, we check the return value of build_skb. If the return value is a null pointer, the driver will recycle the page and update the statistic of ndev. Then jump to rx_processing_done to clear the status flags of the BD so that the hardware can recycle the BD. Fixes: 95698ff6 ("net: fec: using page pool to manage RX buffers") Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Shenwei Wang <Shenwei.wang@nxp.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://lore.kernel.org/r/20221219022755.1047573-1-wei.fang@nxp.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommuLinus Torvalds authored
Pull m68knommu update from Greg Ungerer: "Only a single change to use the safer strscpy() instead of strncpy() when setting up the cmdline" * tag 'm68knommu-for-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu: m68k: use strscpy() to instead of strncpy()
-
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdxLinus Torvalds authored
Pull SPDX/License additions from Greg KH: "Here are two small updates for LICENSES and some kernel files that add the Copyleft-next license and use it in a SPDX tag as a dual-license for some kernel files. These have been discussed thoroughly in public on the linux-spdx mailing list, and have the needed acks on them, as well as having been in linux-next with no reported issues for quite some time" * tag 'spdx-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx: testing: use the copyleft-next-0.3.1 SPDX tag LICENSES: Add the copyleft-next-0.3.1 license
-