- 08 Nov, 2021 3 commits
-
-
John Fastabend authored
A socket in a sockmap may have different combinations of programs attached depending on configuration. There can be no programs in which case the socket acts as a sink only. There can be a TX program in this case a BPF program is attached to sending side, but no RX program is attached. There can be an RX program only where sends have no BPF program attached, but receives are hooked with BPF. And finally, both TX and RX programs may be attached. Giving us the permutations: None, Tx, Rx, and TxRx To date most of our use cases have been TX case being used as a fast datapath to directly copy between local application and a userspace proxy. Or Rx cases and TxRX applications that are operating an in kernel based proxy. The traffic in the first case where we hook applications into a userspace application looks like this: AppA redirect AppB Tx <-----------> Rx | | + + TCP <--> lo <--> TCP In this case all traffic from AppA (after 3whs) is copied into the AppB ingress queue and no traffic is ever on the TCP recieive_queue. In the second case the application never receives, except in some rare error cases, traffic on the actual user space socket. Instead the send happens in the kernel. AppProxy socket pool sk0 ------------->{sk1,sk2, skn} ^ | | | | v ingress lb egress TCP TCP Here because traffic is never read off the socket with userspace recv() APIs there is only ever one reader on the sk receive_queue. Namely the BPF programs. However, we've started to introduce a third configuration where the BPF program on receive should process the data, but then the normal case is to push the data into the receive queue of AppB. AppB recv() (userspace) ----------------------- tcp_bpf_recvmsg() (kernel) | | | | | | ingress_msgQ | | | RX_BPF | | | v v sk->receive_queue This is different from the App{A,B} redirect because traffic is first received on the sk->receive_queue. Now for the issue. The tcp_bpf_recvmsg() handler first checks the ingress_msg queue for any data handled by the BPF rx program and returned with PASS code so that it was enqueued on the ingress msg queue. Then if no data exists on that queue it checks the socket receive queue. Unfortunately, this is the same receive_queue the BPF program is reading data off of. So we get a race. Its possible for the recvmsg() hook to pull data off the receive_queue before the BPF hook has a chance to read it. It typically happens when an application is banging on recv() and getting EAGAINs. Until they manage to race with the RX BPF program. To fix this we note that before this patch at attach time when the socket is loaded into the map we check if it needs a TX program or just the base set of proto bpf hooks. Then it uses the above general RX hook regardless of if we have a BPF program attached at rx or not. This patch now extends this check to handle all cases enumerated above, TX, RX, TXRX, and none. And to fix above race when an RX program is attached we use a new hook that is nearly identical to the old one except now we do not let the recv() call skip the RX BPF program. Now only the BPF program pulls data from sk->receive_queue and recv() only pulls data from the ingress msgQ post BPF program handling. With this resolved our AppB from above has been up and running for many hours without detecting any errors. We do this by correlating counters in RX BPF events and the AppB to ensure data is never skipping the BPF program. Selftests, was not able to detect this because we only run them for a short period of time on well ordered send/recvs so we don't get any of the noise we see in real application environments. Fixes: 51199405 ("bpf: skb_verdict, support SK_PASS on RX BPF path") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Jussi Maki <joamaki@gmail.com> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20211103204736.248403-4-john.fastabend@gmail.com
-
John Fastabend authored
We do not need to handle unhash from BPF side we can simply wait for the close to happen. The original concern was a socket could transition from ESTABLISHED state to a new state while the BPF hook was still attached. But, we convinced ourself this is no longer possible and we also improved BPF sockmap to handle listen sockets so this is no longer a problem. More importantly though there are cases where unhash is called when data is in the receive queue. The BPF unhash logic will flush this data which is wrong. To be correct it should keep the data in the receive queue and allow a receiving application to continue reading the data. This may happen when tcp_abort() is received for example. Instead of complicating the logic in unhash simply moving all this to tcp_close() hook solves this. Fixes: 51199405 ("bpf: skb_verdict, support SK_PASS on RX BPF path") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Jussi Maki <joamaki@gmail.com> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20211103204736.248403-3-john.fastabend@gmail.com
-
John Fastabend authored
In order to fix an issue with sockets in TCP sockmap redirect cases we plan to allow CLOSE state sockets to exist in the sockmap. However, the check in bpf_sk_lookup_assign() currently only invalidates sockets in the TCP_ESTABLISHED case relying on the checks on sockmap insert to ensure we never SOCK_CLOSE state sockets in the map. To prepare for this change we flip the logic in bpf_sk_lookup_assign() to explicitly test for the accepted cases. Namely, a tcp socket in TCP_LISTEN or a udp socket in TCP_CLOSE state. This also makes the code more resilent to future changes. Suggested-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20211103204736.248403-2-john.fastabend@gmail.com
-
- 06 Nov, 2021 4 commits
-
-
Alexei Starovoitov authored
Martin KaFai says: ==================== This set fixes an out-of-bound access issue when jit-ing the bpf_pseudo_func insn (i.e. ld_imm64 with src_reg == BPF_PSEUDO_FUNC) ==================== Reported-by: Yonatan Komornik <yoniko@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-
Martin KaFai Lau authored
This patch adds a test to trigger the DCE to remove the whole subprog to ensure the verifier does not depend on a stable subprog index. The DCE is done by testing a global const. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211106014020.651638-1-kafai@fb.com
-
Martin KaFai Lau authored
This patch is to fix an out-of-bound access issue when jit-ing the bpf_pseudo_func insn (i.e. ld_imm64 with src_reg == BPF_PSEUDO_FUNC) In jit_subprog(), it currently reuses the subprog index cached in insn[1].imm. This subprog index is an index into a few array related to subprogs. For example, in jit_subprog(), it is an index to the newly allocated 'struct bpf_prog **func' array. The subprog index was cached in insn[1].imm after add_subprog(). However, this could become outdated (and too big in this case) if some subprogs are completely removed during dead code elimination (in adjust_subprog_starts_after_remove). The cached index in insn[1].imm is not updated accordingly and causing out-of-bound issue in the later jit_subprog(). Unlike bpf_pseudo_'func' insn, the current bpf_pseudo_'call' insn is handling the DCE properly by calling find_subprog(insn->imm) to figure out the index instead of caching the subprog index. The existing bpf_adj_branches() will adjust the insn->imm whenever insn is added or removed. Instead of having two ways handling subprog index, this patch is to make bpf_pseudo_func works more like bpf_pseudo_call. First change is to stop caching the subprog index result in insn[1].imm after add_subprog(). The verification process will use find_subprog(insn->imm) to figure out the subprog index. Second change is in bpf_adj_branches() and have it to adjust the insn->imm for the bpf_pseudo_func insn also whenever insn is added or removed. Third change is in jit_subprog(). Like the bpf_pseudo_call handling, bpf_pseudo_func temporarily stores the find_subprog() result in insn->off. It is fine because the prog's insn has been finalized at this point. insn->off will be reset back to 0 later to avoid confusing the userspace prog dump tool. Fixes: 69c087ba ("bpf: Add bpf_for_each_map_elem() helper") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211106014014.651018-1-kafai@fb.com
-
Nghia Le authored
The newinet value is initialized with inet_sk() in a block code to handle sockets for the ETH_P_IP protocol. Along this code path, newinet is never read. Thus, assignment to newinet is needless and can be removed. Signed-off-by: Nghia Le <nghialm78@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20211104143740.32446-1-nghialm78@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 05 Nov, 2021 25 commits
-
-
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfJakub Kicinski authored
Daniel Borkmann says: ==================== pull-request: bpf 2021-11-05 We've added 15 non-merge commits during the last 3 day(s) which contain a total of 14 files changed, 199 insertions(+), 90 deletions(-). The main changes are: 1) Fix regression from stack spill/fill of <8 byte scalars, from Martin KaFai Lau. 2) Fix perf's build of bpftool's bootstrap version due to missing libbpf headers, from Quentin Monnet. 3) Fix riscv{32,64} BPF exception tables build errors and warnings, from Björn Töpel. 4) Fix bpf fs to allow RENAME_EXCHANGE support for atomic upgrades on sk_lookup control planes, from Lorenz Bauer. 5) Fix libbpf's error reporting in bpf_map_lookup_and_delete_elem_flags() due to missing libbpf_err_errno(), from Mehrdad Arshad Rad. 6) Various fixes to make xdp_redirect_multi selftest more reliable, from Hangbin Liu. 7) Fix netcnt selftest to make it run serial and thus avoid conflicts with other cgroup/skb selftests run in parallel that could cause flakes, from Andrii Nakryiko. 8) Fix reuseport_bpf_numa networking selftest to skip unavailable NUMA nodes, from Kleber Sacilotto de Souza. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: riscv, bpf: Fix RV32 broken build, and silence RV64 warning selftests/bpf/xdp_redirect_multi: Limit the tests in netns selftests/bpf/xdp_redirect_multi: Give tcpdump a chance to terminate cleanly selftests/bpf/xdp_redirect_multi: Use arping to accurate the arp number selftests/bpf/xdp_redirect_multi: Put the logs to tmp folder libbpf: Fix lookup_and_delete_elem_flags error reporting bpftool: Install libbpf headers for the bootstrap version, too selftests/net: Fix reuseport_bpf_numa by skipping unavailable nodes selftests/bpf: Verifier test on refill from a smaller spill bpf: Do not reject when the stack read size is different from the tracked scalar size selftests/bpf: Make netcnt selftests serial to avoid spurious failures selftests/bpf: Test RENAME_EXCHANGE and RENAME_NOREPLACE on bpffs selftests/bpf: Convert test_bpffs to ASSERT macros libfs: Support RENAME_EXCHANGE in simple_rename() libfs: Move shmem_exchange to simple_rename_exchange ==================== Link: https://lore.kernel.org/r/20211105165803.29372-1-daniel@iogearbox.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Björn Töpel authored
Commit 252c765b ("riscv, bpf: Add BPF exception tables") only addressed RV64, and broke the RV32 build [1]. Fix by gating the exception tables code with CONFIG_ARCH_RV64I. Further, silence a "-Wmissing-prototypes" warning [2] in the RV64 BPF JIT. [1] https://lore.kernel.org/llvm/202111020610.9oy9Rr0G-lkp@intel.com/ [2] https://lore.kernel.org/llvm/202110290334.2zdMyRq4-lkp@intel.com/ Fixes: 252c765b ("riscv, bpf: Add BPF exception tables") Signed-off-by: Björn Töpel <bjorn@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Tong Tiangen <tongtiangen@huawei.com> Link: https://lore.kernel.org/bpf/20211103115453.397209-1-bjorn@kernel.org
-
Hangbin Liu authored
As I want to test both DEVMAP and DEVMAP_HASH in XDP multicast redirect, I limited DEVMAP max entries to a small value for performace. When the test runs after amount of interface creating/deleting tests. The interface index will exceed the map max entries and xdp_redirect_multi will error out with "Get interfacesInterface index to large". Fix this issue by limit the tests in netns and specify the ifindex when creating interfaces. Fixes: d2329247 ("selftests/bpf: Add xdp_redirect_multi test") Reported-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20211027033553.962413-5-liuhangbin@gmail.com
-
Hangbin Liu authored
No need to kill tcpdump with -9. Fixes: d2329247 ("selftests/bpf: Add xdp_redirect_multi test") Suggested-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20211027033553.962413-4-liuhangbin@gmail.com
-
Hangbin Liu authored
The arp request number triggered by ping none exist address is not accurate, which may lead the test false negative/positive. Change to use arping to accurate the arp number. Also do not use grep pattern match for dot. Fixes: d2329247 ("selftests/bpf: Add xdp_redirect_multi test") Suggested-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20211027033553.962413-3-liuhangbin@gmail.com
-
Hangbin Liu authored
The xdp_redirect_multi test logs are created in selftest folder and not cleaned after test. Let's creat a tmp dir and remove the logs after testing. Fixes: d2329247 ("selftests/bpf: Add xdp_redirect_multi test") Suggested-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20211027033553.962413-2-liuhangbin@gmail.com
-
Mehrdad Arshad Rad authored
Fix bpf_map_lookup_and_delete_elem_flags() to pass the return code through libbpf_err_errno() as we do similarly in bpf_map_lookup_and_delete_elem(). Fixes: f12b6543 ("libbpf: Streamline error reporting for low-level APIs") Signed-off-by: Mehrdad Arshad Rad <arshad.rad@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20211104171354.11072-1-arshad.rad@gmail.com
-
Quentin Monnet authored
We recently changed bpftool's Makefile to make it install libbpf's headers locally instead of pulling them from the source directory of the library. Although bpftool needs two versions of libbpf, a "regular" one and a "bootstrap" version, we would only install headers for the regular libbpf build. Given that this build always occurs before the bootstrap build when building bpftool, this is enough to ensure that the bootstrap bpftool will have access to the headers exported through the regular libbpf build. However, this did not account for the case when we only want the bootstrap version of bpftool, through the "bootstrap" target. For example, perf needs the bootstrap version only, to generate BPF skeletons. In that case, when are the headers installed? For some time, the issue has been masked, because we had a step (the installation of headers internal to libbpf) which would depend on the regular build of libbpf and hence trigger the export of the headers, just for the sake of creating a directory. But this changed with commit 8b6c4624 ("bpftool: Remove Makefile dep. on $(LIBBPF) for $(LIBBPF_INTERNAL_HDRS)"), where we cleaned up that stage and removed the dependency on the regular libbpf build. As a result, when we only want the bootstrap bpftool version, the regular libbpf is no longer built. The bootstrap libbpf version is built, but headers are not exported, and the bootstrap bpftool build fails because of the missing headers. To fix this, we also install the library headers for the bootstrap version of libbpf, to use them for the bootstrap bpftool and for generating the skeletons. Fixes: f012ade1 ("bpftool: Install libbpf headers instead of including the dir") Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/bpf/20211105015813.6171-1-quentin@isovalent.com
-
Volodymyr Mytnyk authored
fix the remaining build issues reported by patchwork in firmware v4.0 support commit which has been already merged. Fix patchwork issues: - source inline - checkpatch Fixes: bb5dbf2c ("net: marvell: prestera: add firmware v4.0 support") Signed-off-by: Volodymyr Mytnyk <vmytnyk@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Zhang Mingyu authored
'net/protocol.h' included in 'drivers/net/amt.c' is duplicated. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Zhang Mingyu <zhang.mingyu@zte.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arnd Bergmann authored
The mii ioctls are now handled by the ndo_eth_ioctl() callback, not the old ndo_do_ioctl(), but octeontx2-nicvf introduced the function for the old way. Move it over to ndo_eth_ioctl() to actually allow calling it from user space. Fixes: 43510ef4 ("octeontx2-nicvf: Add PTP hardware clock support to NIX VF") Fixes: a7605370 ("dev_ioctl: split out ndo_eth_ioctl") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arnd Bergmann authored
The timestamp ioctls are now handled by the ndo_eth_ioctl() callback, not the old ndo_do_ioctl(), but oax88796 introduced the function for the old way. Move it over to ndo_eth_ioctl() to actually allow calling it from user space. Fixes: a97c69ba ("net: ax88796c: ASIX AX88796C SPI Ethernet Adapter Driver") Fixes: a7605370 ("dev_ioctl: split out ndo_eth_ioctl") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Lukasz Stelmach <l.stelmach@samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yang Li authored
Eliminate the following coccicheck warning: ./drivers/net/amt.c:2795:6-9: ERROR: amt is NULL but dereferenced. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
Build bot says: >> drivers/net/ethernet/asix/ax88796c_main.c:1116:34: warning: unused variable 'ax88796c_dt_ids' [-Wunused-const-variable] static const struct of_device_id ax88796c_dt_ids[] = { ^ The only reference to this array is wrapped in of_match_ptr(). Reported-by: kernel test robot <lkp@intel.com> Fixes: a97c69ba ("net: ax88796c: ASIX AX88796C SPI Ethernet Adapter Driver") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Menglong Dong authored
udp_mem is a vector of 3 INTEGERs, which is used to limit the number of pages allowed for queueing by all UDP sockets. However, sk_has_memory_pressure() in __sk_mem_raise_allocated() always return false for udp, as memory pressure is not supported by udp, which means that __sk_mem_raise_allocated() will fail once pages allocated for udp socket exceeds udp_mem[0]. Therefor, udp_mem[0] is the only one that limit the number of pages. However, the document of udp_mem just express that udp_mem[2] is the limitation. So, just fix it. Signed-off-by: Menglong Dong <imagedong@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Xu Wang authored
The print function dev_err() is redundant because platform_get_irq() already prints an error. Signed-off-by: Xu Wang <vulab@iscas.ac.cn> Reviewed-by: Cai Huoqing <caihuoqing@baidu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arnd Bergmann authored
The octeontx2 pf nic driver failsz to link when the devlink support is not reachable: aarch64-linux-ld: drivers/net/ethernet/marvell/octeontx2/nic/otx2_devlink.o: in function `otx2_dl_mcam_count_get': otx2_devlink.c:(.text+0x10): undefined reference to `devlink_priv' aarch64-linux-ld: drivers/net/ethernet/marvell/octeontx2/nic/otx2_devlink.o: in function `otx2_dl_mcam_count_validate': otx2_devlink.c:(.text+0x50): undefined reference to `devlink_priv' aarch64-linux-ld: drivers/net/ethernet/marvell/octeontx2/nic/otx2_devlink.o: in function `otx2_dl_mcam_count_set': otx2_devlink.c:(.text+0xd0): undefined reference to `devlink_priv' aarch64-linux-ld: drivers/net/ethernet/marvell/octeontx2/nic/otx2_devlink.o: in function `otx2_devlink_info_get': otx2_devlink.c:(.text+0x150): undefined reference to `devlink_priv' This is already selected by the admin function driver, but not the actual nic, which might be built-in when the af driver is not. Fixes: 2da48943 ("octeontx2-pf: devlink params support to set mcam entry count") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yang Guang authored
Use the macro 'swap()' defined in 'include/linux/minmax.h' to avoid opencoding it. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Yang Guang <yang.guang5@zte.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yang Guang authored
Use the macro 'swap()' defined in 'include/linux/minmax.h' to avoid opencoding it. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Yang Guang <yang.guang5@zte.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
-
luo penghao authored
The assignment of err will be overwritten next, so this statement should be deleted. The clang_analyzer complains as follows: drivers/net/ethernet/broadcom/tg3.c:5506:2: warning: Value stored to 'expected_sg_dig_ctrl' is never read Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: luo penghao <luo.penghao@zte.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tony Lu authored
This makes the output of smcr_link_down tracepoint easier to use and understand without additional translating function's pointer address. It prints the function name with offset: <idle>-0 [000] ..s. 69.087164: smcr_link_down: lnk=00000000dab41cdc lgr=000000007d5d8e24 state=0 rc=1 dev=mlx5_0 location=smc_wr_tx_tasklet_fn+0x5ef/0x6f0 [smc] Link: https://lore.kernel.org/netdev/11f17a34-fd35-f2ec-3f20-dd0c34e55fde@linux.ibm.com/Signed-off-by: Tony Lu <tonylu@linux.alibaba.com> Reviewed-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Huang Guobin authored
When I do fuzz test for bonding device interface, I got the following use-after-free Calltrace: ================================================================== BUG: KASAN: use-after-free in bond_enslave+0x1521/0x24f0 Read of size 8 at addr ffff88825bc11c00 by task ifenslave/7365 CPU: 5 PID: 7365 Comm: ifenslave Tainted: G E 5.15.0-rc1+ #13 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014 Call Trace: dump_stack_lvl+0x6c/0x8b print_address_description.constprop.0+0x48/0x70 kasan_report.cold+0x82/0xdb __asan_load8+0x69/0x90 bond_enslave+0x1521/0x24f0 bond_do_ioctl+0x3e0/0x450 dev_ifsioc+0x2ba/0x970 dev_ioctl+0x112/0x710 sock_do_ioctl+0x118/0x1b0 sock_ioctl+0x2e0/0x490 __x64_sys_ioctl+0x118/0x150 do_syscall_64+0x35/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f19159cf577 Code: b3 66 90 48 8b 05 11 89 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 78 RSP: 002b:00007ffeb3083c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007ffeb3084bca RCX: 00007f19159cf577 RDX: 00007ffeb3083ce0 RSI: 0000000000008990 RDI: 0000000000000003 RBP: 00007ffeb3084bc4 R08: 0000000000000040 R09: 0000000000000000 R10: 00007ffeb3084bc0 R11: 0000000000000246 R12: 00007ffeb3083ce0 R13: 0000000000000000 R14: 0000000000000000 R15: 00007ffeb3083cb0 Allocated by task 7365: kasan_save_stack+0x23/0x50 __kasan_kmalloc+0x83/0xa0 kmem_cache_alloc_trace+0x22e/0x470 bond_enslave+0x2e1/0x24f0 bond_do_ioctl+0x3e0/0x450 dev_ifsioc+0x2ba/0x970 dev_ioctl+0x112/0x710 sock_do_ioctl+0x118/0x1b0 sock_ioctl+0x2e0/0x490 __x64_sys_ioctl+0x118/0x150 do_syscall_64+0x35/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xae Freed by task 7365: kasan_save_stack+0x23/0x50 kasan_set_track+0x20/0x30 kasan_set_free_info+0x24/0x40 __kasan_slab_free+0xf2/0x130 kfree+0xd1/0x5c0 slave_kobj_release+0x61/0x90 kobject_put+0x102/0x180 bond_sysfs_slave_add+0x7a/0xa0 bond_enslave+0x11b6/0x24f0 bond_do_ioctl+0x3e0/0x450 dev_ifsioc+0x2ba/0x970 dev_ioctl+0x112/0x710 sock_do_ioctl+0x118/0x1b0 sock_ioctl+0x2e0/0x490 __x64_sys_ioctl+0x118/0x150 do_syscall_64+0x35/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xae Last potentially related work creation: kasan_save_stack+0x23/0x50 kasan_record_aux_stack+0xb7/0xd0 insert_work+0x43/0x190 __queue_work+0x2e3/0x970 delayed_work_timer_fn+0x3e/0x50 call_timer_fn+0x148/0x470 run_timer_softirq+0x8a8/0xc50 __do_softirq+0x107/0x55f Second to last potentially related work creation: kasan_save_stack+0x23/0x50 kasan_record_aux_stack+0xb7/0xd0 insert_work+0x43/0x190 __queue_work+0x2e3/0x970 __queue_delayed_work+0x130/0x180 queue_delayed_work_on+0xa7/0xb0 bond_enslave+0xe25/0x24f0 bond_do_ioctl+0x3e0/0x450 dev_ifsioc+0x2ba/0x970 dev_ioctl+0x112/0x710 sock_do_ioctl+0x118/0x1b0 sock_ioctl+0x2e0/0x490 __x64_sys_ioctl+0x118/0x150 do_syscall_64+0x35/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xae The buggy address belongs to the object at ffff88825bc11c00 which belongs to the cache kmalloc-1k of size 1024 The buggy address is located 0 bytes inside of 1024-byte region [ffff88825bc11c00, ffff88825bc12000) The buggy address belongs to the page: page:ffffea00096f0400 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x25bc10 head:ffffea00096f0400 order:3 compound_mapcount:0 compound_pincount:0 flags: 0x57ff00000010200(slab|head|node=1|zone=2|lastcpupid=0x7ff) raw: 057ff00000010200 ffffea0009a71c08 ffff888240001968 ffff88810004dbc0 raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88825bc11b00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88825bc11b80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff88825bc11c00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff88825bc11c80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88825bc11d00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== Put new_slave in bond_sysfs_slave_add() will cause use-after-free problems when new_slave is accessed in the subsequent error handling process. Since new_slave will be put in the subsequent error handling process, remove the unnecessary put to fix it. In addition, when sysfs_create_file() fails, if some files have been crea- ted successfully, we need to call sysfs_remove_file() to remove them. Since there are sysfs_create_files() & sysfs_remove_files() can be used, use these two functions instead. Fixes: 7afcaec4 (bonding: use kobject_put instead of _del after kobject_add) Signed-off-by: Huang Guobin <huangguobin4@huawei.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
Eugene Syromiatnikov says: ==================== MCTP sockaddr padding check/initialisation fixup This pair of patches introduces checks for padding fields of struct sockaddr_mctp/sockaddr_mctp_ext to ease their re-use for possible extensions in the future; as well as zeroing of these fields in the respective sockaddr filling routines. While the first commit is definitely an ABI breakage, it is proposed in hopes that the change is made soon enough (the interface appeared only in Linux 5.15) to avoid affecting any existing user space. ==================== Link: https://lore.kernel.org/r/cover.1635965993.git.esyr@redhat.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Eugene Syromiatnikov authored
struct sockaddr_mctp_ext.__smctp_paddin0 has to be checked for being set to zero, otherwise it cannot be utilised in the future. Fixes: 99ce45d5 ("mctp: Implement extended addressing") Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com> Acked-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Eugene Syromiatnikov authored
In order to have the padding fields actually usable in the future, there have to be checks that user space doesn't supply non-zero garbage there. It is also worth setting these padding fields to zero, unless it is known that they have been already zeroed. Cc: stable@vger.kernel.org # v5.15 Fixes: 5a20dd46 ("mctp: Be explicit about struct sockaddr_mctp padding") Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com> Acked-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 04 Nov, 2021 7 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queueJakub Kicinski authored
Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2021-11-03 Brett fixes issues with promiscuous mode settings not being properly enabled and removes setting of VF antispoof along with promiscuous mode. He also ensures that VF Tx queues are always disabled and resolves a race between virtchnl handling and VF related ndo ops. Sylwester fixes an issue where a VF MAC could not be set to its primary MAC if the address is already present. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: ice: Fix race conditions between virtchnl handling and VF ndo ops ice: Fix not stopping Tx queues for VFs ice: Fix replacing VF hardware MAC to existing MAC filter ice: Remove toggling of antispoof for VF trusted promiscuous mode ice: Fix VF true promiscuous mode ==================== Link: https://lore.kernel.org/r/20211103161935.2997369-1-anthony.l.nguyen@intel.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Heiner Kallweit authored
As reported by Zhang there's a small issue if in forced mode the duplex mode changes with the link staying up [0]. In this case the MAC isn't notified about the change. The proposed patch relies on the phylib state machine and ignores the fact that there are drivers that uses phylib but not the phylib state machine. So let's don't change the behavior for such drivers and fix it w/o re-adding state PHY_FORCING for the case that phylib state machine is used. [0] https://lore.kernel.org/netdev/a5c26ffd-4ee4-a5e6-4103-873208ce0dc5@huawei.com/T/ Fixes: 2bd229df ("net: phy: remove state PHY_FORCING") Reported-by: Zhang Changzhong <zhangchangzhong@huawei.com> Tested-by: Zhang Changzhong <zhangchangzhong@huawei.com> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/7b8b9456-a93f-abbc-1dc5-a2c2542f932c@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Guo Zhengkui authored
Fix following coccicheck warning: ./net/core/devlink.c:69:6-10: WARNING use flexible-array member instead Signed-off-by: Guo Zhengkui <guozhengkui@vivo.com> Link: https://lore.kernel.org/r/20211103121607.27490-1-guozhengkui@vivo.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Kleber Sacilotto de Souza authored
In some platforms the numa node numbers are not necessarily consecutive, meaning that not all nodes from 0 to the value returned by numa_max_node() are available on the system. Using node numbers which are not available results on errors from libnuma such as: ---- IPv4 UDP ---- send node 0, receive socket 0 libnuma: Warning: Cannot read node cpumask from sysfs ./reuseport_bpf_numa: failed to pin to node: No such file or directory Fix it by checking if the node number bit is set on numa_nodes_ptr, which is defined on libnuma as "Set with all nodes the kernel has exposed to userspace". Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20211101145317.286118-1-kleber.souza@canonical.com
-
Eric Dumazet authored
Sanity check in sock_reserve_memory() was not enough to prevent malicious user to trigger a NULL deref. In this case, the isse is that sk_prot->memory_allocated is NULL. Use standard sk_has_account() helper to deal with this. BUG: KASAN: null-ptr-deref in instrument_atomic_read_write include/linux/instrumented.h:101 [inline] BUG: KASAN: null-ptr-deref in atomic_long_add_return include/linux/atomic/atomic-instrumented.h:1218 [inline] BUG: KASAN: null-ptr-deref in sk_memory_allocated_add include/net/sock.h:1371 [inline] BUG: KASAN: null-ptr-deref in sock_reserve_memory net/core/sock.c:994 [inline] BUG: KASAN: null-ptr-deref in sock_setsockopt+0x22ab/0x2b30 net/core/sock.c:1443 Write of size 8 at addr 0000000000000000 by task syz-executor.0/11270 CPU: 1 PID: 11270 Comm: syz-executor.0 Not tainted 5.15.0-syzkaller #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 __kasan_report mm/kasan/report.c:446 [inline] kasan_report.cold+0x66/0xdf mm/kasan/report.c:459 check_region_inline mm/kasan/generic.c:183 [inline] kasan_check_range+0x13d/0x180 mm/kasan/generic.c:189 instrument_atomic_read_write include/linux/instrumented.h:101 [inline] atomic_long_add_return include/linux/atomic/atomic-instrumented.h:1218 [inline] sk_memory_allocated_add include/net/sock.h:1371 [inline] sock_reserve_memory net/core/sock.c:994 [inline] sock_setsockopt+0x22ab/0x2b30 net/core/sock.c:1443 __sys_setsockopt+0x4f8/0x610 net/socket.c:2172 __do_sys_setsockopt net/socket.c:2187 [inline] __se_sys_setsockopt net/socket.c:2184 [inline] __x64_sys_setsockopt+0xba/0x150 net/socket.c:2184 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f56076d5ae9 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f5604c4b188 EFLAGS: 00000246 ORIG_RAX: 0000000000000036 RAX: ffffffffffffffda RBX: 00007f56077e8f60 RCX: 00007f56076d5ae9 RDX: 0000000000000049 RSI: 0000000000000001 RDI: 0000000000000003 RBP: 00007f560772ff25 R08: 000000000000fec7 R09: 0000000000000000 R10: 0000000020000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007fffb61a100f R14: 00007f5604c4b300 R15: 0000000000022000 </TASK> Fixes: 2bb2f5fb ("net: add new socket option SO_RESERVE_MEM") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Acked-by: Wei Wang <weiwan@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Leonard Crestez authored
Extending these flags using the existing (1 << x) pattern triggers complaints from checkpatch. Instead of ignoring checkpatch modify the existing values to use BIT(x) style in a separate commit. Signed-off-by: Leonard Crestez <cdleonard@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Andrea Righi authored
Explicitly pass -6 to netcat when the test is using IPv6 to prevent failures. Also make sure to pass "-N" to netcat to close the socket after EOF on the client side, otherwise we would always hit the timeout and the test would fail. Without this fix applied: TEST: GREv6/v4 - copy file w/ TSO [FAIL] TEST: GREv6/v4 - copy file w/ GSO [FAIL] TEST: GREv6/v6 - copy file w/ TSO [FAIL] TEST: GREv6/v6 - copy file w/ GSO [FAIL] With this fix applied: TEST: GREv6/v4 - copy file w/ TSO [ OK ] TEST: GREv6/v4 - copy file w/ GSO [ OK ] TEST: GREv6/v6 - copy file w/ TSO [ OK ] TEST: GREv6/v6 - copy file w/ GSO [ OK ] Fixes: 025efa0a ("selftests: add simple GSO GRE test") Signed-off-by: Andrea Righi <andrea.righi@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 03 Nov, 2021 1 commit
-
-
Brett Creeley authored
The VF can be configured via the PF's ndo ops at the same time the PF is receiving/handling virtchnl messages. This has many issues, with one of them being the ndo op could be actively resetting a VF (i.e. resetting it to the default state and deleting/re-adding the VF's VSI) while a virtchnl message is being handled. The following error was seen because a VF ndo op was used to change a VF's trust setting while the VIRTCHNL_OP_CONFIG_VSI_QUEUES was ongoing: [35274.192484] ice 0000:88:00.0: Failed to set LAN Tx queue context, error: ICE_ERR_PARAM [35274.193074] ice 0000:88:00.0: VF 0 failed opcode 6, retval: -5 [35274.193640] iavf 0000:88:01.0: PF returned error -5 (IAVF_ERR_PARAM) to our request 6 Fix this by making sure the virtchnl handling and VF ndo ops that trigger VF resets cannot run concurrently. This is done by adding a struct mutex cfg_lock to each VF structure. For VF ndo ops, the mutex will be locked around the critical operations and VFR. Since the ndo ops will trigger a VFR, the virtchnl thread will use mutex_trylock(). This is done because if any other thread (i.e. VF ndo op) has the mutex, then that means the current VF message being handled is no longer valid, so just ignore it. This issue can be seen using the following commands: for i in {0..50}; do rmmod ice modprobe ice sleep 1 echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs echo 1 > /sys/class/net/ens785f1/device/sriov_numvfs ip link set ens785f1 vf 0 trust on ip link set ens785f0 vf 0 trust on sleep 2 echo 0 > /sys/class/net/ens785f0/device/sriov_numvfs echo 0 > /sys/class/net/ens785f1/device/sriov_numvfs sleep 1 echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs echo 1 > /sys/class/net/ens785f1/device/sriov_numvfs ip link set ens785f1 vf 0 trust on ip link set ens785f0 vf 0 trust on done Fixes: 7c710869 ("ice: Add handlers for VF netdevice operations") Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-