- 17 Oct, 2024 11 commits
-
-
Andrii Nakryiko authored
>From memfd_secret(2) manpage: The memory areas backing the file created with memfd_secret(2) are visible only to the processes that have access to the file descriptor. The memory region is removed from the kernel page tables and only the page tables of the processes holding the file descriptor map the corresponding physical memory. (Thus, the pages in the region can't be accessed by the kernel itself, so that, for example, pointers to the region can't be passed to system calls.) We need to handle this special case gracefully in build ID fetching code. Return -EFAULT whenever secretmem file is passed to build_id_parse() family of APIs. Original report and repro can be found in [0]. [0] https://lore.kernel.org/bpf/ZwyG8Uro%2FSyTXAni@ly-workstation/ Fixes: de3ec364 ("lib/buildid: add single folio-based file reader abstraction") Reported-by: Yi Lai <yi1.lai@intel.com> Suggested-by: Shakeel Butt <shakeel.butt@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Link: https://lore.kernel.org/bpf/20241017175431.6183-A-hca@linux.ibm.com Link: https://lore.kernel.org/bpf/20241017174713.2157873-1-andrii@kernel.org
-
Daniel Borkmann authored
Add a small BPF verifier test case to ensure that alu32 additions to registers are not subject to linked scalar delta tracking. # ./vmtest.sh -- ./test_progs -t verifier_linked_scalars [...] ./test_progs -t verifier_linked_scalars [ 1.413138] tsc: Refined TSC clocksource calibration: 3407.993 MHz [ 1.413524] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x311fcd52370, max_idle_ns: 440795242006 ns [ 1.414223] clocksource: Switched to clocksource tsc [ 1.419640] bpf_testmod: loading out-of-tree module taints kernel. [ 1.420025] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel #500/1 verifier_linked_scalars/scalars: find linked scalars:OK #500 verifier_linked_scalars:OK Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED [ 1.590858] ACPI: PM: Preparing to enter system sleep state S5 [ 1.591402] reboot: Power down [...] Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20241016134913.32249-3-daniel@iogearbox.net
-
Daniel Borkmann authored
print_reg_state() should not consider adding reg->off to reg->var_off.value when dumping scalars. Scalars can be produced with reg->off != 0 through BPF_ADD_CONST, and thus as-is this can skew the register log dump. Fixes: 98d7ca37 ("bpf: Track delta between "linked" registers.") Reported-by: Nathaniel Theis <nathaniel.theis@nccgroup.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241016134913.32249-2-daniel@iogearbox.net
-
Daniel Borkmann authored
Nathaniel reported a bug in the linked scalar delta tracking, which can lead to accepting a program with OOB access. The specific code is related to the sync_linked_regs() function and the BPF_ADD_CONST flag, which signifies a constant offset between two scalar registers tracked by the same register id. The verifier attempts to track "similar" scalars in order to propagate bounds information learned about one scalar to others. For instance, if r1 and r2 are known to contain the same value, then upon encountering 'if (r1 != 0x1234) goto xyz', not only does it know that r1 is equal to 0x1234 on the path where that conditional jump is not taken, it also knows that r2 is. Additionally, with env->bpf_capable set, the verifier will track scalars which should be a constant delta apart (if r1 is known to be one greater than r2, then if r1 is known to be equal to 0x1234, r2 must be equal to 0x1233.) The code path for the latter in adjust_reg_min_max_vals() is reached when processing both 32 and 64-bit addition operations. While adjust_reg_min_max_vals() knows whether dst_reg was produced by a 32 or a 64-bit addition (based on the alu32 bool), the only information saved in dst_reg is the id of the source register (reg->id, or'ed by BPF_ADD_CONST) and the value of the constant offset (reg->off). Later, the function sync_linked_regs() will attempt to use this information to propagate bounds information from one register (known_reg) to others, meaning, for all R in linked_regs, it copies known_reg range (and possibly adjusting delta) into R for the case of R->id == known_reg->id. For the delta adjustment, meaning, matching reg->id with BPF_ADD_CONST, the verifier adjusts the register as reg = known_reg; reg += delta where delta is computed as (s32)reg->off - (s32)known_reg->off and placed as a scalar into a fake_reg to then simulate the addition of reg += fake_reg. This is only correct, however, if the value in reg was created by a 64-bit addition. When reg contains the result of a 32-bit addition operation, its upper 32 bits will always be zero. sync_linked_regs() on the other hand, may cause the verifier to believe that the addition between fake_reg and reg overflows into those upper bits. For example, if reg was generated by adding the constant 1 to known_reg using a 32-bit alu operation, then reg->off is 1 and known_reg->off is 0. If known_reg is known to be the constant 0xFFFFFFFF, sync_linked_regs() will tell the verifier that reg is equal to the constant 0x100000000. This is incorrect as the actual value of reg will be 0, as the 32-bit addition will wrap around. Example: 0: (b7) r0 = 0; R0_w=0 1: (18) r1 = 0x80000001; R1_w=0x80000001 3: (37) r1 /= 1; R1_w=scalar() 4: (bf) r2 = r1; R1_w=scalar(id=1) R2_w=scalar(id=1) 5: (bf) r4 = r1; R1_w=scalar(id=1) R4_w=scalar(id=1) 6: (04) w2 += 2147483647; R2_w=scalar(id=1+2147483647,smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) 7: (04) w4 += 0 ; R4_w=scalar(id=1+0,smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) 8: (15) if r2 == 0x0 goto pc+1 10: R0=0 R1=0xffffffff80000001 R2=0x7fffffff R4=0xffffffff80000001 R10=fp0 What can be seen here is that r1 is copied to r2 and r4, such that {r1,r2,r4}.id are all the same which later lets sync_linked_regs() to be invoked. Then, in a next step constants are added with alu32 to r2 and r4, setting their ->off, as well as id |= BPF_ADD_CONST. Next, the conditional will bind r2 and propagate ranges to its linked registers. The verifier now believes the upper 32 bits of r4 are r4=0xffffffff80000001, while actually r4=r1=0x80000001. One approach for a simple fix suitable also for stable is to limit the constant delta tracking to only 64-bit alu addition. If necessary at some later point, BPF_ADD_CONST could be split into BPF_ADD_CONST64 and BPF_ADD_CONST32 to avoid mixing the two under the tradeoff to further complicate sync_linked_regs(). However, none of the added tests from dedf56d7 ("selftests/bpf: Add tests for add_const") make this necessary at this point, meaning, BPF CI also passes with just limiting tracking to 64-bit alu addition. Fixes: 98d7ca37 ("bpf: Track delta between "linked" registers.") Reported-by: Nathaniel Theis <nathaniel.theis@nccgroup.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20241016134913.32249-1-daniel@iogearbox.net
-
Jordan Rome authored
Previously test_task_tid was setting `linfo.task.tid` to `getpid()` which is the same as `gettid()` for the parent process. Instead create a new child thread and set `linfo.task.tid` to `gettid()` to make sure the tid filtering logic is working as expected. Signed-off-by: Jordan Rome <linux@jordanrome.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241016210048.1213935-2-linux@jordanrome.com
-
Jordan Rome authored
In userspace, you can add a tid filter by setting the "task.tid" field for "bpf_iter_link_info". However, `get_pid_task` when called for the `BPF_TASK_ITER_TID` type should have been using `PIDTYPE_PID` (tid) instead of `PIDTYPE_TGID` (pid). Fixes: f0d74c4d ("bpf: Parameterize task iterators.") Signed-off-by: Jordan Rome <linux@jordanrome.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241016210048.1213935-1-linux@jordanrome.com
-
Andrea Parri authored
According to the prototype formal BPF memory consistency model discussed e.g. in [1] and following the ordering properties of the C/in-kernel macro atomic_cmpxchg(), a BPF atomic operation with the BPF_CMPXCHG modifier is fully ordered. However, the current RISC-V JIT lowerings fail to meet such memory ordering property. This is illustrated by the following litmus test: BPF BPF__MP+success_cmpxchg+fence { 0:r1=x; 0:r3=y; 0:r5=1; 1:r2=y; 1:r4=f; 1:r7=x; } P0 | P1 ; *(u64 *)(r1 + 0) = 1 | r1 = *(u64 *)(r2 + 0) ; r2 = cmpxchg_64 (r3 + 0, r4, r5) | r3 = atomic_fetch_add((u64 *)(r4 + 0), r5) ; | r6 = *(u64 *)(r7 + 0) ; exists (1:r1=1 /\ 1:r6=0) whose "exists" clause is not satisfiable according to the BPF memory model. Using the current RISC-V JIT lowerings, the test can be mapped to the following RISC-V litmus test: RISCV RISCV__MP+success_cmpxchg+fence { 0:x1=x; 0:x3=y; 0:x5=1; 1:x2=y; 1:x4=f; 1:x7=x; } P0 | P1 ; sd x5, 0(x1) | ld x1, 0(x2) ; L00: | amoadd.d.aqrl x3, x5, 0(x4) ; lr.d x2, 0(x3) | ld x6, 0(x7) ; bne x2, x4, L01 | ; sc.d x6, x5, 0(x3) | ; bne x6, x4, L00 | ; fence rw, rw | ; L01: | ; exists (1:x1=1 /\ 1:x6=0) where the two stores in P0 can be reordered. Update the RISC-V JIT lowerings/implementation of BPF_CMPXCHG to emit an SC with RELEASE ("rl") annotation in order to meet the expected memory ordering guarantees. The resulting RISC-V JIT lowerings of BPF_CMPXCHG match the RISC-V lowerings of the C atomic_cmpxchg(). Other lowerings were fixed via 20a759df ("riscv, bpf: make some atomic operations fully ordered"). Fixes: dd642ccb ("riscv, bpf: Implement more atomic operations for RV64") Signed-off-by: Andrea Parri <parri.andrea@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Puranjay Mohan <puranjay@kernel.org> Acked-by: Björn Töpel <bjorn@kernel.org> Link: https://lpc.events/event/18/contributions/1949/attachments/1665/3441/bpfmemmodel.2024.09.19p.pdf [1] Link: https://lore.kernel.org/bpf/20241017143628.2673894-1-parri.andrea@gmail.com
-
Michal Luczaj authored
vsock_bpf_prot is set up at runtime. Remove the superfluous init. No functional change intended. Fixes: 634f1a71 ("vsock: support sockmap") Signed-off-by: Michal Luczaj <mhal@rbox.co> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20241013-vsock-fixes-for-redir-v2-4-d6577bbfe742@rbox.co
-
Michal Luczaj authored
Dequeuing via vsock_transport::read_skb() left msg_count outdated, which then confused SOCK_SEQPACKET recv(). Decrease the counter. Fixes: 634f1a71 ("vsock: support sockmap") Signed-off-by: Michal Luczaj <mhal@rbox.co> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20241013-vsock-fixes-for-redir-v2-3-d6577bbfe742@rbox.co
-
Michal Luczaj authored
Make sure virtio_transport_inc_rx_pkt() and virtio_transport_dec_rx_pkt() calls are balanced (i.e. virtio_vsock_sock::rx_bytes doesn't lie) after vsock_transport::read_skb(). While here, also inform the peer that we've freed up space and it has more credit. Failing to update rx_bytes after packet is dequeued leads to a warning on SOCK_STREAM recv(): [ 233.396654] rx_queue is empty, but rx_bytes is non-zero [ 233.396702] WARNING: CPU: 11 PID: 40601 at net/vmw_vsock/virtio_transport_common.c:589 Fixes: 634f1a71 ("vsock: support sockmap") Suggested-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Michal Luczaj <mhal@rbox.co> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20241013-vsock-fixes-for-redir-v2-2-d6577bbfe742@rbox.co
-
Michal Luczaj authored
Don't mislead the callers of bpf_{sk,msg}_redirect_{map,hash}(): make sure to immediately and visibly fail the forwarding of unsupported af_vsock packets. Fixes: 634f1a71 ("vsock: support sockmap") Signed-off-by: Michal Luczaj <mhal@rbox.co> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20241013-vsock-fixes-for-redir-v2-1-d6577bbfe742@rbox.co
-
- 16 Oct, 2024 2 commits
-
-
Tyrone Wu authored
Add assertions/tests to verify `bpf_link_info` fields for netfilter link are correctly populated. Signed-off-by: Tyrone Wu <wudevelops@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20241011193252.178997-2-wudevelops@gmail.com
-
Tyrone Wu authored
This fix correctly populates the `bpf_link_info.netfilter.flags` field when user passes the `BPF_F_NETFILTER_IP_DEFRAG` flag. Fixes: 91721c2d ("netfilter: bpf: Support BPF_F_NETFILTER_IP_DEFRAG in netfilter link") Signed-off-by: Tyrone Wu <wudevelops@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Florian Westphal <fw@strlen.de> Cc: Daniel Xu <dxu@dxuuu.xyz> Link: https://lore.kernel.org/bpf/20241011193252.178997-1-wudevelops@gmail.com
-
- 15 Oct, 2024 4 commits
-
-
Alexei Starovoitov authored
Dimitar Kanaliev says: ==================== Fix truncation bug in coerce_reg_to_size_sx and extend selftests. This patch series addresses a truncation bug in the eBPF verifier function coerce_reg_to_size_sx(). The issue was caused by the incorrect ordering of assignments between 32-bit and 64-bit min/max values, leading to improper truncation when updating the register state. This issue has been reported previously by Zac Ecob[1] , but was not followed up on. The first patch fixes the assignment order in coerce_reg_to_size_sx() to ensure correct truncation. The subsequent patches add selftests for coerce_{reg,subreg}_to_size_sx. Changelog: v1 -> v2: - Moved selftests inside the conditional check for cpuv4 [1] (https://lore.kernel.org/bpf/h3qKLDEO6m9nhif0eAQX4fVrqdO0D_OPb0y5HfMK9jBePEKK33wQ3K-bqSVnr0hiZdFZtSJOsbNkcEQGpv_yJk61PAAiO8fUkgMRSO-lB50=@protonmail.com/) ==================== Link: https://lore.kernel.org/r/20241014121155.92887-1-dimitar.kanaliev@siteground.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Dimitar Kanaliev authored
Add a test for unsigned ranges after signed extension instruction. This case isn't currently covered by existing tests in verifier_movsx.c. Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Signed-off-by: Dimitar Kanaliev <dimitar.kanaliev@siteground.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20241014121155.92887-4-dimitar.kanaliev@siteground.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Dimitar Kanaliev authored
Add test that checks whether unsigned ranges deduced by the verifier for sign extension instruction is correct. Without previous patch that fixes truncation in coerce_reg_to_size_sx() this test fails. Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Signed-off-by: Dimitar Kanaliev <dimitar.kanaliev@siteground.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20241014121155.92887-3-dimitar.kanaliev@siteground.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Dimitar Kanaliev authored
coerce_reg_to_size_sx() updates the register state after a sign-extension operation. However, there's a bug in the assignment order of the unsigned min/max values, leading to incorrect truncation: 0: (85) call bpf_get_prandom_u32#7 ; R0_w=scalar() 1: (57) r0 &= 1 ; R0_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=1,var_off=(0x0; 0x1)) 2: (07) r0 += 254 ; R0_w=scalar(smin=umin=smin32=umin32=254,smax=umax=smax32=umax32=255,var_off=(0xfe; 0x1)) 3: (bf) r0 = (s8)r0 ; R0_w=scalar(smin=smin32=-2,smax=smax32=-1,umin=umin32=0xfffffffe,umax=0xffffffff,var_off=(0xfffffffffffffffe; 0x1)) In the current implementation, the unsigned 32-bit min/max values (u32_min_value and u32_max_value) are assigned directly from the 64-bit signed min/max values (s64_min and s64_max): reg->umin_value = reg->u32_min_value = s64_min; reg->umax_value = reg->u32_max_value = s64_max; Due to the chain assigmnent, this is equivalent to: reg->u32_min_value = s64_min; // Unintended truncation reg->umin_value = reg->u32_min_value; reg->u32_max_value = s64_max; // Unintended truncation reg->umax_value = reg->u32_max_value; Fixes: 1f9a1ea8 ("bpf: Support new sign-extension load insns") Reported-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Reported-by: Zac Ecob <zacecob@protonmail.com> Signed-off-by: Dimitar Kanaliev <dimitar.kanaliev@siteground.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Reviewed-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Link: https://lore.kernel.org/r/20241014121155.92887-2-dimitar.kanaliev@siteground.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
- 11 Oct, 2024 3 commits
-
-
Tyrone Wu authored
Add assertions in `bpf_link_info.uprobe_multi` test to verify that `count` and `path_size` fields are correctly populated when the fields are unset. This tests a previous bug where the `path_size` field was not populated when `path` and `path_size` were unset. Signed-off-by: Tyrone Wu <wudevelops@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241011000803.681190-2-wudevelops@gmail.com
-
Tyrone Wu authored
Previously when retrieving `bpf_link_info.uprobe_multi` with `path` and `path_size` fields unset, the `path_size` field is not populated (remains 0). This behavior was inconsistent with how other input/output string buffer fields work, as the field should be populated in cases when: - both buffer and length are set (currently works as expected) - both buffer and length are unset (not working as expected) This patch now fills the `path_size` field when `path` and `path_size` are unset. Fixes: e56fdbfb ("bpf: Add link_info support for uprobe multi link") Signed-off-by: Tyrone Wu <wudevelops@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241011000803.681190-1-wudevelops@gmail.com
-
Tony Ambardar authored
Linking of urandom_read and liburandom_read.so prefers LLVM's 'ld.lld' but falls back to using 'ld' if unsupported. However, this fallback discards any existing makefile macro for LD and can break cross-compilation. Fix by changing the fallback to use the target linker $(LD), passed via '-fuse-ld=' using an absolute path rather than a linker "flavour". Fixes: 08c79c9c ("selftests/bpf: Don't force lld on non-x86 architectures") Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241009040720.635260-1-tony.ambardar@gmail.com
-
- 10 Oct, 2024 9 commits
-
-
Alexei Starovoitov authored
Toke Høiland-Jørgensen says: ==================== Fix caching of BTF for kfuncs in the verifier When playing around with defining kfuncs in some custom modules, we noticed that if a BPF program calls two functions with the same signature in two different modules, the function from the wrong module may sometimes end up being called. Whether this happens depends on the order of the calls in the BPF program, which turns out to be due to the use of sort() inside __find_kfunc_desc_btf() in the verifier code. This series contains a fix for the issue (first patch), and a selftest to trigger it (last patch). The middle commit is a small refactor to expose the module loading helper functions in testing_helpers.c. See the individual patch descriptions for more details. Changes in v2: - Drop patch that refactors module building in selftests (Alexei) - Get rid of expect_val function argument in selftest (Jiri) - Collect ACKs - Link to v1: https://lore.kernel.org/r/20241008-fix-kfunc-btf-caching-for-modules-v1-0-dfefd9aa4318@redhat.com ==================== Link: https://lore.kernel.org/r/20241010-fix-kfunc-btf-caching-for-modules-v2-0-745af6c1af98@redhat.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Simon Sundberg authored
Add a test case for kfuncs from multiple external modules, checking that the correct kfuncs are called regardless of which order they're called in. Specifically, check that calling the kfuncs in an order different from the one the modules' BTF are loaded in works. Signed-off-by: Simon Sundberg <simon.sundberg@kau.se> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20241010-fix-kfunc-btf-caching-for-modules-v2-3-745af6c1af98@redhat.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Simon Sundberg authored
Generalize the previous [un]load_bpf_testmod() helpers (in testing_helpers.c) to the more generic [un]load_module(), which can load an arbitrary kernel module by name. This allows future selftests to more easily load custom kernel modules other than bpf_testmod.ko. Refactor [un]load_bpf_testmod() to wrap this new helper. Signed-off-by: Simon Sundberg <simon.sundberg@kau.se> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20241010-fix-kfunc-btf-caching-for-modules-v2-2-745af6c1af98@redhat.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Toke Høiland-Jørgensen authored
The verifier contains a cache for looking up module BTF objects when calling kfuncs defined in modules. This cache uses a 'struct bpf_kfunc_btf_tab', which contains a sorted list of BTF objects that were already seen in the current verifier run, and the BTF objects are looked up by the offset stored in the relocated call instruction using bsearch(). The first time a given offset is seen, the module BTF is loaded from the file descriptor passed in by libbpf, and stored into the cache. However, there's a bug in the code storing the new entry: it stores a pointer to the new cache entry, then calls sort() to keep the cache sorted for the next lookup using bsearch(), and then returns the entry that was just stored through the stored pointer. However, because sort() modifies the list of entries in place *by value*, the stored pointer may no longer point to the right entry, in which case the wrong BTF object will be returned. The end result of this is an intermittent bug where, if a BPF program calls two functions with the same signature in two different modules, the function from the wrong module may sometimes end up being called. Whether this happens depends on the order of the calls in the BPF program (as that affects whether sort() reorders the array of BTF objects), making it especially hard to track down. Simon, credited as reporter below, spent significant effort analysing and creating a reproducer for this issue. The reproducer is added as a selftest in a subsequent patch. The fix is straight forward: simply don't use the stored pointer after calling sort(). Since we already have an on-stack pointer to the BTF object itself at the point where the function return, just use that, and populate it from the cache entry in the branch where the lookup succeeds. Fixes: 2357672c ("bpf: Introduce BPF support for kernel module function calls") Reported-by: Simon Sundberg <simon.sundberg@kau.se> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20241010-fix-kfunc-btf-caching-for-modules-v2-1-745af6c1af98@redhat.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Tony Ambardar authored
Existing code calls connect() with a 'struct sockaddr_in6 *' argument where a 'struct sockaddr *' argument is declared, yielding compile errors when building for mips64el/musl-libc: In file included from cgroup_ancestor.c:3: cgroup_ancestor.c: In function 'send_datagram': cgroup_ancestor.c:38:38: error: passing argument 2 of 'connect' from incompatible pointer type [-Werror=incompatible-pointer-types] 38 | if (!ASSERT_OK(connect(sock, &addr, sizeof(addr)), "connect")) { | ^~~~~ | | | struct sockaddr_in6 * ./test_progs.h:343:29: note: in definition of macro 'ASSERT_OK' 343 | long long ___res = (res); \ | ^~~ In file included from .../netinet/in.h:10, from .../arpa/inet.h:9, from ./test_progs.h:17: .../sys/socket.h:386:19: note: expected 'const struct sockaddr *' but argument is of type 'struct sockaddr_in6 *' 386 | int connect (int, const struct sockaddr *, socklen_t); | ^~~~~~~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors This only compiles because of a glibc extension allowing declaration of the argument as a "transparent union" which includes both types above. Explicitly cast the argument to allow compiling for both musl and glibc. Cc: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com> Fixes: f957c230 ("selftests/bpf: convert test_skb_cgroup_id_user to test_progs") Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com> Reviewed-by: Alexis Lothoré <alexis.lothore@bootlin.com> Link: https://lore.kernel.org/r/20241008231232.634047-1-tony.ambardar@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Pu Lehui authored
When CONFIG_CFI_CLANG is enabled, the number of prologue instructions skipped by tailcall needs to include the kcfi instruction, otherwise the TCC will be initialized every tailcall is called, which may result in infinite tailcalls. Fixes: e63985ec ("bpf, riscv64/cfi: Support kCFI + BPF on riscv64") Signed-off-by: Pu Lehui <pulehui@huawei.com> Acked-by: Björn Töpel <bjorn@kernel.org> Link: https://lore.kernel.org/r/20241008124544.171161-1-pulehui@huaweicloud.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Tyrone Wu authored
Fix `name_len` field assertions in `bpf_link_info.perf_event` for kprobe/uprobe/tracepoint to validate correct name size instead of 0. Fixes: 23cf7aa5 ("selftests/bpf: Add selftest for fill_link_info") Signed-off-by: Tyrone Wu <wudevelops@gmail.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/r/20241008164312.46269-2-wudevelops@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Tyrone Wu authored
Previously when retrieving `bpf_link_info.perf_event` for kprobe/uprobe/tracepoint, the `name_len` field was not populated by the kernel, leaving it to reflect the value initially set by the user. This behavior was inconsistent with how other input/output string buffer fields function (e.g. `raw_tracepoint.tp_name_len`). This patch fills `name_len` with the actual size of the string name. Fixes: 1b715e1b ("bpf: Support ->fill_link_info for perf_event") Signed-off-by: Tyrone Wu <wudevelops@gmail.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/r/20241008164312.46269-1-wudevelops@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Rik van Riel authored
The kzmalloc call in bpf_check can fail when memory is very fragmented, which in turn can lead to an OOM kill. Use kvzmalloc to fall back to vmalloc when memory is too fragmented to allocate an order 3 sized bpf verifier environment. Admittedly this is not a very common case, and only happens on systems where memory has already been squeezed close to the limit, but this does not seem like much of a hot path, and it's a simple enough fix. Signed-off-by: Rik van Riel <riel@surriel.com> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Link: https://lore.kernel.org/r/20241008170735.16766766@imladris.surriel.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
- 09 Oct, 2024 3 commits
-
-
Alexei Starovoitov authored
Hou Tao says: ==================== Check the remaining info_cnt before repeating btf fields From: Hou Tao <houtao1@huawei.com> Hi, The patch set adds the missed check again info_cnt when flattening the array of nested struct. The problem was spotted when developing dynptr key support for hash map. Patch #1 adds the missed check and patch #2 adds three success test cases and one failure test case for the problem. Comments are always welcome. Change Log: v2: * patch #1: check info_cnt in btf_repeat_fields() * patch #2: use a hard-coded number instead of BTF_FIELDS_MAX, because BTF_FIELDS_MAX is not always available in vmlinux.h (e.g., for llvm 17/18) v1: https://lore.kernel.org/bpf/20240911110557.2759801-1-houtao@huaweicloud.com/T/#t ==================== Link: https://lore.kernel.org/r/20241008071114.3718177-1-houtao@huaweicloud.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Hou Tao authored
Add three success test cases to test the flattening of array of nested struct. For these three tests, the number of special fields in map is BTF_FIELDS_MAX, but the array is defined in structs with different nested level. Add one failure test case for the flattening as well. In the test case, the number of special fields in map is BTF_FIELDS_MAX + 1. It will make btf_parse_fields() in map_create() return -E2BIG, the creation of map will succeed, but the load of program will fail because the btf_record is invalid for the map. Signed-off-by: Hou Tao <houtao1@huawei.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20241008071114.3718177-3-houtao@huaweicloud.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Hou Tao authored
When trying to repeat the btf fields for array of nested struct, it doesn't check the remaining info_cnt. The following splat will be reported when the value of ret * nelems is greater than BTF_FIELDS_MAX: ------------[ cut here ]------------ UBSAN: array-index-out-of-bounds in ../kernel/bpf/btf.c:3951:49 index 11 is out of range for type 'btf_field_info [11]' CPU: 6 UID: 0 PID: 411 Comm: test_progs ...... 6.11.0-rc4+ #1 Tainted: [O]=OOT_MODULE Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ... Call Trace: <TASK> dump_stack_lvl+0x57/0x70 dump_stack+0x10/0x20 ubsan_epilogue+0x9/0x40 __ubsan_handle_out_of_bounds+0x6f/0x80 ? kallsyms_lookup_name+0x48/0xb0 btf_parse_fields+0x992/0xce0 map_create+0x591/0x770 __sys_bpf+0x229/0x2410 __x64_sys_bpf+0x1f/0x30 x64_sys_call+0x199/0x9f0 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 RIP: 0033:0x7fea56f2cc5d ...... </TASK> ---[ end trace ]--- Fix it by checking the remaining info_cnt in btf_repeat_fields() before repeating the btf fields. Fixes: 64e8ee81 ("bpf: look into the types of the fields of a struct type recursively.") Signed-off-by: Hou Tao <houtao1@huawei.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20241008071114.3718177-2-houtao@huaweicloud.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
- 08 Oct, 2024 2 commits
-
-
Thomas Weißschuh authored
The key_free LSM hook has been removed. Remove the corresponding BPF hook. Avoid warnings during the build: BTFIDS vmlinux WARN: resolve_btfids: unresolved symbol bpf_lsm_key_free Fixes: 5f8d28f6 ("lsm: infrastructure management of the key security blob") Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Song Liu <song@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20241005-lsm-key_free-v1-1-42ea801dbd63@weissschuh.net
-
Jiri Olsa authored
We need to free specs properly. Fixes: 3d2786d6 ("bpf: correctly handle malformed BPF_CORE_TYPE_ID_LOCAL relos") Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20241007160958.607434-1-jolsa@kernel.org
-
- 02 Oct, 2024 3 commits
-
-
Martin KaFai Lau authored
Florian Kauer says: ==================== rxq contains a pointer to the device from where the redirect happened. Currently, the BPF program that was executed after a redirect via BPF_MAP_TYPE_DEVMAP* does not have it set. Add bugfix and related selftest. --- Changes in v4: - return -> goto out_close, thanks Toke - Link to v3: https://lore.kernel.org/r/20240909-devel-koalo-fix-ingress-ifindex-v3-0-66218191ecca@linutronix.de Changes in v3: - initialize skel to NULL, thanks Stanislav - Link to v2: https://lore.kernel.org/r/20240906-devel-koalo-fix-ingress-ifindex-v2-0-4caa12c644b4@linutronix.de Changes in v2: - changed fixes tag - added selftest - Link to v1: https://lore.kernel.org/r/20240905-devel-koalo-fix-ingress-ifindex-v1-1-d12a0d74c29c@linutronix.de ==================== Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
Florian Kauer authored
The current xdp_devmap_attach test attaches a program that redirects to another program via devmap. It is, however, never executed, so do that to catch any bugs that might occur during execution. Also, execute the same for a veth pair so that we also cover the non-generic path. Warning: Running this without the bugfix in this series will likely crash your system. Signed-off-by: Florian Kauer <florian.kauer@linutronix.de> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20240911-devel-koalo-fix-ingress-ifindex-v4-2-5c643ae10258@linutronix.deSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
Florian Kauer authored
rxq contains a pointer to the device from where the redirect happened. Currently, the BPF program that was executed after a redirect via BPF_MAP_TYPE_DEVMAP* does not have it set. This is particularly bad since accessing ingress_ifindex, e.g. SEC("xdp") int prog(struct xdp_md *pkt) { return bpf_redirect_map(&dev_redirect_map, 0, 0); } SEC("xdp/devmap") int prog_after_redirect(struct xdp_md *pkt) { bpf_printk("ifindex %i", pkt->ingress_ifindex); return XDP_PASS; } depends on access to rxq, so a NULL pointer gets dereferenced: <1>[ 574.475170] BUG: kernel NULL pointer dereference, address: 0000000000000000 <1>[ 574.475188] #PF: supervisor read access in kernel mode <1>[ 574.475194] #PF: error_code(0x0000) - not-present page <6>[ 574.475199] PGD 0 P4D 0 <4>[ 574.475207] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI <4>[ 574.475217] CPU: 4 UID: 0 PID: 217 Comm: kworker/4:1 Not tainted 6.11.0-rc5-reduced-00859-g78080120 #23 <4>[ 574.475226] Hardware name: Intel(R) Client Systems NUC13ANHi7/NUC13ANBi7, BIOS ANRPL357.0026.2023.0314.1458 03/14/2023 <4>[ 574.475231] Workqueue: mld mld_ifc_work <4>[ 574.475247] RIP: 0010:bpf_prog_5e13354d9cf5018a_prog_after_redirect+0x17/0x3c <4>[ 574.475257] Code: cc cc cc cc cc cc cc 80 00 00 00 cc cc cc cc cc cc cc cc f3 0f 1e fa 0f 1f 44 00 00 66 90 55 48 89 e5 f3 0f 1e fa 48 8b 57 20 <48> 8b 52 00 8b 92 e0 00 00 00 48 bf f8 a6 d5 c4 5d a0 ff ff be 0b <4>[ 574.475263] RSP: 0018:ffffa62440280c98 EFLAGS: 00010206 <4>[ 574.475269] RAX: ffffa62440280cd8 RBX: 0000000000000001 RCX: 0000000000000000 <4>[ 574.475274] RDX: 0000000000000000 RSI: ffffa62440549048 RDI: ffffa62440280ce0 <4>[ 574.475278] RBP: ffffa62440280c98 R08: 0000000000000002 R09: 0000000000000001 <4>[ 574.475281] R10: ffffa05dc8b98000 R11: ffffa05f577fca40 R12: ffffa05dcab24000 <4>[ 574.475285] R13: ffffa62440280ce0 R14: ffffa62440549048 R15: ffffa62440549000 <4>[ 574.475289] FS: 0000000000000000(0000) GS:ffffa05f4f700000(0000) knlGS:0000000000000000 <4>[ 574.475294] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[ 574.475298] CR2: 0000000000000000 CR3: 000000025522e000 CR4: 0000000000f50ef0 <4>[ 574.475303] PKRU: 55555554 <4>[ 574.475306] Call Trace: <4>[ 574.475313] <IRQ> <4>[ 574.475318] ? __die+0x23/0x70 <4>[ 574.475329] ? page_fault_oops+0x180/0x4c0 <4>[ 574.475339] ? skb_pp_cow_data+0x34c/0x490 <4>[ 574.475346] ? kmem_cache_free+0x257/0x280 <4>[ 574.475357] ? exc_page_fault+0x67/0x150 <4>[ 574.475368] ? asm_exc_page_fault+0x26/0x30 <4>[ 574.475381] ? bpf_prog_5e13354d9cf5018a_prog_after_redirect+0x17/0x3c <4>[ 574.475386] bq_xmit_all+0x158/0x420 <4>[ 574.475397] __dev_flush+0x30/0x90 <4>[ 574.475407] veth_poll+0x216/0x250 [veth] <4>[ 574.475421] __napi_poll+0x28/0x1c0 <4>[ 574.475430] net_rx_action+0x32d/0x3a0 <4>[ 574.475441] handle_softirqs+0xcb/0x2c0 <4>[ 574.475451] do_softirq+0x40/0x60 <4>[ 574.475458] </IRQ> <4>[ 574.475461] <TASK> <4>[ 574.475464] __local_bh_enable_ip+0x66/0x70 <4>[ 574.475471] __dev_queue_xmit+0x268/0xe40 <4>[ 574.475480] ? selinux_ip_postroute+0x213/0x420 <4>[ 574.475491] ? alloc_skb_with_frags+0x4a/0x1d0 <4>[ 574.475502] ip6_finish_output2+0x2be/0x640 <4>[ 574.475512] ? nf_hook_slow+0x42/0xf0 <4>[ 574.475521] ip6_finish_output+0x194/0x300 <4>[ 574.475529] ? __pfx_ip6_finish_output+0x10/0x10 <4>[ 574.475538] mld_sendpack+0x17c/0x240 <4>[ 574.475548] mld_ifc_work+0x192/0x410 <4>[ 574.475557] process_one_work+0x15d/0x380 <4>[ 574.475566] worker_thread+0x29d/0x3a0 <4>[ 574.475573] ? __pfx_worker_thread+0x10/0x10 <4>[ 574.475580] ? __pfx_worker_thread+0x10/0x10 <4>[ 574.475587] kthread+0xcd/0x100 <4>[ 574.475597] ? __pfx_kthread+0x10/0x10 <4>[ 574.475606] ret_from_fork+0x31/0x50 <4>[ 574.475615] ? __pfx_kthread+0x10/0x10 <4>[ 574.475623] ret_from_fork_asm+0x1a/0x30 <4>[ 574.475635] </TASK> <4>[ 574.475637] Modules linked in: veth br_netfilter bridge stp llc iwlmvm x86_pkg_temp_thermal iwlwifi efivarfs nvme nvme_core <4>[ 574.475662] CR2: 0000000000000000 <4>[ 574.475668] ---[ end trace 0000000000000000 ]--- Therefore, provide it to the program by setting rxq properly. Fixes: cb261b59 ("bpf: Run devmap xdp_prog on flush instead of bulk enqueue") Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Florian Kauer <florian.kauer@linutronix.de> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20240911-devel-koalo-fix-ingress-ifindex-v4-1-5c643ae10258@linutronix.deSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
- 01 Oct, 2024 3 commits
-
-
Daniel Borkmann authored
There is a delta between kernel UAPI bpf.h and tools UAPI bpf.h, thus sync them again. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Toke Høiland-Jørgensen authored
The bpf_redirect_info is shared between the SKB and XDP redirect paths, and the two paths use the same numeric flag values in the ri->flags field (specifically, BPF_F_BROADCAST == BPF_F_NEXTHOP). This means that if skb bpf_redirect_neigh() is used with a non-NULL params argument and, subsequently, an XDP redirect is performed using the same bpf_redirect_info struct, the XDP path will get confused and end up crashing, which syzbot managed to trigger. With the stack-allocated bpf_redirect_info, the structure is no longer shared between the SKB and XDP paths, so the crash doesn't happen anymore. However, different code paths using identically-numbered flag values in the same struct field still seems like a bit of a mess, so this patch cleans that up by moving the flag definitions together and redefining the three flags in BPF_F_REDIRECT_INTERNAL to not overlap with the flags used for XDP. It also adds a BUILD_BUG_ON() check to make sure the overlap is not re-introduced by mistake. Fixes: e624d4ed ("xdp: Extend xdp_redirect_map with broadcast support") Reported-by: syzbot+cca39e6e84a367a7e6f6@syzkaller.appspotmail.com Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Closes: https://syzkaller.appspot.com/bug?extid=cca39e6e84a367a7e6f6 Link: https://lore.kernel.org/bpf/20240920125625.59465-1-toke@redhat.com
-
Eduard Zingerman authored
This test was added because of a bug in verifier.c:sync_linked_regs(), upon range propagation it destroyed subreg_def marks for registers. The test is written in a way to return an upper half of a register that is affected by range propagation and must have it's subreg_def preserved. This gives a return value of 0 and leads to undefined return value if subreg_def mark is not preserved. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240924210844.1758441-2-eddyz87@gmail.com
-