- 13 May, 2024 1 commit
-
-
Puranjay Mohan authored
The BPF atomic operations with the BPF_FETCH modifier along with BPF_XCHG and BPF_CMPXCHG are fully ordered but the RISC-V JIT implements all atomic operations except BPF_CMPXCHG with relaxed ordering. Section 8.1 of the "The RISC-V Instruction Set Manual Volume I: Unprivileged ISA" [1], titled, "Specifying Ordering of Atomic Instructions" says: | To provide more efficient support for release consistency [5], each | atomic instruction has two bits, aq and rl, used to specify additional | memory ordering constraints as viewed by other RISC-V harts. and | If only the aq bit is set, the atomic memory operation is treated as | an acquire access. | If only the rl bit is set, the atomic memory operation is treated as a | release access. | | If both the aq and rl bits are set, the atomic memory operation is | sequentially consistent. Fix this by setting both aq and rl bits as 1 for operations with BPF_FETCH and BPF_XCHG. [1] https://riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2.pdf Fixes: dd642ccb ("riscv, bpf: Implement more atomic operations for RV64") Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Reviewed-by: Pu Lehui <pulehui@huawei.com> Link: https://lore.kernel.org/r/20240505201633.123115-1-puranjay@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
- 12 May, 2024 8 commits
-
-
Xiao Wang authored
We can use either "instruction" or "insn" in the comment. Signed-off-by: Xiao Wang <xiao.w.wang@intel.com> Reviewed-by: Pu Lehui <pulehui@huawei.com> Link: https://lore.kernel.org/r/20240507111618.437121-1-xiao.w.wang@intel.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Ilya Leoshkevich authored
BPF_ATOMIC_OP() macro documentation states that "BPF_ADD | BPF_FETCH" should be the same as atomic_fetch_add(), which is currently not the case on s390x: the serialization instruction "bcr 14,0" is missing. This applies to "and", "or" and "xor" variants too. s390x is allowed to reorder stores with subsequent fetches from different addresses, so code relying on BPF_FETCH acting as a barrier, for example: stw [%r0], 1 afadd [%r1], %r2 ldxw %r3, [%r4] may be broken. Fix it by emitting "bcr 14,0". Note that a separate serialization instruction is not needed for BPF_XCHG and BPF_CMPXCHG, because COMPARE AND SWAP performs serialization itself. Fixes: ba3b86b9 ("s390/bpf: Implement new atomic ops") Reported-by: Puranjay Mohan <puranjay12@gmail.com> Closes: https://lore.kernel.org/bpf/mb61p34qvq3wf.fsf@kernel.org/Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Puranjay Mohan <puranjay@kernel.org> Link: https://lore.kernel.org/r/20240507000557.12048-1-iii@linux.ibm.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Alexei Starovoitov authored
Puranjay Mohan says: ==================== bpf: Inline helpers in arm64 and riscv JITs Changes in v5 -> v6: arm64 v5: https://lore.kernel.org/all/20240430234739.79185-1-puranjay@kernel.org/ riscv v2: https://lore.kernel.org/all/20240430175834.33152-1-puranjay@kernel.org/ - Combine riscv and arm64 changes in single series - Some coding style fixes Changes in v4 -> v5: v4: https://lore.kernel.org/all/20240429131647.50165-1-puranjay@kernel.org/ - Implement the inlining of the bpf_get_smp_processor_id() in the JIT. NOTE: This needs to be based on: https://lore.kernel.org/all/20240430175834.33152-1-puranjay@kernel.org/ to be built. Manual run of bpf-ci with this series rebased on above: https://github.com/kernel-patches/bpf/pull/6929 Changes in v3 -> v4: v3: https://lore.kernel.org/all/20240426121349.97651-1-puranjay@kernel.org/ - Fix coding style issue related to C89 standards. Changes in v2 -> v3: v2: https://lore.kernel.org/all/20240424173550.16359-1-puranjay@kernel.org/ - Fixed the xlated dump of percpu mov to "r0 = &(void __percpu *)(r0)" - Made ARM64 and x86-64 use the same code for inlining. The only difference that remains is the per-cpu address of the cpu_number. Changes in v1 -> v2: v1: https://lore.kernel.org/all/20240405091707.66675-1-puranjay12@gmail.com/ - Add a patch to inline bpf_get_smp_processor_id() - Fix an issue in MRS instruction encoding as pointed out by Will - Remove CONFIG_SMP check because arm64 kernel always compiles with CONFIG_SMP This series adds the support of internal only per-CPU instructions and inlines the bpf_get_smp_processor_id() helper call for ARM64 and RISC-V BPF JITs. Here is an example of calls to bpf_get_smp_processor_id() and percpu_array_map_lookup_elem() before and after this series on ARM64. BPF ===== BEFORE AFTER -------- ------- int cpu = bpf_get_smp_processor_id(); int cpu = bpf_get_smp_processor_id(); (85) call bpf_get_smp_processor_id#229032 (85) call bpf_get_smp_processor_id#8 p = bpf_map_lookup_elem(map, &zero); p = bpf_map_lookup_elem(map, &zero); (18) r1 = map[id:78] (18) r1 = map[id:153] (18) r2 = map[id:82][0]+65536 (18) r2 = map[id:157][0]+65536 (85) call percpu_array_map_lookup_elem#313512 (07) r1 += 496 (61) r0 = *(u32 *)(r2 +0) (35) if r0 >= 0x1 goto pc+5 (67) r0 <<= 3 (0f) r0 += r1 (79) r0 = *(u64 *)(r0 +0) (bf) r0 = &(void __percpu *)(r0) (05) goto pc+1 (b7) r0 = 0 ARM64 JIT =========== BEFORE AFTER -------- ------- int cpu = bpf_get_smp_processor_id(); int cpu = bpf_get_smp_processor_id(); mov x10, #0xfffffffffffff4d0 mrs x10, sp_el0 movk x10, #0x802b, lsl #16 ldr w7, [x10, #24] movk x10, #0x8000, lsl #32 blr x10 add x7, x0, #0x0 p = bpf_map_lookup_elem(map, &zero); p = bpf_map_lookup_elem(map, &zero); mov x0, #0xffff0003ffffffff mov x0, #0xffff0003ffffffff movk x0, #0xce5c, lsl #16 movk x0, #0xe0f3, lsl #16 movk x0, #0xca00 movk x0, #0x7c00 mov x1, #0xffff8000ffffffff mov x1, #0xffff8000ffffffff movk x1, #0x8bdb, lsl #16 movk x1, #0xb0c7, lsl #16 movk x1, #0x6000 movk x1, #0xe000 mov x10, #0xffffffffffff3ed0 add x0, x0, #0x1f0 movk x10, #0x802d, lsl #16 ldr w7, [x1] movk x10, #0x8000, lsl #32 cmp x7, #0x1 blr x10 b.cs 0x0000000000000090 add x7, x0, #0x0 lsl x7, x7, #3 add x7, x7, x0 ldr x7, [x7] mrs x10, tpidr_el1 add x7, x7, x10 b 0x0000000000000094 mov x7, #0x0 Performance improvement found using benchmark[1] ./benchs/run_bench_trigger.sh glob-arr-inc arr-inc hash-inc +---------------+-------------------+-------------------+--------------+ | Name | Before | After | % change | |---------------+-------------------+-------------------+--------------| | glob-arr-inc | 23.380 ± 1.675M/s | 25.893 ± 0.026M/s | + 10.74% | | arr-inc | 23.928 ± 0.034M/s | 25.213 ± 0.063M/s | + 5.37% | | hash-inc | 12.352 ± 0.005M/s | 12.609 ± 0.013M/s | + 2.08% | +---------------+-------------------+-------------------+--------------+ [1] https://github.com/anakryiko/linux/commit/8dec900975ef RISCV64 JIT output for `call bpf_get_smp_processor_id` ======================================================= Before After -------- ------- auipc t1,0x848c ld a5,32(tp) jalr 604(t1) mv a5,a0 Benchmark using [1] on Qemu. ./benchs/run_bench_trigger.sh glob-arr-inc arr-inc hash-inc +---------------+------------------+------------------+--------------+ | Name | Before | After | % change | |---------------+------------------+------------------+--------------| | glob-arr-inc | 1.077 ± 0.006M/s | 1.336 ± 0.010M/s | + 24.04% | | arr-inc | 1.078 ± 0.002M/s | 1.332 ± 0.015M/s | + 23.56% | | hash-inc | 0.494 ± 0.004M/s | 0.653 ± 0.001M/s | + 32.18% | +---------------+------------------+------------------+--------------+ ==================== Link: https://lore.kernel.org/r/20240502151854.9810-1-puranjay@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Puranjay Mohan authored
Inline calls to bpf_get_smp_processor_id() helper in the JIT by emitting a read from struct thread_info. The SP_EL0 system register holds the pointer to the task_struct and thread_info is the first member of this struct. We can read the cpu number from the thread_info. Here is how the ARM64 JITed assembly changes after this commit: ARM64 JIT =========== BEFORE AFTER -------- ------- int cpu = bpf_get_smp_processor_id(); int cpu = bpf_get_smp_processor_id(); mov x10, #0xfffffffffffff4d0 mrs x10, sp_el0 movk x10, #0x802b, lsl #16 ldr w7, [x10, #24] movk x10, #0x8000, lsl #32 blr x10 add x7, x0, #0x0 Performance improvement using benchmark[1] ./benchs/run_bench_trigger.sh glob-arr-inc arr-inc hash-inc +---------------+-------------------+-------------------+--------------+ | Name | Before | After | % change | |---------------+-------------------+-------------------+--------------| | glob-arr-inc | 23.380 ± 1.675M/s | 25.893 ± 0.026M/s | + 10.74% | | arr-inc | 23.928 ± 0.034M/s | 25.213 ± 0.063M/s | + 5.37% | | hash-inc | 12.352 ± 0.005M/s | 12.609 ± 0.013M/s | + 2.08% | +---------------+-------------------+-------------------+--------------+ [1] https://github.com/anakryiko/linux/commit/8dec900975efSigned-off-by: Puranjay Mohan <puranjay@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240502151854.9810-5-puranjay@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Puranjay Mohan authored
Support an instruction for resolving absolute addresses of per-CPU data from their per-CPU offsets. This instruction is internal-only and users are not allowed to use them directly. They will only be used for internal inlining optimizations for now between BPF verifier and BPF JITs. Since commit 71586276 ("arm64: percpu: implement optimised pcpu access using tpidr_el1"), the per-cpu offset for the CPU is stored in the tpidr_el1/2 register of that CPU. To support this BPF instruction in the ARM64 JIT, the following ARM64 instructions are emitted: mov dst, src // Move src to dst, if src != dst mrs tmp, tpidr_el1/2 // Move per-cpu offset of the current cpu in tmp. add dst, dst, tmp // Add the per cpu offset to the dst. To measure the performance improvement provided by this change, the benchmark in [1] was used: Before: glob-arr-inc : 23.597 ± 0.012M/s arr-inc : 23.173 ± 0.019M/s hash-inc : 12.186 ± 0.028M/s After: glob-arr-inc : 23.819 ± 0.034M/s arr-inc : 23.285 ± 0.017M/s hash-inc : 12.419 ± 0.011M/s [1] https://github.com/anakryiko/linux/commit/8dec900975efSigned-off-by: Puranjay Mohan <puranjay12@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240502151854.9810-4-puranjay@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Puranjay Mohan authored
Inline the calls to bpf_get_smp_processor_id() in the riscv bpf jit. RISCV saves the pointer to the CPU's task_struct in the TP (thread pointer) register. This makes it trivial to get the CPU's processor id. As thread_info is the first member of task_struct, we can read the processor id from TP + offsetof(struct thread_info, cpu). RISCV64 JIT output for `call bpf_get_smp_processor_id` ====================================================== Before After -------- ------- auipc t1,0x848c ld a5,32(tp) jalr 604(t1) mv a5,a0 Benchmark using [1] on Qemu. ./benchs/run_bench_trigger.sh glob-arr-inc arr-inc hash-inc +---------------+------------------+------------------+--------------+ | Name | Before | After | % change | |---------------+------------------+------------------+--------------| | glob-arr-inc | 1.077 ± 0.006M/s | 1.336 ± 0.010M/s | + 24.04% | | arr-inc | 1.078 ± 0.002M/s | 1.332 ± 0.015M/s | + 23.56% | | hash-inc | 0.494 ± 0.004M/s | 0.653 ± 0.001M/s | + 32.18% | +---------------+------------------+------------------+--------------+ NOTE: This benchmark includes changes from this patch and the previous patch that implemented the per-cpu insn. [1] https://github.com/anakryiko/linux/commit/8dec900975efSigned-off-by: Puranjay Mohan <puranjay@kernel.org> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Björn Töpel <bjorn@kernel.org> Link: https://lore.kernel.org/r/20240502151854.9810-3-puranjay@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Puranjay Mohan authored
Support an instruction for resolving absolute addresses of per-CPU data from their per-CPU offsets. This instruction is internal-only and users are not allowed to use them directly. They will only be used for internal inlining optimizations for now between BPF verifier and BPF JITs. RISC-V uses generic per-cpu implementation where the offsets for CPUs are kept in an array called __per_cpu_offset[cpu_number]. RISCV stores the address of the task_struct in TP register. The first element in task_struct is struct thread_info, and we can get the cpu number by reading from the TP register + offsetof(struct thread_info, cpu). Once we have the cpu number in a register we read the offset for that cpu from address: &__per_cpu_offset + cpu_number << 3. Then we add this offset to the destination register. To measure the improvement from this change, the benchmark in [1] was used on Qemu: Before: glob-arr-inc : 1.127 ± 0.013M/s arr-inc : 1.121 ± 0.004M/s hash-inc : 0.681 ± 0.052M/s After: glob-arr-inc : 1.138 ± 0.011M/s arr-inc : 1.366 ± 0.006M/s hash-inc : 0.676 ± 0.001M/s [1] https://github.com/anakryiko/linux/commit/8dec900975efSigned-off-by: Puranjay Mohan <puranjay@kernel.org> Acked-by: Björn Töpel <bjorn@kernel.org> Link: https://lore.kernel.org/r/20240502151854.9810-2-puranjay@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Shahab Vahedi authored
This will add eBPF JIT support to the 32-bit ARCv2 processors. The implementation is qualified by running the BPF tests on a Synopsys HSDK board with "ARC HS38 v2.1c at 500 MHz" as the 4-core CPU. The test_bpf.ko reports 2-10 fold improvements in execution time of its tests. For instance: test_bpf: #33 tcpdump port 22 jited:0 704 1766 2104 PASS test_bpf: #33 tcpdump port 22 jited:1 120 224 260 PASS test_bpf: #141 ALU_DIV_X: 4294967295 / 4294967295 = 1 jited:0 238 PASS test_bpf: #141 ALU_DIV_X: 4294967295 / 4294967295 = 1 jited:1 23 PASS test_bpf: #776 JMP32_JGE_K: all ... magnitudes jited:0 2034681 PASS test_bpf: #776 JMP32_JGE_K: all ... magnitudes jited:1 1020022 PASS Deployment and structure ------------------------ The related codes are added to "arch/arc/net": - bpf_jit.h -- The interface that a back-end translator must provide - bpf_jit_core.c -- Knows how to handle the input eBPF byte stream - bpf_jit_arcv2.c -- The back-end code that knows the translation logic The bpf_int_jit_compile() at the end of bpf_jit_core.c is the entrance to the whole process. Normally, the translation is done in one pass, namely the "normal pass". In case some relocations are not known during this pass, some data (arc_jit_data) is allocated for the next pass to come. This possible next (and last) pass is called the "extra pass". 1. Normal pass # The necessary pass 1a. Dry run # Get the whole JIT length, epilogue offset, etc. 1b. Emit phase # Allocate memory and start emitting instructions 2. Extra pass # Only needed if there are relocations to be fixed 2a. Patch relocations Support status -------------- The JIT compiler supports BPF instructions up to "cpu=v4". However, it does not yet provide support for: - Tail calls - Atomic operations - 64-bit division/remainder - BPF_PROBE_MEM* (exception table) The result of "test_bpf" test suite on an HSDK board is: hsdk-lnx# insmod test_bpf.ko test_suite=test_bpf test_bpf: Summary: 863 PASSED, 186 FAILED, [851/851 JIT'ed] All the failing test cases are due to the ones that were not JIT'ed. Categorically, they can be represented as: .-----------.------------.-------------. | test type | opcodes | # of cases | |-----------+------------+-------------| | atomic | 0xC3, 0xDB | 149 | | div64 | 0x37, 0x3F | 22 | | mod64 | 0x97, 0x9F | 15 | `-----------^------------+-------------| | (total) 186 | `-------------' Setup: build config ------------------- The following configs must be set to have a working JIT test: CONFIG_BPF_JIT=y CONFIG_BPF_JIT_ALWAYS_ON=y CONFIG_TEST_BPF=m The following options are not necessary for the tests module, but are good to have: CONFIG_DEBUG_INFO=y # prerequisite for below CONFIG_DEBUG_INFO_BTF=y # so bpftool can generate vmlinux.h CONFIG_FTRACE=y # CONFIG_BPF_SYSCALL=y # all these options lead to CONFIG_KPROBE_EVENTS=y # having CONFIG_BPF_EVENTS=y CONFIG_PERF_EVENTS=y # Some BPF programs provide data through /sys/kernel/debug: CONFIG_DEBUG_FS=y arc# mount -t debugfs debugfs /sys/kernel/debug Setup: elfutils --------------- The libdw.{so,a} library that is used by pahole for processing the final binary must come from elfutils 0.189 or newer. The support for ARCv2 [1] has been added since that version. [1] https://sourceware.org/git/?p=elfutils.git;a=commit;h=de3d46b3e7 Setup: pahole ------------- The line below in linux/scripts/Makefile.btf must be commented out: pahole-flags-$(call test-ge, $(pahole-ver), 121) += --btf_gen_floats Or else, the build will fail: $ make V=1 ... BTF .btf.vmlinux.bin.o pahole -J --btf_gen_floats \ -j --lang_exclude=rust \ --skip_encoding_btf_inconsistent_proto \ --btf_gen_optimized .tmp_vmlinux.btf Complex, interval and imaginary float types are not supported Encountered error while encoding BTF. ... BTFIDS vmlinux ./tools/bpf/resolve_btfids/resolve_btfids vmlinux libbpf: failed to find '.BTF' ELF section in vmlinux FAILED: load BTF from vmlinux: No data available This is due to the fact that the ARC toolchains generate "complex float" DIE entries in libgcc and at the moment, pahole can't handle such entries. Running the tests ----------------- host$ scp /bld/linux/lib/test_bpf.ko arc: arc # sysctl net.core.bpf_jit_enable=1 arc # insmod test_bpf.ko test_suite=test_bpf ... test_bpf: #1048 Staggered jumps: JMP32_JSLE_X jited:1 697811 PASS test_bpf: Summary: 863 PASSED, 186 FAILED, [851/851 JIT'ed] Acknowledgments --------------- - Claudiu Zissulescu for his unwavering support - Yuriy Kolerov for testing and troubleshooting - Vladimir Isaev for the pahole workaround - Sergey Matyukevich for paving the road by adding the interpreter support Signed-off-by: Shahab Vahedi <shahab@synopsys.com> Link: https://lore.kernel.org/r/20240430145604.38592-1-list+bpf@vahedi.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
- 09 May, 2024 19 commits
-
-
Alan Maguire authored
The btf_features list can be used for pahole v1.26 and later - it is useful because if a feature is not yet implemented it will not exit with a failure message. This will allow us to add feature requests to the pahole options without having to check pahole versions in future; if the version of pahole supports the feature it will be added. Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Tested-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240507135514.490467-1-alan.maguire@oracle.com
-
Martin KaFai Lau authored
Geliang Tang says: ==================== From: Geliang Tang <tanggeliang@kylinos.cn> This patchset adds post_socket_cb pointer into struct network_helper_opts to make start_server_addr() helper more flexible. With these modifications, many duplicate codes can be dropped. Patches 1-3 address Martin's comments in the previous series. ==================== Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
Geliang Tang authored
The arguments "addr" and "len" of run_test() have dropped. This makes function get_port() useless. Drop it from test_tcp_check_syncookie_user.c. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Link: https://lore.kernel.org/r/a9b5c8064ab4cbf0f68886fe0e4706428b8d0d47.1714907662.git.tanggeliang@kylinos.cnSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
Geliang Tang authored
This patch uses public helper connect_to_fd() exported in network_helpers.h instead of the local defined function connect_to_server() in test_tcp_check_syncookie_user.c. This can avoid duplicate code. Then the arguments "addr" and "len" of run_test() become useless, drop them too. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Link: https://lore.kernel.org/r/e0ae6b790ac0abc7193aadfb2660c8c9eb0fe1f0.1714907662.git.tanggeliang@kylinos.cnSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
Geliang Tang authored
This patch uses public helper connect_to_fd() exported in network_helpers.h instead of the local defined function connect_to_server() in prog_tests/sockopt_inherit.c. This can avoid duplicate code. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Link: https://lore.kernel.org/r/71db79127cc160b0643fd9a12c70ae019ae076a1.1714907662.git.tanggeliang@kylinos.cnSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
Geliang Tang authored
Include network_helpers.h in test_tcp_check_syncookie_user.c, use public helper start_server_addr() in it instead of the local defined function start_server(). This can avoid duplicate code. Add two helpers v6only_true() and v6only_false() to set IPV6_V6ONLY sockopt to true or false, set them to post_socket_cb pointer of struct network_helper_opts, and pass it to start_server_setsockopt(). In order to use functions defined in network_helpers.c, Makefile needs to be updated too. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Link: https://lore.kernel.org/r/e0c5324f5da84f453f47543536e70f126eaa8678.1714907662.git.tanggeliang@kylinos.cnSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
Geliang Tang authored
Include network_helpers.h in prog_tests/sockopt_inherit.c, use public helper start_server_addr() instead of the local defined function start_server(). This can avoid duplicate code. Add a helper custom_cb() to set SOL_CUSTOM sockopt looply, set it to post_socket_cb pointer of struct network_helper_opts, and pass it to start_server_addr(). Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Link: https://lore.kernel.org/r/687af66f743a0bf15cdba372c5f71fe64863219e.1714907662.git.tanggeliang@kylinos.cnSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
Geliang Tang authored
__start_server() sets SO_REUSPORT through setsockopt() when the parameter 'reuseport' is set. This patch makes it more flexible by adding a function pointer post_socket_cb into struct network_helper_opts. The 'const struct post_socket_opts *cb_opts' args in the post_socket_cb is for the future extension. The 'reuseport' parameter can be dropped. Now the original start_reuseport_server() can be implemented by setting a newly defined reuseport_cb() function pointer to post_socket_cb filed of struct network_helper_opts. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Link: https://lore.kernel.org/r/470cb82f209f055fc7fb39c66c6b090b5b7ed2b2.1714907662.git.tanggeliang@kylinos.cnSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
Alexei Starovoitov authored
Martin KaFai Lau says: ==================== selftests/bpf: Retire bpf_tcp_helpers.h From: Martin KaFai Lau <martin.lau@kernel.org> The earlier commit 8e6d9ae2 ("selftests/bpf: Use bpf_tracing.h instead of bpf_tcp_helpers.h") removed the bpf_tcp_helpers.h usages from the non networking tests. This patch set is a continuation of this effort to retire the bpf_tcp_helpers.h from the networking tests (mostly tcp-cc related). The main usage of the bpf_tcp_helpers.h is the partial kernel socket definitions (e.g. sock, tcp_sock). New fields are kept adding back to those partial socket definitions while everything is available in the vmlinux.h. The recent bpf_cc_cubic.c test tried to extend bpf_tcp_helpers.c but eventually used the vmlinux.h instead. To avoid this unnecessary detour for new tests and have one consistent way of using the kernel sockets, this patch set retires the bpf_tcp_helpers.h usages and consolidates the tests to use vmlinux.h instead. ==================== Link: https://lore.kernel.org/r/20240509175026.3423614-1-martin.lau@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Martin KaFai Lau authored
The previous patches have consolidated the tests to use bpf_tracing_net.h (i.e. vmlinux.h) instead of bpf_tcp_helpers.h. This patch can finally retire the bpf_tcp_helpers.h from the repository. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20240509175026.3423614-11-martin.lau@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Martin KaFai Lau authored
The patch removes the remaining bpf_tcp_helpers.h usages in the non tcp-cc networking tests. It either replaces it with bpf_tracing_net.h or just removed it because the test is not actually using any kernel sockets. For the later, the missing macro (mainly SOL_TCP) is defined locally. An exception is the test_sock_fields which is testing the "struct bpf_sock" type instead of the kernel sock type. Whenever "vmlinux.h" is used instead, it hits a verifier error on doing arithmetic on the sock_common pointer: ; return !a6[0] && !a6[1] && !a6[2] && a6[3] == bpf_htonl(1); @ test_sock_fields.c:54 21: (61) r2 = *(u32 *)(r1 +28) ; R1_w=sock_common() R2_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) 22: (56) if w2 != 0x0 goto pc-6 ; R2_w=0 23: (b7) r3 = 28 ; R3_w=28 24: (bf) r2 = r1 ; R1_w=sock_common() R2_w=sock_common() 25: (0f) r2 += r3 R2 pointer arithmetic on sock_common prohibited Hence, instead of including bpf_tracing_net.h, the test_sock_fields test defines a tcp_sock with one lsndtime field in it. Another highlight is, in sockopt_qos_to_cc.c, the tcp_cc_eq() is replaced by bpf_strncmp(). tcp_cc_eq() was a workaround in bpf_tcp_helpers.h before bpf_strncmp had been added. The SOL_IPV6 addition to bpf_tracing_net.h is needed by the test_tcpbpf_kern test. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20240509175026.3423614-10-martin.lau@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Martin KaFai Lau authored
This patch removed the final few bpf_tcp_helpers.h usages in some misc bpf tcp-cc tests and replace it with bpf_tracing_net.h (i.e. vmlinux.h) Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20240509175026.3423614-9-martin.lau@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Martin KaFai Lau authored
This patch uses bpf_tracing_net.h (i.e. vmlinux.h) in bpf_dctcp. This will allow to retire the bpf_tcp_helpers.h and consolidate tcp-cc tests to vmlinux.h. It will have a dup on min/max macros with the bpf_cubic. It could be further refactored in the future. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20240509175026.3423614-8-martin.lau@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Martin KaFai Lau authored
This patch uses bpf_tracing_net.h (i.e. vmlinux.h) in bpf_cubic. This will allow to retire the bpf_tcp_helpers.h and consolidate tcp-cc tests to vmlinux.h. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20240509175026.3423614-7-martin.lau@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Martin KaFai Lau authored
The "struct bictcp" and "struct dctcp" are private to the bpf prog and they are stored in the private buffer in inet_csk(sk)->icsk_ca_priv. Hence, there is no bpf CO-RE required. The same struct name exists in the vmlinux.h. To reuse vmlinux.h, they need to be renamed such that the bpf prog logic will be immuned from the kernel tcp-cc changes. This patch adds a "bpf_" prefix to them. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20240509175026.3423614-6-martin.lau@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Martin KaFai Lau authored
It is needed to remove the BPF_STRUCT_OPS usages from the tcp-cc tests because it is defined in bpf_tcp_helpers.h which is going to be retired. While at it, this patch consolidates all tcp-cc struct_ops programs to use the SEC("struct_ops") + BPF_PROG(). It also removes the unnecessary __always_inline usages from the tcp-cc tests. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20240509175026.3423614-5-martin.lau@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Martin KaFai Lau authored
This patch removes the individual tcp_sk implementations from the tcp-cc tests. The tcp_sk() implementation from the bpf_tracing_net.h is reused instead. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20240509175026.3423614-4-martin.lau@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Martin KaFai Lau authored
This patch adds a few tcp related helper functions to bpf_tracing_net.h. They will be useful for both tcp-cc and network tracing related bpf progs. They have already been in the bpf_tcp_helpers.h. This change is needed to retire the bpf_tcp_helpers.h and consolidate all tests to vmlinux.h (i.e. bpf_tracing_net.h). Some of the helpers (tcp_sk and inet_csk) are also defined in bpf_cc_cubic.c and they are removed. While at it, remove the vmlinux.h from bpf_cc_cubic.c. bpf_tracing_net.h (which has vmlinux.h after this patch) is enough and will be consistent with the other tcp-cc tests in the later patches. The other TCP_* macro additions will be needed for the bpf_dctcp changes in the later patch. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20240509175026.3423614-3-martin.lau@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Martin KaFai Lau authored
This patch removes the bpf_tracing_net.h usage from the networking tests, fib_lookup and test_lwt_redirect. Instead of using the (copied) macro TC_ACT_SHOT and ETH_HLEN from bpf_tracing_net.h, they can directly use the ones defined in the network header files under linux/. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20240509175026.3423614-2-martin.lau@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
- 08 May, 2024 5 commits
-
-
Jose E. Marchesi authored
[Changes from V1: - Use a default branch in the switch statement to initialize `val'.] GCC warns that `val' may be used uninitialized in the BPF_CRE_READ_BITFIELD macro, defined in bpf_core_read.h as: [...] unsigned long long val; \ [...] \ switch (__CORE_RELO(s, field, BYTE_SIZE)) { \ case 1: val = *(const unsigned char *)p; break; \ case 2: val = *(const unsigned short *)p; break; \ case 4: val = *(const unsigned int *)p; break; \ case 8: val = *(const unsigned long long *)p; break; \ } \ [...] val; \ } \ This patch adds a default entry in the switch statement that sets `val' to zero in order to avoid the warning, and random values to be used in case __builtin_preserve_field_info returns unexpected values for BPF_FIELD_BYTE_SIZE. Tested in bpf-next master. No regressions. Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240508101313.16662-1-jose.marchesi@oracle.com
-
Jose E. Marchesi authored
This little patch is a follow-up to: https://lore.kernel.org/bpf/20240507095011.15867-1-jose.marchesi@oracle.com/T/#u The temporary workaround of passing -DBPF_NO_PRESERVE_ACCESS_INDEX when building with GCC triggers a redefinition preprocessor error when building progs/skb_pkt_end.c. This patch adds a guard to avoid redefinition. Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com> Cc: david.faust@oracle.com Cc: cupertino.miranda@oracle.com Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Yonghong Song <yonghong.song@linux.dev> Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20240508110332.17332-1-jose.marchesi@oracle.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Jose E. Marchesi authored
[Changes from V2: - no-strict-aliasing is only applied when building with GCC. - cpumask_failure.c is excluded, as it doesn't use __imm_insn.] The __imm_insn macro is defined in bpf_misc.h as: #define __imm_insn(name, expr) [name]"i"(*(long *)&(expr)) This may lead to type-punning and strict aliasing rules violations in it's typical usage where the address of a struct bpf_insn is passed as expr, like in: __imm_insn(st_mem, BPF_ST_MEM(BPF_W, BPF_REG_1, offsetof(struct __sk_buff, mark), 42)) Where: #define BPF_ST_MEM(SIZE, DST, OFF, IMM) \ ((struct bpf_insn) { \ .code = BPF_ST | BPF_SIZE(SIZE) | BPF_MEM, \ .dst_reg = DST, \ .src_reg = 0, \ .off = OFF, \ .imm = IMM }) In all the actual instances of this in the BPF selftests the value is fed to a volatile asm statement as soon as it gets read from memory, and thus it is unlikely anti-aliasing rules breakage may lead to misguided optimizations. However, GCC detects the potential problem (indirectly) by issuing a warning stating that a temporary <Uxxxxxx> is used uninitialized, where the temporary corresponds to the memory read by *(long *). This patch adds -fno-strict-aliasing to the compilation flags of the particular selftests that do type punning via __imm_insn, only for GCC. Tested in master bpf-next. No regressions. Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com> Cc: david.faust@oracle.com Cc: cupertino.miranda@oracle.com Cc: Yonghong Song <yonghong.song@linux.dev> Cc: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20240508103551.14955-1-jose.marchesi@oracle.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Jose E. Marchesi authored
[Changes from V1: - The warning to disable is -Wmaybe-uninitialized, not -Wuninitialized. - This warning is only supported in GCC.] The BPF selftest verifier_global_subprogs.c contains code that purposedly performs out of bounds access to memory, to check whether the kernel verifier is able to catch them. For example: __noinline int global_unsupp(const int *mem) { if (!mem) return 0; return mem[100]; /* BOOM */ } With -O1 and higher and no inlining, GCC notices this fact and emits a "maybe uninitialized" warning. This is by design. Note that the emission of these warnings is highly dependent on the precise optimizations that are performed. This patch adds a compiler pragma to verifier_global_subprogs.c to ignore these warnings. Tested in bpf-next master. No regressions. Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com> Cc: david.faust@oracle.com Cc: cupertino.miranda@oracle.com Cc: Yonghong Song <yonghong.song@linux.dev> Cc: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20240507184756.1772-1-jose.marchesi@oracle.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Puranjay Mohan authored
When LSE atomics are available, BPF atomic instructions are implemented as single ARM64 atomic instructions, therefore it is easy to enable these in bpf_arena using the currently available exception handling setup. LL_SC atomics use loops and therefore would need more work to enable in bpf_arena. Enable LSE atomics based instructions in bpf_arena and use the bpf_jit_supports_insn() callback to reject atomics in bpf_arena if LSE atomics are not available. All atomics and arena_atomics selftests are passing: [root@ip-172-31-2-216 bpf]# ./test_progs -a atomics,arena_atomics #3/1 arena_atomics/add:OK #3/2 arena_atomics/sub:OK #3/3 arena_atomics/and:OK #3/4 arena_atomics/or:OK #3/5 arena_atomics/xor:OK #3/6 arena_atomics/cmpxchg:OK #3/7 arena_atomics/xchg:OK #3 arena_atomics:OK #10/1 atomics/add:OK #10/2 atomics/sub:OK #10/3 atomics/and:OK #10/4 atomics/or:OK #10/5 atomics/xor:OK #10/6 atomics/cmpxchg:OK #10/7 atomics/xchg:OK #10 atomics:OK Summary: 2/14 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Link: https://lore.kernel.org/r/20240426161116.441-1-puranjay@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
- 07 May, 2024 7 commits
-
-
Martin KaFai Lau authored
Andrii Nakryiko says: ==================== Fix yet another case of mishandling SEC("struct_ops") programs that were nulled out programmatically through BPF skeleton by the user. While at it, add some improvements around detecting and reporting errors, specifically a common case of declaring SEC("struct_ops") program, but forgetting to actually make use of it by setting it as a callback implementation in SEC(".struct_ops") variable (i.e., map) declaration. A bunch of new selftests are added as well. ==================== Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
Andrii Nakryiko authored
Drive-by clean up, we shouldn't use meaningless "test_" prefix for subtest names. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240507001335.1445325-8-andrii@kernel.orgSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
Andrii Nakryiko authored
Add a simple test that validates that libbpf will reject isolated struct_ops program early with helpful warning message. Also validate that explicit use of such BPF program through BPF skeleton after BPF object is open won't trigger any warnings. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240507001335.1445325-7-andrii@kernel.orgSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
Andrii Nakryiko authored
Extend libbpf's pre-load checks for BPF programs, detecting more typical conditions that are destinated to cause BPF program failure. This is an opportunity to provide more helpful and actionable error message to users, instead of potentially very confusing BPF verifier log and/or error. In this case, we detect struct_ops BPF program that was not referenced anywhere, but still attempted to be loaded (according to libbpf logic). Suggest that the program might need to be used in some struct_ops variable. User will get a message of the following kind: libbpf: prog 'test_1_forgotten': SEC("struct_ops") program isn't referenced anywhere, did you forget to use it? Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240507001335.1445325-6-andrii@kernel.orgSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
Andrii Nakryiko authored
strerror_r(), used from libbpf-specific libbpf_strerror_r() wrapper is documented to return error in two different ways, depending on glibc version. Take that into account when handling strerror_r()'s own errors, which happens when we pass some non-standard (internal) kernel error to it. Before this patch we'd have "ERROR: strerror_r(524)=22", which is quite confusing. Now for the same situation we'll see a bit less visually scary "unknown error (-524)". At least we won't confuse user with irrelevant EINVAL (22). Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240507001335.1445325-5-andrii@kernel.orgSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
Andrii Nakryiko authored
Add a test which tests the case that was just fixed. Kernel has full type information about callback, but user explicitly nulls out the reference to declaratively set BPF program reference. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240507001335.1445325-4-andrii@kernel.orgSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
Andrii Nakryiko authored
There is yet another corner case where user can set STRUCT_OPS program reference in STRUCT_OPS map to NULL, but libbpf will fail to disable autoload for such BPF program. This time it's the case of "new" kernel which has type information about callback field, but user explicitly nulled-out program reference from user-space after opening BPF object. Fix, hopefully, the last remaining unhandled case. Fixes: 0737df6d ("libbpf: better fix for handling nulled-out struct_ops program") Fixes: f973fccd ("libbpf: handle nulled-out program in struct_ops correctly") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240507001335.1445325-3-andrii@kernel.orgSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-