- 04 Apr, 2024 9 commits
-
-
Anton Protopopov authored
The struct bpf_fib_lookup is supposed to be of size 64. A recent commit 59b418c7 ("bpf: Add a check for struct bpf_fib_lookup size") added a static assertion to check this property so that future changes to the structure will not accidentally break this assumption. As it immediately turned out, on some 32-bit arm systems, when AEABI=n, the total size of the structure was equal to 68, see [1]. This happened because the bpf_fib_lookup structure contains a union of two 16-bit fields: union { __u16 tot_len; __u16 mtu_result; }; which was supposed to compile to a 16-bit-aligned 16-bit field. On the aforementioned setups it was instead both aligned and padded to 32-bits. Declare this inner union as __attribute__((packed, aligned(2))) such that it always is of size 2 and is aligned to 16 bits. [1] https://lore.kernel.org/all/CA+G9fYtsoP51f-oP_Sp5MOq-Ffv8La2RztNpwvE6+R1VtFiLrw@mail.gmail.com/#tReported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Fixes: e1850ea9 ("bpf: bpf_fib_lookup return MTU value as output when looked up") Signed-off-by: Anton Protopopov <aspsk@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240403123303.1452184-1-aspsk@isovalent.com
-
Sahil Siddiq authored
When pinning programs/objects under PATH (eg: during "bpftool prog loadall") the bpffs is mounted on the parent dir of PATH in the following situations: - the given dir exists but it is not bpffs. - the given dir doesn't exist and the parent dir is not bpffs. Mounting on the parent dir can also have the unintentional side- effect of hiding other files located under the parent dir. If the given dir exists but is not bpffs, then the bpffs should be mounted on the given dir and not its parent dir. Similarly, if the given dir doesn't exist and its parent dir is not bpffs, then the given dir should be created and the bpffs should be mounted on this new dir. Fixes: 2a36c26f ("bpftool: Support bpffs mountpoint as pin path for prog loadall") Signed-off-by: Sahil Siddiq <icegambit91@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/2da44d24-74ae-a564-1764-afccf395eeec@isovalent.com/T/#t Link: https://lore.kernel.org/bpf/20240404192219.52373-1-icegambit91@gmail.com Closes: https://github.com/libbpf/bpftool/issues/100 Changes since v1: - Split "mount_bpffs_for_pin" into two functions. This is done to improve maintainability and readability. Changes since v2: - mount_bpffs_for_pin: rename to "create_and_mount_bpffs_dir". - mount_bpffs_given_file: rename to "mount_bpffs_given_file". - create_and_mount_bpffs_dir: - introduce "dir_exists" boolean. - remove new dir if "mnt_fs" fails. - improve error handling and error messages. Changes since v3: - Rectify function name. - Improve error messages and formatting. - mount_bpffs_for_file: - Check if dir exists before block_mount check. Changes since v4: - Use strdup instead of strcpy. - create_and_mount_bpffs_dir: - Use S_IRWXU instead of 0700. - Improve error handling and formatting.
-
Alexei Starovoitov authored
Andrii Nakryiko says: ==================== Inline bpf_get_branch_snapshot() BPF helper Implement inlining of bpf_get_branch_snapshot() BPF helper using generic BPF assembly approach. This allows to reduce LBR record usage right before LBR records are captured from inside BPF program. See v1 cover letter ([0]) for some visual examples. I dropped them from v2 because there are multiple independent changes landing and being reviewed, all of which remove different parts of LBR record waste, so presenting final state of LBR "waste" gets more complicated until all of the pieces land. [0] https://lore.kernel.org/bpf/20240321180501.734779-1-andrii@kernel.org/ v2->v3: - fix BPF_MUL instruction definition; v1->v2: - inlining of bpf_get_smp_processor_id() split out into a separate patch set implementing internal per-CPU BPF instruction; - add efficient divide-by-24 through multiplication logic, and leave comments to explain the idea behind it; this way inlined version of bpf_get_branch_snapshot() has no compromises compared to non-inlined version of the helper (Alexei). ==================== Link: https://lore.kernel.org/r/20240404002640.1774210-1-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Andrii Nakryiko authored
Inline bpf_get_branch_snapshot() helper using architecture-agnostic inline BPF code which calls directly into underlying callback of perf_snapshot_branch_stack static call. This callback is set early during kernel initialization and is never updated or reset, so it's ok to fetch actual implementation using static_call_query() and call directly into it. This change eliminates a full function call and saves one LBR entry in PERF_SAMPLE_BRANCH_ANY LBR mode. Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20240404002640.1774210-3-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Andrii Nakryiko authored
perf_snapshot_branch_stack is set up in an architecture-agnostic way, so there is no reason for BPF subsystem to keep track of which architectures do support LBR or not. E.g., it looks like ARM64 might soon get support for BRBE ([0]), which (with proper integration) should be possible to utilize using this BPF helper. perf_snapshot_branch_stack static call will point to __static_call_return0() by default, which just returns zero, which will lead to -ENOENT, as expected. So no need to guard anything here. [0] https://lore.kernel.org/linux-arm-kernel/20240125094119.2542332-1-anshuman.khandual@arm.com/Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20240404002640.1774210-2-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Puranjay Mohan authored
LLVM generates bpf_addr_space_cast instruction while translating pointers between native (zero) address space and __attribute__((address_space(N))). The addr_space=0 is reserved as bpf_arena address space. rY = addr_space_cast(rX, 0, 1) is processed by the verifier and converted to normal 32-bit move: wX = wY rY = addr_space_cast(rX, 1, 0) has to be converted by JIT. Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Björn Töpel <bjorn@rivosinc.com> Tested-by: Pu Lehui <pulehui@huawei.com> Reviewed-by: Pu Lehui <pulehui@huawei.com> Acked-by: Björn Töpel <bjorn@kernel.org> Link: https://lore.kernel.org/bpf/20240404114203.105970-3-puranjay12@gmail.com
-
Puranjay Mohan authored
Add support for [LDX | STX | ST], PROBE_MEM32, [B | H | W | DW] instructions. They are similar to PROBE_MEM instructions with the following differences: - PROBE_MEM32 supports store. - PROBE_MEM32 relies on the verifier to clear upper 32-bit of the src/dst register - PROBE_MEM32 adds 64-bit kern_vm_start address (which is stored in S7 in the prologue). Due to bpf_arena constructions such S7 + reg + off16 access is guaranteed to be within arena virtual range, so no address check at run-time. - S11 is a free callee-saved register, so it is used to store kern_vm_start - PROBE_MEM32 allows STX and ST. If they fault the store is a nop. When LDX faults the destination register is zeroed. To support these on riscv, we do tmp = S7 + src/dst reg and then use tmp2 as the new src/dst register. This allows us to reuse most of the code for normal [LDX | STX | ST]. Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Björn Töpel <bjorn@rivosinc.com> Tested-by: Pu Lehui <pulehui@huawei.com> Reviewed-by: Pu Lehui <pulehui@huawei.com> Acked-by: Björn Töpel <bjorn@kernel.org> Link: https://lore.kernel.org/bpf/20240404114203.105970-2-puranjay12@gmail.com
-
Alexei Starovoitov authored
Turned out that bpf prog callback addresses, bpf prog addresses used in bpf_trampoline, and in other cases the 64-bit address can be represented as sign extended 32-bit value. According to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82339 "Skylake has 0.64c throughput for mov r64, imm64, vs. 0.25 for mov r32, imm32." So use shorter encoding and faster instruction when possible. Special care is needed in jit_subprogs(), since bpf_pseudo_func() instruction cannot change its size during the last step of JIT. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/CAADnVQKFfpY-QZBrOU2CG8v2du8Lgyb7MNVmOZVK_yTyOdNbBA@mail.gmail.com Link: https://lore.kernel.org/bpf/20240401233800.42737-1-alexei.starovoitov@gmail.com
-
Andrii Nakryiko authored
On non-SMP systems, there is no "per-CPU" data, it's just global data. So in such case just don't do this_cpu_off-based per-CPU address adjustment. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202404040951.d4CUx5S6-lkp@intel.com/ Fixes: 7bdbf744 ("bpf: add special internal-only MOV instruction to resolve per-CPU addrs") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240404034726.2766740-1-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
- 03 Apr, 2024 16 commits
-
-
Alexei Starovoitov authored
Andrii Nakryiko says: ==================== Add internal-only BPF per-CPU instruction Add a new BPF instruction for resolving per-CPU memory addresses. New instruction is a special form of BPF_ALU64 | BPF_MOV | BPF_X, with insns->off set to BPF_ADDR_PERCPU (== -1). It resolves provided per-CPU offset to an absolute address where per-CPU data resides for "this" CPU. This patch set implements support for it in x86-64 BPF JIT only. Using the new instruction, we also implement inlining for three cases: - bpf_get_smp_processor_id(), which allows to avoid unnecessary trivial function call, saving a bit of performance and also not polluting LBR records with unnecessary function call/return records; - PERCPU_ARRAY's bpf_map_lookup_elem() is completely inlined, bringing its performance to implementing per-CPU data structures using global variables in BPF (which is an awesome improvement, see benchmarks below); - PERCPU_HASH's bpf_map_lookup_elem() is partially inlined, just like the same for non-PERCPU HASH map; this still saves a bit of overhead. To validate performance benefits, I hacked together a tiny benchmark doing only bpf_map_lookup_elem() and incrementing the value by 1 for PERCPU_ARRAY (arr-inc benchmark below) and PERCPU_HASH (hash-inc benchmark below) maps. To establish a baseline, I also implemented logic similar to PERCPU_ARRAY based on global variable array using bpf_get_smp_processor_id() to index array for current CPU (glob-arr-inc benchmark below). BEFORE ====== glob-arr-inc : 163.685 ± 0.092M/s arr-inc : 138.096 ± 0.160M/s hash-inc : 66.855 ± 0.123M/s AFTER ===== glob-arr-inc : 173.921 ± 0.039M/s (+6%) arr-inc : 170.729 ± 0.210M/s (+23.7%) hash-inc : 68.673 ± 0.070M/s (+2.7%) As can be seen, PERCPU_HASH gets a modest +2.7% improvement, while global array-based gets a nice +6% due to inlining of bpf_get_smp_processor_id(). But what's really important is that arr-inc benchmark basically catches up with glob-arr-inc, resulting in +23.7% improvement. This means that in practice it won't be necessary to avoid PERCPU_ARRAY anymore if performance is critical (e.g., high-frequent stats collection, which is often a practical use for PERCPU_ARRAY today). v1->v2: - use BPF_ALU64 | BPF_MOV instruction instead of LDX (Alexei); - dropped the direct per-CPU memory read instruction, it can always be added back, if necessary; - guarded bpf_get_smp_processor_id() behind x86-64 check (Alexei); - switched all per-cpu addr casts to (unsigned long) to avoid sparse warnings. ==================== Link: https://lore.kernel.org/r/20240402021307.1012571-1-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Andrii Nakryiko authored
Using new per-CPU BPF instruction, partially inline bpf_map_lookup_elem() helper for per-CPU hashmap BPF map. Just like for normal HASH map, we still generate a call into __htab_map_lookup_elem(), but after that we resolve per-CPU element address using a new instruction, saving on extra functions calls. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/r/20240402021307.1012571-5-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Andrii Nakryiko authored
Using new per-CPU BPF instruction implement inlining for per-CPU ARRAY map lookup helper, if BPF JIT support is present. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/r/20240402021307.1012571-4-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Andrii Nakryiko authored
If BPF JIT supports per-CPU MOV instruction, inline bpf_get_smp_processor_id() to eliminate unnecessary function calls. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/r/20240402021307.1012571-3-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Andrii Nakryiko authored
Add a new BPF instruction for resolving absolute addresses of per-CPU data from their per-CPU offsets. This instruction is internal-only and users are not allowed to use them directly. They will only be used for internal inlining optimizations for now between BPF verifier and BPF JITs. We use a special BPF_MOV | BPF_ALU64 | BPF_X form with insn->off field set to BPF_ADDR_PERCPU = -1. I used negative offset value to distinguish them from positive ones used by user-exposed instructions. Such instruction performs a resolution of a per-CPU offset stored in a register to a valid kernel address which can be dereferenced. It is useful in any use case where absolute address of a per-CPU data has to be resolved (e.g., in inlining bpf_map_lookup_elem()). BPF disassembler is also taught to recognize them to support dumping final BPF assembly code (non-JIT'ed version). Add arch-specific way for BPF JITs to mark support for this instructions. This patch also adds support for these instructions in x86-64 BPF JIT. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/r/20240402021307.1012571-2-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Justin Stitt authored
strncpy() is deprecated for use on NUL-terminated destination strings [1] and as such we should prefer more robust and less ambiguous string interfaces. bpf sym names get looked up and compared/cleaned with various string apis. This suggests they need to be NUL-terminated (strncpy() suggests this but does not guarantee it). | static int compare_symbol_name(const char *name, char *namebuf) | { | cleanup_symbol_name(namebuf); | return strcmp(name, namebuf); | } | static void cleanup_symbol_name(char *s) | { | ... | res = strstr(s, ".llvm."); | ... | } Use strscpy() as this method guarantees NUL-termination on the destination buffer. This patch also replaces two uses of strncpy() used in log.c. These are simple replacements as postfix has been zero-initialized on the stack and has source arguments with a size less than the destination's size. Note that this patch uses the new 2-argument version of strscpy introduced in commit e6584c39 ("string: Allow 2-argument strscpy()"). Signed-off-by: Justin Stitt <justinstitt@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1] Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2] Link: https://github.com/KSPP/linux/issues/90 Link: https://lore.kernel.org/bpf/20240402-strncpy-kernel-bpf-core-c-v1-1-7cb07a426e78@google.com
-
Tushar Vyavahare authored
Introduce a test case to evaluate AF_XDP's robustness by pushing hardware and software ring sizes to their limits. This test ensures AF_XDP's reliability amidst potential producer/consumer throttling due to maximum ring utilization. The testing strategy includes: 1. Configuring rings to their maximum allowable sizes. 2. Executing a series of tests across diverse batch sizes to assess system's behavior under different configurations. Signed-off-by: Tushar Vyavahare <tushar.vyavahare@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20240402114529.545475-8-tushar.vyavahare@intel.com
-
Tushar Vyavahare authored
Add a new test case that stresses AF_XDP and the driver by configuring small hardware and software ring sizes. This verifies that AF_XDP continues to function properly even with insufficient ring space that could lead to frequent producer/consumer throttling. The test procedure involves: 1. Set the minimum possible ring configuration(tx 64 and rx 128). 2. Run tests with various batch sizes(1 and 63) to validate the system's behavior under different configurations. Update Makefile to include network_helpers.o in the build process for xskxceiver. Signed-off-by: Tushar Vyavahare <tushar.vyavahare@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20240402114529.545475-7-tushar.vyavahare@intel.com
-
Tushar Vyavahare authored
selftests/xsk: Introduce set_ring_size function with a retry mechanism for handling AF_XDP socket closures Introduce a new function, set_ring_size(), to manage asynchronous AF_XDP socket closure. Retry set_hw_ring_size up to SOCK_RECONF_CTR times if it fails due to an active AF_XDP socket. Return an error immediately for non-EBUSY errors. This enhances robustness against asynchronous AF_XDP socket closures during ring size changes. Signed-off-by: Tushar Vyavahare <tushar.vyavahare@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20240402114529.545475-6-tushar.vyavahare@intel.com
-
Tushar Vyavahare authored
Introduce a new function called set_hw_ring_size that allows for the dynamic configuration of the ring size within the interface. Signed-off-by: Tushar Vyavahare <tushar.vyavahare@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20240402114529.545475-5-tushar.vyavahare@intel.com
-
Tushar Vyavahare authored
Introduce a new function called get_hw_size that retrieves both the current and maximum size of the interface and stores this information in the 'ethtool_ringparam' structure. Remove ethtool_channels struct from xdp_hw_metadata.c due to redefinition error. Remove unused linux/if.h include from flow_dissector BPF test to address CI pipeline failure. Signed-off-by: Tushar Vyavahare <tushar.vyavahare@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20240402114529.545475-4-tushar.vyavahare@intel.com
-
Tushar Vyavahare authored
Convert the constant BATCH_SIZE into a variable named batch_size to allow dynamic modification at runtime. This is required for the forthcoming changes to support testing different hardware ring sizes. While running these tests, a bug was identified when the batch size is roughly the same as the NIC ring size. This has now been addressed by Maciej's fix in commit 913eda2b ("i40e: xsk: remove count_mask"). Signed-off-by: Tushar Vyavahare <tushar.vyavahare@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20240402114529.545475-3-tushar.vyavahare@intel.com
-
Tushar Vyavahare authored
This commit duplicates the ethtool.h file from the include/uapi/linux directory in the kernel source to the tools/include/uapi/linux directory. This action ensures that the ethtool.h file used in the tools directory is in sync with the kernel's version, maintaining consistency across the codebase. There are some checkpatch warnings in this file that could be cleaned up, but I preferred to move it over as-is for now to avoid disrupting the code. Signed-off-by: Tushar Vyavahare <tushar.vyavahare@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20240402114529.545475-2-tushar.vyavahare@intel.com
-
Alexei Starovoitov authored
Puranjay Mohan says: ==================== bpf,arm64: Add support for BPF Arena Changes in V4 V3: https://lore.kernel.org/bpf/20240323103057.26499-1-puranjay12@gmail.com/ - Use more descriptive variable names. - Use insn_is_cast_user() helper. Changes in V3 V2: https://lore.kernel.org/bpf/20240321153102.103832-1-puranjay12@gmail.com/ - Optimize bpf_addr_space_cast as suggested by Xu Kuohai Changes in V2 V1: https://lore.kernel.org/bpf/20240314150003.123020-1-puranjay12@gmail.com/ - Fix build warnings by using 5 in place of 32 as DONT_CLEAR marker. R5 is not mapped to any BPF register so it can safely be used here. This series adds the support for PROBE_MEM32 and bpf_addr_space_cast instructions to the ARM64 BPF JIT. These two instructions allow the enablement of BPF Arena. All arena related selftests are passing. [root@ip-172-31-6-62 bpf]# ./test_progs -a "*arena*" #3/1 arena_htab/arena_htab_llvm:OK #3/2 arena_htab/arena_htab_asm:OK #3 arena_htab:OK #4/1 arena_list/arena_list_1:OK #4/2 arena_list/arena_list_1000:OK #4 arena_list:OK #434/1 verifier_arena/basic_alloc1:OK #434/2 verifier_arena/basic_alloc2:OK #434/3 verifier_arena/basic_alloc3:OK #434/4 verifier_arena/iter_maps1:OK #434/5 verifier_arena/iter_maps2:OK #434/6 verifier_arena/iter_maps3:OK #434 verifier_arena:OK Summary: 3/10 PASSED, 0 SKIPPED, 0 FAILED This will need the patch [1] that introduced insn_is_cast_user() helper to build. The verifier_arena selftest could fail in the CI because the following commit[2] is missing from bpf-next: [1] https://lore.kernel.org/bpf/20240324183226.29674-1-puranjay12@gmail.com/ [2] https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/commit/?id=fa3550dca8f02ec312727653a94115ef3ab68445 Here is a CI run with all dependencies added: https://github.com/kernel-patches/bpf/pull/6641 ==================== Link: https://lore.kernel.org/r/20240325150716.4387-1-puranjay12@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Puranjay Mohan authored
LLVM generates bpf_addr_space_cast instruction while translating pointers between native (zero) address space and __attribute__((address_space(N))). The addr_space=0 is reserved as bpf_arena address space. rY = addr_space_cast(rX, 0, 1) is processed by the verifier and converted to normal 32-bit move: wX = wY. rY = addr_space_cast(rX, 1, 0) : used to convert a bpf arena pointer to a pointer in the userspace vma. This has to be converted by the JIT. Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Link: https://lore.kernel.org/r/20240325150716.4387-3-puranjay12@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Puranjay Mohan authored
Add support for [LDX | STX | ST], PROBE_MEM32, [B | H | W | DW] instructions. They are similar to PROBE_MEM instructions with the following differences: - PROBE_MEM32 supports store. - PROBE_MEM32 relies on the verifier to clear upper 32-bit of the src/dst register - PROBE_MEM32 adds 64-bit kern_vm_start address (which is stored in R28 in the prologue). Due to bpf_arena constructions such R28 + reg + off16 access is guaranteed to be within arena virtual range, so no address check at run-time. - PROBE_MEM32 allows STX and ST. If they fault the store is a nop. When LDX faults the destination register is zeroed. To support these on arm64, we do tmp2 = R28 + src/dst reg and then use tmp2 as the new src/dst register. This allows us to reuse most of the code for normal [LDX | STX | ST]. Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Link: https://lore.kernel.org/r/20240325150716.4387-2-puranjay12@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
- 02 Apr, 2024 11 commits
-
-
Geliang Tang authored
In order to prevent mptcpify prog from affecting the running results of other BPF tests, a pid limit was added to restrict it from only modifying its own program. Suggested-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/8987e2938e15e8ec390b85b5dcbee704751359dc.1712054986.git.tanggeliang@kylinos.cn
-
Tobias Böhm authored
Commit 20d59ee5 ("libbpf: add bpf_core_cast() macro") added a bpf_helpers include in bpf_core_read.h as a system include. Usually, the includes are local, though, like in bpf_tracing.h. This commit adjusts the include to be local as well. Signed-off-by: Tobias Böhm <tobias@aibor.de> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/q5d5bgc6vty2fmaazd5e73efd6f5bhiru2le6fxn43vkw45bls@fhlw2s5ootdb
-
Jose Fernandez authored
This patch improves the run-time calculation for program stats by capturing the duration as soon as possible after the program returns. Previously, the duration included u64_stats_t operations. While the instrumentation overhead is part of the total time spent when stats are enabled, distinguishing between the program's native execution time and the time spent due to instrumentation is crucial for accurate performance analysis. By making this change, the patch facilitates more precise optimization of BPF programs, enabling users to understand their performance in environments without stats enabled. I used a virtualized environment to measure the run-time over one minute for a basic raw_tracepoint/sys_enter program, which just increments a local counter. Although the virtualization introduced some performance degradation that could affect the results, I observed approximately a 16% decrease in average run-time reported by stats with this change (310 -> 260 nsec). Signed-off-by: Jose Fernandez <josef@netflix.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240402034010.25060-1-josef@netflix.com
-
Pu Lehui authored
When testing send_signal and stacktrace_build_id_nmi using the riscv sbi pmu driver without the sscofpmf extension or the riscv legacy pmu driver, then failures as follows are encountered: test_send_signal_common:FAIL:perf_event_open unexpected perf_event_open: actual -1 < expected 0 #272/3 send_signal/send_signal_nmi:FAIL test_stacktrace_build_id_nmi:FAIL:perf_event_open err -1 errno 95 #304 stacktrace_build_id_nmi:FAIL The reason is that the above pmu driver or hardware does not support sampling events, that is, PERF_PMU_CAP_NO_INTERRUPT is set to pmu capabilities, and then perf_event_open returns EOPNOTSUPP. Since PERF_PMU_CAP_NO_INTERRUPT is not only set in the riscv-related pmu driver, it is better to skip testing when this capability is set. Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240402073029.1299085-1-pulehui@huaweicloud.com
-
Andrii Nakryiko authored
When generated BPF skeleton header is included in C++ code base, some compiler setups will emit warning about using language extensions due to typeof() usage, resulting in something like: error: extension used [-Werror,-Wlanguage-extension-token] obj->struct_ops.empty_tcp_ca = (typeof(obj->struct_ops.empty_tcp_ca)) ^ It looks like __typeof__() is a preferred way to do typeof() with better C++ compatibility behavior, so switch to that. With __typeof__() we get no such warning. Fixes: c2a0257c ("bpftool: Cast pointers for shadow types explicitly.") Fixes: 00389c58 ("bpftool: Add support for subskeletons") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Kui-Feng Lee <thinker.li@gmail.com> Acked-by: Quentin Monnet <qmo@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20240401170713.2081368-1-andrii@kernel.org
-
Yonghong Song authored
Currently, cond_break macro uses bytes to encode the may_goto insn. Patch [1] in llvm implemented may_goto insn in BPF backend. Replace byte-level encoding with llvm inline asm for better usability. Using llvm may_goto insn is controlled by macro __BPF_FEATURE_MAY_GOTO. [1] https://github.com/llvm/llvm-project/commit/0e0bfacff71859d1f9212205f8f873d47029d3fbSigned-off-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20240402025446.3215182-1-yonghong.song@linux.dev
-
Anton Protopopov authored
When more than 64 maps are used by a program and its subprograms the verifier returns -E2BIG. Add a verbose message which highlights the source of the error and also print the actual limit. Signed-off-by: Anton Protopopov <aspsk@isovalent.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20240402073347.195920-1-aspsk@isovalent.com
-
David Lechner authored
In a few places in the bpf uapi headers, EOPNOTSUPP is missing a "P" in the doc comments. This adds the missing "P". Signed-off-by: David Lechner <dlechner@baylibre.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240329152900.398260-2-dlechner@baylibre.com
-
Rameez Rehman authored
Improve the formatting of the attach flags for cgroup programs in the relevant man page, and fix typos ("can be on of", "an userspace inet socket") when introducing that list. Also fix a couple of other trivial issues in docs. [ Quentin: Fixed trival issues in bpftool-gen.rst and bpftool-iter.rst ] Signed-off-by: Rameez Rehman <rameezrehman408@hotmail.com> Signed-off-by: Quentin Monnet <qmo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240331200346.29118-4-qmo@kernel.org
-
Rameez Rehman authored
As it turns out, the terms in definition lists in the rST file are already rendered with bold-ish formatting when generating the man pages; all double-star sequences we have in the commands for the command description are unnecessary, and can be removed to make the documentation easier to read. The rST files were automatically processed with: sed -i '/DESCRIPTION/,/OPTIONS/ { /^\*/ s/\*\*//g }' b*.rst Signed-off-by: Rameez Rehman <rameezrehman408@hotmail.com> Signed-off-by: Quentin Monnet <qmo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240331200346.29118-3-qmo@kernel.org
-
Rameez Rehman authored
The rST manual pages for bpftool would use a mix of tabs and spaces for indentation. While this is the norm in C code, this is rather unusual for rST documents, and over time we've seen many contributors use a wrong level of indentation for documentation update. Let's fix bpftool's indentation in docs once and for all: - Let's use spaces, that are more common in rST files. - Remove one level of indentation for the synopsis, the command description, and the "see also" section. As a result, all sections start with the same indentation level in the generated man page. - Rewrap the paragraphs after the changes. There is no content change in this patch, only indentation and rewrapping changes. The wrapping in the generated source files for the manual pages is changed, but the pages displayed with "man" remain the same, apart from the adjusted indentation level on relevant sections. [ Quentin: rebased on bpf-next, removed indent level for command description and options, updated synopsis, command summary, and "see also" sections. ] Signed-off-by: Rameez Rehman <rameezrehman408@hotmail.com> Signed-off-by: Quentin Monnet <qmo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240331200346.29118-2-qmo@kernel.org
-
- 30 Mar, 2024 1 commit
-
-
Andrii Nakryiko authored
When BPF selftests are built in RELEASE=1 mode with -O2 optimization level, uprobe_multi binary, called from multi-uprobe tests is optimized to the point that all the thousands of target uprobe_multi_func_XXX functions are eliminated, breaking tests. So ensure they are preserved by using weak attribute. But, actually, compiling uprobe_multi binary with -O2 takes a really long time, and is quite useless (it's not a benchmark). So in addition to ensuring that uprobe_multi_func_XXX functions are preserved, opt-out of -O2 explicitly in Makefile and stick to -O0. This saves a lot of compilation time. With -O2, just recompiling uprobe_multi: $ touch uprobe_multi.c $ time make RELEASE=1 -j90 make RELEASE=1 -j90 291.66s user 2.54s system 99% cpu 4:55.52 total With -O0: $ touch uprobe_multi.c $ time make RELEASE=1 -j90 make RELEASE=1 -j90 22.40s user 1.91s system 99% cpu 24.355 total 5 minutes vs (still slow, but...) 24 seconds. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240329190410.4191353-1-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
- 29 Mar, 2024 3 commits
-
-
Alexei Starovoitov authored
syzbot reported the following lock sequence: cpu 2: grabs timer_base lock spins on bpf_lpm lock cpu 1: grab rcu krcp lock spins on timer_base lock cpu 0: grab bpf_lpm lock spins on rcu krcp lock bpf_lpm lock can be the same. timer_base lock can also be the same due to timer migration. but rcu krcp lock is always per-cpu, so it cannot be the same lock. Hence it's a false positive. To avoid lockdep complaining move kfree_rcu() after spin_unlock. Reported-by: syzbot+1fa663a2100308ab6eab@syzkaller.appspotmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240329171439.37813-1-alexei.starovoitov@gmail.com
-
Martin KaFai Lau authored
Geliang Tang says: ==================== Simplify bpf_tcp_ca test by using connect_fd_to_fd and start_server helpers. v4: - Matt reminded me that I shouldn't send a square-to patch to BPF (thanks), so I update them into two patches in v4. v3: - split v2 as two patches as Daniel suggested. - The patch "selftests/bpf: Use start_server in bpf_tcp_ca" is merged by Daniel (thanks), but I forgot to drop 'settimeo(lfd, 0)' in it, so I send a squash-to patch to fix this. ==================== Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
Geliang Tang authored
settimeo is invoked in start_server() and in connect_fd_to_fd() already, no need to invoke settimeo(lfd, 0) and settimeo(fd, 0) in do_test() anymore. This patch drops them. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Link: https://lore.kernel.org/r/dbc3613bee3b1c78f95ac9ff468bf47c92f106ea.1711447102.git.tanggeliang@kylinos.cnSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-