- 25 Jan, 2024 4 commits
-
-
Andrii Nakryiko authored
Add new kind of BPF kernel object, BPF token. BPF token is meant to allow delegating privileged BPF functionality, like loading a BPF program or creating a BPF map, from privileged process to a *trusted* unprivileged process, all while having a good amount of control over which privileged operations could be performed using provided BPF token. This is achieved through mounting BPF FS instance with extra delegation mount options, which determine what operations are delegatable, and also constraining it to the owning user namespace (as mentioned in the previous patch). BPF token itself is just a derivative from BPF FS and can be created through a new bpf() syscall command, BPF_TOKEN_CREATE, which accepts BPF FS FD, which can be attained through open() API by opening BPF FS mount point. Currently, BPF token "inherits" delegated command, map types, prog type, and attach type bit sets from BPF FS as is. In the future, having an BPF token as a separate object with its own FD, we can allow to further restrict BPF token's allowable set of things either at the creation time or after the fact, allowing the process to guard itself further from unintentionally trying to load undesired kind of BPF programs. But for now we keep things simple and just copy bit sets as is. When BPF token is created from BPF FS mount, we take reference to the BPF super block's owning user namespace, and then use that namespace for checking all the {CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, CAP_SYS_ADMIN} capabilities that are normally only checked against init userns (using capable()), but now we check them using ns_capable() instead (if BPF token is provided). See bpf_token_capable() for details. Such setup means that BPF token in itself is not sufficient to grant BPF functionality. User namespaced process has to *also* have necessary combination of capabilities inside that user namespace. So while previously CAP_BPF was useless when granted within user namespace, now it gains a meaning and allows container managers and sys admins to have a flexible control over which processes can and need to use BPF functionality within the user namespace (i.e., container in practice). And BPF FS delegation mount options and derived BPF tokens serve as a per-container "flag" to grant overall ability to use bpf() (plus further restrict on which parts of bpf() syscalls are treated as namespaced). Note also, BPF_TOKEN_CREATE command itself requires ns_capable(CAP_BPF) within the BPF FS owning user namespace, rounding up the ns_capable() story of BPF token. Also creating BPF token in init user namespace is currently not supported, given BPF token doesn't have any effect in init user namespace anyways. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/bpf/20240124022127.2379740-4-andrii@kernel.org
-
Andrii Nakryiko authored
Add few new mount options to BPF FS that allow to specify that a given BPF FS instance allows creation of BPF token (added in the next patch), and what sort of operations are allowed under BPF token. As such, we get 4 new mount options, each is a bit mask - `delegate_cmds` allow to specify which bpf() syscall commands are allowed with BPF token derived from this BPF FS instance; - if BPF_MAP_CREATE command is allowed, `delegate_maps` specifies a set of allowable BPF map types that could be created with BPF token; - if BPF_PROG_LOAD command is allowed, `delegate_progs` specifies a set of allowable BPF program types that could be loaded with BPF token; - if BPF_PROG_LOAD command is allowed, `delegate_attachs` specifies a set of allowable BPF program attach types that could be loaded with BPF token; delegate_progs and delegate_attachs are meant to be used together, as full BPF program type is, in general, determined through both program type and program attach type. Currently, these mount options accept the following forms of values: - a special value "any", that enables all possible values of a given bit set; - numeric value (decimal or hexadecimal, determined by kernel automatically) that specifies a bit mask value directly; - all the values for a given mount option are combined, if specified multiple times. E.g., `mount -t bpf nodev /path/to/mount -o delegate_maps=0x1 -o delegate_maps=0x2` will result in a combined 0x3 mask. Ideally, more convenient (for humans) symbolic form derived from corresponding UAPI enums would be accepted (e.g., `-o delegate_progs=kprobe|tracepoint`) and I intend to implement this, but it requires a bunch of UAPI header churn, so I postponed it until this feature lands upstream or at least there is a definite consensus that this feature is acceptable and is going to make it, just to minimize amount of wasted effort and not increase amount of non-essential code to be reviewed. Attentive reader will notice that BPF FS is now marked as FS_USERNS_MOUNT, which theoretically makes it mountable inside non-init user namespace as long as the process has sufficient *namespaced* capabilities within that user namespace. But in reality we still restrict BPF FS to be mountable only by processes with CAP_SYS_ADMIN *in init userns* (extra check in bpf_fill_super()). FS_USERNS_MOUNT is added to allow creating BPF FS context object (i.e., fsopen("bpf")) from inside unprivileged process inside non-init userns, to capture that userns as the owning userns. It will still be required to pass this context object back to privileged process to instantiate and mount it. This manipulation is important, because capturing non-init userns as the owning userns of BPF FS instance (super block) allows to use that userns to constraint BPF token to that userns later on (see next patch). So creating BPF FS with delegation inside unprivileged userns will restrict derived BPF token objects to only "work" inside that intended userns, making it scoped to a intended "container". Also, setting these delegation options requires capable(CAP_SYS_ADMIN), so unprivileged process cannot set this up without involvement of a privileged process. There is a set of selftests at the end of the patch set that simulates this sequence of steps and validates that everything works as intended. But careful review is requested to make sure there are no missed gaps in the implementation and testing. This somewhat subtle set of aspects is the result of previous discussions ([0]) about various user namespace implications and interactions with BPF token functionality and is necessary to contain BPF token inside intended user namespace. [0] https://lore.kernel.org/bpf/20230704-hochverdient-lehne-eeb9eeef785e@brauner/Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/bpf/20240124022127.2379740-3-andrii@kernel.org
-
Andrii Nakryiko authored
Within BPF syscall handling code CAP_NET_ADMIN checks stand out a bit compared to CAP_BPF and CAP_PERFMON checks. For the latter, CAP_BPF or CAP_PERFMON are checked first, but if they are not set, CAP_SYS_ADMIN takes over and grants whatever part of BPF syscall is required. Similar kind of checks that involve CAP_NET_ADMIN are not so consistent. One out of four uses does follow CAP_BPF/CAP_PERFMON model: during BPF_PROG_LOAD, if the type of BPF program is "network-related" either CAP_NET_ADMIN or CAP_SYS_ADMIN is required to proceed. But in three other cases CAP_NET_ADMIN is required even if CAP_SYS_ADMIN is set: - when creating DEVMAP/XDKMAP/CPU_MAP maps; - when attaching CGROUP_SKB programs; - when handling BPF_PROG_QUERY command. This patch is changing the latter three cases to follow BPF_PROG_LOAD model, that is allowing to proceed under either CAP_NET_ADMIN or CAP_SYS_ADMIN. This also makes it cleaner in subsequent BPF token patches to switch wholesomely to a generic bpf_token_capable(int cap) check, that always falls back to CAP_SYS_ADMIN if requested capability is missing. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/bpf/20240124022127.2379740-2-andrii@kernel.org
-
Martin KaFai Lau authored
The commit 9e926acd ("libbpf: Find correct module BTFs for struct_ops maps and progs.") sets a newly added field (value_type_btf_obj_fd) to -1 in libbpf when the caller of the libbpf's bpf_map_create did not define this field by passing a NULL "opts" or passing in a "opts" that does not cover this new field. OPT_HAS(opts, field) is used to decide if the field is defined or not: ((opts) && opts->sz >= offsetofend(typeof(*(opts)), field)) Once OPTS_HAS decided the field is not defined, that field should be set to 0. For this particular new field (value_type_btf_obj_fd), its corresponding map_flags "BPF_F_VTYPE_BTF_OBJ_FD" is not set. Thus, the kernel does not treat it as an fd field. Fixes: 9e926acd ("libbpf: Find correct module BTFs for struct_ops maps and progs.") Reported-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240124224418.2905133-1-martin.lau@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
- 24 Jan, 2024 30 commits
-
-
Martin KaFai Lau authored
After the previous patch that speeded up the test (by avoiding neigh discovery in IPv6), the BPF CI occasionally hits this error: rcv tstamp unexpected pkt rcv tstamp: actual 0 == expected 0 The test complains about the cmsg returned from the recvmsg() does not have the rcv timestamp. Setting skb->tstamp or not is controlled by a kernel static key "netstamp_needed_key". The static key is enabled whenever this is at least one sk with the SOCK_TIMESTAMP set. The test_redirect_dtime does use setsockopt() to turn on the SOCK_TIMESTAMP for the reading sk. In the kernel net_enable_timestamp() has a delay to enable the "netstamp_needed_key" when CONFIG_JUMP_LABEL is set. This potential delay is the likely reason for packet missing rcv timestamp occasionally. This patch is to create udp sockets with SOCK_TIMESTAMP set. It sends and receives some packets until the received packet has a rcv timestamp. It currently retries at most 5 times with 1s in between. This should be enough to wait for the "netstamp_needed_key". It then holds on to the socket and only closes it at the end of the test. This guarantees that the test has the "netstamp_needed_key" key turned on from the beginning. To simplify the udp sockets setup, they are sending/receiving packets in the same netns (ns_dst is used) and communicate over the "lo" dev. Hence, the patch enables the "lo" dev in the ns_dst. Fixes: c803475f ("bpf: selftests: test skb->tstamp in redirect_neigh") Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240120060518.3604920-2-martin.lau@linux.dev
-
Martin KaFai Lau authored
BPF CI has been reporting the tc_redirect_dtime test failing from time to time: test_inet_dtime:PASS:setns src 0 nsec (network_helpers.c:253: errno: No route to host) Failed to connect to server close_netns:PASS:setns 0 nsec test_inet_dtime:FAIL:connect_to_fd unexpected connect_to_fd: actual -1 < expected 0 test_tcp_clear_dtime:PASS:tcp ip6 clear dtime ingress_fwdns_p100 0 nsec The connect_to_fd failure (EHOSTUNREACH) is from the test_tcp_clear_dtime() test and it is the very first IPv6 traffic after setting up all the links, addresses, and routes. The symptom is this first connect() is always slow. In my setup, it could take ~3s. After some tracing and tcpdump, the slowness is mostly spent in the neighbor solicitation in the "ns_fwd" namespace while the "ns_src" and "ns_dst" are fine. I forced the kernel to drop the neighbor solicitation messages. I can then reproduce EHOSTUNREACH. What actually happen could be: - the neighbor advertisement came back a little slow. - the "ns_fwd" namespace concluded a neighbor discovery failure and triggered the ndisc_error_report() => ip6_link_failure() => icmpv6_send(skb, ICMPV6_DEST_UNREACH, ICMPV6_ADDR_UNREACH, 0) - the client's connect() reports EHOSTUNREACH after receiving the ICMPV6_DEST_UNREACH message. The neigh table of both "ns_src" and "ns_dst" namespace has already been manually populated but not the "ns_fwd" namespace. This patch fixes it by manually populating the neigh table also in the "ns_fwd" namespace. Although the namespace configuration part had been existed before the tc_redirect_dtime test, still Fixes-tagging the patch when the tc_redirect_dtime test was added since it is the only test hitting it so far. Fixes: c803475f ("bpf: selftests: test skb->tstamp in redirect_neigh") Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240120060518.3604920-1-martin.lau@linux.dev
-
Dima Tisnek authored
Past commit ([0]) removed the last vestiges of struct bpf_field_reloc, it's called struct bpf_core_relo now. [0] 28b93c64 ("libbpf: Clean up and improve CO-RE reloc logging") Signed-off-by: Dima Tisnek <dimaqq@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/bpf/20240121060126.15650-1-dimaqq@gmail.com
-
Andrii Nakryiko authored
Tiezhu Yang says: ==================== Skip callback tests if jit is disabled in test_verifier Thanks very much for the feedbacks from Eduard, John, Jiri, Daniel, Hou Tao, Song Liu and Andrii. v7: -- Add an explicit flag F_NEEDS_JIT_ENABLED for checking, thanks Andrii. v6: -- Copy insn_is_pseudo_func() into testing_helpers, thanks Andrii. v5: -- Reuse is_ldimm64_insn() and insn_is_pseudo_func(), thanks Song Liu. v4: -- Move the not-allowed-checking into "if (expected_ret ...)" block, thanks Hou Tao. -- Do some small changes to avoid checkpatch warning about "line length exceeds 100 columns". v3: -- Rebase on the latest bpf-next tree. -- Address the review comments by Hou Tao, remove the second argument "0" of open(), check only once whether jit is disabled, check fd_prog, saved_errno and jit_disabled to skip. ==================== Link: https://lore.kernel.org/r/20240123090351.2207-1-yangtiezhu@loongson.cnSigned-off-by: Andrii Nakryiko <andrii@kernel.org>
-
Tiezhu Yang authored
If CONFIG_BPF_JIT_ALWAYS_ON is not set and bpf_jit_enable is 0, there exist 6 failed tests. [root@linux bpf]# echo 0 > /proc/sys/net/core/bpf_jit_enable [root@linux bpf]# echo 0 > /proc/sys/kernel/unprivileged_bpf_disabled [root@linux bpf]# ./test_verifier | grep FAIL #106/p inline simple bpf_loop call FAIL #107/p don't inline bpf_loop call, flags non-zero FAIL #108/p don't inline bpf_loop call, callback non-constant FAIL #109/p bpf_loop_inline and a dead func FAIL #110/p bpf_loop_inline stack locations for loop vars FAIL #111/p inline bpf_loop call in a big program FAIL Summary: 768 PASSED, 15 SKIPPED, 6 FAILED The test log shows that callbacks are not allowed in non-JITed programs, interpreter doesn't support them yet, thus these tests should be skipped if jit is disabled. Add an explicit flag F_NEEDS_JIT_ENABLED to those tests to mark that they require JIT enabled in bpf_loop_inline.c, check the flag and jit_disabled at the beginning of do_test_single() to handle this case. With this patch: [root@linux bpf]# echo 0 > /proc/sys/net/core/bpf_jit_enable [root@linux bpf]# echo 0 > /proc/sys/kernel/unprivileged_bpf_disabled [root@linux bpf]# ./test_verifier | grep FAIL Summary: 768 PASSED, 21 SKIPPED, 0 FAILED Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240123090351.2207-3-yangtiezhu@loongson.cn
-
Tiezhu Yang authored
Currently, is_jit_enabled() is only used in test_progs, move it into testing_helpers so that it can be used in test_verifier. While at it, remove the second argument "0" of open() as Hou Tao suggested. Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Hou Tao <houtao1@huawei.com> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/bpf/20240123090351.2207-2-yangtiezhu@loongson.cn
-
Martin KaFai Lau authored
Kui-Feng Lee says: ==================== Given the current constraints of the current implementation, struct_ops cannot be registered dynamically. This presents a significant limitation for modules like coming fuse-bpf, which seeks to implement a new struct_ops type. To address this issue, a new API is introduced that allows the registration of new struct_ops types from modules. Previously, struct_ops types were defined in bpf_struct_ops_types.h and collected as a static array. The new API lets callers add new struct_ops types dynamically. The static array has been removed and replaced by the per-btf struct_ops_tab. The struct_ops subsystem relies on BTF to determine the layout of values in a struct_ops map and identify the subsystem that the struct_ops map registers to. However, the kernel BTF does not include the type information of struct_ops types defined by a module. The struct_ops subsystem requires knowledge of the corresponding module for a given struct_ops map and the utilization of BTF information from that module. We empower libbpf to determine the correct module for accessing the BTF information and pass an identity (FD) of the module btf to the kernel. The kernel looks up type information and registered struct_ops types directly from the given btf. If a module exits while one or more struct_ops maps still refer to a struct_ops type defined by the module, it can lead to unforeseen complications. Therefore, it is crucial to ensure that a module remains intact as long as any struct_ops map is still linked to a struct_ops type defined by the module. To achieve this, every struct_ops map holds a reference to the module while being registered. Changes from v16: - Fix unnecessary bpf_struct_ops_link_create() removing/adding. - Rename REGISTER_BPF_STRUCT_OPS() to register_bpf_struct_ops(). - Implement bpf_map_struct_ops_info_fill() for !CONFIG_BPF_JIT. Changes from v15: - Fix the misleading commit message of part 4. - Introduce BPF_F_VTYPE_BTF_OBJ_FD flag to struct bpf_attr to tell if value_type_btf_obj_fd is set or not. - Introduce links_cnt to struct bpf_struct_ops_map to avoid accessing struct bpf_struct_ops_desc in bpf_struct_ops_map_put_progs() after calling module_put() against the owner module of the struct_ops type. (Part 9) Changes from v14: - Rebase. Add cif_stub required by the commit 2cd3e377 ("x86/cfi,bpf: Fix bpf_struct_ops CFI") - Remove creating struct_ops map without bpf_testmod.ko from the test. - Check the name of btf returned by bpf_map_info by getting the name with bpf_btf_get_info_by_fd(). - Change value_type_btf_obj_fd to a signed type to allow the 0 fd. Changes from v13: - Change the test case to use bpf_map_create() to create a struct_ops map while testmod.ko is unloaded. - Move bpf_struct_ops_find*() to btf.c. - Use btf_is_module() to replace btf != btf_vmlinux. Changes from v12: - Rebase to for-next to fix conflictions. Changes from v11: - bpf_struct_ops_maps hold only the refcnt to the module, but not btf. (patch 1) - Fix warning messages. (patch 1, 9 and 10) - Remove unnecessary conditional compiling of CONFIG_BPF_JIT. (patch 4, 9 and 10) - Fix the commit log of the patch 7 to explain how a btf is pass from the user space and how the kernel handle it. - bpf_struct_ops_maps hold the module defining it's type, but not btf. A map will hold the module through its life-span from allocating to being free. (patch 8) - Change selftests and tracing __bpf_struct_ops_map_free() to wait for the release of the bpf_testmod module. - Include btf_obj_id in bpf_map_info. (patch 14) Changes from v10: - Guard btf.c from CONFIG_BPF_JIT=n. This patchset has introduced symbols from bpf_struct_ops.c which is only built when CONFIG_BPF_JIT=y. - Fix the warning of unused errout_free label by moving code that is leaked to patch 8 to patch 7. Changes from v9: - Remove the call_rcu_tasks_trace() changes from kern_sync_rcu(). - Trace btf_put() in the test case to ensure the release of kmod's btf, or the consequent tests may fail for using kmod's unloaded old btf instead the new one created after loading again. The kmod's btf may live for awhile after unloading the kmod, for a map being freed asynchronized is still holding the btf. - Split "add struct_ops_tab to btf" into tow patches by adding "make struct_ops_map support btfs other than btf_vmlinux". - Flip the order of "pass attached BTF to the bpf_struct_ops subsystem" and "hold module for bpf_struct_ops_map" to make it more reasonable. - Fix the compile errors of a missing header file. Changes from v8: - Rename bpf_struct_ops_init_one() to bpf_struct_ops_desc_init(). - Move code that using BTF_ID_LIST to the newly added patch 2. - Move code that lookup struct_ops types from a given module to the newly added patch 5. - Store the pointers of btf at st_maps. - Add test cases for the cases of modules being unload. - Call bpf_struct_ops_init() in btf_add_struct_ops() to fix an inconsistent issue. Changes from v7: - Fix check_struct_ops_btf_id() to use attach btf if there is instead of btf_vmlinux. Changes from v6: - Change returned error code to -EINVAL for the case of bpf_try_get_module(). - Return an error code from bpf_struct_ops_init(). - Fix the dependency issue of testing_helpers.c and rcu_tasks_trace_gp.skel.h. Changes from v5: - As the 2nd patch, we introduce "bpf_struct_ops_desc". This change involves moving certain members of "bpf_struct_ops" to "bpf_struct_ops_desc", which becomes a part of "btf_struct_ops_tab". This ensures that these members remain accessible even when the owner module of a "bpf_struct_ops" is unloaded. - Correct the order of arguments when calling in the 3rd patch. - Remove the owner argument from bpf_struct_ops_init_one(). Instead, callers should fill in st_ops->owner. - Make sure to hold the owner module when calling bpf_struct_ops_find() and bpf_struct_ops_find_value() in the 6th patch. - Merge the functions register_bpf_struct_ops_btf() and register_bpf_struct_ops() into a single function and relocate it to btf.c for better organization and clarity. - Undo the name modifications made to find_kernel_btf_id() and find_ksym_btf_id() in the 8th patch. Changes from v4: - Fix the dependency between testing_helpers.o and rcu_tasks_trace_gp.skel.h. Changes from v3: - Fix according to the feedback for v3. - Change of the order of arguments to make btf as the first argument. - Use btf_try_get_module() instead of try_get_module() since the module pointed by st_ops->owner can gone while some one is still holding its btf. - Move variables defined by BPF_STRUCT_OPS_COMMON_VALUE to struct bpf_struct_ops_common_value to validation easier. - Register the struct_ops type defined by bpf_testmod in its init function. - Rename field name to 'value_type_btf_obj_fd' to make it explicit. - Fix leaking of btf objects on error. - st_maps hold their modules to keep modules alive and prevent they from unloading. - bpf_map of libbpf keeps mod_btf_fd instead of a pointer to module_btf. - Do call_rcu_tasks_trace() in kern_sync_rcu() to ensure the bpf_testmod is unloaded properly. It uses rcu_tasks_trace_gp to trigger call_rcu_tasks_trace() in the kernel. - Merge and reorder patches in a reasonable order. Changes from v2: - Remove struct_ops array, and add a per-btf (module) struct_ops_tab to collect registered struct_ops types. - Validate value_type by checking member names and types. --- v16: https://lore.kernel.org/all/20240118014930.1992551-1-thinker.li@gmail.com/ v15: https://lore.kernel.org/all/20231220222654.1435895-1-thinker.li@gmail.com/ v14: https://lore.kernel.org/all/20231217081132.1025020-1-thinker.li@gmail.com/ v13: https://lore.kernel.org/all/20231209002709.535966-1-thinker.li@gmail.com/ v12: https://lore.kernel.org/all/20231207013950.1689269-1-thinker.li@gmail.com/ v11: https://lore.kernel.org/all/20231106201252.1568931-1-thinker.li@gmail.com/ v10: https://lore.kernel.org/all/20231103232202.3664407-1-thinker.li@gmail.com/ v9: https://lore.kernel.org/all/20231101204519.677870-1-thinker.li@gmail.com/ v8: https://lore.kernel.org/all/20231030192810.382942-1-thinker.li@gmail.com/ v7: https://lore.kernel.org/all/20231027211702.1374597-1-thinker.li@gmail.com/ v6: https://lore.kernel.org/all/20231022050335.2579051-11-thinker.li@gmail.com/ v5: https://lore.kernel.org/all/20231017162306.176586-1-thinker.li@gmail.com/ v4: https://lore.kernel.org/all/20231013224304.187218-1-thinker.li@gmail.com/ v3: https://lore.kernel.org/all/20230920155923.151136-1-thinker.li@gmail.com/ v2: https://lore.kernel.org/all/20230913061449.1918219-1-thinker.li@gmail.com/ ==================== Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
Kui-Feng Lee authored
Create a new struct_ops type called bpf_testmod_ops within the bpf_testmod module. When a struct_ops object is registered, the bpf_testmod module will invoke test_2 from the module. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240119225005.668602-15-thinker.li@gmail.comSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
Kui-Feng Lee authored
The module requires the use of btf_ctx_access() to invoke bpf_tracing_btf_ctx_access() from a module. This function is valuable for implementing validation functions that ensure proper access to ctx. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240119225005.668602-14-thinker.li@gmail.comSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
Kui-Feng Lee authored
Locate the module BTFs for struct_ops maps and progs and pass them to the kernel. This ensures that the kernel correctly resolves type IDs from the appropriate module BTFs. For the map of a struct_ops object, the FD of the module BTF is set to bpf_map to keep a reference to the module BTF. The FD is passed to the kernel as value_type_btf_obj_fd when the struct_ops object is loaded. For a bpf_struct_ops prog, attach_btf_obj_fd of bpf_prog is the FD of a module BTF in the kernel. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240119225005.668602-13-thinker.li@gmail.comSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
Kui-Feng Lee authored
Replace the static list of struct_ops types with per-btf struct_ops_tab to enable dynamic registration. Both bpf_dummy_ops and bpf_tcp_ca now utilize the registration function instead of being listed in bpf_struct_ops_types.h. Cc: netdev@vger.kernel.org Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240119225005.668602-12-thinker.li@gmail.comSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
Kui-Feng Lee authored
A value_type should consist of three components: refcnt, state, and data. refcnt and state has been move to struct bpf_struct_ops_common_value to make it easier to check the value type. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240119225005.668602-11-thinker.li@gmail.comSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
Kui-Feng Lee authored
To ensure that a module remains accessible whenever a struct_ops object of a struct_ops type provided by the module is still in use. struct bpf_struct_ops_map doesn't hold a refcnt to btf anymore since a module will hold a refcnt to it's btf already. But, struct_ops programs are different. They hold their associated btf, not the module since they need only btf to assure their types (signatures). However, verifier holds the refcnt of the associated module of a struct_ops type temporarily when verify a struct_ops prog. Verifier needs the help from the verifier operators (struct bpf_verifier_ops) provided by the owner module to verify data access of a prog, provide information, and generate code. This patch also add a count of links (links_cnt) to bpf_struct_ops_map. It avoids bpf_struct_ops_map_put_progs() from accessing btf after calling module_put() in bpf_struct_ops_map_free(). Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240119225005.668602-10-thinker.li@gmail.comSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
Kui-Feng Lee authored
Pass the fd of a btf from the userspace to the bpf() syscall, and then convert the fd into a btf. The btf is generated from the module that defines the target BPF struct_ops type. In order to inform the kernel about the module that defines the target struct_ops type, the userspace program needs to provide a btf fd for the respective module's btf. This btf contains essential information on the types defined within the module, including the target struct_ops type. A btf fd must be provided to the kernel for struct_ops maps and for the bpf programs attached to those maps. In the case of the bpf programs, the attach_btf_obj_fd parameter is passed as part of the bpf_attr and is converted into a btf. This btf is then stored in the prog->aux->attach_btf field. Here, it just let the verifier access attach_btf directly. In the case of struct_ops maps, a btf fd is passed as value_type_btf_obj_fd of bpf_attr. The bpf_struct_ops_map_alloc() function converts the fd to a btf and stores it as st_map->btf. A flag BPF_F_VTYPE_BTF_OBJ_FD is added for map_flags to indicate that the value of value_type_btf_obj_fd is set. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240119225005.668602-9-thinker.li@gmail.comSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
Kui-Feng Lee authored
This is a preparation for searching for struct_ops types from a specified module. BTF is always btf_vmlinux now. This patch passes a pointer of BTF to bpf_struct_ops_find_value() and bpf_struct_ops_find(). Once the new registration API of struct_ops types is used, other BTFs besides btf_vmlinux can also be passed to them. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240119225005.668602-8-thinker.li@gmail.comSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
Kui-Feng Lee authored
Include btf object id (btf_obj_id) in bpf_map_info so that tools (ex: bpftools struct_ops dump) know the correct btf from the kernel to look up type information of struct_ops types. Since struct_ops types can be defined and registered in a module. The type information of a struct_ops type are defined in the btf of the module defining it. The userspace tools need to know which btf is for the module defining a struct_ops type. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240119225005.668602-7-thinker.li@gmail.comSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
Kui-Feng Lee authored
Once new struct_ops can be registered from modules, btf_vmlinux is no longer the only btf that struct_ops_map would face. st_map should remember what btf it should use to get type information. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240119225005.668602-6-thinker.li@gmail.comSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
Kui-Feng Lee authored
Maintain a registry of registered struct_ops types in the per-btf (module) struct_ops_tab. This registry allows for easy lookup of struct_ops types that are registered by a specific module. It is a preparation work for supporting kernel module struct_ops in a latter patch. Each struct_ops will be registered under its own kernel module btf and will be stored in the newly added btf->struct_ops_tab. The bpf verifier and bpf syscall (e.g. prog and map cmd) can find the struct_ops and its btf type/size/id... information from btf->struct_ops_tab. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240119225005.668602-5-thinker.li@gmail.comSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
Kui-Feng Lee authored
Move some of members of bpf_struct_ops to bpf_struct_ops_desc. type_id is unavailabe in bpf_struct_ops anymore. Modules should get it from the btf received by kmod's init function. Cc: netdev@vger.kernel.org Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240119225005.668602-4-thinker.li@gmail.comSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
Kui-Feng Lee authored
Get ready to remove bpf_struct_ops_init() in the future. By using BTF_ID_LIST, it is possible to gather type information while building instead of runtime. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240119225005.668602-3-thinker.li@gmail.comSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
Kui-Feng Lee authored
Move the majority of the code to bpf_struct_ops_init_one(), which can then be utilized for the initialization of newly registered dynamically allocated struct_ops types in the following patches. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240119225005.668602-2-thinker.li@gmail.comSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
Alexei Starovoitov authored
Jiri Olsa says: ==================== bpf: Add cookies retrieval for perf/kprobe multi links hi, this patchset adds support to retrieve cookies from existing tracing links that still did not support it plus changes to bpftool to display them. It's leftover we discussed some time ago [1]. thanks, jirka v2 changes: - added review/ack tags - fixed memory leak [Quentin] - align the uapi fields properly [Yafang Shao] [1] https://lore.kernel.org/bpf/CALOAHbAZ6=A9j3VFCLoAC_WhgQKU7injMf06=cM2sU4Hi4Sx+Q@mail.gmail.com/Reviewed-by: Quentin Monnet <quentin@isovalent.com> --- ==================== Reviewed-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/r/20240119110505.400573-1-jolsa@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Jiri Olsa authored
Displaying cookies for kprobe multi link, in plain mode: # bpftool link ... 1397: kprobe_multi prog 47532 kretprobe.multi func_cnt 3 addr cookie func [module] ffffffff82b370c0 3 bpf_fentry_test1 ffffffff82b39780 1 bpf_fentry_test2 ffffffff82b397a0 2 bpf_fentry_test3 And in json mode: # bpftool link -j | jq ... { "id": 1397, "type": "kprobe_multi", "prog_id": 47532, "retprobe": true, "func_cnt": 3, "missed": 0, "funcs": [ { "addr": 18446744071607382208, "func": "bpf_fentry_test1", "module": null, "cookie": 3 }, { "addr": 18446744071607392128, "func": "bpf_fentry_test2", "module": null, "cookie": 1 }, { "addr": 18446744071607392160, "func": "bpf_fentry_test3", "module": null, "cookie": 2 } ] } Cookie is attached to specific address, and because we sort addresses before printing, we need to sort cookies the same way, hence adding the struct addr_cookie to keep and sort them together. Also adding missing dd.sym_count check to show_kprobe_multi_json. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240119110505.400573-9-jolsa@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Jiri Olsa authored
Displaying cookie for perf event link probes, in plain mode: # bpftool link 17: perf_event prog 90 kprobe ffffffff82b1c2b0 bpf_fentry_test1 cookie 3735928559 18: perf_event prog 90 kretprobe ffffffff82b1c2b0 bpf_fentry_test1 cookie 3735928559 20: perf_event prog 92 tracepoint sched_switch cookie 3735928559 21: perf_event prog 93 event software:page-faults cookie 3735928559 22: perf_event prog 91 uprobe /proc/self/exe+0xd703c cookie 3735928559 And in json mode: # bpftool link -j | jq { "id": 30, "type": "perf_event", "prog_id": 160, "retprobe": false, "addr": 18446744071607272112, "func": "bpf_fentry_test1", "offset": 0, "missed": 0, "cookie": 3735928559 } { "id": 33, "type": "perf_event", "prog_id": 162, "tracepoint": "sched_switch", "cookie": 3735928559 } { "id": 34, "type": "perf_event", "prog_id": 163, "event_type": "software", "event_config": "page-faults", "cookie": 3735928559 } { "id": 35, "type": "perf_event", "prog_id": 161, "retprobe": false, "file": "/proc/self/exe", "offset": 880700, "cookie": 3735928559 } Reviewed-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240119110505.400573-8-jolsa@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Jiri Olsa authored
Adding fill_link_info test for perf event and testing we get its values back through the bpf_link_info interface. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240119110505.400573-7-jolsa@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Jiri Olsa authored
Now that we get cookies for perf_event probes, adding tests for cookie for kprobe/uprobe/tracepoint. The perf_event test needs to be added completely and is coming in following change. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240119110505.400573-6-jolsa@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Jiri Olsa authored
Adding cookies check for kprobe_multi fill_link_info test, plus tests for invalid values related to cookies. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240119110505.400573-5-jolsa@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Jiri Olsa authored
The error path frees wrong array, it should be ref_ctr_offsets. Acked-by: Yafang Shao <laoar.shao@gmail.com> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Fixes: a7795698 ("bpftool: Add support to display uprobe_multi links") Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240119110505.400573-4-jolsa@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Jiri Olsa authored
Storing cookies in kprobe_multi bpf_link_info data. The cookies field is optional and if provided it needs to be an array of __u64 with kprobe_multi.count length. Acked-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240119110505.400573-3-jolsa@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Jiri Olsa authored
At the moment we don't store cookie for perf_event probes, while we do that for the rest of the probes. Adding cookie fields to struct bpf_link_info perf event probe records: perf_event.uprobe perf_event.kprobe perf_event.tracepoint perf_event.perf_event And the code to store that in bpf_link_info struct. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Song Liu <song@kernel.org> Acked-by: Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/r/20240119110505.400573-2-jolsa@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
- 23 Jan, 2024 6 commits
-
-
Jose E. Marchesi authored
Some of the BPF selftests use the "p" constraint in inline assembly snippets, for input operands for MOV (rN = rM) instructions. This is mainly done via the __imm_ptr macro defined in tools/testing/selftests/bpf/progs/bpf_misc.h: #define __imm_ptr(name) [name]"p"(&name) Example: int consume_first_item_only(void *ctx) { struct bpf_iter_num iter; asm volatile ( /* create iterator */ "r1 = %[iter];" [...] : : __imm_ptr(iter) : CLOBBERS); [...] } The "p" constraint is a tricky one. It is documented in the GCC manual section "Simple Constraints": An operand that is a valid memory address is allowed. This is for ``load address'' and ``push address'' instructions. p in the constraint must be accompanied by address_operand as the predicate in the match_operand. This predicate interprets the mode specified in the match_operand as the mode of the memory reference for which the address would be valid. There are two problems: 1. It is questionable whether that constraint was ever intended to be used in inline assembly templates, because its behavior really depends on compiler internals. A "memory address" is not the same than a "memory operand" or a "memory reference" (constraint "m"), and in fact its usage in the template above results in an error in both x86_64-linux-gnu and bpf-unkonwn-none: foo.c: In function ‘bar’: foo.c:6:3: error: invalid 'asm': invalid expression as operand 6 | asm volatile ("r1 = %[jorl]" : : [jorl]"p"(&jorl)); | ^~~ I would assume the same happens with aarch64, riscv, and most/all other targets in GCC, that do not accept operands of the form A + B that are not wrapped either in a const or in a memory reference. To avoid that error, the usage of the "p" constraint in internal GCC instruction templates is supposed to be complemented by the 'a' modifier, like in: asm volatile ("r1 = %a[jorl]" : : [jorl]"p"(&jorl)); Internally documented (in GCC's final.cc) as: %aN means expect operand N to be a memory address (not a memory reference!) and print a reference to that address. That works because when the modifier 'a' is found, GCC prints an "operand address", which is not the same than an "operand". But... 2. Even if we used the internal 'a' modifier (we shouldn't) the 'rN = rM' instruction really requires a register argument. In cases involving automatics, like in the examples above, we easily end with: bar: #APP r1 = r10-4 #NO_APP In other cases we could conceibly also end with a 64-bit label that may overflow the 32-bit immediate operand of `rN = imm32' instructions: r1 = foo All of which is clearly wrong. clang happens to do "the right thing" in the current usage of __imm_ptr in the BPF tests, because even with -O2 it seems to "reload" the fp-relative address of the automatic to a register like in: bar: r1 = r10 r1 += -4 #APP r1 = r1 #NO_APP Which is what GCC would generate with -O0. Whether this is by chance or by design, the compiler shouln't be expected to do that reload driven by the "p" constraint. This patch changes the usage of the "p" constraint in the BPF selftests macros to use the "r" constraint instead. If a register is what is required, we should let the compiler know. Previous discussion in bpf@vger: https://lore.kernel.org/bpf/87h6p5ebpb.fsf@oracle.com/T/#ef0df83d6975c34dff20bf0dd52e078f5b8ca2767 Tested in bpf-next master. No regressions. Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com> Cc: Yonghong Song <yonghong.song@linux.dev> Cc: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20240123181309.19853-1-jose.marchesi@oracle.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Jose E. Marchesi authored
GCC emits a warning: progs/test_tcpbpf_kern.c:60:9: error: ‘op’ is used uninitialized [-Werror=uninitialized] when an uninialized op is used with a "+r" constraint. The + modifier means a read-write operand, but that operand in the selftest is just written to. This patch changes the selftest to use a "=r" constraint. This pacifies GCC. Tested in bpf-next master. No regressions. Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com> Cc: Yonghong Song <yhs@meta.com> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: david.faust@oracle.com Cc: cupertino.miranda@oracle.com Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20240123205624.14746-1-jose.marchesi@oracle.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Jose E. Marchesi authored
VLAs are not supported by either the BPF port of clang nor GCC. The selftest test_xdp_dynptr.c contains the following code: const size_t tcphdr_sz = sizeof(struct tcphdr); const size_t udphdr_sz = sizeof(struct udphdr); const size_t ethhdr_sz = sizeof(struct ethhdr); const size_t iphdr_sz = sizeof(struct iphdr); const size_t ipv6hdr_sz = sizeof(struct ipv6hdr); [...] static __always_inline int handle_ipv4(struct xdp_md *xdp, struct bpf_dynptr *xdp_ptr) { __u8 eth_buffer[ethhdr_sz + iphdr_sz + ethhdr_sz]; __u8 iph_buffer_tcp[iphdr_sz + tcphdr_sz]; __u8 iph_buffer_udp[iphdr_sz + udphdr_sz]; [...] } The eth_buffer, iph_buffer_tcp and other automatics are fixed size only if the compiler optimizes away the constant global variables. clang does this, but GCC does not, turning these automatics into variable length arrays. This patch removes the global variables and turns these values into preprocessor constants. This makes the selftest to build properly with GCC. Tested in bpf-next master. No regressions. Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com> Cc: Yonghong Song <yhs@meta.com> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: david.faust@oracle.com Cc: cupertino.miranda@oracle.com Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20240123201729.16173-1-jose.marchesi@oracle.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Andrii Nakryiko authored
We've ran into issues with using dup2() API in production setting, where libbpf is linked into large production environment and ends up calling unintended custom implementations of dup2(). These custom implementations don't provide atomic FD replacement guarantees of dup2() syscall, leading to subtle and hard to debug issues. To prevent this in the future and guarantee that no libc implementation will do their own custom non-atomic dup2() implementation, call dup2() syscall directly with syscall(SYS_dup2). Note that some architectures don't seem to provide dup2 and have dup3 instead. Try to detect and pick best syscall. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Song Liu <song@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20240119210201.1295511-1-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Alexei Starovoitov authored
Hou Tao says: ==================== Enable the inline of kptr_xchg for arm64 From: Hou Tao <houtao1@huawei.com> Hi, The patch set is just a follow-up for "bpf: inline bpf_kptr_xchg()". It enables the inline of bpf_kptr_xchg() and kptr_xchg_inline test for arm64. Please see individual patches for more details. And comments are always welcome. ==================== Link: https://lore.kernel.org/r/20240119102529.99581-1-houtao@huaweicloud.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Hou Tao authored
Now arm64 bpf jit has enable bpf_jit_supports_ptr_xchg(), so enable the test for arm64 as well. Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20240119102529.99581-3-houtao@huaweicloud.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-