- 26 Jan, 2024 2 commits
-
-
Kui-Feng Lee authored
In bpf_struct_ops_map_alloc, it needs to check for NULL in the returned pointer of bpf_get_btf_vmlinux() when CONFIG_DEBUG_INFO_BTF is not set. ENOTSUPP is used to preserve the same behavior before the struct_ops kmod support. In the function check_struct_ops_btf_id(), instead of redoing the bpf_get_btf_vmlinux() that has already been done in syscall.c, the fix here is to check for prog->aux->attach_btf_id. BPF_PROG_TYPE_STRUCT_OPS must require attach_btf_id and syscall.c guarantees a valid attach_btf as long as attach_btf_id is set. When attach_btf_id is not set, this patch returns -ENOTSUPP because it is what the selftest in test_libbpf_probe_prog_types() and libbpf_probes.c are expecting for feature probing purpose. Changes from v1: - Remove an unnecessary NULL check in check_struct_ops_btf_id() Reported-by: syzbot+88f0aafe5f950d7489d7@syzkaller.appspotmail.com Closes: https://lore.kernel.org/bpf/00000000000040d68a060fc8db8c@google.com/ Reported-by: syzbot+1336f3d4b10bcda75b89@syzkaller.appspotmail.com Closes: https://lore.kernel.org/bpf/00000000000026353b060fc21c07@google.com/ Fixes: fcc2c1fb ("bpf: pass attached BTF to the bpf_struct_ops subsystem") Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240126023113.1379504-1-thinker.li@gmail.comSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
Eduard Zingerman authored
I've been working on BPF verifier, BPF selftests and, to some extent, libbpf, for some time. As suggested by Andrii and Alexei, I humbly ask to add me to maintainers list: - As reviewer for BPF [GENERAL] - As maintainer for BPF [LIBRARY] - As maintainer for BPF [SELFTESTS] This patch adds dedicated entries to MAINTAINERS. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20240126032554.9697-1-eddyz87@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
- 25 Jan, 2024 32 commits
-
-
Andrii Nakryiko authored
Andrii Nakryiko says: ==================== BPF token This patch set is a combination of three BPF token-related patch sets ([0], [1], [2]) with fixes ([3]) to kernel-side token_fd passing APIs incorporated into relevant patches, bpf_token_capable() changes requested by Christian Brauner, and necessary libbpf and BPF selftests side adjustments. This patch set introduces an ability to delegate a subset of BPF subsystem functionality from privileged system-wide daemon (e.g., systemd or any other container manager) through special mount options for userns-bound BPF FS to a *trusted* unprivileged application. Trust is the key here. This functionality is not about allowing unconditional unprivileged BPF usage. Establishing trust, though, is completely up to the discretion of respective privileged application that would create and mount a BPF FS instance with delegation enabled, as different production setups can and do achieve it through a combination of different means (signing, LSM, code reviews, etc), and it's undesirable and infeasible for kernel to enforce any particular way of validating trustworthiness of particular process. The main motivation for this work is a desire to enable containerized BPF applications to be used together with user namespaces. This is currently impossible, as CAP_BPF, required for BPF subsystem usage, cannot be namespaced or sandboxed, as a general rule. E.g., tracing BPF programs, thanks to BPF helpers like bpf_probe_read_kernel() and bpf_probe_read_user() can safely read arbitrary memory, and it's impossible to ensure that they only read memory of processes belonging to any given namespace. This means that it's impossible to have a mechanically verifiable namespace-aware CAP_BPF capability, and as such another mechanism to allow safe usage of BPF functionality is necessary. BPF FS delegation mount options and BPF token derived from such BPF FS instance is such a mechanism. Kernel makes no assumption about what "trusted" constitutes in any particular case, and it's up to specific privileged applications and their surrounding infrastructure to decide that. What kernel provides is a set of APIs to setup and mount special BPF FS instance and derive BPF tokens from it. BPF FS and BPF token are both bound to its owning userns and in such a way are constrained inside intended container. Users can then pass BPF token FD to privileged bpf() syscall commands, like BPF map creation and BPF program loading, to perform such operations without having init userns privileges. This version incorporates feedback and suggestions ([4]) received on earlier iterations of BPF token approach, and instead of allowing to create BPF tokens directly assuming capable(CAP_SYS_ADMIN), we instead enhance BPF FS to accept a few new delegation mount options. If these options are used and BPF FS itself is properly created, set up, and mounted inside the user namespaced container, user application is able to derive a BPF token object from BPF FS instance, and pass that token to bpf() syscall. As explained in patch #3, BPF token itself doesn't grant access to BPF functionality, but instead allows kernel to do namespaced capabilities checks (ns_capable() vs capable()) for CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, and CAP_SYS_ADMIN, as applicable. So it forms one half of a puzzle and allows container managers and sys admins to have safe and flexible configuration options: determining which containers get delegation of BPF functionality through BPF FS, and then which applications within such containers are allowed to perform bpf() commands, based on namespaces capabilities. Previous attempt at addressing this very same problem ([5]) attempted to utilize authoritative LSM approach, but was conclusively rejected by upstream LSM maintainers. BPF token concept is not changing anything about LSM approach, but can be combined with LSM hooks for very fine-grained security policy. Some ideas about making BPF token more convenient to use with LSM (in particular custom BPF LSM programs) was briefly described in recent LSF/MM/BPF 2023 presentation ([6]). E.g., an ability to specify user-provided data (context), which in combination with BPF LSM would allow implementing a very dynamic and fine-granular custom security policies on top of BPF token. In the interest of minimizing API surface area and discussions this was relegated to follow up patches, as it's not essential to the fundamental concept of delegatable BPF token. It should be noted that BPF token is conceptually quite similar to the idea of /dev/bpf device file, proposed by Song a while ago ([7]). The biggest difference is the idea of using virtual anon_inode file to hold BPF token and allowing multiple independent instances of them, each (potentially) with its own set of restrictions. And also, crucially, BPF token approach is not using any special stateful task-scoped flags. Instead, bpf() syscall accepts token_fd parameters explicitly for each relevant BPF command. This addresses main concerns brought up during the /dev/bpf discussion, and fits better with overall BPF subsystem design. Second part of this patch set adds full support for BPF token in libbpf's BPF object high-level API. Good chunk of the changes rework libbpf feature detection internals, which are the most affected by BPF token presence. Besides internal refactorings, libbpf allows to pass location of BPF FS from which BPF token should be created by libbpf. This can be done explicitly though a new bpf_object_open_opts.bpf_token_path field. But we also add implicit BPF token creation logic to BPF object load step, even without any explicit involvement of the user. If the environment is setup properly, BPF token will be created transparently and used implicitly. This allows for all existing application to gain BPF token support by just linking with latest version of libbpf library. No source code modifications are required. All that under assumption that privileged container management agent properly set up default BPF FS instance at /sys/bpf/fs to allow BPF token creation. libbpf adds support to override default BPF FS location for BPF token creation through LIBBPF_BPF_TOKEN_PATH envvar knowledge. This allows admins or container managers to mount BPF token-enabled BPF FS at non-standard location without the need to coordinate with applications. LIBBPF_BPF_TOKEN_PATH can also be used to disable BPF token implicit creation by setting it to an empty value. [0] https://patchwork.kernel.org/project/netdevbpf/list/?series=805707&state=* [1] https://patchwork.kernel.org/project/netdevbpf/list/?series=810260&state=* [2] https://patchwork.kernel.org/project/netdevbpf/list/?series=809800&state=* [3] https://patchwork.kernel.org/project/netdevbpf/patch/20231219053150.336991-1-andrii@kernel.org/ [4] https://lore.kernel.org/bpf/20230704-hochverdient-lehne-eeb9eeef785e@brauner/ [5] https://lore.kernel.org/bpf/20230412043300.360803-1-andrii@kernel.org/ [6] http://vger.kernel.org/bpfconf2023_material/Trusted_unprivileged_BPF_LSFMM2023.pdf [7] https://lore.kernel.org/bpf/20190627201923.2589391-2-songliubraving@fb.com/ v1->v2: - disable BPF token creation in init userns, and simplify bpf_token_capable() logic (Christian); - use kzalloc/kfree instead of kvzalloc/kvfree (Linus); - few more selftest cases to validate LSM and BPF token interations. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> ==================== Link: https://lore.kernel.org/r/20240124022127.2379740-1-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Andrii Nakryiko authored
Add tests for LSM interactions (both bpf_token_capable and bpf_token_cmd LSM hooks) with BPF token in bpf() subsystem. Now child process passes back token FD for parent to be able to do tests with token originating in "wrong" userns. But we also create token in initns and check that token LSMs don't accidentally reject BPF operations when capable() checks pass without BPF token. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20240124022127.2379740-31-andrii@kernel.org
-
Andrii Nakryiko authored
Add new subtest validating LIBBPF_BPF_TOKEN_PATH envvar semantics. Extend existing test to validate that LIBBPF_BPF_TOKEN_PATH allows to disable implicit BPF token creation by setting envvar to empty string. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20240124022127.2379740-30-andrii@kernel.org
-
Andrii Nakryiko authored
To allow external admin authority to override default BPF FS location (/sys/fs/bpf) for implicit BPF token creation, teach libbpf to recognize LIBBPF_BPF_TOKEN_PATH envvar. If it is specified and user application didn't explicitly specify bpf_token_path option, it will be treated exactly like bpf_token_path option, overriding default /sys/fs/bpf location and making BPF token mandatory. Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20240124022127.2379740-29-andrii@kernel.org
-
Andrii Nakryiko authored
Add a test to validate libbpf's implicit BPF token creation from default BPF FS location (/sys/fs/bpf). Also validate that disabling this implicit BPF token creation works. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20240124022127.2379740-28-andrii@kernel.org
-
Andrii Nakryiko authored
Add a few tests that attempt to load BPF object containing privileged map, program, and the one requiring mandatory BTF uploading into the kernel (to validate token FD propagation to BPF_BTF_LOAD command). Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20240124022127.2379740-27-andrii@kernel.org
-
Andrii Nakryiko authored
Add BPF token support to BPF object-level functionality. BPF token is supported by BPF object logic either as an explicitly provided BPF token from outside (through BPF FS path), or implicitly (unless prevented through bpf_object_open_opts). Implicit mode is assumed to be the most common one for user namespaced unprivileged workloads. The assumption is that privileged container manager sets up default BPF FS mount point at /sys/fs/bpf with BPF token delegation options (delegate_{cmds,maps,progs,attachs} mount options). BPF object during loading will attempt to create BPF token from /sys/fs/bpf location, and pass it for all relevant operations (currently, map creation, BTF load, and program load). In this implicit mode, if BPF token creation fails due to whatever reason (BPF FS is not mounted, or kernel doesn't support BPF token, etc), this is not considered an error. BPF object loading sequence will proceed with no BPF token. In explicit BPF token mode, user provides explicitly custom BPF FS mount point path. In such case, BPF object will attempt to create BPF token from provided BPF FS location. If BPF token creation fails, that is considered a critical error and BPF object load fails with an error. Libbpf provides a way to disable implicit BPF token creation, if it causes any troubles (BPF token is designed to be completely optional and shouldn't cause any problems even if provided, but in the world of BPF LSM, custom security logic can be installed that might change outcome depending on the presence of BPF token). To disable libbpf's default BPF token creation behavior user should provide either invalid BPF token FD (negative), or empty bpf_token_path option. BPF token presence can influence libbpf's feature probing, so if BPF object has associated BPF token, feature probing is instructed to use BPF object-specific feature detection cache and token FD. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20240124022127.2379740-26-andrii@kernel.org
-
Andrii Nakryiko authored
Adjust feature probing callbacks to take into account optional token_fd. In unprivileged contexts, some feature detectors would fail to detect kernel support just because BPF program, BPF map, or BTF object can't be loaded due to privileged nature of those operations. So when BPF object is loaded with BPF token, this token should be used for feature probing. This patch is setting support for this scenario, but we don't yet pass non-zero token FD. This will be added in the next patch. We also switched BPF cookie detector from using kprobe program to tracepoint one, as tracepoint is somewhat less dangerous BPF program type and has higher likelihood of being allowed through BPF token in the future. This change has no effect on detection behavior. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20240124022127.2379740-25-andrii@kernel.org
-
Andrii Nakryiko authored
It's quite a lot of well isolated code, so it seems like a good candidate to move it out of libbpf.c to reduce its size. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20240124022127.2379740-24-andrii@kernel.org
-
Andrii Nakryiko authored
Add feat_supported() helper that accepts feature cache instead of bpf_object. This allows low-level code in bpf.c to not know or care about higher-level concept of bpf_object, yet it will be able to utilize custom feature checking in cases where BPF token might influence the outcome. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20240124022127.2379740-23-andrii@kernel.org
-
Andrii Nakryiko authored
Split a list of supported feature detectors with their corresponding callbacks from actual cached supported/missing values. This will allow to have more flexible per-token or per-object feature detectors in subsequent refactorings. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20240124022127.2379740-22-andrii@kernel.org
-
Andrii Nakryiko authored
Use both hex-based and string-based way to specify delegate mount options for BPF FS. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20240124022127.2379740-21-andrii@kernel.org
-
Andrii Nakryiko authored
Besides already supported special "any" value and hex bit mask, support string-based parsing of delegation masks based on exact enumerator names. Utilize BTF information of `enum bpf_cmd`, `enum bpf_map_type`, `enum bpf_prog_type`, and `enum bpf_attach_type` types to find supported symbolic names (ignoring __MAX_xxx guard values and stripping repetitive prefixes like BPF_ for cmd and attach types, BPF_MAP_TYPE_ for maps, and BPF_PROG_TYPE_ for prog types). The case doesn't matter, but it is normalized to lower case in mount option output. So "PROG_LOAD", "prog_load", and "MAP_create" are all valid values to specify for delegate_cmds options, "array" is among supported for map types, etc. Besides supporting string values, we also support multiple values specified at the same time, using colon (':') separator. There are corresponding changes on bpf_show_options side to use known values to print them in human-readable format, falling back to hex mask printing, if there are any unrecognized bits. This shouldn't be necessary when enum BTF information is present, but in general we should always be able to fall back to this even if kernel was built without BTF. As mentioned, emitted symbolic names are normalized to be all lower case. Example below shows various ways to specify delegate_cmds options through mount command and how mount options are printed back: 12/14 14:39:07.604 vmuser@archvm:~/local/linux/tools/testing/selftests/bpf $ mount | rg token $ sudo mkdir -p /sys/fs/bpf/token $ sudo mount -t bpf bpffs /sys/fs/bpf/token \ -o delegate_cmds=prog_load:MAP_CREATE \ -o delegate_progs=kprobe \ -o delegate_attachs=xdp $ mount | grep token bpffs on /sys/fs/bpf/token type bpf (rw,relatime,delegate_cmds=map_create:prog_load,delegate_progs=kprobe,delegate_attachs=xdp) Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20240124022127.2379740-20-andrii@kernel.org
-
Andrii Nakryiko authored
It's quite confusing in practice when it's possible to successfully create a BPF token from BPF FS that didn't have any of delegate_xxx mount options set up. While it's not wrong, it's actually more meaningful to reject BPF_TOKEN_CREATE with specific error code (-ENOENT) to let user-space know that no token delegation is setup up. So, instead of creating empty BPF token that will be always ignored because it doesn't have any of the allow_xxx bits set, reject it with -ENOENT. If we ever need empty BPF token to be possible, we can support that with extra flag passed into BPF_TOKEN_CREATE. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Christian Brauner <brauner@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20240124022127.2379740-19-andrii@kernel.org
-
Andrii Nakryiko authored
Utilize newly added bpf_token_create/bpf_token_free LSM hooks to allocate struct bpf_security_struct for each BPF token object in SELinux. This just follows similar pattern for BPF prog and map. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20240124022127.2379740-18-andrii@kernel.org
-
Andrii Nakryiko authored
Add a selftest that attempts to conceptually replicate intended BPF token use cases inside user namespaced container. Child process is forked. It is then put into its own userns and mountns. Child creates BPF FS context object. This ensures child userns is captured as the owning userns for this instance of BPF FS. Given setting delegation mount options is privileged operation, we ensure that child cannot set them. This context is passed back to privileged parent process through Unix socket, where parent sets up delegation options, creates, and mounts it as a detached mount. This mount FD is passed back to the child to be used for BPF token creation, which allows otherwise privileged BPF operations to succeed inside userns. We validate that all of token-enabled privileged commands (BPF_BTF_LOAD, BPF_MAP_CREATE, and BPF_PROG_LOAD) work as intended. They should only succeed inside the userns if a) BPF token is provided with proper allowed sets of commands and types; and b) namespaces CAP_BPF and other privileges are set. Lacking a) or b) should lead to -EPERM failures. Based on suggested workflow by Christian Brauner ([0]). [0] https://lore.kernel.org/bpf/20230704-hochverdient-lehne-eeb9eeef785e@brauner/Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20240124022127.2379740-17-andrii@kernel.org
-
Andrii Nakryiko authored
Wire through token_fd into bpf_prog_load(). Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20240124022127.2379740-16-andrii@kernel.org
-
Andrii Nakryiko authored
Allow user to specify token_fd for bpf_btf_load() API that wraps kernel's BPF_BTF_LOAD command. This allows loading BTF from unprivileged process as long as it has BPF token allowing BPF_BTF_LOAD command, which can be created and delegated by privileged process. Wire through new btf_flags as well, so that user can provide BPF_F_TOKEN_FD flag, if necessary. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20240124022127.2379740-15-andrii@kernel.org
-
Andrii Nakryiko authored
Add ability to provide token_fd for BPF_MAP_CREATE command through bpf_map_create() API. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20240124022127.2379740-14-andrii@kernel.org
-
Andrii Nakryiko authored
Add low-level wrapper API for BPF_TOKEN_CREATE command in bpf() syscall. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20240124022127.2379740-13-andrii@kernel.org
-
Andrii Nakryiko authored
Wire up bpf_token_create and bpf_token_free LSM hooks, which allow to allocate LSM security blob (we add `void *security` field to struct bpf_token for that), but also control who can instantiate BPF token. This follows existing pattern for BPF map and BPF prog. Also add security_bpf_token_allow_cmd() and security_bpf_token_capable() LSM hooks that allow LSM implementation to control and negate (if necessary) BPF token's delegation of a specific bpf_cmd and capability, respectively. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Paul Moore <paul@paul-moore.com> Link: https://lore.kernel.org/bpf/20240124022127.2379740-12-andrii@kernel.org
-
Andrii Nakryiko authored
Similarly to bpf_prog_alloc LSM hook, rename and extend bpf_map_alloc hook into bpf_map_create, taking not just struct bpf_map, but also bpf_attr and bpf_token, to give a fuller context to LSMs. Unlike bpf_prog_alloc, there is no need to move the hook around, as it currently is firing right before allocating BPF map ID and FD, which seems to be a sweet spot. But like bpf_prog_alloc/bpf_prog_free combo, make sure that bpf_map_free LSM hook is called even if bpf_map_create hook returned error, as if few LSMs are combined together it could be that one LSM successfully allocated security blob for its needs, while subsequent LSM rejected BPF map creation. The former LSM would still need to free up LSM blob, so we need to ensure security_bpf_map_free() is called regardless of the outcome. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Paul Moore <paul@paul-moore.com> Link: https://lore.kernel.org/bpf/20240124022127.2379740-11-andrii@kernel.org
-
Andrii Nakryiko authored
Based on upstream discussion ([0]), rework existing bpf_prog_alloc_security LSM hook. Rename it to bpf_prog_load and instead of passing bpf_prog_aux, pass proper bpf_prog pointer for a full BPF program struct. Also, we pass bpf_attr union with all the user-provided arguments for BPF_PROG_LOAD command. This will give LSMs as much information as we can basically provide. The hook is also BPF token-aware now, and optional bpf_token struct is passed as a third argument. bpf_prog_load LSM hook is called after a bunch of sanity checks were performed, bpf_prog and bpf_prog_aux were allocated and filled out, but right before performing full-fledged BPF verification step. bpf_prog_free LSM hook is now accepting struct bpf_prog argument, for consistency. SELinux code is adjusted to all new names, types, and signatures. Note, given that bpf_prog_load (previously bpf_prog_alloc) hook can be used by some LSMs to allocate extra security blob, but also by other LSMs to reject BPF program loading, we need to make sure that bpf_prog_free LSM hook is called after bpf_prog_load/bpf_prog_alloc one *even* if the hook itself returned error. If we don't do that, we run the risk of leaking memory. This seems to be possible today when combining SELinux and BPF LSM, as one example, depending on their relative ordering. Also, for BPF LSM setup, add bpf_prog_load and bpf_prog_free to sleepable LSM hooks list, as they are both executed in sleepable context. Also drop bpf_prog_load hook from untrusted, as there is no issue with refcount or anything else anymore, that originally forced us to add it to untrusted list in c0c852dd ("bpf: Do not mark certain LSM hook arguments as trusted"). We now trigger this hook much later and it should not be an issue anymore. [0] https://lore.kernel.org/bpf/9fe88aef7deabbe87d3fc38c4aea3c69.paul@paul-moore.com/Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Paul Moore <paul@paul-moore.com> Link: https://lore.kernel.org/bpf/20240124022127.2379740-10-andrii@kernel.org
-
Andrii Nakryiko authored
Remove remaining direct queries to perfmon_capable() and bpf_capable() in BPF verifier logic and instead use BPF token (if available) to make decisions about privileges. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20240124022127.2379740-9-andrii@kernel.org
-
Andrii Nakryiko authored
Instead of performing unconditional system-wide bpf_capable() and perfmon_capable() calls inside bpf_base_func_proto() function (and other similar ones) to determine eligibility of a given BPF helper for a given program, use previously recorded BPF token during BPF_PROG_LOAD command handling to inform the decision. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20240124022127.2379740-8-andrii@kernel.org
-
Andrii Nakryiko authored
Add basic support of BPF token to BPF_PROG_LOAD. BPF_F_TOKEN_FD flag should be set in prog_flags field when providing prog_token_fd. Wire through a set of allowed BPF program types and attach types, derived from BPF FS at BPF token creation time. Then make sure we perform bpf_token_capable() checks everywhere where it's relevant. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20240124022127.2379740-7-andrii@kernel.org
-
Andrii Nakryiko authored
Accept BPF token FD in BPF_BTF_LOAD command to allow BTF data loading through delegated BPF token. BPF_F_TOKEN_FD flag has to be specified when passing BPF token FD. Given BPF_BTF_LOAD command didn't have flags field before, we also add btf_flags field. BTF loading is a pretty straightforward operation, so as long as BPF token is created with allow_cmds granting BPF_BTF_LOAD command, kernel proceeds to parsing BTF data and creating BTF object. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20240124022127.2379740-6-andrii@kernel.org
-
Andrii Nakryiko authored
Allow providing token_fd for BPF_MAP_CREATE command to allow controlled BPF map creation from unprivileged process through delegated BPF token. New BPF_F_TOKEN_FD flag is added to specify together with BPF token FD for BPF_MAP_CREATE command. Wire through a set of allowed BPF map types to BPF token, derived from BPF FS at BPF token creation time. This, in combination with allowed_cmds allows to create a narrowly-focused BPF token (controlled by privileged agent) with a restrictive set of BPF maps that application can attempt to create. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20240124022127.2379740-5-andrii@kernel.org
-
Andrii Nakryiko authored
Add new kind of BPF kernel object, BPF token. BPF token is meant to allow delegating privileged BPF functionality, like loading a BPF program or creating a BPF map, from privileged process to a *trusted* unprivileged process, all while having a good amount of control over which privileged operations could be performed using provided BPF token. This is achieved through mounting BPF FS instance with extra delegation mount options, which determine what operations are delegatable, and also constraining it to the owning user namespace (as mentioned in the previous patch). BPF token itself is just a derivative from BPF FS and can be created through a new bpf() syscall command, BPF_TOKEN_CREATE, which accepts BPF FS FD, which can be attained through open() API by opening BPF FS mount point. Currently, BPF token "inherits" delegated command, map types, prog type, and attach type bit sets from BPF FS as is. In the future, having an BPF token as a separate object with its own FD, we can allow to further restrict BPF token's allowable set of things either at the creation time or after the fact, allowing the process to guard itself further from unintentionally trying to load undesired kind of BPF programs. But for now we keep things simple and just copy bit sets as is. When BPF token is created from BPF FS mount, we take reference to the BPF super block's owning user namespace, and then use that namespace for checking all the {CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, CAP_SYS_ADMIN} capabilities that are normally only checked against init userns (using capable()), but now we check them using ns_capable() instead (if BPF token is provided). See bpf_token_capable() for details. Such setup means that BPF token in itself is not sufficient to grant BPF functionality. User namespaced process has to *also* have necessary combination of capabilities inside that user namespace. So while previously CAP_BPF was useless when granted within user namespace, now it gains a meaning and allows container managers and sys admins to have a flexible control over which processes can and need to use BPF functionality within the user namespace (i.e., container in practice). And BPF FS delegation mount options and derived BPF tokens serve as a per-container "flag" to grant overall ability to use bpf() (plus further restrict on which parts of bpf() syscalls are treated as namespaced). Note also, BPF_TOKEN_CREATE command itself requires ns_capable(CAP_BPF) within the BPF FS owning user namespace, rounding up the ns_capable() story of BPF token. Also creating BPF token in init user namespace is currently not supported, given BPF token doesn't have any effect in init user namespace anyways. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/bpf/20240124022127.2379740-4-andrii@kernel.org
-
Andrii Nakryiko authored
Add few new mount options to BPF FS that allow to specify that a given BPF FS instance allows creation of BPF token (added in the next patch), and what sort of operations are allowed under BPF token. As such, we get 4 new mount options, each is a bit mask - `delegate_cmds` allow to specify which bpf() syscall commands are allowed with BPF token derived from this BPF FS instance; - if BPF_MAP_CREATE command is allowed, `delegate_maps` specifies a set of allowable BPF map types that could be created with BPF token; - if BPF_PROG_LOAD command is allowed, `delegate_progs` specifies a set of allowable BPF program types that could be loaded with BPF token; - if BPF_PROG_LOAD command is allowed, `delegate_attachs` specifies a set of allowable BPF program attach types that could be loaded with BPF token; delegate_progs and delegate_attachs are meant to be used together, as full BPF program type is, in general, determined through both program type and program attach type. Currently, these mount options accept the following forms of values: - a special value "any", that enables all possible values of a given bit set; - numeric value (decimal or hexadecimal, determined by kernel automatically) that specifies a bit mask value directly; - all the values for a given mount option are combined, if specified multiple times. E.g., `mount -t bpf nodev /path/to/mount -o delegate_maps=0x1 -o delegate_maps=0x2` will result in a combined 0x3 mask. Ideally, more convenient (for humans) symbolic form derived from corresponding UAPI enums would be accepted (e.g., `-o delegate_progs=kprobe|tracepoint`) and I intend to implement this, but it requires a bunch of UAPI header churn, so I postponed it until this feature lands upstream or at least there is a definite consensus that this feature is acceptable and is going to make it, just to minimize amount of wasted effort and not increase amount of non-essential code to be reviewed. Attentive reader will notice that BPF FS is now marked as FS_USERNS_MOUNT, which theoretically makes it mountable inside non-init user namespace as long as the process has sufficient *namespaced* capabilities within that user namespace. But in reality we still restrict BPF FS to be mountable only by processes with CAP_SYS_ADMIN *in init userns* (extra check in bpf_fill_super()). FS_USERNS_MOUNT is added to allow creating BPF FS context object (i.e., fsopen("bpf")) from inside unprivileged process inside non-init userns, to capture that userns as the owning userns. It will still be required to pass this context object back to privileged process to instantiate and mount it. This manipulation is important, because capturing non-init userns as the owning userns of BPF FS instance (super block) allows to use that userns to constraint BPF token to that userns later on (see next patch). So creating BPF FS with delegation inside unprivileged userns will restrict derived BPF token objects to only "work" inside that intended userns, making it scoped to a intended "container". Also, setting these delegation options requires capable(CAP_SYS_ADMIN), so unprivileged process cannot set this up without involvement of a privileged process. There is a set of selftests at the end of the patch set that simulates this sequence of steps and validates that everything works as intended. But careful review is requested to make sure there are no missed gaps in the implementation and testing. This somewhat subtle set of aspects is the result of previous discussions ([0]) about various user namespace implications and interactions with BPF token functionality and is necessary to contain BPF token inside intended user namespace. [0] https://lore.kernel.org/bpf/20230704-hochverdient-lehne-eeb9eeef785e@brauner/Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/bpf/20240124022127.2379740-3-andrii@kernel.org
-
Andrii Nakryiko authored
Within BPF syscall handling code CAP_NET_ADMIN checks stand out a bit compared to CAP_BPF and CAP_PERFMON checks. For the latter, CAP_BPF or CAP_PERFMON are checked first, but if they are not set, CAP_SYS_ADMIN takes over and grants whatever part of BPF syscall is required. Similar kind of checks that involve CAP_NET_ADMIN are not so consistent. One out of four uses does follow CAP_BPF/CAP_PERFMON model: during BPF_PROG_LOAD, if the type of BPF program is "network-related" either CAP_NET_ADMIN or CAP_SYS_ADMIN is required to proceed. But in three other cases CAP_NET_ADMIN is required even if CAP_SYS_ADMIN is set: - when creating DEVMAP/XDKMAP/CPU_MAP maps; - when attaching CGROUP_SKB programs; - when handling BPF_PROG_QUERY command. This patch is changing the latter three cases to follow BPF_PROG_LOAD model, that is allowing to proceed under either CAP_NET_ADMIN or CAP_SYS_ADMIN. This also makes it cleaner in subsequent BPF token patches to switch wholesomely to a generic bpf_token_capable(int cap) check, that always falls back to CAP_SYS_ADMIN if requested capability is missing. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/bpf/20240124022127.2379740-2-andrii@kernel.org
-
Martin KaFai Lau authored
The commit 9e926acd ("libbpf: Find correct module BTFs for struct_ops maps and progs.") sets a newly added field (value_type_btf_obj_fd) to -1 in libbpf when the caller of the libbpf's bpf_map_create did not define this field by passing a NULL "opts" or passing in a "opts" that does not cover this new field. OPT_HAS(opts, field) is used to decide if the field is defined or not: ((opts) && opts->sz >= offsetofend(typeof(*(opts)), field)) Once OPTS_HAS decided the field is not defined, that field should be set to 0. For this particular new field (value_type_btf_obj_fd), its corresponding map_flags "BPF_F_VTYPE_BTF_OBJ_FD" is not set. Thus, the kernel does not treat it as an fd field. Fixes: 9e926acd ("libbpf: Find correct module BTFs for struct_ops maps and progs.") Reported-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240124224418.2905133-1-martin.lau@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
- 24 Jan, 2024 6 commits
-
-
Martin KaFai Lau authored
After the previous patch that speeded up the test (by avoiding neigh discovery in IPv6), the BPF CI occasionally hits this error: rcv tstamp unexpected pkt rcv tstamp: actual 0 == expected 0 The test complains about the cmsg returned from the recvmsg() does not have the rcv timestamp. Setting skb->tstamp or not is controlled by a kernel static key "netstamp_needed_key". The static key is enabled whenever this is at least one sk with the SOCK_TIMESTAMP set. The test_redirect_dtime does use setsockopt() to turn on the SOCK_TIMESTAMP for the reading sk. In the kernel net_enable_timestamp() has a delay to enable the "netstamp_needed_key" when CONFIG_JUMP_LABEL is set. This potential delay is the likely reason for packet missing rcv timestamp occasionally. This patch is to create udp sockets with SOCK_TIMESTAMP set. It sends and receives some packets until the received packet has a rcv timestamp. It currently retries at most 5 times with 1s in between. This should be enough to wait for the "netstamp_needed_key". It then holds on to the socket and only closes it at the end of the test. This guarantees that the test has the "netstamp_needed_key" key turned on from the beginning. To simplify the udp sockets setup, they are sending/receiving packets in the same netns (ns_dst is used) and communicate over the "lo" dev. Hence, the patch enables the "lo" dev in the ns_dst. Fixes: c803475f ("bpf: selftests: test skb->tstamp in redirect_neigh") Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240120060518.3604920-2-martin.lau@linux.dev
-
Martin KaFai Lau authored
BPF CI has been reporting the tc_redirect_dtime test failing from time to time: test_inet_dtime:PASS:setns src 0 nsec (network_helpers.c:253: errno: No route to host) Failed to connect to server close_netns:PASS:setns 0 nsec test_inet_dtime:FAIL:connect_to_fd unexpected connect_to_fd: actual -1 < expected 0 test_tcp_clear_dtime:PASS:tcp ip6 clear dtime ingress_fwdns_p100 0 nsec The connect_to_fd failure (EHOSTUNREACH) is from the test_tcp_clear_dtime() test and it is the very first IPv6 traffic after setting up all the links, addresses, and routes. The symptom is this first connect() is always slow. In my setup, it could take ~3s. After some tracing and tcpdump, the slowness is mostly spent in the neighbor solicitation in the "ns_fwd" namespace while the "ns_src" and "ns_dst" are fine. I forced the kernel to drop the neighbor solicitation messages. I can then reproduce EHOSTUNREACH. What actually happen could be: - the neighbor advertisement came back a little slow. - the "ns_fwd" namespace concluded a neighbor discovery failure and triggered the ndisc_error_report() => ip6_link_failure() => icmpv6_send(skb, ICMPV6_DEST_UNREACH, ICMPV6_ADDR_UNREACH, 0) - the client's connect() reports EHOSTUNREACH after receiving the ICMPV6_DEST_UNREACH message. The neigh table of both "ns_src" and "ns_dst" namespace has already been manually populated but not the "ns_fwd" namespace. This patch fixes it by manually populating the neigh table also in the "ns_fwd" namespace. Although the namespace configuration part had been existed before the tc_redirect_dtime test, still Fixes-tagging the patch when the tc_redirect_dtime test was added since it is the only test hitting it so far. Fixes: c803475f ("bpf: selftests: test skb->tstamp in redirect_neigh") Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240120060518.3604920-1-martin.lau@linux.dev
-
Dima Tisnek authored
Past commit ([0]) removed the last vestiges of struct bpf_field_reloc, it's called struct bpf_core_relo now. [0] 28b93c64 ("libbpf: Clean up and improve CO-RE reloc logging") Signed-off-by: Dima Tisnek <dimaqq@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/bpf/20240121060126.15650-1-dimaqq@gmail.com
-
Andrii Nakryiko authored
Tiezhu Yang says: ==================== Skip callback tests if jit is disabled in test_verifier Thanks very much for the feedbacks from Eduard, John, Jiri, Daniel, Hou Tao, Song Liu and Andrii. v7: -- Add an explicit flag F_NEEDS_JIT_ENABLED for checking, thanks Andrii. v6: -- Copy insn_is_pseudo_func() into testing_helpers, thanks Andrii. v5: -- Reuse is_ldimm64_insn() and insn_is_pseudo_func(), thanks Song Liu. v4: -- Move the not-allowed-checking into "if (expected_ret ...)" block, thanks Hou Tao. -- Do some small changes to avoid checkpatch warning about "line length exceeds 100 columns". v3: -- Rebase on the latest bpf-next tree. -- Address the review comments by Hou Tao, remove the second argument "0" of open(), check only once whether jit is disabled, check fd_prog, saved_errno and jit_disabled to skip. ==================== Link: https://lore.kernel.org/r/20240123090351.2207-1-yangtiezhu@loongson.cnSigned-off-by: Andrii Nakryiko <andrii@kernel.org>
-
Tiezhu Yang authored
If CONFIG_BPF_JIT_ALWAYS_ON is not set and bpf_jit_enable is 0, there exist 6 failed tests. [root@linux bpf]# echo 0 > /proc/sys/net/core/bpf_jit_enable [root@linux bpf]# echo 0 > /proc/sys/kernel/unprivileged_bpf_disabled [root@linux bpf]# ./test_verifier | grep FAIL #106/p inline simple bpf_loop call FAIL #107/p don't inline bpf_loop call, flags non-zero FAIL #108/p don't inline bpf_loop call, callback non-constant FAIL #109/p bpf_loop_inline and a dead func FAIL #110/p bpf_loop_inline stack locations for loop vars FAIL #111/p inline bpf_loop call in a big program FAIL Summary: 768 PASSED, 15 SKIPPED, 6 FAILED The test log shows that callbacks are not allowed in non-JITed programs, interpreter doesn't support them yet, thus these tests should be skipped if jit is disabled. Add an explicit flag F_NEEDS_JIT_ENABLED to those tests to mark that they require JIT enabled in bpf_loop_inline.c, check the flag and jit_disabled at the beginning of do_test_single() to handle this case. With this patch: [root@linux bpf]# echo 0 > /proc/sys/net/core/bpf_jit_enable [root@linux bpf]# echo 0 > /proc/sys/kernel/unprivileged_bpf_disabled [root@linux bpf]# ./test_verifier | grep FAIL Summary: 768 PASSED, 21 SKIPPED, 0 FAILED Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240123090351.2207-3-yangtiezhu@loongson.cn
-
Tiezhu Yang authored
Currently, is_jit_enabled() is only used in test_progs, move it into testing_helpers so that it can be used in test_verifier. While at it, remove the second argument "0" of open() as Hou Tao suggested. Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Hou Tao <houtao1@huawei.com> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/bpf/20240123090351.2207-2-yangtiezhu@loongson.cn
-