1. 25 Jan, 2024 17 commits
    • Andrii Nakryiko's avatar
      selftests/bpf: Add BPF token-enabled tests · fcb9597f
      Andrii Nakryiko authored
      Add a selftest that attempts to conceptually replicate intended BPF
      token use cases inside user namespaced container.
      
      Child process is forked. It is then put into its own userns and mountns.
      Child creates BPF FS context object. This ensures child userns is
      captured as the owning userns for this instance of BPF FS. Given setting
      delegation mount options is privileged operation, we ensure that child
      cannot set them.
      
      This context is passed back to privileged parent process through Unix
      socket, where parent sets up delegation options, creates, and mounts it
      as a detached mount. This mount FD is passed back to the child to be
      used for BPF token creation, which allows otherwise privileged BPF
      operations to succeed inside userns.
      
      We validate that all of token-enabled privileged commands (BPF_BTF_LOAD,
      BPF_MAP_CREATE, and BPF_PROG_LOAD) work as intended. They should only
      succeed inside the userns if a) BPF token is provided with proper
      allowed sets of commands and types; and b) namespaces CAP_BPF and other
      privileges are set. Lacking a) or b) should lead to -EPERM failures.
      
      Based on suggested workflow by Christian Brauner ([0]).
      
        [0] https://lore.kernel.org/bpf/20230704-hochverdient-lehne-eeb9eeef785e@brauner/Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20240124022127.2379740-17-andrii@kernel.org
      fcb9597f
    • Andrii Nakryiko's avatar
      404cbc14
    • Andrii Nakryiko's avatar
      libbpf: Add BPF token support to bpf_btf_load() API · a3d63e85
      Andrii Nakryiko authored
      Allow user to specify token_fd for bpf_btf_load() API that wraps
      kernel's BPF_BTF_LOAD command. This allows loading BTF from unprivileged
      process as long as it has BPF token allowing BPF_BTF_LOAD command, which
      can be created and delegated by privileged process.
      
      Wire through new btf_flags as well, so that user can provide
      BPF_F_TOKEN_FD flag, if necessary.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20240124022127.2379740-15-andrii@kernel.org
      a3d63e85
    • Andrii Nakryiko's avatar
      libbpf: Add BPF token support to bpf_map_create() API · 364f8483
      Andrii Nakryiko authored
      Add ability to provide token_fd for BPF_MAP_CREATE command through
      bpf_map_create() API.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20240124022127.2379740-14-andrii@kernel.org
      364f8483
    • Andrii Nakryiko's avatar
      libbpf: Add bpf_token_create() API · 639ecd7d
      Andrii Nakryiko authored
      Add low-level wrapper API for BPF_TOKEN_CREATE command in bpf() syscall.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20240124022127.2379740-13-andrii@kernel.org
      639ecd7d
    • Andrii Nakryiko's avatar
      bpf,lsm: Add BPF token LSM hooks · f568a3d4
      Andrii Nakryiko authored
      Wire up bpf_token_create and bpf_token_free LSM hooks, which allow to
      allocate LSM security blob (we add `void *security` field to struct
      bpf_token for that), but also control who can instantiate BPF token.
      This follows existing pattern for BPF map and BPF prog.
      
      Also add security_bpf_token_allow_cmd() and security_bpf_token_capable()
      LSM hooks that allow LSM implementation to control and negate (if
      necessary) BPF token's delegation of a specific bpf_cmd and capability,
      respectively.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarPaul Moore <paul@paul-moore.com>
      Link: https://lore.kernel.org/bpf/20240124022127.2379740-12-andrii@kernel.org
      f568a3d4
    • Andrii Nakryiko's avatar
      bpf,lsm: Refactor bpf_map_alloc/bpf_map_free LSM hooks · a2431c7e
      Andrii Nakryiko authored
      Similarly to bpf_prog_alloc LSM hook, rename and extend bpf_map_alloc
      hook into bpf_map_create, taking not just struct bpf_map, but also
      bpf_attr and bpf_token, to give a fuller context to LSMs.
      
      Unlike bpf_prog_alloc, there is no need to move the hook around, as it
      currently is firing right before allocating BPF map ID and FD, which
      seems to be a sweet spot.
      
      But like bpf_prog_alloc/bpf_prog_free combo, make sure that bpf_map_free
      LSM hook is called even if bpf_map_create hook returned error, as if few
      LSMs are combined together it could be that one LSM successfully
      allocated security blob for its needs, while subsequent LSM rejected BPF
      map creation. The former LSM would still need to free up LSM blob, so we
      need to ensure security_bpf_map_free() is called regardless of the
      outcome.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarPaul Moore <paul@paul-moore.com>
      Link: https://lore.kernel.org/bpf/20240124022127.2379740-11-andrii@kernel.org
      a2431c7e
    • Andrii Nakryiko's avatar
      bpf,lsm: Refactor bpf_prog_alloc/bpf_prog_free LSM hooks · 1b67772e
      Andrii Nakryiko authored
      Based on upstream discussion ([0]), rework existing
      bpf_prog_alloc_security LSM hook. Rename it to bpf_prog_load and instead
      of passing bpf_prog_aux, pass proper bpf_prog pointer for a full BPF
      program struct. Also, we pass bpf_attr union with all the user-provided
      arguments for BPF_PROG_LOAD command.  This will give LSMs as much
      information as we can basically provide.
      
      The hook is also BPF token-aware now, and optional bpf_token struct is
      passed as a third argument. bpf_prog_load LSM hook is called after
      a bunch of sanity checks were performed, bpf_prog and bpf_prog_aux were
      allocated and filled out, but right before performing full-fledged BPF
      verification step.
      
      bpf_prog_free LSM hook is now accepting struct bpf_prog argument, for
      consistency. SELinux code is adjusted to all new names, types, and
      signatures.
      
      Note, given that bpf_prog_load (previously bpf_prog_alloc) hook can be
      used by some LSMs to allocate extra security blob, but also by other
      LSMs to reject BPF program loading, we need to make sure that
      bpf_prog_free LSM hook is called after bpf_prog_load/bpf_prog_alloc one
      *even* if the hook itself returned error. If we don't do that, we run
      the risk of leaking memory. This seems to be possible today when
      combining SELinux and BPF LSM, as one example, depending on their
      relative ordering.
      
      Also, for BPF LSM setup, add bpf_prog_load and bpf_prog_free to
      sleepable LSM hooks list, as they are both executed in sleepable
      context. Also drop bpf_prog_load hook from untrusted, as there is no
      issue with refcount or anything else anymore, that originally forced us
      to add it to untrusted list in c0c852dd ("bpf: Do not mark certain LSM
      hook arguments as trusted"). We now trigger this hook much later and it
      should not be an issue anymore.
      
        [0] https://lore.kernel.org/bpf/9fe88aef7deabbe87d3fc38c4aea3c69.paul@paul-moore.com/Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarPaul Moore <paul@paul-moore.com>
      Link: https://lore.kernel.org/bpf/20240124022127.2379740-10-andrii@kernel.org
      1b67772e
    • Andrii Nakryiko's avatar
      bpf: Consistently use BPF token throughout BPF verifier logic · d79a3549
      Andrii Nakryiko authored
      Remove remaining direct queries to perfmon_capable() and bpf_capable()
      in BPF verifier logic and instead use BPF token (if available) to make
      decisions about privileges.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20240124022127.2379740-9-andrii@kernel.org
      d79a3549
    • Andrii Nakryiko's avatar
      bpf: Take into account BPF token when fetching helper protos · bbc1d247
      Andrii Nakryiko authored
      Instead of performing unconditional system-wide bpf_capable() and
      perfmon_capable() calls inside bpf_base_func_proto() function (and other
      similar ones) to determine eligibility of a given BPF helper for a given
      program, use previously recorded BPF token during BPF_PROG_LOAD command
      handling to inform the decision.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20240124022127.2379740-8-andrii@kernel.org
      bbc1d247
    • Andrii Nakryiko's avatar
      bpf: Add BPF token support to BPF_PROG_LOAD command · caf8f28e
      Andrii Nakryiko authored
      Add basic support of BPF token to BPF_PROG_LOAD. BPF_F_TOKEN_FD flag
      should be set in prog_flags field when providing prog_token_fd.
      
      Wire through a set of allowed BPF program types and attach types,
      derived from BPF FS at BPF token creation time. Then make sure we
      perform bpf_token_capable() checks everywhere where it's relevant.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20240124022127.2379740-7-andrii@kernel.org
      caf8f28e
    • Andrii Nakryiko's avatar
      bpf: Add BPF token support to BPF_BTF_LOAD command · 9ea7c4bf
      Andrii Nakryiko authored
      Accept BPF token FD in BPF_BTF_LOAD command to allow BTF data loading
      through delegated BPF token. BPF_F_TOKEN_FD flag has to be specified
      when passing BPF token FD. Given BPF_BTF_LOAD command didn't have flags
      field before, we also add btf_flags field.
      
      BTF loading is a pretty straightforward operation, so as long as BPF
      token is created with allow_cmds granting BPF_BTF_LOAD command, kernel
      proceeds to parsing BTF data and creating BTF object.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20240124022127.2379740-6-andrii@kernel.org
      9ea7c4bf
    • Andrii Nakryiko's avatar
      bpf: Add BPF token support to BPF_MAP_CREATE command · a177fc2b
      Andrii Nakryiko authored
      Allow providing token_fd for BPF_MAP_CREATE command to allow controlled
      BPF map creation from unprivileged process through delegated BPF token.
      New BPF_F_TOKEN_FD flag is added to specify together with BPF token FD
      for BPF_MAP_CREATE command.
      
      Wire through a set of allowed BPF map types to BPF token, derived from
      BPF FS at BPF token creation time. This, in combination with allowed_cmds
      allows to create a narrowly-focused BPF token (controlled by privileged
      agent) with a restrictive set of BPF maps that application can attempt
      to create.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20240124022127.2379740-5-andrii@kernel.org
      a177fc2b
    • Andrii Nakryiko's avatar
      bpf: Introduce BPF token object · 35f96de0
      Andrii Nakryiko authored
      Add new kind of BPF kernel object, BPF token. BPF token is meant to
      allow delegating privileged BPF functionality, like loading a BPF
      program or creating a BPF map, from privileged process to a *trusted*
      unprivileged process, all while having a good amount of control over which
      privileged operations could be performed using provided BPF token.
      
      This is achieved through mounting BPF FS instance with extra delegation
      mount options, which determine what operations are delegatable, and also
      constraining it to the owning user namespace (as mentioned in the
      previous patch).
      
      BPF token itself is just a derivative from BPF FS and can be created
      through a new bpf() syscall command, BPF_TOKEN_CREATE, which accepts BPF
      FS FD, which can be attained through open() API by opening BPF FS mount
      point. Currently, BPF token "inherits" delegated command, map types,
      prog type, and attach type bit sets from BPF FS as is. In the future,
      having an BPF token as a separate object with its own FD, we can allow
      to further restrict BPF token's allowable set of things either at the
      creation time or after the fact, allowing the process to guard itself
      further from unintentionally trying to load undesired kind of BPF
      programs. But for now we keep things simple and just copy bit sets as is.
      
      When BPF token is created from BPF FS mount, we take reference to the
      BPF super block's owning user namespace, and then use that namespace for
      checking all the {CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, CAP_SYS_ADMIN}
      capabilities that are normally only checked against init userns (using
      capable()), but now we check them using ns_capable() instead (if BPF
      token is provided). See bpf_token_capable() for details.
      
      Such setup means that BPF token in itself is not sufficient to grant BPF
      functionality. User namespaced process has to *also* have necessary
      combination of capabilities inside that user namespace. So while
      previously CAP_BPF was useless when granted within user namespace, now
      it gains a meaning and allows container managers and sys admins to have
      a flexible control over which processes can and need to use BPF
      functionality within the user namespace (i.e., container in practice).
      And BPF FS delegation mount options and derived BPF tokens serve as
      a per-container "flag" to grant overall ability to use bpf() (plus further
      restrict on which parts of bpf() syscalls are treated as namespaced).
      
      Note also, BPF_TOKEN_CREATE command itself requires ns_capable(CAP_BPF)
      within the BPF FS owning user namespace, rounding up the ns_capable()
      story of BPF token. Also creating BPF token in init user namespace is
      currently not supported, given BPF token doesn't have any effect in init
      user namespace anyways.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarChristian Brauner <brauner@kernel.org>
      Link: https://lore.kernel.org/bpf/20240124022127.2379740-4-andrii@kernel.org
      35f96de0
    • Andrii Nakryiko's avatar
      bpf: Add BPF token delegation mount options to BPF FS · 6fe01d3c
      Andrii Nakryiko authored
      Add few new mount options to BPF FS that allow to specify that a given
      BPF FS instance allows creation of BPF token (added in the next patch),
      and what sort of operations are allowed under BPF token. As such, we get
      4 new mount options, each is a bit mask
        - `delegate_cmds` allow to specify which bpf() syscall commands are
          allowed with BPF token derived from this BPF FS instance;
        - if BPF_MAP_CREATE command is allowed, `delegate_maps` specifies
          a set of allowable BPF map types that could be created with BPF token;
        - if BPF_PROG_LOAD command is allowed, `delegate_progs` specifies
          a set of allowable BPF program types that could be loaded with BPF token;
        - if BPF_PROG_LOAD command is allowed, `delegate_attachs` specifies
          a set of allowable BPF program attach types that could be loaded with
          BPF token; delegate_progs and delegate_attachs are meant to be used
          together, as full BPF program type is, in general, determined
          through both program type and program attach type.
      
      Currently, these mount options accept the following forms of values:
        - a special value "any", that enables all possible values of a given
        bit set;
        - numeric value (decimal or hexadecimal, determined by kernel
        automatically) that specifies a bit mask value directly;
        - all the values for a given mount option are combined, if specified
        multiple times. E.g., `mount -t bpf nodev /path/to/mount -o
        delegate_maps=0x1 -o delegate_maps=0x2` will result in a combined 0x3
        mask.
      
      Ideally, more convenient (for humans) symbolic form derived from
      corresponding UAPI enums would be accepted (e.g., `-o
      delegate_progs=kprobe|tracepoint`) and I intend to implement this, but
      it requires a bunch of UAPI header churn, so I postponed it until this
      feature lands upstream or at least there is a definite consensus that
      this feature is acceptable and is going to make it, just to minimize
      amount of wasted effort and not increase amount of non-essential code to
      be reviewed.
      
      Attentive reader will notice that BPF FS is now marked as
      FS_USERNS_MOUNT, which theoretically makes it mountable inside non-init
      user namespace as long as the process has sufficient *namespaced*
      capabilities within that user namespace. But in reality we still
      restrict BPF FS to be mountable only by processes with CAP_SYS_ADMIN *in
      init userns* (extra check in bpf_fill_super()). FS_USERNS_MOUNT is added
      to allow creating BPF FS context object (i.e., fsopen("bpf")) from
      inside unprivileged process inside non-init userns, to capture that
      userns as the owning userns. It will still be required to pass this
      context object back to privileged process to instantiate and mount it.
      
      This manipulation is important, because capturing non-init userns as the
      owning userns of BPF FS instance (super block) allows to use that userns
      to constraint BPF token to that userns later on (see next patch). So
      creating BPF FS with delegation inside unprivileged userns will restrict
      derived BPF token objects to only "work" inside that intended userns,
      making it scoped to a intended "container". Also, setting these
      delegation options requires capable(CAP_SYS_ADMIN), so unprivileged
      process cannot set this up without involvement of a privileged process.
      
      There is a set of selftests at the end of the patch set that simulates
      this sequence of steps and validates that everything works as intended.
      But careful review is requested to make sure there are no missed gaps in
      the implementation and testing.
      
      This somewhat subtle set of aspects is the result of previous
      discussions ([0]) about various user namespace implications and
      interactions with BPF token functionality and is necessary to contain
      BPF token inside intended user namespace.
      
        [0] https://lore.kernel.org/bpf/20230704-hochverdient-lehne-eeb9eeef785e@brauner/Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarChristian Brauner <brauner@kernel.org>
      Link: https://lore.kernel.org/bpf/20240124022127.2379740-3-andrii@kernel.org
      6fe01d3c
    • Andrii Nakryiko's avatar
      bpf: Align CAP_NET_ADMIN checks with bpf_capable() approach · ed1ad5a7
      Andrii Nakryiko authored
      Within BPF syscall handling code CAP_NET_ADMIN checks stand out a bit
      compared to CAP_BPF and CAP_PERFMON checks. For the latter, CAP_BPF or
      CAP_PERFMON are checked first, but if they are not set, CAP_SYS_ADMIN
      takes over and grants whatever part of BPF syscall is required.
      
      Similar kind of checks that involve CAP_NET_ADMIN are not so consistent.
      One out of four uses does follow CAP_BPF/CAP_PERFMON model: during
      BPF_PROG_LOAD, if the type of BPF program is "network-related" either
      CAP_NET_ADMIN or CAP_SYS_ADMIN is required to proceed.
      
      But in three other cases CAP_NET_ADMIN is required even if CAP_SYS_ADMIN
      is set:
        - when creating DEVMAP/XDKMAP/CPU_MAP maps;
        - when attaching CGROUP_SKB programs;
        - when handling BPF_PROG_QUERY command.
      
      This patch is changing the latter three cases to follow BPF_PROG_LOAD
      model, that is allowing to proceed under either CAP_NET_ADMIN or
      CAP_SYS_ADMIN.
      
      This also makes it cleaner in subsequent BPF token patches to switch
      wholesomely to a generic bpf_token_capable(int cap) check, that always
      falls back to CAP_SYS_ADMIN if requested capability is missing.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarYafang Shao <laoar.shao@gmail.com>
      Link: https://lore.kernel.org/bpf/20240124022127.2379740-2-andrii@kernel.org
      ed1ad5a7
    • Martin KaFai Lau's avatar
      libbpf: Ensure undefined bpf_attr field stays 0 · c9f11556
      Martin KaFai Lau authored
      The commit 9e926acd ("libbpf: Find correct module BTFs for struct_ops maps and progs.")
      sets a newly added field (value_type_btf_obj_fd) to -1 in libbpf when
      the caller of the libbpf's bpf_map_create did not define this field by
      passing a NULL "opts" or passing in a "opts" that does not cover this
      new field. OPT_HAS(opts, field) is used to decide if the field is
      defined or not:
      
      	((opts) && opts->sz >= offsetofend(typeof(*(opts)), field))
      
      Once OPTS_HAS decided the field is not defined, that field should
      be set to 0. For this particular new field (value_type_btf_obj_fd),
      its corresponding map_flags "BPF_F_VTYPE_BTF_OBJ_FD" is not set.
      Thus, the kernel does not treat it as an fd field.
      
      Fixes: 9e926acd ("libbpf: Find correct module BTFs for struct_ops maps and progs.")
      Reported-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20240124224418.2905133-1-martin.lau@linux.devSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      c9f11556
  2. 24 Jan, 2024 23 commits