• Andrii Nakryiko's avatar
    bpf: Add BPF token delegation mount options to BPF FS · 6fe01d3c
    Andrii Nakryiko authored
    Add few new mount options to BPF FS that allow to specify that a given
    BPF FS instance allows creation of BPF token (added in the next patch),
    and what sort of operations are allowed under BPF token. As such, we get
    4 new mount options, each is a bit mask
      - `delegate_cmds` allow to specify which bpf() syscall commands are
        allowed with BPF token derived from this BPF FS instance;
      - if BPF_MAP_CREATE command is allowed, `delegate_maps` specifies
        a set of allowable BPF map types that could be created with BPF token;
      - if BPF_PROG_LOAD command is allowed, `delegate_progs` specifies
        a set of allowable BPF program types that could be loaded with BPF token;
      - if BPF_PROG_LOAD command is allowed, `delegate_attachs` specifies
        a set of allowable BPF program attach types that could be loaded with
        BPF token; delegate_progs and delegate_attachs are meant to be used
        together, as full BPF program type is, in general, determined
        through both program type and program attach type.
    
    Currently, these mount options accept the following forms of values:
      - a special value "any", that enables all possible values of a given
      bit set;
      - numeric value (decimal or hexadecimal, determined by kernel
      automatically) that specifies a bit mask value directly;
      - all the values for a given mount option are combined, if specified
      multiple times. E.g., `mount -t bpf nodev /path/to/mount -o
      delegate_maps=0x1 -o delegate_maps=0x2` will result in a combined 0x3
      mask.
    
    Ideally, more convenient (for humans) symbolic form derived from
    corresponding UAPI enums would be accepted (e.g., `-o
    delegate_progs=kprobe|tracepoint`) and I intend to implement this, but
    it requires a bunch of UAPI header churn, so I postponed it until this
    feature lands upstream or at least there is a definite consensus that
    this feature is acceptable and is going to make it, just to minimize
    amount of wasted effort and not increase amount of non-essential code to
    be reviewed.
    
    Attentive reader will notice that BPF FS is now marked as
    FS_USERNS_MOUNT, which theoretically makes it mountable inside non-init
    user namespace as long as the process has sufficient *namespaced*
    capabilities within that user namespace. But in reality we still
    restrict BPF FS to be mountable only by processes with CAP_SYS_ADMIN *in
    init userns* (extra check in bpf_fill_super()). FS_USERNS_MOUNT is added
    to allow creating BPF FS context object (i.e., fsopen("bpf")) from
    inside unprivileged process inside non-init userns, to capture that
    userns as the owning userns. It will still be required to pass this
    context object back to privileged process to instantiate and mount it.
    
    This manipulation is important, because capturing non-init userns as the
    owning userns of BPF FS instance (super block) allows to use that userns
    to constraint BPF token to that userns later on (see next patch). So
    creating BPF FS with delegation inside unprivileged userns will restrict
    derived BPF token objects to only "work" inside that intended userns,
    making it scoped to a intended "container". Also, setting these
    delegation options requires capable(CAP_SYS_ADMIN), so unprivileged
    process cannot set this up without involvement of a privileged process.
    
    There is a set of selftests at the end of the patch set that simulates
    this sequence of steps and validates that everything works as intended.
    But careful review is requested to make sure there are no missed gaps in
    the implementation and testing.
    
    This somewhat subtle set of aspects is the result of previous
    discussions ([0]) about various user namespace implications and
    interactions with BPF token functionality and is necessary to contain
    BPF token inside intended user namespace.
    
      [0] https://lore.kernel.org/bpf/20230704-hochverdient-lehne-eeb9eeef785e@brauner/Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
    Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
    Acked-by: default avatarChristian Brauner <brauner@kernel.org>
    Link: https://lore.kernel.org/bpf/20240124022127.2379740-3-andrii@kernel.org
    6fe01d3c
inode.c 20 KB