• Andrii Nakryiko's avatar
    bpf: introduce BPF token object · 4527358b
    Andrii Nakryiko authored
    Add new kind of BPF kernel object, BPF token. BPF token is meant to
    allow delegating privileged BPF functionality, like loading a BPF
    program or creating a BPF map, from privileged process to a *trusted*
    unprivileged process, all while having a good amount of control over which
    privileged operations could be performed using provided BPF token.
    
    This is achieved through mounting BPF FS instance with extra delegation
    mount options, which determine what operations are delegatable, and also
    constraining it to the owning user namespace (as mentioned in the
    previous patch).
    
    BPF token itself is just a derivative from BPF FS and can be created
    through a new bpf() syscall command, BPF_TOKEN_CREATE, which accepts BPF
    FS FD, which can be attained through open() API by opening BPF FS mount
    point. Currently, BPF token "inherits" delegated command, map types,
    prog type, and attach type bit sets from BPF FS as is. In the future,
    having an BPF token as a separate object with its own FD, we can allow
    to further restrict BPF token's allowable set of things either at the
    creation time or after the fact, allowing the process to guard itself
    further from unintentionally trying to load undesired kind of BPF
    programs. But for now we keep things simple and just copy bit sets as is.
    
    When BPF token is created from BPF FS mount, we take reference to the
    BPF super block's owning user namespace, and then use that namespace for
    checking all the {CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, CAP_SYS_ADMIN}
    capabilities that are normally only checked against init userns (using
    capable()), but now we check them using ns_capable() instead (if BPF
    token is provided). See bpf_token_capable() for details.
    
    Such setup means that BPF token in itself is not sufficient to grant BPF
    functionality. User namespaced process has to *also* have necessary
    combination of capabilities inside that user namespace. So while
    previously CAP_BPF was useless when granted within user namespace, now
    it gains a meaning and allows container managers and sys admins to have
    a flexible control over which processes can and need to use BPF
    functionality within the user namespace (i.e., container in practice).
    And BPF FS delegation mount options and derived BPF tokens serve as
    a per-container "flag" to grant overall ability to use bpf() (plus further
    restrict on which parts of bpf() syscalls are treated as namespaced).
    
    Note also, BPF_TOKEN_CREATE command itself requires ns_capable(CAP_BPF)
    within the BPF FS owning user namespace, rounding up the ns_capable()
    story of BPF token.
    Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/r/20231130185229.2688956-4-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
    4527358b
bpf.h 270 KB