• Kees Cook's avatar
    seccomp: implement SECCOMP_FILTER_FLAG_TSYNC · c2e1f2e3
    Kees Cook authored
    Applying restrictive seccomp filter programs to large or diverse
    codebases often requires handling threads which may be started early in
    the process lifetime (e.g., by code that is linked in). While it is
    possible to apply permissive programs prior to process start up, it is
    difficult to further restrict the kernel ABI to those threads after that
    point.
    
    This change adds a new seccomp syscall flag to SECCOMP_SET_MODE_FILTER for
    synchronizing thread group seccomp filters at filter installation time.
    
    When calling seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC,
    filter) an attempt will be made to synchronize all threads in current's
    threadgroup to its new seccomp filter program. This is possible iff all
    threads are using a filter that is an ancestor to the filter current is
    attempting to synchronize to. NULL filters (where the task is running as
    SECCOMP_MODE_NONE) are also treated as ancestors allowing threads to be
    transitioned into SECCOMP_MODE_FILTER. If prctrl(PR_SET_NO_NEW_PRIVS,
    ...) has been set on the calling thread, no_new_privs will be set for
    all synchronized threads too. On success, 0 is returned. On failure,
    the pid of one of the failing threads will be returned and no filters
    will have been applied.
    
    The race conditions against another thread are:
    - requesting TSYNC (already handled by sighand lock)
    - performing a clone (already handled by sighand lock)
    - changing its filter (already handled by sighand lock)
    - calling exec (handled by cred_guard_mutex)
    The clone case is assisted by the fact that new threads will have their
    seccomp state duplicated from their parent before appearing on the tasklist.
    
    Holding cred_guard_mutex means that seccomp filters cannot be assigned
    while in the middle of another thread's exec (potentially bypassing
    no_new_privs or similar). The call to de_thread() may kill threads waiting
    for the mutex.
    
    Changes across threads to the filter pointer includes a barrier.
    
    Based on patches by Will Drewry.
    Suggested-by: default avatarJulien Tinnes <jln@chromium.org>
    Signed-off-by: default avatarKees Cook <keescook@chromium.org>
    Reviewed-by: default avatarOleg Nesterov <oleg@redhat.com>
    Reviewed-by: default avatarAndy Lutomirski <luto@amacapital.net>
    c2e1f2e3
seccomp.h 1.78 KB