• Aleksa Sarai's avatar
    fs: exec: apply CLOEXEC before changing dumpable task flags · 77e36d73
    Aleksa Sarai authored
    [ Upstream commit 613cc2b6 ]
    
    If you have a process that has set itself to be non-dumpable, and it
    then undergoes exec(2), any CLOEXEC file descriptors it has open are
    "exposed" during a race window between the dumpable flags of the process
    being reset for exec(2) and CLOEXEC being applied to the file
    descriptors. This can be exploited by a process by attempting to access
    /proc/<pid>/fd/... during this window, without requiring CAP_SYS_PTRACE.
    
    The race in question is after set_dumpable has been (for get_link,
    though the trace is basically the same for readlink):
    
    [vfs]
    -> proc_pid_link_inode_operations.get_link
       -> proc_pid_get_link
          -> proc_fd_access_allowed
             -> ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS);
    
    Which will return 0, during the race window and CLOEXEC file descriptors
    will still be open during this window because do_close_on_exec has not
    been called yet. As a result, the ordering of these calls should be
    reversed to avoid this race window.
    
    This is of particular concern to container runtimes, where joining a
    PID namespace with file descriptors referring to the host filesystem
    can result in security issues (since PRCTL_SET_DUMPABLE doesn't protect
    against access of CLOEXEC file descriptors -- file descriptors which may
    reference filesystem objects the container shouldn't have access to).
    
    Cc: dev@opencontainers.org
    Cc: <stable@vger.kernel.org> # v3.2+
    Reported-by: default avatarMichael Crosby <crosbymichael@gmail.com>
    Signed-off-by: default avatarAleksa Sarai <asarai@suse.de>
    Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
    Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
    77e36d73
exec.c 38 KB