Commit fa7773de authored by Jens Axboe's avatar Jens Axboe

Merge branch 'work.openat2' of...

Merge branch 'work.openat2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs into for-5.6/io_uring-vfs

Pull in Al's openat2 branch, since we'll need that for the openat2
support.

* 'work.openat2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  Documentation: path-lookup: include new LOOKUP flags
  selftests: add openat2(2) selftests
  open: introduce openat2(2) syscall
  namei: LOOKUP_{IN_ROOT,BENEATH}: permit limited ".." resolution
  namei: LOOKUP_IN_ROOT: chroot-like scoped resolution
  namei: LOOKUP_BENEATH: O_BENEATH-like scoped resolution
  namei: LOOKUP_NO_XDEV: block mountpoint crossing
  namei: LOOKUP_NO_MAGICLINKS: block magic-link resolution
  namei: LOOKUP_NO_SYMLINKS: block symlink resolution
  namei: allow set_root() to produce errors
  namei: allow nd_jump_link() to produce errors
  nsfs: clean-up ns_get_path() signature to return int
  namei: only return -ECHILD from follow_dotdot_rcu()
parents def9d278 b55eef87
...@@ -3302,7 +3302,9 @@ S: France ...@@ -3302,7 +3302,9 @@ S: France
N: Aleksa Sarai N: Aleksa Sarai
E: cyphar@cyphar.com E: cyphar@cyphar.com
W: https://www.cyphar.com/ W: https://www.cyphar.com/
D: `pids` cgroup subsystem D: /sys/fs/cgroup/pids
D: openat2(2)
S: Sydney, Australia
N: Dipankar Sarma N: Dipankar Sarma
E: dipankar@in.ibm.com E: dipankar@in.ibm.com
......
...@@ -13,6 +13,7 @@ It has subsequently been updated to reflect changes in the kernel ...@@ -13,6 +13,7 @@ It has subsequently been updated to reflect changes in the kernel
including: including:
- per-directory parallel name lookup. - per-directory parallel name lookup.
- ``openat2()`` resolution restriction flags.
Introduction to pathname lookup Introduction to pathname lookup
=============================== ===============================
...@@ -235,6 +236,13 @@ renamed. If ``d_lookup`` finds that a rename happened while it ...@@ -235,6 +236,13 @@ renamed. If ``d_lookup`` finds that a rename happened while it
unsuccessfully scanned a chain in the hash table, it simply tries unsuccessfully scanned a chain in the hash table, it simply tries
again. again.
``rename_lock`` is also used to detect and defend against potential attacks
against ``LOOKUP_BENEATH`` and ``LOOKUP_IN_ROOT`` when resolving ".." (where
the parent directory is moved outside the root, bypassing the ``path_equal()``
check). If ``rename_lock`` is updated during the lookup and the path encounters
a "..", a potential attack occurred and ``handle_dots()`` will bail out with
``-EAGAIN``.
inode->i_rwsem inode->i_rwsem
~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~
...@@ -348,6 +356,13 @@ any changes to any mount points while stepping up. This locking is ...@@ -348,6 +356,13 @@ any changes to any mount points while stepping up. This locking is
needed to stabilize the link to the mounted-on dentry, which the needed to stabilize the link to the mounted-on dentry, which the
refcount on the mount itself doesn't ensure. refcount on the mount itself doesn't ensure.
``mount_lock`` is also used to detect and defend against potential attacks
against ``LOOKUP_BENEATH`` and ``LOOKUP_IN_ROOT`` when resolving ".." (where
the parent directory is moved outside the root, bypassing the ``path_equal()``
check). If ``mount_lock`` is updated during the lookup and the path encounters
a "..", a potential attack occurred and ``handle_dots()`` will bail out with
``-EAGAIN``.
RCU RCU
~~~ ~~~
...@@ -405,6 +420,10 @@ is requested. Keeping a reference in the ``nameidata`` ensures that ...@@ -405,6 +420,10 @@ is requested. Keeping a reference in the ``nameidata`` ensures that
only one root is in effect for the entire path walk, even if it races only one root is in effect for the entire path walk, even if it races
with a ``chroot()`` system call. with a ``chroot()`` system call.
It should be noted that in the case of ``LOOKUP_IN_ROOT`` or
``LOOKUP_BENEATH``, the effective root becomes the directory file descriptor
passed to ``openat2()`` (which exposes these ``LOOKUP_`` flags).
The root is needed when either of two conditions holds: (1) either the The root is needed when either of two conditions holds: (1) either the
pathname or a symbolic link starts with a "'/'", or (2) a "``..``" pathname or a symbolic link starts with a "'/'", or (2) a "``..``"
component is being handled, since "``..``" from the root must always stay component is being handled, since "``..``" from the root must always stay
...@@ -1149,7 +1168,7 @@ so ``NULL`` is returned to indicate that the symlink can be released and ...@@ -1149,7 +1168,7 @@ so ``NULL`` is returned to indicate that the symlink can be released and
the stack frame discarded. the stack frame discarded.
The other case involves things in ``/proc`` that look like symlinks but The other case involves things in ``/proc`` that look like symlinks but
aren't really:: aren't really (and are therefore commonly referred to as "magic-links")::
$ ls -l /proc/self/fd/1 $ ls -l /proc/self/fd/1
lrwx------ 1 neilb neilb 64 Jun 13 10:19 /proc/self/fd/1 -> /dev/pts/4 lrwx------ 1 neilb neilb 64 Jun 13 10:19 /proc/self/fd/1 -> /dev/pts/4
...@@ -1286,7 +1305,9 @@ A few flags ...@@ -1286,7 +1305,9 @@ A few flags
A suitable way to wrap up this tour of pathname walking is to list A suitable way to wrap up this tour of pathname walking is to list
the various flags that can be stored in the ``nameidata`` to guide the the various flags that can be stored in the ``nameidata`` to guide the
lookup process. Many of these are only meaningful on the final lookup process. Many of these are only meaningful on the final
component, others reflect the current state of the pathname lookup. component, others reflect the current state of the pathname lookup, and some
apply restrictions to all path components encountered in the path lookup.
And then there is ``LOOKUP_EMPTY``, which doesn't fit conceptually with And then there is ``LOOKUP_EMPTY``, which doesn't fit conceptually with
the others. If this is not set, an empty pathname causes an error the others. If this is not set, an empty pathname causes an error
very early on. If it is set, empty pathnames are not considered to be very early on. If it is set, empty pathnames are not considered to be
...@@ -1310,13 +1331,48 @@ longer needed. ...@@ -1310,13 +1331,48 @@ longer needed.
``LOOKUP_JUMPED`` means that the current dentry was chosen not because ``LOOKUP_JUMPED`` means that the current dentry was chosen not because
it had the right name but for some other reason. This happens when it had the right name but for some other reason. This happens when
following "``..``", following a symlink to ``/``, crossing a mount point following "``..``", following a symlink to ``/``, crossing a mount point
or accessing a "``/proc/$PID/fd/$FD``" symlink. In this case the or accessing a "``/proc/$PID/fd/$FD``" symlink (also known as a "magic
filesystem has not been asked to revalidate the name (with link"). In this case the filesystem has not been asked to revalidate the
``d_revalidate()``). In such cases the inode may still need to be name (with ``d_revalidate()``). In such cases the inode may still need
revalidated, so ``d_op->d_weak_revalidate()`` is called if to be revalidated, so ``d_op->d_weak_revalidate()`` is called if
``LOOKUP_JUMPED`` is set when the look completes - which may be at the ``LOOKUP_JUMPED`` is set when the look completes - which may be at the
final component or, when creating, unlinking, or renaming, at the penultimate component. final component or, when creating, unlinking, or renaming, at the penultimate component.
Resolution-restriction flags
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In order to allow userspace to protect itself against certain race conditions
and attack scenarios involving changing path components, a series of flags are
available which apply restrictions to all path components encountered during
path lookup. These flags are exposed through ``openat2()``'s ``resolve`` field.
``LOOKUP_NO_SYMLINKS`` blocks all symlink traversals (including magic-links).
This is distinctly different from ``LOOKUP_FOLLOW``, because the latter only
relates to restricting the following of trailing symlinks.
``LOOKUP_NO_MAGICLINKS`` blocks all magic-link traversals. Filesystems must
ensure that they return errors from ``nd_jump_link()``, because that is how
``LOOKUP_NO_MAGICLINKS`` and other magic-link restrictions are implemented.
``LOOKUP_NO_XDEV`` blocks all ``vfsmount`` traversals (this includes both
bind-mounts and ordinary mounts). Note that the ``vfsmount`` which contains the
lookup is determined by the first mountpoint the path lookup reaches --
absolute paths start with the ``vfsmount`` of ``/``, and relative paths start
with the ``dfd``'s ``vfsmount``. Magic-links are only permitted if the
``vfsmount`` of the path is unchanged.
``LOOKUP_BENEATH`` blocks any path components which resolve outside the
starting point of the resolution. This is done by blocking ``nd_jump_root()``
as well as blocking ".." if it would jump outside the starting point.
``rename_lock`` and ``mount_lock`` are used to detect attacks against the
resolution of "..". Magic-links are also blocked.
``LOOKUP_IN_ROOT`` resolves all path components as though the starting point
were the filesystem root. ``nd_jump_root()`` brings the resolution back to to
the starting point, and ".." at the starting point will act as a no-op. As with
``LOOKUP_BENEATH``, ``rename_lock`` and ``mount_lock`` are used to detect
attacks against ".." resolution. Magic-links are also blocked.
Final-component flags Final-component flags
~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~
......
...@@ -6402,6 +6402,7 @@ F: fs/* ...@@ -6402,6 +6402,7 @@ F: fs/*
F: include/linux/fs.h F: include/linux/fs.h
F: include/linux/fs_types.h F: include/linux/fs_types.h
F: include/uapi/linux/fs.h F: include/uapi/linux/fs.h
F: include/uapi/linux/openat2.h
FINTEK F75375S HARDWARE MONITOR AND FAN CONTROLLER DRIVER FINTEK F75375S HARDWARE MONITOR AND FAN CONTROLLER DRIVER
M: Riku Voipio <riku.voipio@iki.fi> M: Riku Voipio <riku.voipio@iki.fi>
......
...@@ -475,3 +475,4 @@ ...@@ -475,3 +475,4 @@
543 common fspick sys_fspick 543 common fspick sys_fspick
544 common pidfd_open sys_pidfd_open 544 common pidfd_open sys_pidfd_open
# 545 reserved for clone3 # 545 reserved for clone3
547 common openat2 sys_openat2
...@@ -449,3 +449,4 @@ ...@@ -449,3 +449,4 @@
433 common fspick sys_fspick 433 common fspick sys_fspick
434 common pidfd_open sys_pidfd_open 434 common pidfd_open sys_pidfd_open
435 common clone3 sys_clone3 435 common clone3 sys_clone3
437 common openat2 sys_openat2
...@@ -38,7 +38,7 @@ ...@@ -38,7 +38,7 @@
#define __ARM_NR_compat_set_tls (__ARM_NR_COMPAT_BASE + 5) #define __ARM_NR_compat_set_tls (__ARM_NR_COMPAT_BASE + 5)
#define __ARM_NR_COMPAT_END (__ARM_NR_COMPAT_BASE + 0x800) #define __ARM_NR_COMPAT_END (__ARM_NR_COMPAT_BASE + 0x800)
#define __NR_compat_syscalls 436 #define __NR_compat_syscalls 438
#endif #endif
#define __ARCH_WANT_SYS_CLONE #define __ARCH_WANT_SYS_CLONE
......
...@@ -879,6 +879,8 @@ __SYSCALL(__NR_fspick, sys_fspick) ...@@ -879,6 +879,8 @@ __SYSCALL(__NR_fspick, sys_fspick)
__SYSCALL(__NR_pidfd_open, sys_pidfd_open) __SYSCALL(__NR_pidfd_open, sys_pidfd_open)
#define __NR_clone3 435 #define __NR_clone3 435
__SYSCALL(__NR_clone3, sys_clone3) __SYSCALL(__NR_clone3, sys_clone3)
#define __NR_openat2 437
__SYSCALL(__NR_openat2, sys_openat2)
/* /*
* Please add new compat syscalls above this comment and update * Please add new compat syscalls above this comment and update
......
...@@ -356,3 +356,4 @@ ...@@ -356,3 +356,4 @@
433 common fspick sys_fspick 433 common fspick sys_fspick
434 common pidfd_open sys_pidfd_open 434 common pidfd_open sys_pidfd_open
# 435 reserved for clone3 # 435 reserved for clone3
437 common openat2 sys_openat2
...@@ -435,3 +435,4 @@ ...@@ -435,3 +435,4 @@
433 common fspick sys_fspick 433 common fspick sys_fspick
434 common pidfd_open sys_pidfd_open 434 common pidfd_open sys_pidfd_open
# 435 reserved for clone3 # 435 reserved for clone3
437 common openat2 sys_openat2
...@@ -441,3 +441,4 @@ ...@@ -441,3 +441,4 @@
433 common fspick sys_fspick 433 common fspick sys_fspick
434 common pidfd_open sys_pidfd_open 434 common pidfd_open sys_pidfd_open
435 common clone3 sys_clone3 435 common clone3 sys_clone3
437 common openat2 sys_openat2
...@@ -374,3 +374,4 @@ ...@@ -374,3 +374,4 @@
433 n32 fspick sys_fspick 433 n32 fspick sys_fspick
434 n32 pidfd_open sys_pidfd_open 434 n32 pidfd_open sys_pidfd_open
435 n32 clone3 __sys_clone3 435 n32 clone3 __sys_clone3
437 n32 openat2 sys_openat2
...@@ -350,3 +350,4 @@ ...@@ -350,3 +350,4 @@
433 n64 fspick sys_fspick 433 n64 fspick sys_fspick
434 n64 pidfd_open sys_pidfd_open 434 n64 pidfd_open sys_pidfd_open
435 n64 clone3 __sys_clone3 435 n64 clone3 __sys_clone3
437 n64 openat2 sys_openat2
...@@ -423,3 +423,4 @@ ...@@ -423,3 +423,4 @@
433 o32 fspick sys_fspick 433 o32 fspick sys_fspick
434 o32 pidfd_open sys_pidfd_open 434 o32 pidfd_open sys_pidfd_open
435 o32 clone3 __sys_clone3 435 o32 clone3 __sys_clone3
437 o32 openat2 sys_openat2
...@@ -433,3 +433,4 @@ ...@@ -433,3 +433,4 @@
433 common fspick sys_fspick 433 common fspick sys_fspick
434 common pidfd_open sys_pidfd_open 434 common pidfd_open sys_pidfd_open
435 common clone3 sys_clone3_wrapper 435 common clone3 sys_clone3_wrapper
437 common openat2 sys_openat2
...@@ -517,3 +517,4 @@ ...@@ -517,3 +517,4 @@
433 common fspick sys_fspick 433 common fspick sys_fspick
434 common pidfd_open sys_pidfd_open 434 common pidfd_open sys_pidfd_open
435 nospu clone3 ppc_clone3 435 nospu clone3 ppc_clone3
437 common openat2 sys_openat2
...@@ -438,3 +438,4 @@ ...@@ -438,3 +438,4 @@
433 common fspick sys_fspick sys_fspick 433 common fspick sys_fspick sys_fspick
434 common pidfd_open sys_pidfd_open sys_pidfd_open 434 common pidfd_open sys_pidfd_open sys_pidfd_open
435 common clone3 sys_clone3 sys_clone3 435 common clone3 sys_clone3 sys_clone3
437 common openat2 sys_openat2 sys_openat2
...@@ -438,3 +438,4 @@ ...@@ -438,3 +438,4 @@
433 common fspick sys_fspick 433 common fspick sys_fspick
434 common pidfd_open sys_pidfd_open 434 common pidfd_open sys_pidfd_open
# 435 reserved for clone3 # 435 reserved for clone3
437 common openat2 sys_openat2
...@@ -481,3 +481,4 @@ ...@@ -481,3 +481,4 @@
433 common fspick sys_fspick 433 common fspick sys_fspick
434 common pidfd_open sys_pidfd_open 434 common pidfd_open sys_pidfd_open
# 435 reserved for clone3 # 435 reserved for clone3
437 common openat2 sys_openat2
...@@ -440,3 +440,4 @@ ...@@ -440,3 +440,4 @@
433 i386 fspick sys_fspick __ia32_sys_fspick 433 i386 fspick sys_fspick __ia32_sys_fspick
434 i386 pidfd_open sys_pidfd_open __ia32_sys_pidfd_open 434 i386 pidfd_open sys_pidfd_open __ia32_sys_pidfd_open
435 i386 clone3 sys_clone3 __ia32_sys_clone3 435 i386 clone3 sys_clone3 __ia32_sys_clone3
437 i386 openat2 sys_openat2 __ia32_sys_openat2
...@@ -357,6 +357,7 @@ ...@@ -357,6 +357,7 @@
433 common fspick __x64_sys_fspick 433 common fspick __x64_sys_fspick
434 common pidfd_open __x64_sys_pidfd_open 434 common pidfd_open __x64_sys_pidfd_open
435 common clone3 __x64_sys_clone3/ptregs 435 common clone3 __x64_sys_clone3/ptregs
437 common openat2 __x64_sys_openat2
# #
# x32-specific system call numbers start at 512 to avoid cache impact # x32-specific system call numbers start at 512 to avoid cache impact
......
...@@ -406,3 +406,4 @@ ...@@ -406,3 +406,4 @@
433 common fspick sys_fspick 433 common fspick sys_fspick
434 common pidfd_open sys_pidfd_open 434 common pidfd_open sys_pidfd_open
435 common clone3 sys_clone3 435 common clone3 sys_clone3
437 common openat2 sys_openat2
This diff is collapsed.
...@@ -55,7 +55,7 @@ static void nsfs_evict(struct inode *inode) ...@@ -55,7 +55,7 @@ static void nsfs_evict(struct inode *inode)
ns->ops->put(ns); ns->ops->put(ns);
} }
static void *__ns_get_path(struct path *path, struct ns_common *ns) static int __ns_get_path(struct path *path, struct ns_common *ns)
{ {
struct vfsmount *mnt = nsfs_mnt; struct vfsmount *mnt = nsfs_mnt;
struct dentry *dentry; struct dentry *dentry;
...@@ -74,13 +74,13 @@ static void *__ns_get_path(struct path *path, struct ns_common *ns) ...@@ -74,13 +74,13 @@ static void *__ns_get_path(struct path *path, struct ns_common *ns)
got_it: got_it:
path->mnt = mntget(mnt); path->mnt = mntget(mnt);
path->dentry = dentry; path->dentry = dentry;
return NULL; return 0;
slow: slow:
rcu_read_unlock(); rcu_read_unlock();
inode = new_inode_pseudo(mnt->mnt_sb); inode = new_inode_pseudo(mnt->mnt_sb);
if (!inode) { if (!inode) {
ns->ops->put(ns); ns->ops->put(ns);
return ERR_PTR(-ENOMEM); return -ENOMEM;
} }
inode->i_ino = ns->inum; inode->i_ino = ns->inum;
inode->i_mtime = inode->i_atime = inode->i_ctime = current_time(inode); inode->i_mtime = inode->i_atime = inode->i_ctime = current_time(inode);
...@@ -92,7 +92,7 @@ static void *__ns_get_path(struct path *path, struct ns_common *ns) ...@@ -92,7 +92,7 @@ static void *__ns_get_path(struct path *path, struct ns_common *ns)
dentry = d_alloc_anon(mnt->mnt_sb); dentry = d_alloc_anon(mnt->mnt_sb);
if (!dentry) { if (!dentry) {
iput(inode); iput(inode);
return ERR_PTR(-ENOMEM); return -ENOMEM;
} }
d_instantiate(dentry, inode); d_instantiate(dentry, inode);
dentry->d_fsdata = (void *)ns->ops; dentry->d_fsdata = (void *)ns->ops;
...@@ -101,23 +101,22 @@ static void *__ns_get_path(struct path *path, struct ns_common *ns) ...@@ -101,23 +101,22 @@ static void *__ns_get_path(struct path *path, struct ns_common *ns)
d_delete(dentry); /* make sure ->d_prune() does nothing */ d_delete(dentry); /* make sure ->d_prune() does nothing */
dput(dentry); dput(dentry);
cpu_relax(); cpu_relax();
return ERR_PTR(-EAGAIN); return -EAGAIN;
} }
goto got_it; goto got_it;
} }
void *ns_get_path_cb(struct path *path, ns_get_path_helper_t *ns_get_cb, int ns_get_path_cb(struct path *path, ns_get_path_helper_t *ns_get_cb,
void *private_data) void *private_data)
{ {
void *ret; int ret;
do { do {
struct ns_common *ns = ns_get_cb(private_data); struct ns_common *ns = ns_get_cb(private_data);
if (!ns) if (!ns)
return ERR_PTR(-ENOENT); return -ENOENT;
ret = __ns_get_path(path, ns); ret = __ns_get_path(path, ns);
} while (ret == ERR_PTR(-EAGAIN)); } while (ret == -EAGAIN);
return ret; return ret;
} }
...@@ -134,7 +133,7 @@ static struct ns_common *ns_get_path_task(void *private_data) ...@@ -134,7 +133,7 @@ static struct ns_common *ns_get_path_task(void *private_data)
return args->ns_ops->get(args->task); return args->ns_ops->get(args->task);
} }
void *ns_get_path(struct path *path, struct task_struct *task, int ns_get_path(struct path *path, struct task_struct *task,
const struct proc_ns_operations *ns_ops) const struct proc_ns_operations *ns_ops)
{ {
struct ns_get_path_task_args args = { struct ns_get_path_task_args args = {
...@@ -150,7 +149,7 @@ int open_related_ns(struct ns_common *ns, ...@@ -150,7 +149,7 @@ int open_related_ns(struct ns_common *ns,
{ {
struct path path = {}; struct path path = {};
struct file *f; struct file *f;
void *err; int err;
int fd; int fd;
fd = get_unused_fd_flags(O_CLOEXEC); fd = get_unused_fd_flags(O_CLOEXEC);
...@@ -167,11 +166,11 @@ int open_related_ns(struct ns_common *ns, ...@@ -167,11 +166,11 @@ int open_related_ns(struct ns_common *ns,
} }
err = __ns_get_path(&path, relative); err = __ns_get_path(&path, relative);
} while (err == ERR_PTR(-EAGAIN)); } while (err == -EAGAIN);
if (IS_ERR(err)) { if (err) {
put_unused_fd(fd); put_unused_fd(fd);
return PTR_ERR(err); return err;
} }
f = dentry_open(&path, O_RDONLY, current_cred()); f = dentry_open(&path, O_RDONLY, current_cred());
......
...@@ -955,48 +955,84 @@ struct file *open_with_fake_path(const struct path *path, int flags, ...@@ -955,48 +955,84 @@ struct file *open_with_fake_path(const struct path *path, int flags,
} }
EXPORT_SYMBOL(open_with_fake_path); EXPORT_SYMBOL(open_with_fake_path);
static inline int build_open_flags(int flags, umode_t mode, struct open_flags *op) #define WILL_CREATE(flags) (flags & (O_CREAT | __O_TMPFILE))
#define O_PATH_FLAGS (O_DIRECTORY | O_NOFOLLOW | O_PATH | O_CLOEXEC)
static inline struct open_how build_open_how(int flags, umode_t mode)
{
struct open_how how = {
.flags = flags & VALID_OPEN_FLAGS,
.mode = mode & S_IALLUGO,
};
/* O_PATH beats everything else. */
if (how.flags & O_PATH)
how.flags &= O_PATH_FLAGS;
/* Modes should only be set for create-like flags. */
if (!WILL_CREATE(how.flags))
how.mode = 0;
return how;
}
static inline int build_open_flags(const struct open_how *how,
struct open_flags *op)
{ {
int flags = how->flags;
int lookup_flags = 0; int lookup_flags = 0;
int acc_mode = ACC_MODE(flags); int acc_mode = ACC_MODE(flags);
/* Must never be set by userspace */
flags &= ~(FMODE_NONOTIFY | O_CLOEXEC);
/* /*
* Clear out all open flags we don't know about so that we don't report * Older syscalls implicitly clear all of the invalid flags or argument
* them in fcntl(F_GETFD) or similar interfaces. * values before calling build_open_flags(), but openat2(2) checks all
* of its arguments.
*/ */
flags &= VALID_OPEN_FLAGS; if (flags & ~VALID_OPEN_FLAGS)
return -EINVAL;
if (how->resolve & ~VALID_RESOLVE_FLAGS)
return -EINVAL;
if (flags & (O_CREAT | __O_TMPFILE)) /* Deal with the mode. */
op->mode = (mode & S_IALLUGO) | S_IFREG; if (WILL_CREATE(flags)) {
else if (how->mode & ~S_IALLUGO)
return -EINVAL;
op->mode = how->mode | S_IFREG;
} else {
if (how->mode != 0)
return -EINVAL;
op->mode = 0; op->mode = 0;
}
/* Must never be set by userspace */
flags &= ~FMODE_NONOTIFY & ~O_CLOEXEC;
/* /*
* O_SYNC is implemented as __O_SYNC|O_DSYNC. As many places only * In order to ensure programs get explicit errors when trying to use
* check for O_DSYNC if the need any syncing at all we enforce it's * O_TMPFILE on old kernels, O_TMPFILE is implemented such that it
* always set instead of having to deal with possibly weird behaviour * looks like (O_DIRECTORY|O_RDWR & ~O_CREAT) to old kernels. But we
* for malicious applications setting only __O_SYNC. * have to require userspace to explicitly set it.
*/ */
if (flags & __O_SYNC)
flags |= O_DSYNC;
if (flags & __O_TMPFILE) { if (flags & __O_TMPFILE) {
if ((flags & O_TMPFILE_MASK) != O_TMPFILE) if ((flags & O_TMPFILE_MASK) != O_TMPFILE)
return -EINVAL; return -EINVAL;
if (!(acc_mode & MAY_WRITE)) if (!(acc_mode & MAY_WRITE))
return -EINVAL; return -EINVAL;
} else if (flags & O_PATH) { }
/* if (flags & O_PATH) {
* If we have O_PATH in the open flag. Then we /* O_PATH only permits certain other flags to be set. */
* cannot have anything other than the below set of flags if (flags & ~O_PATH_FLAGS)
*/ return -EINVAL;
flags &= O_DIRECTORY | O_NOFOLLOW | O_PATH;
acc_mode = 0; acc_mode = 0;
} }
/*
* O_SYNC is implemented as __O_SYNC|O_DSYNC. As many places only
* check for O_DSYNC if the need any syncing at all we enforce it's
* always set instead of having to deal with possibly weird behaviour
* for malicious applications setting only __O_SYNC.
*/
if (flags & __O_SYNC)
flags |= O_DSYNC;
op->open_flag = flags; op->open_flag = flags;
/* O_TRUNC implies we need access checks for write permissions */ /* O_TRUNC implies we need access checks for write permissions */
...@@ -1022,6 +1058,18 @@ static inline int build_open_flags(int flags, umode_t mode, struct open_flags *o ...@@ -1022,6 +1058,18 @@ static inline int build_open_flags(int flags, umode_t mode, struct open_flags *o
lookup_flags |= LOOKUP_DIRECTORY; lookup_flags |= LOOKUP_DIRECTORY;
if (!(flags & O_NOFOLLOW)) if (!(flags & O_NOFOLLOW))
lookup_flags |= LOOKUP_FOLLOW; lookup_flags |= LOOKUP_FOLLOW;
if (how->resolve & RESOLVE_NO_XDEV)
lookup_flags |= LOOKUP_NO_XDEV;
if (how->resolve & RESOLVE_NO_MAGICLINKS)
lookup_flags |= LOOKUP_NO_MAGICLINKS;
if (how->resolve & RESOLVE_NO_SYMLINKS)
lookup_flags |= LOOKUP_NO_SYMLINKS;
if (how->resolve & RESOLVE_BENEATH)
lookup_flags |= LOOKUP_BENEATH;
if (how->resolve & RESOLVE_IN_ROOT)
lookup_flags |= LOOKUP_IN_ROOT;
op->lookup_flags = lookup_flags; op->lookup_flags = lookup_flags;
return 0; return 0;
} }
...@@ -1040,8 +1088,11 @@ static inline int build_open_flags(int flags, umode_t mode, struct open_flags *o ...@@ -1040,8 +1088,11 @@ static inline int build_open_flags(int flags, umode_t mode, struct open_flags *o
struct file *file_open_name(struct filename *name, int flags, umode_t mode) struct file *file_open_name(struct filename *name, int flags, umode_t mode)
{ {
struct open_flags op; struct open_flags op;
int err = build_open_flags(flags, mode, &op); struct open_how how = build_open_how(flags, mode);
return err ? ERR_PTR(err) : do_filp_open(AT_FDCWD, name, &op); int err = build_open_flags(&how, &op);
if (err)
return ERR_PTR(err);
return do_filp_open(AT_FDCWD, name, &op);
} }
/** /**
...@@ -1072,17 +1123,19 @@ struct file *file_open_root(struct dentry *dentry, struct vfsmount *mnt, ...@@ -1072,17 +1123,19 @@ struct file *file_open_root(struct dentry *dentry, struct vfsmount *mnt,
const char *filename, int flags, umode_t mode) const char *filename, int flags, umode_t mode)
{ {
struct open_flags op; struct open_flags op;
int err = build_open_flags(flags, mode, &op); struct open_how how = build_open_how(flags, mode);
int err = build_open_flags(&how, &op);
if (err) if (err)
return ERR_PTR(err); return ERR_PTR(err);
return do_file_open_root(dentry, mnt, filename, &op); return do_file_open_root(dentry, mnt, filename, &op);
} }
EXPORT_SYMBOL(file_open_root); EXPORT_SYMBOL(file_open_root);
long do_sys_open(int dfd, const char __user *filename, int flags, umode_t mode) static long do_sys_openat2(int dfd, const char __user *filename,
struct open_how *how)
{ {
struct open_flags op; struct open_flags op;
int fd = build_open_flags(flags, mode, &op); int fd = build_open_flags(how, &op);
struct filename *tmp; struct filename *tmp;
if (fd) if (fd)
...@@ -1092,7 +1145,7 @@ long do_sys_open(int dfd, const char __user *filename, int flags, umode_t mode) ...@@ -1092,7 +1145,7 @@ long do_sys_open(int dfd, const char __user *filename, int flags, umode_t mode)
if (IS_ERR(tmp)) if (IS_ERR(tmp))
return PTR_ERR(tmp); return PTR_ERR(tmp);
fd = get_unused_fd_flags(flags); fd = get_unused_fd_flags(how->flags);
if (fd >= 0) { if (fd >= 0) {
struct file *f = do_filp_open(dfd, tmp, &op); struct file *f = do_filp_open(dfd, tmp, &op);
if (IS_ERR(f)) { if (IS_ERR(f)) {
...@@ -1107,12 +1160,16 @@ long do_sys_open(int dfd, const char __user *filename, int flags, umode_t mode) ...@@ -1107,12 +1160,16 @@ long do_sys_open(int dfd, const char __user *filename, int flags, umode_t mode)
return fd; return fd;
} }
SYSCALL_DEFINE3(open, const char __user *, filename, int, flags, umode_t, mode) long do_sys_open(int dfd, const char __user *filename, int flags, umode_t mode)
{ {
if (force_o_largefile()) struct open_how how = build_open_how(flags, mode);
flags |= O_LARGEFILE; return do_sys_openat2(dfd, filename, &how);
}
return do_sys_open(AT_FDCWD, filename, flags, mode);
SYSCALL_DEFINE3(open, const char __user *, filename, int, flags, umode_t, mode)
{
return ksys_open(filename, flags, mode);
} }
SYSCALL_DEFINE4(openat, int, dfd, const char __user *, filename, int, flags, SYSCALL_DEFINE4(openat, int, dfd, const char __user *, filename, int, flags,
...@@ -1120,10 +1177,32 @@ SYSCALL_DEFINE4(openat, int, dfd, const char __user *, filename, int, flags, ...@@ -1120,10 +1177,32 @@ SYSCALL_DEFINE4(openat, int, dfd, const char __user *, filename, int, flags,
{ {
if (force_o_largefile()) if (force_o_largefile())
flags |= O_LARGEFILE; flags |= O_LARGEFILE;
return do_sys_open(dfd, filename, flags, mode); return do_sys_open(dfd, filename, flags, mode);
} }
SYSCALL_DEFINE4(openat2, int, dfd, const char __user *, filename,
struct open_how __user *, how, size_t, usize)
{
int err;
struct open_how tmp;
BUILD_BUG_ON(sizeof(struct open_how) < OPEN_HOW_SIZE_VER0);
BUILD_BUG_ON(sizeof(struct open_how) != OPEN_HOW_SIZE_LATEST);
if (unlikely(usize < OPEN_HOW_SIZE_VER0))
return -EINVAL;
err = copy_struct_from_user(&tmp, sizeof(tmp), how, usize);
if (err)
return err;
/* O_LARGEFILE is only allowed for non-O_PATH. */
if (!(tmp.flags & O_PATH) && force_o_largefile())
tmp.flags |= O_LARGEFILE;
return do_sys_openat2(dfd, filename, &tmp);
}
#ifdef CONFIG_COMPAT #ifdef CONFIG_COMPAT
/* /*
* Exactly like sys_open(), except that it doesn't set the * Exactly like sys_open(), except that it doesn't set the
......
...@@ -1626,8 +1626,7 @@ static const char *proc_pid_get_link(struct dentry *dentry, ...@@ -1626,8 +1626,7 @@ static const char *proc_pid_get_link(struct dentry *dentry,
if (error) if (error)
goto out; goto out;
nd_jump_link(&path); error = nd_jump_link(&path);
return NULL;
out: out:
return ERR_PTR(error); return ERR_PTR(error);
} }
......
...@@ -42,22 +42,26 @@ static const char *proc_ns_get_link(struct dentry *dentry, ...@@ -42,22 +42,26 @@ static const char *proc_ns_get_link(struct dentry *dentry,
const struct proc_ns_operations *ns_ops = PROC_I(inode)->ns_ops; const struct proc_ns_operations *ns_ops = PROC_I(inode)->ns_ops;
struct task_struct *task; struct task_struct *task;
struct path ns_path; struct path ns_path;
void *error = ERR_PTR(-EACCES); int error = -EACCES;
if (!dentry) if (!dentry)
return ERR_PTR(-ECHILD); return ERR_PTR(-ECHILD);
task = get_proc_task(inode); task = get_proc_task(inode);
if (!task) if (!task)
return error; return ERR_PTR(-EACCES);
if (ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS)) { if (!ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS))
error = ns_get_path(&ns_path, task, ns_ops); goto out;
if (!error)
nd_jump_link(&ns_path); error = ns_get_path(&ns_path, task, ns_ops);
} if (error)
goto out;
error = nd_jump_link(&ns_path);
out:
put_task_struct(task); put_task_struct(task);
return error; return ERR_PTR(error);
} }
static int proc_ns_readlink(struct dentry *dentry, char __user *buffer, int buflen) static int proc_ns_readlink(struct dentry *dentry, char __user *buffer, int buflen)
......
...@@ -2,15 +2,29 @@ ...@@ -2,15 +2,29 @@
#ifndef _LINUX_FCNTL_H #ifndef _LINUX_FCNTL_H
#define _LINUX_FCNTL_H #define _LINUX_FCNTL_H
#include <linux/stat.h>
#include <uapi/linux/fcntl.h> #include <uapi/linux/fcntl.h>
/* list of all valid flags for the open/openat flags argument: */ /* List of all valid flags for the open/openat flags argument: */
#define VALID_OPEN_FLAGS \ #define VALID_OPEN_FLAGS \
(O_RDONLY | O_WRONLY | O_RDWR | O_CREAT | O_EXCL | O_NOCTTY | O_TRUNC | \ (O_RDONLY | O_WRONLY | O_RDWR | O_CREAT | O_EXCL | O_NOCTTY | O_TRUNC | \
O_APPEND | O_NDELAY | O_NONBLOCK | O_NDELAY | __O_SYNC | O_DSYNC | \ O_APPEND | O_NDELAY | O_NONBLOCK | O_NDELAY | __O_SYNC | O_DSYNC | \
FASYNC | O_DIRECT | O_LARGEFILE | O_DIRECTORY | O_NOFOLLOW | \ FASYNC | O_DIRECT | O_LARGEFILE | O_DIRECTORY | O_NOFOLLOW | \
O_NOATIME | O_CLOEXEC | O_PATH | __O_TMPFILE) O_NOATIME | O_CLOEXEC | O_PATH | __O_TMPFILE)
/* List of all valid flags for the how->upgrade_mask argument: */
#define VALID_UPGRADE_FLAGS \
(UPGRADE_NOWRITE | UPGRADE_NOREAD)
/* List of all valid flags for the how->resolve argument: */
#define VALID_RESOLVE_FLAGS \
(RESOLVE_NO_XDEV | RESOLVE_NO_MAGICLINKS | RESOLVE_NO_SYMLINKS | \
RESOLVE_BENEATH | RESOLVE_IN_ROOT)
/* List of all open_how "versions". */
#define OPEN_HOW_SIZE_VER0 24 /* sizeof first published struct */
#define OPEN_HOW_SIZE_LATEST OPEN_HOW_SIZE_VER0
#ifndef force_o_largefile #ifndef force_o_largefile
#define force_o_largefile() (!IS_ENABLED(CONFIG_ARCH_32BIT_OFF_T)) #define force_o_largefile() (!IS_ENABLED(CONFIG_ARCH_32BIT_OFF_T))
#endif #endif
......
...@@ -2,6 +2,7 @@ ...@@ -2,6 +2,7 @@
#ifndef _LINUX_NAMEI_H #ifndef _LINUX_NAMEI_H
#define _LINUX_NAMEI_H #define _LINUX_NAMEI_H
#include <linux/fs.h>
#include <linux/kernel.h> #include <linux/kernel.h>
#include <linux/path.h> #include <linux/path.h>
#include <linux/fcntl.h> #include <linux/fcntl.h>
...@@ -38,6 +39,15 @@ enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT, LAST_BIND}; ...@@ -38,6 +39,15 @@ enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT, LAST_BIND};
#define LOOKUP_ROOT 0x2000 #define LOOKUP_ROOT 0x2000
#define LOOKUP_ROOT_GRABBED 0x0008 #define LOOKUP_ROOT_GRABBED 0x0008
/* Scoping flags for lookup. */
#define LOOKUP_NO_SYMLINKS 0x010000 /* No symlink crossing. */
#define LOOKUP_NO_MAGICLINKS 0x020000 /* No nd_jump_link() crossing. */
#define LOOKUP_NO_XDEV 0x040000 /* No mountpoint crossing. */
#define LOOKUP_BENEATH 0x080000 /* No escaping from starting point. */
#define LOOKUP_IN_ROOT 0x100000 /* Treat dirfd as fs root. */
/* LOOKUP_* flags which do scope-related checks based on the dirfd. */
#define LOOKUP_IS_SCOPED (LOOKUP_BENEATH | LOOKUP_IN_ROOT)
extern int path_pts(struct path *path); extern int path_pts(struct path *path);
extern int user_path_at_empty(int, const char __user *, unsigned, struct path *, int *empty); extern int user_path_at_empty(int, const char __user *, unsigned, struct path *, int *empty);
...@@ -68,7 +78,7 @@ extern int follow_up(struct path *); ...@@ -68,7 +78,7 @@ extern int follow_up(struct path *);
extern struct dentry *lock_rename(struct dentry *, struct dentry *); extern struct dentry *lock_rename(struct dentry *, struct dentry *);
extern void unlock_rename(struct dentry *, struct dentry *); extern void unlock_rename(struct dentry *, struct dentry *);
extern void nd_jump_link(struct path *path); extern int __must_check nd_jump_link(struct path *path);
static inline void nd_terminate_link(void *name, size_t len, size_t maxlen) static inline void nd_terminate_link(void *name, size_t len, size_t maxlen)
{ {
......
...@@ -76,10 +76,10 @@ static inline int ns_alloc_inum(struct ns_common *ns) ...@@ -76,10 +76,10 @@ static inline int ns_alloc_inum(struct ns_common *ns)
extern struct file *proc_ns_fget(int fd); extern struct file *proc_ns_fget(int fd);
#define get_proc_ns(inode) ((struct ns_common *)(inode)->i_private) #define get_proc_ns(inode) ((struct ns_common *)(inode)->i_private)
extern void *ns_get_path(struct path *path, struct task_struct *task, extern int ns_get_path(struct path *path, struct task_struct *task,
const struct proc_ns_operations *ns_ops); const struct proc_ns_operations *ns_ops);
typedef struct ns_common *ns_get_path_helper_t(void *); typedef struct ns_common *ns_get_path_helper_t(void *);
extern void *ns_get_path_cb(struct path *path, ns_get_path_helper_t ns_get_cb, extern int ns_get_path_cb(struct path *path, ns_get_path_helper_t ns_get_cb,
void *private_data); void *private_data);
extern int ns_get_name(char *buf, size_t size, struct task_struct *task, extern int ns_get_name(char *buf, size_t size, struct task_struct *task,
......
...@@ -69,6 +69,7 @@ struct rseq; ...@@ -69,6 +69,7 @@ struct rseq;
union bpf_attr; union bpf_attr;
struct io_uring_params; struct io_uring_params;
struct clone_args; struct clone_args;
struct open_how;
#include <linux/types.h> #include <linux/types.h>
#include <linux/aio_abi.h> #include <linux/aio_abi.h>
...@@ -439,6 +440,8 @@ asmlinkage long sys_fchownat(int dfd, const char __user *filename, uid_t user, ...@@ -439,6 +440,8 @@ asmlinkage long sys_fchownat(int dfd, const char __user *filename, uid_t user,
asmlinkage long sys_fchown(unsigned int fd, uid_t user, gid_t group); asmlinkage long sys_fchown(unsigned int fd, uid_t user, gid_t group);
asmlinkage long sys_openat(int dfd, const char __user *filename, int flags, asmlinkage long sys_openat(int dfd, const char __user *filename, int flags,
umode_t mode); umode_t mode);
asmlinkage long sys_openat2(int dfd, const char __user *filename,
struct open_how *how, size_t size);
asmlinkage long sys_close(unsigned int fd); asmlinkage long sys_close(unsigned int fd);
asmlinkage long sys_vhangup(void); asmlinkage long sys_vhangup(void);
......
...@@ -851,8 +851,11 @@ __SYSCALL(__NR_pidfd_open, sys_pidfd_open) ...@@ -851,8 +851,11 @@ __SYSCALL(__NR_pidfd_open, sys_pidfd_open)
__SYSCALL(__NR_clone3, sys_clone3) __SYSCALL(__NR_clone3, sys_clone3)
#endif #endif
#define __NR_openat2 437
__SYSCALL(__NR_openat2, sys_openat2)
#undef __NR_syscalls #undef __NR_syscalls
#define __NR_syscalls 436 #define __NR_syscalls 438
/* /*
* 32 bit systems traditionally used different * 32 bit systems traditionally used different
......
...@@ -3,6 +3,7 @@ ...@@ -3,6 +3,7 @@
#define _UAPI_LINUX_FCNTL_H #define _UAPI_LINUX_FCNTL_H
#include <asm/fcntl.h> #include <asm/fcntl.h>
#include <linux/openat2.h>
#define F_SETLEASE (F_LINUX_SPECIFIC_BASE + 0) #define F_SETLEASE (F_LINUX_SPECIFIC_BASE + 0)
#define F_GETLEASE (F_LINUX_SPECIFIC_BASE + 1) #define F_GETLEASE (F_LINUX_SPECIFIC_BASE + 1)
...@@ -100,5 +101,4 @@ ...@@ -100,5 +101,4 @@
#define AT_RECURSIVE 0x8000 /* Apply to the entire subtree */ #define AT_RECURSIVE 0x8000 /* Apply to the entire subtree */
#endif /* _UAPI_LINUX_FCNTL_H */ #endif /* _UAPI_LINUX_FCNTL_H */
/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
#ifndef _UAPI_LINUX_OPENAT2_H
#define _UAPI_LINUX_OPENAT2_H
#include <linux/types.h>
/*
* Arguments for how openat2(2) should open the target path. If only @flags and
* @mode are non-zero, then openat2(2) operates very similarly to openat(2).
*
* However, unlike openat(2), unknown or invalid bits in @flags result in
* -EINVAL rather than being silently ignored. @mode must be zero unless one of
* {O_CREAT, O_TMPFILE} are set.
*
* @flags: O_* flags.
* @mode: O_CREAT/O_TMPFILE file mode.
* @resolve: RESOLVE_* flags.
*/
struct open_how {
__u64 flags;
__u64 mode;
__u64 resolve;
};
/* how->resolve flags for openat2(2). */
#define RESOLVE_NO_XDEV 0x01 /* Block mount-point crossings
(includes bind-mounts). */
#define RESOLVE_NO_MAGICLINKS 0x02 /* Block traversal through procfs-style
"magic-links". */
#define RESOLVE_NO_SYMLINKS 0x04 /* Block traversal through all symlinks
(implies OEXT_NO_MAGICLINKS) */
#define RESOLVE_BENEATH 0x08 /* Block "lexical" trickery like
"..", symlinks, and absolute
paths which escape the dirfd. */
#define RESOLVE_IN_ROOT 0x10 /* Make all jumps to "/" and ".."
be scoped inside the dirfd
(similar to chroot(2)). */
#endif /* _UAPI_LINUX_OPENAT2_H */
...@@ -302,14 +302,14 @@ int bpf_prog_offload_info_fill(struct bpf_prog_info *info, ...@@ -302,14 +302,14 @@ int bpf_prog_offload_info_fill(struct bpf_prog_info *info,
struct inode *ns_inode; struct inode *ns_inode;
struct path ns_path; struct path ns_path;
char __user *uinsns; char __user *uinsns;
void *res; int res;
u32 ulen; u32 ulen;
res = ns_get_path_cb(&ns_path, bpf_prog_offload_info_fill_ns, &args); res = ns_get_path_cb(&ns_path, bpf_prog_offload_info_fill_ns, &args);
if (IS_ERR(res)) { if (res) {
if (!info->ifindex) if (!info->ifindex)
return -ENODEV; return -ENODEV;
return PTR_ERR(res); return res;
} }
down_read(&bpf_devs_lock); down_read(&bpf_devs_lock);
...@@ -526,13 +526,13 @@ int bpf_map_offload_info_fill(struct bpf_map_info *info, struct bpf_map *map) ...@@ -526,13 +526,13 @@ int bpf_map_offload_info_fill(struct bpf_map_info *info, struct bpf_map *map)
}; };
struct inode *ns_inode; struct inode *ns_inode;
struct path ns_path; struct path ns_path;
void *res; int res;
res = ns_get_path_cb(&ns_path, bpf_map_offload_info_fill_ns, &args); res = ns_get_path_cb(&ns_path, bpf_map_offload_info_fill_ns, &args);
if (IS_ERR(res)) { if (res) {
if (!info->ifindex) if (!info->ifindex)
return -ENODEV; return -ENODEV;
return PTR_ERR(res); return res;
} }
ns_inode = ns_path.dentry->d_inode; ns_inode = ns_path.dentry->d_inode;
......
...@@ -7495,7 +7495,7 @@ static void perf_fill_ns_link_info(struct perf_ns_link_info *ns_link_info, ...@@ -7495,7 +7495,7 @@ static void perf_fill_ns_link_info(struct perf_ns_link_info *ns_link_info,
{ {
struct path ns_path; struct path ns_path;
struct inode *ns_inode; struct inode *ns_inode;
void *error; int error;
error = ns_get_path(&ns_path, task, ns_ops); error = ns_get_path(&ns_path, task, ns_ops);
if (!error) { if (!error) {
......
...@@ -2573,16 +2573,18 @@ static const char *policy_get_link(struct dentry *dentry, ...@@ -2573,16 +2573,18 @@ static const char *policy_get_link(struct dentry *dentry,
{ {
struct aa_ns *ns; struct aa_ns *ns;
struct path path; struct path path;
int error;
if (!dentry) if (!dentry)
return ERR_PTR(-ECHILD); return ERR_PTR(-ECHILD);
ns = aa_get_current_ns(); ns = aa_get_current_ns();
path.mnt = mntget(aafs_mnt); path.mnt = mntget(aafs_mnt);
path.dentry = dget(ns_dir(ns)); path.dentry = dget(ns_dir(ns));
nd_jump_link(&path); error = nd_jump_link(&path);
aa_put_ns(ns); aa_put_ns(ns);
return NULL; return ERR_PTR(error);
} }
static int policy_readlink(struct dentry *dentry, char __user *buffer, static int policy_readlink(struct dentry *dentry, char __user *buffer,
......
...@@ -40,6 +40,7 @@ TARGETS += powerpc ...@@ -40,6 +40,7 @@ TARGETS += powerpc
TARGETS += proc TARGETS += proc
TARGETS += pstore TARGETS += pstore
TARGETS += ptrace TARGETS += ptrace
TARGETS += openat2
TARGETS += rseq TARGETS += rseq
TARGETS += rtc TARGETS += rtc
TARGETS += seccomp TARGETS += seccomp
......
# SPDX-License-Identifier: GPL-2.0-or-later
CFLAGS += -Wall -O2 -g -fsanitize=address -fsanitize=undefined
TEST_GEN_PROGS := openat2_test resolve_test rename_attack_test
include ../lib.mk
$(TEST_GEN_PROGS): helpers.c
// SPDX-License-Identifier: GPL-2.0-or-later
/*
* Author: Aleksa Sarai <cyphar@cyphar.com>
* Copyright (C) 2018-2019 SUSE LLC.
*/
#define _GNU_SOURCE
#include <errno.h>
#include <fcntl.h>
#include <stdbool.h>
#include <string.h>
#include <syscall.h>
#include <limits.h>
#include "helpers.h"
bool needs_openat2(const struct open_how *how)
{
return how->resolve != 0;
}
int raw_openat2(int dfd, const char *path, void *how, size_t size)
{
int ret = syscall(__NR_openat2, dfd, path, how, size);
return ret >= 0 ? ret : -errno;
}
int sys_openat2(int dfd, const char *path, struct open_how *how)
{
return raw_openat2(dfd, path, how, sizeof(*how));
}
int sys_openat(int dfd, const char *path, struct open_how *how)
{
int ret = openat(dfd, path, how->flags, how->mode);
return ret >= 0 ? ret : -errno;
}
int sys_renameat2(int olddirfd, const char *oldpath,
int newdirfd, const char *newpath, unsigned int flags)
{
int ret = syscall(__NR_renameat2, olddirfd, oldpath,
newdirfd, newpath, flags);
return ret >= 0 ? ret : -errno;
}
int touchat(int dfd, const char *path)
{
int fd = openat(dfd, path, O_CREAT);
if (fd >= 0)
close(fd);
return fd;
}
char *fdreadlink(int fd)
{
char *target, *tmp;
E_asprintf(&tmp, "/proc/self/fd/%d", fd);
target = malloc(PATH_MAX);
if (!target)
ksft_exit_fail_msg("fdreadlink: malloc failed\n");
memset(target, 0, PATH_MAX);
E_readlink(tmp, target, PATH_MAX);
free(tmp);
return target;
}
bool fdequal(int fd, int dfd, const char *path)
{
char *fdpath, *dfdpath, *other;
bool cmp;
fdpath = fdreadlink(fd);
dfdpath = fdreadlink(dfd);
if (!path)
E_asprintf(&other, "%s", dfdpath);
else if (*path == '/')
E_asprintf(&other, "%s", path);
else
E_asprintf(&other, "%s/%s", dfdpath, path);
cmp = !strcmp(fdpath, other);
free(fdpath);
free(dfdpath);
free(other);
return cmp;
}
bool openat2_supported = false;
void __attribute__((constructor)) init(void)
{
struct open_how how = {};
int fd;
BUILD_BUG_ON(sizeof(struct open_how) != OPEN_HOW_SIZE_VER0);
/* Check openat2(2) support. */
fd = sys_openat2(AT_FDCWD, ".", &how);
openat2_supported = (fd >= 0);
if (fd >= 0)
close(fd);
}
// SPDX-License-Identifier: GPL-2.0-or-later
/*
* Author: Aleksa Sarai <cyphar@cyphar.com>
* Copyright (C) 2018-2019 SUSE LLC.
*/
#ifndef __RESOLVEAT_H__
#define __RESOLVEAT_H__
#define _GNU_SOURCE
#include <stdint.h>
#include <errno.h>
#include <linux/types.h>
#include "../kselftest.h"
#define ARRAY_LEN(X) (sizeof (X) / sizeof (*(X)))
#define BUILD_BUG_ON(e) ((void)(sizeof(struct { int:(-!!(e)); })))
#ifndef SYS_openat2
#ifndef __NR_openat2
#define __NR_openat2 437
#endif /* __NR_openat2 */
#define SYS_openat2 __NR_openat2
#endif /* SYS_openat2 */
/*
* Arguments for how openat2(2) should open the target path. If @resolve is
* zero, then openat2(2) operates very similarly to openat(2).
*
* However, unlike openat(2), unknown bits in @flags result in -EINVAL rather
* than being silently ignored. @mode must be zero unless one of {O_CREAT,
* O_TMPFILE} are set.
*
* @flags: O_* flags.
* @mode: O_CREAT/O_TMPFILE file mode.
* @resolve: RESOLVE_* flags.
*/
struct open_how {
__u64 flags;
__u64 mode;
__u64 resolve;
};
#define OPEN_HOW_SIZE_VER0 24 /* sizeof first published struct */
#define OPEN_HOW_SIZE_LATEST OPEN_HOW_SIZE_VER0
bool needs_openat2(const struct open_how *how);
#ifndef RESOLVE_IN_ROOT
/* how->resolve flags for openat2(2). */
#define RESOLVE_NO_XDEV 0x01 /* Block mount-point crossings
(includes bind-mounts). */
#define RESOLVE_NO_MAGICLINKS 0x02 /* Block traversal through procfs-style
"magic-links". */
#define RESOLVE_NO_SYMLINKS 0x04 /* Block traversal through all symlinks
(implies OEXT_NO_MAGICLINKS) */
#define RESOLVE_BENEATH 0x08 /* Block "lexical" trickery like
"..", symlinks, and absolute
paths which escape the dirfd. */
#define RESOLVE_IN_ROOT 0x10 /* Make all jumps to "/" and ".."
be scoped inside the dirfd
(similar to chroot(2)). */
#endif /* RESOLVE_IN_ROOT */
#define E_func(func, ...) \
do { \
if (func(__VA_ARGS__) < 0) \
ksft_exit_fail_msg("%s:%d %s failed\n", \
__FILE__, __LINE__, #func);\
} while (0)
#define E_asprintf(...) E_func(asprintf, __VA_ARGS__)
#define E_chmod(...) E_func(chmod, __VA_ARGS__)
#define E_dup2(...) E_func(dup2, __VA_ARGS__)
#define E_fchdir(...) E_func(fchdir, __VA_ARGS__)
#define E_fstatat(...) E_func(fstatat, __VA_ARGS__)
#define E_kill(...) E_func(kill, __VA_ARGS__)
#define E_mkdirat(...) E_func(mkdirat, __VA_ARGS__)
#define E_mount(...) E_func(mount, __VA_ARGS__)
#define E_prctl(...) E_func(prctl, __VA_ARGS__)
#define E_readlink(...) E_func(readlink, __VA_ARGS__)
#define E_setresuid(...) E_func(setresuid, __VA_ARGS__)
#define E_symlinkat(...) E_func(symlinkat, __VA_ARGS__)
#define E_touchat(...) E_func(touchat, __VA_ARGS__)
#define E_unshare(...) E_func(unshare, __VA_ARGS__)
#define E_assert(expr, msg, ...) \
do { \
if (!(expr)) \
ksft_exit_fail_msg("ASSERT(%s:%d) failed (%s): " msg "\n", \
__FILE__, __LINE__, #expr, ##__VA_ARGS__); \
} while (0)
int raw_openat2(int dfd, const char *path, void *how, size_t size);
int sys_openat2(int dfd, const char *path, struct open_how *how);
int sys_openat(int dfd, const char *path, struct open_how *how);
int sys_renameat2(int olddirfd, const char *oldpath,
int newdirfd, const char *newpath, unsigned int flags);
int touchat(int dfd, const char *path);
char *fdreadlink(int fd);
bool fdequal(int fd, int dfd, const char *path);
extern bool openat2_supported;
#endif /* __RESOLVEAT_H__ */
// SPDX-License-Identifier: GPL-2.0-or-later
/*
* Author: Aleksa Sarai <cyphar@cyphar.com>
* Copyright (C) 2018-2019 SUSE LLC.
*/
#define _GNU_SOURCE
#include <fcntl.h>
#include <sched.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/mount.h>
#include <stdlib.h>
#include <stdbool.h>
#include <string.h>
#include "../kselftest.h"
#include "helpers.h"
/*
* O_LARGEFILE is set to 0 by glibc.
* XXX: This is wrong on {mips, parisc, powerpc, sparc}.
*/
#undef O_LARGEFILE
#define O_LARGEFILE 0x8000
struct open_how_ext {
struct open_how inner;
uint32_t extra1;
char pad1[128];
uint32_t extra2;
char pad2[128];
uint32_t extra3;
};
struct struct_test {
const char *name;
struct open_how_ext arg;
size_t size;
int err;
};
#define NUM_OPENAT2_STRUCT_TESTS 7
#define NUM_OPENAT2_STRUCT_VARIATIONS 13
void test_openat2_struct(void)
{
int misalignments[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 17, 87 };
struct struct_test tests[] = {
/* Normal struct. */
{ .name = "normal struct",
.arg.inner.flags = O_RDONLY,
.size = sizeof(struct open_how) },
/* Bigger struct, with zeroed out end. */
{ .name = "bigger struct (zeroed out)",
.arg.inner.flags = O_RDONLY,
.size = sizeof(struct open_how_ext) },
/* TODO: Once expanded, check zero-padding. */
/* Smaller than version-0 struct. */
{ .name = "zero-sized 'struct'",
.arg.inner.flags = O_RDONLY, .size = 0, .err = -EINVAL },
{ .name = "smaller-than-v0 struct",
.arg.inner.flags = O_RDONLY,
.size = OPEN_HOW_SIZE_VER0 - 1, .err = -EINVAL },
/* Bigger struct, with non-zero trailing bytes. */
{ .name = "bigger struct (non-zero data in first 'future field')",
.arg.inner.flags = O_RDONLY, .arg.extra1 = 0xdeadbeef,
.size = sizeof(struct open_how_ext), .err = -E2BIG },
{ .name = "bigger struct (non-zero data in middle of 'future fields')",
.arg.inner.flags = O_RDONLY, .arg.extra2 = 0xfeedcafe,
.size = sizeof(struct open_how_ext), .err = -E2BIG },
{ .name = "bigger struct (non-zero data at end of 'future fields')",
.arg.inner.flags = O_RDONLY, .arg.extra3 = 0xabad1dea,
.size = sizeof(struct open_how_ext), .err = -E2BIG },
};
BUILD_BUG_ON(ARRAY_LEN(misalignments) != NUM_OPENAT2_STRUCT_VARIATIONS);
BUILD_BUG_ON(ARRAY_LEN(tests) != NUM_OPENAT2_STRUCT_TESTS);
for (int i = 0; i < ARRAY_LEN(tests); i++) {
struct struct_test *test = &tests[i];
struct open_how_ext how_ext = test->arg;
for (int j = 0; j < ARRAY_LEN(misalignments); j++) {
int fd, misalign = misalignments[j];
char *fdpath = NULL;
bool failed;
void (*resultfn)(const char *msg, ...) = ksft_test_result_pass;
void *copy = NULL, *how_copy = &how_ext;
if (!openat2_supported) {
ksft_print_msg("openat2(2) unsupported\n");
resultfn = ksft_test_result_skip;
goto skip;
}
if (misalign) {
/*
* Explicitly misalign the structure copying it with the given
* (mis)alignment offset. The other data is set to be non-zero to
* make sure that non-zero bytes outside the struct aren't checked
*
* This is effectively to check that is_zeroed_user() works.
*/
copy = malloc(misalign + sizeof(how_ext));
how_copy = copy + misalign;
memset(copy, 0xff, misalign);
memcpy(how_copy, &how_ext, sizeof(how_ext));
}
fd = raw_openat2(AT_FDCWD, ".", how_copy, test->size);
if (test->err >= 0)
failed = (fd < 0);
else
failed = (fd != test->err);
if (fd >= 0) {
fdpath = fdreadlink(fd);
close(fd);
}
if (failed) {
resultfn = ksft_test_result_fail;
ksft_print_msg("openat2 unexpectedly returned ");
if (fdpath)
ksft_print_msg("%d['%s']\n", fd, fdpath);
else
ksft_print_msg("%d (%s)\n", fd, strerror(-fd));
}
skip:
if (test->err >= 0)
resultfn("openat2 with %s argument [misalign=%d] succeeds\n",
test->name, misalign);
else
resultfn("openat2 with %s argument [misalign=%d] fails with %d (%s)\n",
test->name, misalign, test->err,
strerror(-test->err));
free(copy);
free(fdpath);
fflush(stdout);
}
}
}
struct flag_test {
const char *name;
struct open_how how;
int err;
};
#define NUM_OPENAT2_FLAG_TESTS 23
void test_openat2_flags(void)
{
struct flag_test tests[] = {
/* O_TMPFILE is incompatible with O_PATH and O_CREAT. */
{ .name = "incompatible flags (O_TMPFILE | O_PATH)",
.how.flags = O_TMPFILE | O_PATH | O_RDWR, .err = -EINVAL },
{ .name = "incompatible flags (O_TMPFILE | O_CREAT)",
.how.flags = O_TMPFILE | O_CREAT | O_RDWR, .err = -EINVAL },
/* O_PATH only permits certain other flags to be set ... */
{ .name = "compatible flags (O_PATH | O_CLOEXEC)",
.how.flags = O_PATH | O_CLOEXEC },
{ .name = "compatible flags (O_PATH | O_DIRECTORY)",
.how.flags = O_PATH | O_DIRECTORY },
{ .name = "compatible flags (O_PATH | O_NOFOLLOW)",
.how.flags = O_PATH | O_NOFOLLOW },
/* ... and others are absolutely not permitted. */
{ .name = "incompatible flags (O_PATH | O_RDWR)",
.how.flags = O_PATH | O_RDWR, .err = -EINVAL },
{ .name = "incompatible flags (O_PATH | O_CREAT)",
.how.flags = O_PATH | O_CREAT, .err = -EINVAL },
{ .name = "incompatible flags (O_PATH | O_EXCL)",
.how.flags = O_PATH | O_EXCL, .err = -EINVAL },
{ .name = "incompatible flags (O_PATH | O_NOCTTY)",
.how.flags = O_PATH | O_NOCTTY, .err = -EINVAL },
{ .name = "incompatible flags (O_PATH | O_DIRECT)",
.how.flags = O_PATH | O_DIRECT, .err = -EINVAL },
{ .name = "incompatible flags (O_PATH | O_LARGEFILE)",
.how.flags = O_PATH | O_LARGEFILE, .err = -EINVAL },
/* ->mode must only be set with O_{CREAT,TMPFILE}. */
{ .name = "non-zero how.mode and O_RDONLY",
.how.flags = O_RDONLY, .how.mode = 0600, .err = -EINVAL },
{ .name = "non-zero how.mode and O_PATH",
.how.flags = O_PATH, .how.mode = 0600, .err = -EINVAL },
{ .name = "valid how.mode and O_CREAT",
.how.flags = O_CREAT, .how.mode = 0600 },
{ .name = "valid how.mode and O_TMPFILE",
.how.flags = O_TMPFILE | O_RDWR, .how.mode = 0600 },
/* ->mode must only contain 0777 bits. */
{ .name = "invalid how.mode and O_CREAT",
.how.flags = O_CREAT,
.how.mode = 0xFFFF, .err = -EINVAL },
{ .name = "invalid (very large) how.mode and O_CREAT",
.how.flags = O_CREAT,
.how.mode = 0xC000000000000000ULL, .err = -EINVAL },
{ .name = "invalid how.mode and O_TMPFILE",
.how.flags = O_TMPFILE | O_RDWR,
.how.mode = 0x1337, .err = -EINVAL },
{ .name = "invalid (very large) how.mode and O_TMPFILE",
.how.flags = O_TMPFILE | O_RDWR,
.how.mode = 0x0000A00000000000ULL, .err = -EINVAL },
/* ->resolve must only contain RESOLVE_* flags. */
{ .name = "invalid how.resolve and O_RDONLY",
.how.flags = O_RDONLY,
.how.resolve = 0x1337, .err = -EINVAL },
{ .name = "invalid how.resolve and O_CREAT",
.how.flags = O_CREAT,
.how.resolve = 0x1337, .err = -EINVAL },
{ .name = "invalid how.resolve and O_TMPFILE",
.how.flags = O_TMPFILE | O_RDWR,
.how.resolve = 0x1337, .err = -EINVAL },
{ .name = "invalid how.resolve and O_PATH",
.how.flags = O_PATH,
.how.resolve = 0x1337, .err = -EINVAL },
};
BUILD_BUG_ON(ARRAY_LEN(tests) != NUM_OPENAT2_FLAG_TESTS);
for (int i = 0; i < ARRAY_LEN(tests); i++) {
int fd, fdflags = -1;
char *path, *fdpath = NULL;
bool failed = false;
struct flag_test *test = &tests[i];
void (*resultfn)(const char *msg, ...) = ksft_test_result_pass;
if (!openat2_supported) {
ksft_print_msg("openat2(2) unsupported\n");
resultfn = ksft_test_result_skip;
goto skip;
}
path = (test->how.flags & O_CREAT) ? "/tmp/ksft.openat2_tmpfile" : ".";
unlink(path);
fd = sys_openat2(AT_FDCWD, path, &test->how);
if (test->err >= 0)
failed = (fd < 0);
else
failed = (fd != test->err);
if (fd >= 0) {
int otherflags;
fdpath = fdreadlink(fd);
fdflags = fcntl(fd, F_GETFL);
otherflags = fcntl(fd, F_GETFD);
close(fd);
E_assert(fdflags >= 0, "fcntl F_GETFL of new fd");
E_assert(otherflags >= 0, "fcntl F_GETFD of new fd");
/* O_CLOEXEC isn't shown in F_GETFL. */
if (otherflags & FD_CLOEXEC)
fdflags |= O_CLOEXEC;
/* O_CREAT is hidden from F_GETFL. */
if (test->how.flags & O_CREAT)
fdflags |= O_CREAT;
if (!(test->how.flags & O_LARGEFILE))
fdflags &= ~O_LARGEFILE;
failed |= (fdflags != test->how.flags);
}
if (failed) {
resultfn = ksft_test_result_fail;
ksft_print_msg("openat2 unexpectedly returned ");
if (fdpath)
ksft_print_msg("%d['%s'] with %X (!= %X)\n",
fd, fdpath, fdflags,
test->how.flags);
else
ksft_print_msg("%d (%s)\n", fd, strerror(-fd));
}
skip:
if (test->err >= 0)
resultfn("openat2 with %s succeeds\n", test->name);
else
resultfn("openat2 with %s fails with %d (%s)\n",
test->name, test->err, strerror(-test->err));
free(fdpath);
fflush(stdout);
}
}
#define NUM_TESTS (NUM_OPENAT2_STRUCT_VARIATIONS * NUM_OPENAT2_STRUCT_TESTS + \
NUM_OPENAT2_FLAG_TESTS)
int main(int argc, char **argv)
{
ksft_print_header();
ksft_set_plan(NUM_TESTS);
test_openat2_struct();
test_openat2_flags();
if (ksft_get_fail_cnt() + ksft_get_error_cnt() > 0)
ksft_exit_fail();
else
ksft_exit_pass();
}
// SPDX-License-Identifier: GPL-2.0-or-later
/*
* Author: Aleksa Sarai <cyphar@cyphar.com>
* Copyright (C) 2018-2019 SUSE LLC.
*/
#define _GNU_SOURCE
#include <errno.h>
#include <fcntl.h>
#include <sched.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/mount.h>
#include <sys/mman.h>
#include <sys/prctl.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <string.h>
#include <syscall.h>
#include <limits.h>
#include <unistd.h>
#include "../kselftest.h"
#include "helpers.h"
/* Construct a test directory with the following structure:
*
* root/
* |-- a/
* | `-- c/
* `-- b/
*/
int setup_testdir(void)
{
int dfd;
char dirname[] = "/tmp/ksft-openat2-rename-attack.XXXXXX";
/* Make the top-level directory. */
if (!mkdtemp(dirname))
ksft_exit_fail_msg("setup_testdir: failed to create tmpdir\n");
dfd = open(dirname, O_PATH | O_DIRECTORY);
if (dfd < 0)
ksft_exit_fail_msg("setup_testdir: failed to open tmpdir\n");
E_mkdirat(dfd, "a", 0755);
E_mkdirat(dfd, "b", 0755);
E_mkdirat(dfd, "a/c", 0755);
return dfd;
}
/* Swap @dirfd/@a and @dirfd/@b constantly. Parent must kill this process. */
pid_t spawn_attack(int dirfd, char *a, char *b)
{
pid_t child = fork();
if (child != 0)
return child;
/* If the parent (the test process) dies, kill ourselves too. */
E_prctl(PR_SET_PDEATHSIG, SIGKILL);
/* Swap @a and @b. */
for (;;)
renameat2(dirfd, a, dirfd, b, RENAME_EXCHANGE);
exit(1);
}
#define NUM_RENAME_TESTS 2
#define ROUNDS 400000
const char *flagname(int resolve)
{
switch (resolve) {
case RESOLVE_IN_ROOT:
return "RESOLVE_IN_ROOT";
case RESOLVE_BENEATH:
return "RESOLVE_BENEATH";
}
return "(unknown)";
}
void test_rename_attack(int resolve)
{
int dfd, afd;
pid_t child;
void (*resultfn)(const char *msg, ...) = ksft_test_result_pass;
int escapes = 0, other_errs = 0, exdevs = 0, eagains = 0, successes = 0;
struct open_how how = {
.flags = O_PATH,
.resolve = resolve,
};
if (!openat2_supported) {
how.resolve = 0;
ksft_print_msg("openat2(2) unsupported -- using openat(2) instead\n");
}
dfd = setup_testdir();
afd = openat(dfd, "a", O_PATH);
if (afd < 0)
ksft_exit_fail_msg("test_rename_attack: failed to open 'a'\n");
child = spawn_attack(dfd, "a/c", "b");
for (int i = 0; i < ROUNDS; i++) {
int fd;
char *victim_path = "c/../../c/../../c/../../c/../../c/../../c/../../c/../../c/../../c/../../c/../../c/../../c/../../c/../../c/../../c/../../c/../../c/../../c/../../c/../..";
if (openat2_supported)
fd = sys_openat2(afd, victim_path, &how);
else
fd = sys_openat(afd, victim_path, &how);
if (fd < 0) {
if (fd == -EAGAIN)
eagains++;
else if (fd == -EXDEV)
exdevs++;
else if (fd == -ENOENT)
escapes++; /* escaped outside and got ENOENT... */
else
other_errs++; /* unexpected error */
} else {
if (fdequal(fd, afd, NULL))
successes++;
else
escapes++; /* we got an unexpected fd */
}
close(fd);
}
if (escapes > 0)
resultfn = ksft_test_result_fail;
ksft_print_msg("non-escapes: EAGAIN=%d EXDEV=%d E<other>=%d success=%d\n",
eagains, exdevs, other_errs, successes);
resultfn("rename attack with %s (%d runs, got %d escapes)\n",
flagname(resolve), ROUNDS, escapes);
/* Should be killed anyway, but might as well make sure. */
E_kill(child, SIGKILL);
}
#define NUM_TESTS NUM_RENAME_TESTS
int main(int argc, char **argv)
{
ksft_print_header();
ksft_set_plan(NUM_TESTS);
test_rename_attack(RESOLVE_BENEATH);
test_rename_attack(RESOLVE_IN_ROOT);
if (ksft_get_fail_cnt() + ksft_get_error_cnt() > 0)
ksft_exit_fail();
else
ksft_exit_pass();
}
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment