Commit 6aee4bad authored by Linus Torvalds's avatar Linus Torvalds

Merge branch 'work.openat2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull openat2 support from Al Viro:
 "This is the openat2() series from Aleksa Sarai.

  I'm afraid that the rest of namei stuff will have to wait - it got
  zero review the last time I'd posted #work.namei, and there had been a
  leak in the posted series I'd caught only last weekend. I was going to
  repost it on Monday, but the window opened and the odds of getting any
  review during that... Oh, well.

  Anyway, openat2 part should be ready; that _did_ get sane amount of
  review and public testing, so here it comes"

From Aleksa's description of the series:
 "For a very long time, extending openat(2) with new features has been
  incredibly frustrating. This stems from the fact that openat(2) is
  possibly the most famous counter-example to the mantra "don't silently
  accept garbage from userspace" -- it doesn't check whether unknown
  flags are present[1].

  This means that (generally) the addition of new flags to openat(2) has
  been fraught with backwards-compatibility issues (O_TMPFILE has to be
  defined as __O_TMPFILE|O_DIRECTORY|[O_RDWR or O_WRONLY] to ensure old
  kernels gave errors, since it's insecure to silently ignore the
  flag[2]). All new security-related flags therefore have a tough road
  to being added to openat(2).

  Furthermore, the need for some sort of control over VFS's path
  resolution (to avoid malicious paths resulting in inadvertent
  breakouts) has been a very long-standing desire of many userspace
  applications.

  This patchset is a revival of Al Viro's old AT_NO_JUMPS[3] patchset
  (which was a variant of David Drysdale's O_BENEATH patchset[4] which
  was a spin-off of the Capsicum project[5]) with a few additions and
  changes made based on the previous discussion within [6] as well as
  others I felt were useful.

  In line with the conclusions of the original discussion of
  AT_NO_JUMPS, the flag has been split up into separate flags. However,
  instead of being an openat(2) flag it is provided through a new
  syscall openat2(2) which provides several other improvements to the
  openat(2) interface (see the patch description for more details). The
  following new LOOKUP_* flags are added:

  LOOKUP_NO_XDEV:

     Blocks all mountpoint crossings (upwards, downwards, or through
     absolute links). Absolute pathnames alone in openat(2) do not
     trigger this. Magic-link traversal which implies a vfsmount jump is
     also blocked (though magic-link jumps on the same vfsmount are
     permitted).

  LOOKUP_NO_MAGICLINKS:

     Blocks resolution through /proc/$pid/fd-style links. This is done
     by blocking the usage of nd_jump_link() during resolution in a
     filesystem. The term "magic-links" is used to match with the only
     reference to these links in Documentation/, but I'm happy to change
     the name.

     It should be noted that this is different to the scope of
     ~LOOKUP_FOLLOW in that it applies to all path components. However,
     you can do openat2(NO_FOLLOW|NO_MAGICLINKS) on a magic-link and it
     will *not* fail (assuming that no parent component was a
     magic-link), and you will have an fd for the magic-link.

     In order to correctly detect magic-links, the introduction of a new
     LOOKUP_MAGICLINK_JUMPED state flag was required.

  LOOKUP_BENEATH:

     Disallows escapes to outside the starting dirfd's
     tree, using techniques such as ".." or absolute links. Absolute
     paths in openat(2) are also disallowed.

     Conceptually this flag is to ensure you "stay below" a certain
     point in the filesystem tree -- but this requires some additional
     to protect against various races that would allow escape using
     "..".

     Currently LOOKUP_BENEATH implies LOOKUP_NO_MAGICLINKS, because it
     can trivially beam you around the filesystem (breaking the
     protection). In future, there might be similar safety checks done
     as in LOOKUP_IN_ROOT, but that requires more discussion.

  In addition, two new flags are added that expand on the above ideas:

  LOOKUP_NO_SYMLINKS:

     Does what it says on the tin. No symlink resolution is allowed at
     all, including magic-links. Just as with LOOKUP_NO_MAGICLINKS this
     can still be used with NOFOLLOW to open an fd for the symlink as
     long as no parent path had a symlink component.

  LOOKUP_IN_ROOT:

     This is an extension of LOOKUP_BENEATH that, rather than blocking
     attempts to move past the root, forces all such movements to be
     scoped to the starting point. This provides chroot(2)-like
     protection but without the cost of a chroot(2) for each filesystem
     operation, as well as being safe against race attacks that
     chroot(2) is not.

     If a race is detected (as with LOOKUP_BENEATH) then an error is
     generated, and similar to LOOKUP_BENEATH it is not permitted to
     cross magic-links with LOOKUP_IN_ROOT.

     The primary need for this is from container runtimes, which
     currently need to do symlink scoping in userspace[7] when opening
     paths in a potentially malicious container.

     There is a long list of CVEs that could have bene mitigated by
     having RESOLVE_THIS_ROOT (such as CVE-2017-1002101,
     CVE-2017-1002102, CVE-2018-15664, and CVE-2019-5736, just to name a
     few).

  In order to make all of the above more usable, I'm working on
  libpathrs[8] which is a C-friendly library for safe path resolution.
  It features a userspace-emulated backend if the kernel doesn't support
  openat2(2). Hopefully we can get userspace to switch to using it, and
  thus get openat2(2) support for free once it's ready.

  Future work would include implementing things like
  RESOLVE_NO_AUTOMOUNT and possibly a RESOLVE_NO_REMOTE (to allow
  programs to be sure they don't hit DoSes though stale NFS handles)"

* 'work.openat2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  Documentation: path-lookup: include new LOOKUP flags
  selftests: add openat2(2) selftests
  open: introduce openat2(2) syscall
  namei: LOOKUP_{IN_ROOT,BENEATH}: permit limited ".." resolution
  namei: LOOKUP_IN_ROOT: chroot-like scoped resolution
  namei: LOOKUP_BENEATH: O_BENEATH-like scoped resolution
  namei: LOOKUP_NO_XDEV: block mountpoint crossing
  namei: LOOKUP_NO_MAGICLINKS: block magic-link resolution
  namei: LOOKUP_NO_SYMLINKS: block symlink resolution
  namei: allow set_root() to produce errors
  namei: allow nd_jump_link() to produce errors
  nsfs: clean-up ns_get_path() signature to return int
  namei: only return -ECHILD from follow_dotdot_rcu()
parents 15d66324 b55eef87
......@@ -3302,7 +3302,9 @@ S: France
N: Aleksa Sarai
E: cyphar@cyphar.com
W: https://www.cyphar.com/
D: `pids` cgroup subsystem
D: /sys/fs/cgroup/pids
D: openat2(2)
S: Sydney, Australia
N: Dipankar Sarma
E: dipankar@in.ibm.com
......
......@@ -13,6 +13,7 @@ It has subsequently been updated to reflect changes in the kernel
including:
- per-directory parallel name lookup.
- ``openat2()`` resolution restriction flags.
Introduction to pathname lookup
===============================
......@@ -235,6 +236,13 @@ renamed. If ``d_lookup`` finds that a rename happened while it
unsuccessfully scanned a chain in the hash table, it simply tries
again.
``rename_lock`` is also used to detect and defend against potential attacks
against ``LOOKUP_BENEATH`` and ``LOOKUP_IN_ROOT`` when resolving ".." (where
the parent directory is moved outside the root, bypassing the ``path_equal()``
check). If ``rename_lock`` is updated during the lookup and the path encounters
a "..", a potential attack occurred and ``handle_dots()`` will bail out with
``-EAGAIN``.
inode->i_rwsem
~~~~~~~~~~~~~~
......@@ -348,6 +356,13 @@ any changes to any mount points while stepping up. This locking is
needed to stabilize the link to the mounted-on dentry, which the
refcount on the mount itself doesn't ensure.
``mount_lock`` is also used to detect and defend against potential attacks
against ``LOOKUP_BENEATH`` and ``LOOKUP_IN_ROOT`` when resolving ".." (where
the parent directory is moved outside the root, bypassing the ``path_equal()``
check). If ``mount_lock`` is updated during the lookup and the path encounters
a "..", a potential attack occurred and ``handle_dots()`` will bail out with
``-EAGAIN``.
RCU
~~~
......@@ -405,6 +420,10 @@ is requested. Keeping a reference in the ``nameidata`` ensures that
only one root is in effect for the entire path walk, even if it races
with a ``chroot()`` system call.
It should be noted that in the case of ``LOOKUP_IN_ROOT`` or
``LOOKUP_BENEATH``, the effective root becomes the directory file descriptor
passed to ``openat2()`` (which exposes these ``LOOKUP_`` flags).
The root is needed when either of two conditions holds: (1) either the
pathname or a symbolic link starts with a "'/'", or (2) a "``..``"
component is being handled, since "``..``" from the root must always stay
......@@ -1149,7 +1168,7 @@ so ``NULL`` is returned to indicate that the symlink can be released and
the stack frame discarded.
The other case involves things in ``/proc`` that look like symlinks but
aren't really::
aren't really (and are therefore commonly referred to as "magic-links")::
$ ls -l /proc/self/fd/1
lrwx------ 1 neilb neilb 64 Jun 13 10:19 /proc/self/fd/1 -> /dev/pts/4
......@@ -1286,7 +1305,9 @@ A few flags
A suitable way to wrap up this tour of pathname walking is to list
the various flags that can be stored in the ``nameidata`` to guide the
lookup process. Many of these are only meaningful on the final
component, others reflect the current state of the pathname lookup.
component, others reflect the current state of the pathname lookup, and some
apply restrictions to all path components encountered in the path lookup.
And then there is ``LOOKUP_EMPTY``, which doesn't fit conceptually with
the others. If this is not set, an empty pathname causes an error
very early on. If it is set, empty pathnames are not considered to be
......@@ -1310,13 +1331,48 @@ longer needed.
``LOOKUP_JUMPED`` means that the current dentry was chosen not because
it had the right name but for some other reason. This happens when
following "``..``", following a symlink to ``/``, crossing a mount point
or accessing a "``/proc/$PID/fd/$FD``" symlink. In this case the
filesystem has not been asked to revalidate the name (with
``d_revalidate()``). In such cases the inode may still need to be
revalidated, so ``d_op->d_weak_revalidate()`` is called if
or accessing a "``/proc/$PID/fd/$FD``" symlink (also known as a "magic
link"). In this case the filesystem has not been asked to revalidate the
name (with ``d_revalidate()``). In such cases the inode may still need
to be revalidated, so ``d_op->d_weak_revalidate()`` is called if
``LOOKUP_JUMPED`` is set when the look completes - which may be at the
final component or, when creating, unlinking, or renaming, at the penultimate component.
Resolution-restriction flags
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In order to allow userspace to protect itself against certain race conditions
and attack scenarios involving changing path components, a series of flags are
available which apply restrictions to all path components encountered during
path lookup. These flags are exposed through ``openat2()``'s ``resolve`` field.
``LOOKUP_NO_SYMLINKS`` blocks all symlink traversals (including magic-links).
This is distinctly different from ``LOOKUP_FOLLOW``, because the latter only
relates to restricting the following of trailing symlinks.
``LOOKUP_NO_MAGICLINKS`` blocks all magic-link traversals. Filesystems must
ensure that they return errors from ``nd_jump_link()``, because that is how
``LOOKUP_NO_MAGICLINKS`` and other magic-link restrictions are implemented.
``LOOKUP_NO_XDEV`` blocks all ``vfsmount`` traversals (this includes both
bind-mounts and ordinary mounts). Note that the ``vfsmount`` which contains the
lookup is determined by the first mountpoint the path lookup reaches --
absolute paths start with the ``vfsmount`` of ``/``, and relative paths start
with the ``dfd``'s ``vfsmount``. Magic-links are only permitted if the
``vfsmount`` of the path is unchanged.
``LOOKUP_BENEATH`` blocks any path components which resolve outside the
starting point of the resolution. This is done by blocking ``nd_jump_root()``
as well as blocking ".." if it would jump outside the starting point.
``rename_lock`` and ``mount_lock`` are used to detect attacks against the
resolution of "..". Magic-links are also blocked.
``LOOKUP_IN_ROOT`` resolves all path components as though the starting point
were the filesystem root. ``nd_jump_root()`` brings the resolution back to to
the starting point, and ".." at the starting point will act as a no-op. As with
``LOOKUP_BENEATH``, ``rename_lock`` and ``mount_lock`` are used to detect
attacks against ".." resolution. Magic-links are also blocked.
Final-component flags
~~~~~~~~~~~~~~~~~~~~~
......
......@@ -6458,6 +6458,7 @@ F: fs/*
F: include/linux/fs.h
F: include/linux/fs_types.h
F: include/uapi/linux/fs.h
F: include/uapi/linux/openat2.h
FINTEK F75375S HARDWARE MONITOR AND FAN CONTROLLER DRIVER
M: Riku Voipio <riku.voipio@iki.fi>
......
......@@ -475,3 +475,4 @@
543 common fspick sys_fspick
544 common pidfd_open sys_pidfd_open
# 545 reserved for clone3
547 common openat2 sys_openat2
......@@ -449,3 +449,4 @@
433 common fspick sys_fspick
434 common pidfd_open sys_pidfd_open
435 common clone3 sys_clone3
437 common openat2 sys_openat2
......@@ -38,7 +38,7 @@
#define __ARM_NR_compat_set_tls (__ARM_NR_COMPAT_BASE + 5)
#define __ARM_NR_COMPAT_END (__ARM_NR_COMPAT_BASE + 0x800)
#define __NR_compat_syscalls 436
#define __NR_compat_syscalls 438
#endif
#define __ARCH_WANT_SYS_CLONE
......
......@@ -879,6 +879,8 @@ __SYSCALL(__NR_fspick, sys_fspick)
__SYSCALL(__NR_pidfd_open, sys_pidfd_open)
#define __NR_clone3 435
__SYSCALL(__NR_clone3, sys_clone3)
#define __NR_openat2 437
__SYSCALL(__NR_openat2, sys_openat2)
/*
* Please add new compat syscalls above this comment and update
......
......@@ -356,3 +356,4 @@
433 common fspick sys_fspick
434 common pidfd_open sys_pidfd_open
# 435 reserved for clone3
437 common openat2 sys_openat2
......@@ -435,3 +435,4 @@
433 common fspick sys_fspick
434 common pidfd_open sys_pidfd_open
435 common clone3 __sys_clone3
437 common openat2 sys_openat2
......@@ -441,3 +441,4 @@
433 common fspick sys_fspick
434 common pidfd_open sys_pidfd_open
435 common clone3 sys_clone3
437 common openat2 sys_openat2
......@@ -374,3 +374,4 @@
433 n32 fspick sys_fspick
434 n32 pidfd_open sys_pidfd_open
435 n32 clone3 __sys_clone3
437 n32 openat2 sys_openat2
......@@ -350,3 +350,4 @@
433 n64 fspick sys_fspick
434 n64 pidfd_open sys_pidfd_open
435 n64 clone3 __sys_clone3
437 n64 openat2 sys_openat2
......@@ -423,3 +423,4 @@
433 o32 fspick sys_fspick
434 o32 pidfd_open sys_pidfd_open
435 o32 clone3 __sys_clone3
437 o32 openat2 sys_openat2
......@@ -433,3 +433,4 @@
433 common fspick sys_fspick
434 common pidfd_open sys_pidfd_open
435 common clone3 sys_clone3_wrapper
437 common openat2 sys_openat2
......@@ -517,3 +517,4 @@
433 common fspick sys_fspick
434 common pidfd_open sys_pidfd_open
435 nospu clone3 ppc_clone3
437 common openat2 sys_openat2
......@@ -438,3 +438,4 @@
433 common fspick sys_fspick sys_fspick
434 common pidfd_open sys_pidfd_open sys_pidfd_open
435 common clone3 sys_clone3 sys_clone3
437 common openat2 sys_openat2 sys_openat2
......@@ -438,3 +438,4 @@
433 common fspick sys_fspick
434 common pidfd_open sys_pidfd_open
# 435 reserved for clone3
437 common openat2 sys_openat2
......@@ -481,3 +481,4 @@
433 common fspick sys_fspick
434 common pidfd_open sys_pidfd_open
# 435 reserved for clone3
437 common openat2 sys_openat2
......@@ -440,3 +440,4 @@
433 i386 fspick sys_fspick __ia32_sys_fspick
434 i386 pidfd_open sys_pidfd_open __ia32_sys_pidfd_open
435 i386 clone3 sys_clone3 __ia32_sys_clone3
437 i386 openat2 sys_openat2 __ia32_sys_openat2
......@@ -357,6 +357,7 @@
433 common fspick __x64_sys_fspick
434 common pidfd_open __x64_sys_pidfd_open
435 common clone3 __x64_sys_clone3/ptregs
437 common openat2 __x64_sys_openat2
#
# x32-specific system call numbers start at 512 to avoid cache impact
......
......@@ -406,3 +406,4 @@
433 common fspick sys_fspick
434 common pidfd_open sys_pidfd_open
435 common clone3 sys_clone3
437 common openat2 sys_openat2
This diff is collapsed.
......@@ -55,7 +55,7 @@ static void nsfs_evict(struct inode *inode)
ns->ops->put(ns);
}
static void *__ns_get_path(struct path *path, struct ns_common *ns)
static int __ns_get_path(struct path *path, struct ns_common *ns)
{
struct vfsmount *mnt = nsfs_mnt;
struct dentry *dentry;
......@@ -74,13 +74,13 @@ static void *__ns_get_path(struct path *path, struct ns_common *ns)
got_it:
path->mnt = mntget(mnt);
path->dentry = dentry;
return NULL;
return 0;
slow:
rcu_read_unlock();
inode = new_inode_pseudo(mnt->mnt_sb);
if (!inode) {
ns->ops->put(ns);
return ERR_PTR(-ENOMEM);
return -ENOMEM;
}
inode->i_ino = ns->inum;
inode->i_mtime = inode->i_atime = inode->i_ctime = current_time(inode);
......@@ -92,7 +92,7 @@ static void *__ns_get_path(struct path *path, struct ns_common *ns)
dentry = d_alloc_anon(mnt->mnt_sb);
if (!dentry) {
iput(inode);
return ERR_PTR(-ENOMEM);
return -ENOMEM;
}
d_instantiate(dentry, inode);
dentry->d_fsdata = (void *)ns->ops;
......@@ -101,23 +101,22 @@ static void *__ns_get_path(struct path *path, struct ns_common *ns)
d_delete(dentry); /* make sure ->d_prune() does nothing */
dput(dentry);
cpu_relax();
return ERR_PTR(-EAGAIN);
return -EAGAIN;
}
goto got_it;
}
void *ns_get_path_cb(struct path *path, ns_get_path_helper_t *ns_get_cb,
int ns_get_path_cb(struct path *path, ns_get_path_helper_t *ns_get_cb,
void *private_data)
{
void *ret;
int ret;
do {
struct ns_common *ns = ns_get_cb(private_data);
if (!ns)
return ERR_PTR(-ENOENT);
return -ENOENT;
ret = __ns_get_path(path, ns);
} while (ret == ERR_PTR(-EAGAIN));
} while (ret == -EAGAIN);
return ret;
}
......@@ -134,7 +133,7 @@ static struct ns_common *ns_get_path_task(void *private_data)
return args->ns_ops->get(args->task);
}
void *ns_get_path(struct path *path, struct task_struct *task,
int ns_get_path(struct path *path, struct task_struct *task,
const struct proc_ns_operations *ns_ops)
{
struct ns_get_path_task_args args = {
......@@ -150,7 +149,7 @@ int open_related_ns(struct ns_common *ns,
{
struct path path = {};
struct file *f;
void *err;
int err;
int fd;
fd = get_unused_fd_flags(O_CLOEXEC);
......@@ -167,11 +166,11 @@ int open_related_ns(struct ns_common *ns,
}
err = __ns_get_path(&path, relative);
} while (err == ERR_PTR(-EAGAIN));
} while (err == -EAGAIN);
if (IS_ERR(err)) {
if (err) {
put_unused_fd(fd);
return PTR_ERR(err);
return err;
}
f = dentry_open(&path, O_RDONLY, current_cred());
......
......@@ -955,48 +955,84 @@ struct file *open_with_fake_path(const struct path *path, int flags,
}
EXPORT_SYMBOL(open_with_fake_path);
static inline int build_open_flags(int flags, umode_t mode, struct open_flags *op)
#define WILL_CREATE(flags) (flags & (O_CREAT | __O_TMPFILE))
#define O_PATH_FLAGS (O_DIRECTORY | O_NOFOLLOW | O_PATH | O_CLOEXEC)
static inline struct open_how build_open_how(int flags, umode_t mode)
{
struct open_how how = {
.flags = flags & VALID_OPEN_FLAGS,
.mode = mode & S_IALLUGO,
};
/* O_PATH beats everything else. */
if (how.flags & O_PATH)
how.flags &= O_PATH_FLAGS;
/* Modes should only be set for create-like flags. */
if (!WILL_CREATE(how.flags))
how.mode = 0;
return how;
}
static inline int build_open_flags(const struct open_how *how,
struct open_flags *op)
{
int flags = how->flags;
int lookup_flags = 0;
int acc_mode = ACC_MODE(flags);
/* Must never be set by userspace */
flags &= ~(FMODE_NONOTIFY | O_CLOEXEC);
/*
* Clear out all open flags we don't know about so that we don't report
* them in fcntl(F_GETFD) or similar interfaces.
* Older syscalls implicitly clear all of the invalid flags or argument
* values before calling build_open_flags(), but openat2(2) checks all
* of its arguments.
*/
flags &= VALID_OPEN_FLAGS;
if (flags & ~VALID_OPEN_FLAGS)
return -EINVAL;
if (how->resolve & ~VALID_RESOLVE_FLAGS)
return -EINVAL;
if (flags & (O_CREAT | __O_TMPFILE))
op->mode = (mode & S_IALLUGO) | S_IFREG;
else
/* Deal with the mode. */
if (WILL_CREATE(flags)) {
if (how->mode & ~S_IALLUGO)
return -EINVAL;
op->mode = how->mode | S_IFREG;
} else {
if (how->mode != 0)
return -EINVAL;
op->mode = 0;
/* Must never be set by userspace */
flags &= ~FMODE_NONOTIFY & ~O_CLOEXEC;
}
/*
* O_SYNC is implemented as __O_SYNC|O_DSYNC. As many places only
* check for O_DSYNC if the need any syncing at all we enforce it's
* always set instead of having to deal with possibly weird behaviour
* for malicious applications setting only __O_SYNC.
* In order to ensure programs get explicit errors when trying to use
* O_TMPFILE on old kernels, O_TMPFILE is implemented such that it
* looks like (O_DIRECTORY|O_RDWR & ~O_CREAT) to old kernels. But we
* have to require userspace to explicitly set it.
*/
if (flags & __O_SYNC)
flags |= O_DSYNC;
if (flags & __O_TMPFILE) {
if ((flags & O_TMPFILE_MASK) != O_TMPFILE)
return -EINVAL;
if (!(acc_mode & MAY_WRITE))
return -EINVAL;
} else if (flags & O_PATH) {
/*
* If we have O_PATH in the open flag. Then we
* cannot have anything other than the below set of flags
*/
flags &= O_DIRECTORY | O_NOFOLLOW | O_PATH;
}
if (flags & O_PATH) {
/* O_PATH only permits certain other flags to be set. */
if (flags & ~O_PATH_FLAGS)
return -EINVAL;
acc_mode = 0;
}
/*
* O_SYNC is implemented as __O_SYNC|O_DSYNC. As many places only
* check for O_DSYNC if the need any syncing at all we enforce it's
* always set instead of having to deal with possibly weird behaviour
* for malicious applications setting only __O_SYNC.
*/
if (flags & __O_SYNC)
flags |= O_DSYNC;
op->open_flag = flags;
/* O_TRUNC implies we need access checks for write permissions */
......@@ -1022,6 +1058,18 @@ static inline int build_open_flags(int flags, umode_t mode, struct open_flags *o
lookup_flags |= LOOKUP_DIRECTORY;
if (!(flags & O_NOFOLLOW))
lookup_flags |= LOOKUP_FOLLOW;
if (how->resolve & RESOLVE_NO_XDEV)
lookup_flags |= LOOKUP_NO_XDEV;
if (how->resolve & RESOLVE_NO_MAGICLINKS)
lookup_flags |= LOOKUP_NO_MAGICLINKS;
if (how->resolve & RESOLVE_NO_SYMLINKS)
lookup_flags |= LOOKUP_NO_SYMLINKS;
if (how->resolve & RESOLVE_BENEATH)
lookup_flags |= LOOKUP_BENEATH;
if (how->resolve & RESOLVE_IN_ROOT)
lookup_flags |= LOOKUP_IN_ROOT;
op->lookup_flags = lookup_flags;
return 0;
}
......@@ -1040,8 +1088,11 @@ static inline int build_open_flags(int flags, umode_t mode, struct open_flags *o
struct file *file_open_name(struct filename *name, int flags, umode_t mode)
{
struct open_flags op;
int err = build_open_flags(flags, mode, &op);
return err ? ERR_PTR(err) : do_filp_open(AT_FDCWD, name, &op);
struct open_how how = build_open_how(flags, mode);
int err = build_open_flags(&how, &op);
if (err)
return ERR_PTR(err);
return do_filp_open(AT_FDCWD, name, &op);
}
/**
......@@ -1072,17 +1123,19 @@ struct file *file_open_root(struct dentry *dentry, struct vfsmount *mnt,
const char *filename, int flags, umode_t mode)
{
struct open_flags op;
int err = build_open_flags(flags, mode, &op);
struct open_how how = build_open_how(flags, mode);
int err = build_open_flags(&how, &op);
if (err)
return ERR_PTR(err);
return do_file_open_root(dentry, mnt, filename, &op);
}
EXPORT_SYMBOL(file_open_root);
long do_sys_open(int dfd, const char __user *filename, int flags, umode_t mode)
static long do_sys_openat2(int dfd, const char __user *filename,
struct open_how *how)
{
struct open_flags op;
int fd = build_open_flags(flags, mode, &op);
int fd = build_open_flags(how, &op);
struct filename *tmp;
if (fd)
......@@ -1092,7 +1145,7 @@ long do_sys_open(int dfd, const char __user *filename, int flags, umode_t mode)
if (IS_ERR(tmp))
return PTR_ERR(tmp);
fd = get_unused_fd_flags(flags);
fd = get_unused_fd_flags(how->flags);
if (fd >= 0) {
struct file *f = do_filp_open(dfd, tmp, &op);
if (IS_ERR(f)) {
......@@ -1107,12 +1160,16 @@ long do_sys_open(int dfd, const char __user *filename, int flags, umode_t mode)
return fd;
}
SYSCALL_DEFINE3(open, const char __user *, filename, int, flags, umode_t, mode)
long do_sys_open(int dfd, const char __user *filename, int flags, umode_t mode)
{
if (force_o_largefile())
flags |= O_LARGEFILE;
struct open_how how = build_open_how(flags, mode);
return do_sys_openat2(dfd, filename, &how);
}
return do_sys_open(AT_FDCWD, filename, flags, mode);
SYSCALL_DEFINE3(open, const char __user *, filename, int, flags, umode_t, mode)
{
return ksys_open(filename, flags, mode);
}
SYSCALL_DEFINE4(openat, int, dfd, const char __user *, filename, int, flags,
......@@ -1120,10 +1177,32 @@ SYSCALL_DEFINE4(openat, int, dfd, const char __user *, filename, int, flags,
{
if (force_o_largefile())
flags |= O_LARGEFILE;
return do_sys_open(dfd, filename, flags, mode);
}
SYSCALL_DEFINE4(openat2, int, dfd, const char __user *, filename,
struct open_how __user *, how, size_t, usize)
{
int err;
struct open_how tmp;
BUILD_BUG_ON(sizeof(struct open_how) < OPEN_HOW_SIZE_VER0);
BUILD_BUG_ON(sizeof(struct open_how) != OPEN_HOW_SIZE_LATEST);
if (unlikely(usize < OPEN_HOW_SIZE_VER0))
return -EINVAL;
err = copy_struct_from_user(&tmp, sizeof(tmp), how, usize);
if (err)
return err;
/* O_LARGEFILE is only allowed for non-O_PATH. */
if (!(tmp.flags & O_PATH) && force_o_largefile())
tmp.flags |= O_LARGEFILE;
return do_sys_openat2(dfd, filename, &tmp);
}
#ifdef CONFIG_COMPAT
/*
* Exactly like sys_open(), except that it doesn't set the
......
......@@ -1718,8 +1718,7 @@ static const char *proc_pid_get_link(struct dentry *dentry,
if (error)
goto out;
nd_jump_link(&path);
return NULL;
error = nd_jump_link(&path);
out:
return ERR_PTR(error);
}
......
......@@ -46,22 +46,26 @@ static const char *proc_ns_get_link(struct dentry *dentry,
const struct proc_ns_operations *ns_ops = PROC_I(inode)->ns_ops;
struct task_struct *task;
struct path ns_path;
void *error = ERR_PTR(-EACCES);
int error = -EACCES;
if (!dentry)
return ERR_PTR(-ECHILD);
task = get_proc_task(inode);
if (!task)
return error;
return ERR_PTR(-EACCES);
if (ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS)) {
error = ns_get_path(&ns_path, task, ns_ops);
if (!error)
nd_jump_link(&ns_path);
}
if (!ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS))
goto out;
error = ns_get_path(&ns_path, task, ns_ops);
if (error)
goto out;
error = nd_jump_link(&ns_path);
out:
put_task_struct(task);
return error;
return ERR_PTR(error);
}
static int proc_ns_readlink(struct dentry *dentry, char __user *buffer, int buflen)
......
......@@ -2,15 +2,29 @@
#ifndef _LINUX_FCNTL_H
#define _LINUX_FCNTL_H
#include <linux/stat.h>
#include <uapi/linux/fcntl.h>
/* list of all valid flags for the open/openat flags argument: */
/* List of all valid flags for the open/openat flags argument: */
#define VALID_OPEN_FLAGS \
(O_RDONLY | O_WRONLY | O_RDWR | O_CREAT | O_EXCL | O_NOCTTY | O_TRUNC | \
O_APPEND | O_NDELAY | O_NONBLOCK | O_NDELAY | __O_SYNC | O_DSYNC | \
FASYNC | O_DIRECT | O_LARGEFILE | O_DIRECTORY | O_NOFOLLOW | \
O_NOATIME | O_CLOEXEC | O_PATH | __O_TMPFILE)
/* List of all valid flags for the how->upgrade_mask argument: */
#define VALID_UPGRADE_FLAGS \
(UPGRADE_NOWRITE | UPGRADE_NOREAD)
/* List of all valid flags for the how->resolve argument: */
#define VALID_RESOLVE_FLAGS \
(RESOLVE_NO_XDEV | RESOLVE_NO_MAGICLINKS | RESOLVE_NO_SYMLINKS | \
RESOLVE_BENEATH | RESOLVE_IN_ROOT)
/* List of all open_how "versions". */
#define OPEN_HOW_SIZE_VER0 24 /* sizeof first published struct */
#define OPEN_HOW_SIZE_LATEST OPEN_HOW_SIZE_VER0
#ifndef force_o_largefile
#define force_o_largefile() (!IS_ENABLED(CONFIG_ARCH_32BIT_OFF_T))
#endif
......
......@@ -2,6 +2,7 @@
#ifndef _LINUX_NAMEI_H
#define _LINUX_NAMEI_H
#include <linux/fs.h>
#include <linux/kernel.h>
#include <linux/path.h>
#include <linux/fcntl.h>
......@@ -38,6 +39,15 @@ enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT, LAST_BIND};
#define LOOKUP_ROOT 0x2000
#define LOOKUP_ROOT_GRABBED 0x0008
/* Scoping flags for lookup. */
#define LOOKUP_NO_SYMLINKS 0x010000 /* No symlink crossing. */
#define LOOKUP_NO_MAGICLINKS 0x020000 /* No nd_jump_link() crossing. */
#define LOOKUP_NO_XDEV 0x040000 /* No mountpoint crossing. */
#define LOOKUP_BENEATH 0x080000 /* No escaping from starting point. */
#define LOOKUP_IN_ROOT 0x100000 /* Treat dirfd as fs root. */
/* LOOKUP_* flags which do scope-related checks based on the dirfd. */
#define LOOKUP_IS_SCOPED (LOOKUP_BENEATH | LOOKUP_IN_ROOT)
extern int path_pts(struct path *path);
extern int user_path_at_empty(int, const char __user *, unsigned, struct path *, int *empty);
......@@ -68,7 +78,7 @@ extern int follow_up(struct path *);
extern struct dentry *lock_rename(struct dentry *, struct dentry *);
extern void unlock_rename(struct dentry *, struct dentry *);
extern void nd_jump_link(struct path *path);
extern int __must_check nd_jump_link(struct path *path);
static inline void nd_terminate_link(void *name, size_t len, size_t maxlen)
{
......
......@@ -79,10 +79,10 @@ static inline int ns_alloc_inum(struct ns_common *ns)
extern struct file *proc_ns_fget(int fd);
#define get_proc_ns(inode) ((struct ns_common *)(inode)->i_private)
extern void *ns_get_path(struct path *path, struct task_struct *task,
extern int ns_get_path(struct path *path, struct task_struct *task,
const struct proc_ns_operations *ns_ops);
typedef struct ns_common *ns_get_path_helper_t(void *);
extern void *ns_get_path_cb(struct path *path, ns_get_path_helper_t ns_get_cb,
extern int ns_get_path_cb(struct path *path, ns_get_path_helper_t ns_get_cb,
void *private_data);
extern int ns_get_name(char *buf, size_t size, struct task_struct *task,
......
......@@ -69,6 +69,7 @@ struct rseq;
union bpf_attr;
struct io_uring_params;
struct clone_args;
struct open_how;
#include <linux/types.h>
#include <linux/aio_abi.h>
......@@ -439,6 +440,8 @@ asmlinkage long sys_fchownat(int dfd, const char __user *filename, uid_t user,
asmlinkage long sys_fchown(unsigned int fd, uid_t user, gid_t group);
asmlinkage long sys_openat(int dfd, const char __user *filename, int flags,
umode_t mode);
asmlinkage long sys_openat2(int dfd, const char __user *filename,
struct open_how *how, size_t size);
asmlinkage long sys_close(unsigned int fd);
asmlinkage long sys_vhangup(void);
......
......@@ -851,8 +851,11 @@ __SYSCALL(__NR_pidfd_open, sys_pidfd_open)
__SYSCALL(__NR_clone3, sys_clone3)
#endif
#define __NR_openat2 437
__SYSCALL(__NR_openat2, sys_openat2)
#undef __NR_syscalls
#define __NR_syscalls 436
#define __NR_syscalls 438
/*
* 32 bit systems traditionally used different
......
......@@ -3,6 +3,7 @@
#define _UAPI_LINUX_FCNTL_H
#include <asm/fcntl.h>
#include <linux/openat2.h>
#define F_SETLEASE (F_LINUX_SPECIFIC_BASE + 0)
#define F_GETLEASE (F_LINUX_SPECIFIC_BASE + 1)
......@@ -100,5 +101,4 @@
#define AT_RECURSIVE 0x8000 /* Apply to the entire subtree */
#endif /* _UAPI_LINUX_FCNTL_H */
/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
#ifndef _UAPI_LINUX_OPENAT2_H
#define _UAPI_LINUX_OPENAT2_H
#include <linux/types.h>
/*
* Arguments for how openat2(2) should open the target path. If only @flags and
* @mode are non-zero, then openat2(2) operates very similarly to openat(2).
*
* However, unlike openat(2), unknown or invalid bits in @flags result in
* -EINVAL rather than being silently ignored. @mode must be zero unless one of
* {O_CREAT, O_TMPFILE} are set.
*
* @flags: O_* flags.
* @mode: O_CREAT/O_TMPFILE file mode.
* @resolve: RESOLVE_* flags.
*/
struct open_how {
__u64 flags;
__u64 mode;
__u64 resolve;
};
/* how->resolve flags for openat2(2). */
#define RESOLVE_NO_XDEV 0x01 /* Block mount-point crossings
(includes bind-mounts). */
#define RESOLVE_NO_MAGICLINKS 0x02 /* Block traversal through procfs-style
"magic-links". */
#define RESOLVE_NO_SYMLINKS 0x04 /* Block traversal through all symlinks
(implies OEXT_NO_MAGICLINKS) */
#define RESOLVE_BENEATH 0x08 /* Block "lexical" trickery like
"..", symlinks, and absolute
paths which escape the dirfd. */
#define RESOLVE_IN_ROOT 0x10 /* Make all jumps to "/" and ".."
be scoped inside the dirfd
(similar to chroot(2)). */
#endif /* _UAPI_LINUX_OPENAT2_H */
......@@ -302,14 +302,14 @@ int bpf_prog_offload_info_fill(struct bpf_prog_info *info,
struct inode *ns_inode;
struct path ns_path;
char __user *uinsns;
void *res;
int res;
u32 ulen;
res = ns_get_path_cb(&ns_path, bpf_prog_offload_info_fill_ns, &args);
if (IS_ERR(res)) {
if (res) {
if (!info->ifindex)
return -ENODEV;
return PTR_ERR(res);
return res;
}
down_read(&bpf_devs_lock);
......@@ -526,13 +526,13 @@ int bpf_map_offload_info_fill(struct bpf_map_info *info, struct bpf_map *map)
};
struct inode *ns_inode;
struct path ns_path;
void *res;
int res;
res = ns_get_path_cb(&ns_path, bpf_map_offload_info_fill_ns, &args);
if (IS_ERR(res)) {
if (res) {
if (!info->ifindex)
return -ENODEV;
return PTR_ERR(res);
return res;
}
ns_inode = ns_path.dentry->d_inode;
......
......@@ -7495,7 +7495,7 @@ static void perf_fill_ns_link_info(struct perf_ns_link_info *ns_link_info,
{
struct path ns_path;
struct inode *ns_inode;
void *error;
int error;
error = ns_get_path(&ns_path, task, ns_ops);
if (!error) {
......
......@@ -2573,16 +2573,18 @@ static const char *policy_get_link(struct dentry *dentry,
{
struct aa_ns *ns;
struct path path;
int error;
if (!dentry)
return ERR_PTR(-ECHILD);
ns = aa_get_current_ns();
path.mnt = mntget(aafs_mnt);
path.dentry = dget(ns_dir(ns));
nd_jump_link(&path);
error = nd_jump_link(&path);
aa_put_ns(ns);
return NULL;
return ERR_PTR(error);
}
static int policy_readlink(struct dentry *dentry, char __user *buffer,
......
......@@ -41,6 +41,7 @@ TARGETS += powerpc
TARGETS += proc
TARGETS += pstore
TARGETS += ptrace
TARGETS += openat2
TARGETS += rseq
TARGETS += rtc
TARGETS += seccomp
......
# SPDX-License-Identifier: GPL-2.0-or-later
CFLAGS += -Wall -O2 -g -fsanitize=address -fsanitize=undefined
TEST_GEN_PROGS := openat2_test resolve_test rename_attack_test
include ../lib.mk
$(TEST_GEN_PROGS): helpers.c
// SPDX-License-Identifier: GPL-2.0-or-later
/*
* Author: Aleksa Sarai <cyphar@cyphar.com>
* Copyright (C) 2018-2019 SUSE LLC.
*/
#define _GNU_SOURCE
#include <errno.h>
#include <fcntl.h>
#include <stdbool.h>
#include <string.h>
#include <syscall.h>
#include <limits.h>
#include "helpers.h"
bool needs_openat2(const struct open_how *how)
{
return how->resolve != 0;
}
int raw_openat2(int dfd, const char *path, void *how, size_t size)
{
int ret = syscall(__NR_openat2, dfd, path, how, size);
return ret >= 0 ? ret : -errno;
}
int sys_openat2(int dfd, const char *path, struct open_how *how)
{
return raw_openat2(dfd, path, how, sizeof(*how));
}
int sys_openat(int dfd, const char *path, struct open_how *how)
{
int ret = openat(dfd, path, how->flags, how->mode);
return ret >= 0 ? ret : -errno;
}
int sys_renameat2(int olddirfd, const char *oldpath,
int newdirfd, const char *newpath, unsigned int flags)
{
int ret = syscall(__NR_renameat2, olddirfd, oldpath,
newdirfd, newpath, flags);
return ret >= 0 ? ret : -errno;
}
int touchat(int dfd, const char *path)
{
int fd = openat(dfd, path, O_CREAT);
if (fd >= 0)
close(fd);
return fd;
}
char *fdreadlink(int fd)
{
char *target, *tmp;
E_asprintf(&tmp, "/proc/self/fd/%d", fd);
target = malloc(PATH_MAX);
if (!target)
ksft_exit_fail_msg("fdreadlink: malloc failed\n");
memset(target, 0, PATH_MAX);
E_readlink(tmp, target, PATH_MAX);
free(tmp);
return target;
}
bool fdequal(int fd, int dfd, const char *path)
{
char *fdpath, *dfdpath, *other;
bool cmp;
fdpath = fdreadlink(fd);
dfdpath = fdreadlink(dfd);
if (!path)
E_asprintf(&other, "%s", dfdpath);
else if (*path == '/')
E_asprintf(&other, "%s", path);
else
E_asprintf(&other, "%s/%s", dfdpath, path);
cmp = !strcmp(fdpath, other);
free(fdpath);
free(dfdpath);
free(other);
return cmp;
}
bool openat2_supported = false;
void __attribute__((constructor)) init(void)
{
struct open_how how = {};
int fd;
BUILD_BUG_ON(sizeof(struct open_how) != OPEN_HOW_SIZE_VER0);
/* Check openat2(2) support. */
fd = sys_openat2(AT_FDCWD, ".", &how);
openat2_supported = (fd >= 0);
if (fd >= 0)
close(fd);
}
// SPDX-License-Identifier: GPL-2.0-or-later
/*
* Author: Aleksa Sarai <cyphar@cyphar.com>
* Copyright (C) 2018-2019 SUSE LLC.
*/
#ifndef __RESOLVEAT_H__
#define __RESOLVEAT_H__
#define _GNU_SOURCE
#include <stdint.h>
#include <errno.h>
#include <linux/types.h>
#include "../kselftest.h"
#define ARRAY_LEN(X) (sizeof (X) / sizeof (*(X)))
#define BUILD_BUG_ON(e) ((void)(sizeof(struct { int:(-!!(e)); })))
#ifndef SYS_openat2
#ifndef __NR_openat2
#define __NR_openat2 437
#endif /* __NR_openat2 */
#define SYS_openat2 __NR_openat2
#endif /* SYS_openat2 */
/*
* Arguments for how openat2(2) should open the target path. If @resolve is
* zero, then openat2(2) operates very similarly to openat(2).
*
* However, unlike openat(2), unknown bits in @flags result in -EINVAL rather
* than being silently ignored. @mode must be zero unless one of {O_CREAT,
* O_TMPFILE} are set.
*
* @flags: O_* flags.
* @mode: O_CREAT/O_TMPFILE file mode.
* @resolve: RESOLVE_* flags.
*/
struct open_how {
__u64 flags;
__u64 mode;
__u64 resolve;
};
#define OPEN_HOW_SIZE_VER0 24 /* sizeof first published struct */
#define OPEN_HOW_SIZE_LATEST OPEN_HOW_SIZE_VER0
bool needs_openat2(const struct open_how *how);
#ifndef RESOLVE_IN_ROOT
/* how->resolve flags for openat2(2). */
#define RESOLVE_NO_XDEV 0x01 /* Block mount-point crossings
(includes bind-mounts). */
#define RESOLVE_NO_MAGICLINKS 0x02 /* Block traversal through procfs-style
"magic-links". */
#define RESOLVE_NO_SYMLINKS 0x04 /* Block traversal through all symlinks
(implies OEXT_NO_MAGICLINKS) */
#define RESOLVE_BENEATH 0x08 /* Block "lexical" trickery like
"..", symlinks, and absolute
paths which escape the dirfd. */
#define RESOLVE_IN_ROOT 0x10 /* Make all jumps to "/" and ".."
be scoped inside the dirfd
(similar to chroot(2)). */
#endif /* RESOLVE_IN_ROOT */
#define E_func(func, ...) \
do { \
if (func(__VA_ARGS__) < 0) \
ksft_exit_fail_msg("%s:%d %s failed\n", \
__FILE__, __LINE__, #func);\
} while (0)
#define E_asprintf(...) E_func(asprintf, __VA_ARGS__)
#define E_chmod(...) E_func(chmod, __VA_ARGS__)
#define E_dup2(...) E_func(dup2, __VA_ARGS__)
#define E_fchdir(...) E_func(fchdir, __VA_ARGS__)
#define E_fstatat(...) E_func(fstatat, __VA_ARGS__)
#define E_kill(...) E_func(kill, __VA_ARGS__)
#define E_mkdirat(...) E_func(mkdirat, __VA_ARGS__)
#define E_mount(...) E_func(mount, __VA_ARGS__)
#define E_prctl(...) E_func(prctl, __VA_ARGS__)
#define E_readlink(...) E_func(readlink, __VA_ARGS__)
#define E_setresuid(...) E_func(setresuid, __VA_ARGS__)
#define E_symlinkat(...) E_func(symlinkat, __VA_ARGS__)
#define E_touchat(...) E_func(touchat, __VA_ARGS__)
#define E_unshare(...) E_func(unshare, __VA_ARGS__)
#define E_assert(expr, msg, ...) \
do { \
if (!(expr)) \
ksft_exit_fail_msg("ASSERT(%s:%d) failed (%s): " msg "\n", \
__FILE__, __LINE__, #expr, ##__VA_ARGS__); \
} while (0)
int raw_openat2(int dfd, const char *path, void *how, size_t size);
int sys_openat2(int dfd, const char *path, struct open_how *how);
int sys_openat(int dfd, const char *path, struct open_how *how);
int sys_renameat2(int olddirfd, const char *oldpath,
int newdirfd, const char *newpath, unsigned int flags);
int touchat(int dfd, const char *path);
char *fdreadlink(int fd);
bool fdequal(int fd, int dfd, const char *path);
extern bool openat2_supported;
#endif /* __RESOLVEAT_H__ */
// SPDX-License-Identifier: GPL-2.0-or-later
/*
* Author: Aleksa Sarai <cyphar@cyphar.com>
* Copyright (C) 2018-2019 SUSE LLC.
*/
#define _GNU_SOURCE
#include <fcntl.h>
#include <sched.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/mount.h>
#include <stdlib.h>
#include <stdbool.h>
#include <string.h>
#include "../kselftest.h"
#include "helpers.h"
/*
* O_LARGEFILE is set to 0 by glibc.
* XXX: This is wrong on {mips, parisc, powerpc, sparc}.
*/
#undef O_LARGEFILE
#define O_LARGEFILE 0x8000
struct open_how_ext {
struct open_how inner;
uint32_t extra1;
char pad1[128];
uint32_t extra2;
char pad2[128];
uint32_t extra3;
};
struct struct_test {
const char *name;
struct open_how_ext arg;
size_t size;
int err;
};
#define NUM_OPENAT2_STRUCT_TESTS 7
#define NUM_OPENAT2_STRUCT_VARIATIONS 13
void test_openat2_struct(void)
{
int misalignments[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 17, 87 };
struct struct_test tests[] = {
/* Normal struct. */
{ .name = "normal struct",
.arg.inner.flags = O_RDONLY,
.size = sizeof(struct open_how) },
/* Bigger struct, with zeroed out end. */
{ .name = "bigger struct (zeroed out)",
.arg.inner.flags = O_RDONLY,
.size = sizeof(struct open_how_ext) },
/* TODO: Once expanded, check zero-padding. */
/* Smaller than version-0 struct. */
{ .name = "zero-sized 'struct'",
.arg.inner.flags = O_RDONLY, .size = 0, .err = -EINVAL },
{ .name = "smaller-than-v0 struct",
.arg.inner.flags = O_RDONLY,
.size = OPEN_HOW_SIZE_VER0 - 1, .err = -EINVAL },
/* Bigger struct, with non-zero trailing bytes. */
{ .name = "bigger struct (non-zero data in first 'future field')",
.arg.inner.flags = O_RDONLY, .arg.extra1 = 0xdeadbeef,
.size = sizeof(struct open_how_ext), .err = -E2BIG },
{ .name = "bigger struct (non-zero data in middle of 'future fields')",
.arg.inner.flags = O_RDONLY, .arg.extra2 = 0xfeedcafe,
.size = sizeof(struct open_how_ext), .err = -E2BIG },
{ .name = "bigger struct (non-zero data at end of 'future fields')",
.arg.inner.flags = O_RDONLY, .arg.extra3 = 0xabad1dea,
.size = sizeof(struct open_how_ext), .err = -E2BIG },
};
BUILD_BUG_ON(ARRAY_LEN(misalignments) != NUM_OPENAT2_STRUCT_VARIATIONS);
BUILD_BUG_ON(ARRAY_LEN(tests) != NUM_OPENAT2_STRUCT_TESTS);
for (int i = 0; i < ARRAY_LEN(tests); i++) {
struct struct_test *test = &tests[i];
struct open_how_ext how_ext = test->arg;
for (int j = 0; j < ARRAY_LEN(misalignments); j++) {
int fd, misalign = misalignments[j];
char *fdpath = NULL;
bool failed;
void (*resultfn)(const char *msg, ...) = ksft_test_result_pass;
void *copy = NULL, *how_copy = &how_ext;
if (!openat2_supported) {
ksft_print_msg("openat2(2) unsupported\n");
resultfn = ksft_test_result_skip;
goto skip;
}
if (misalign) {
/*
* Explicitly misalign the structure copying it with the given
* (mis)alignment offset. The other data is set to be non-zero to
* make sure that non-zero bytes outside the struct aren't checked
*
* This is effectively to check that is_zeroed_user() works.
*/
copy = malloc(misalign + sizeof(how_ext));
how_copy = copy + misalign;
memset(copy, 0xff, misalign);
memcpy(how_copy, &how_ext, sizeof(how_ext));
}
fd = raw_openat2(AT_FDCWD, ".", how_copy, test->size);
if (test->err >= 0)
failed = (fd < 0);
else
failed = (fd != test->err);
if (fd >= 0) {
fdpath = fdreadlink(fd);
close(fd);
}
if (failed) {
resultfn = ksft_test_result_fail;
ksft_print_msg("openat2 unexpectedly returned ");
if (fdpath)
ksft_print_msg("%d['%s']\n", fd, fdpath);
else
ksft_print_msg("%d (%s)\n", fd, strerror(-fd));
}
skip:
if (test->err >= 0)
resultfn("openat2 with %s argument [misalign=%d] succeeds\n",
test->name, misalign);
else
resultfn("openat2 with %s argument [misalign=%d] fails with %d (%s)\n",
test->name, misalign, test->err,
strerror(-test->err));
free(copy);
free(fdpath);
fflush(stdout);
}
}
}
struct flag_test {
const char *name;
struct open_how how;
int err;
};
#define NUM_OPENAT2_FLAG_TESTS 23
void test_openat2_flags(void)
{
struct flag_test tests[] = {
/* O_TMPFILE is incompatible with O_PATH and O_CREAT. */
{ .name = "incompatible flags (O_TMPFILE | O_PATH)",
.how.flags = O_TMPFILE | O_PATH | O_RDWR, .err = -EINVAL },
{ .name = "incompatible flags (O_TMPFILE | O_CREAT)",
.how.flags = O_TMPFILE | O_CREAT | O_RDWR, .err = -EINVAL },
/* O_PATH only permits certain other flags to be set ... */
{ .name = "compatible flags (O_PATH | O_CLOEXEC)",
.how.flags = O_PATH | O_CLOEXEC },
{ .name = "compatible flags (O_PATH | O_DIRECTORY)",
.how.flags = O_PATH | O_DIRECTORY },
{ .name = "compatible flags (O_PATH | O_NOFOLLOW)",
.how.flags = O_PATH | O_NOFOLLOW },
/* ... and others are absolutely not permitted. */
{ .name = "incompatible flags (O_PATH | O_RDWR)",
.how.flags = O_PATH | O_RDWR, .err = -EINVAL },
{ .name = "incompatible flags (O_PATH | O_CREAT)",
.how.flags = O_PATH | O_CREAT, .err = -EINVAL },
{ .name = "incompatible flags (O_PATH | O_EXCL)",
.how.flags = O_PATH | O_EXCL, .err = -EINVAL },
{ .name = "incompatible flags (O_PATH | O_NOCTTY)",
.how.flags = O_PATH | O_NOCTTY, .err = -EINVAL },
{ .name = "incompatible flags (O_PATH | O_DIRECT)",
.how.flags = O_PATH | O_DIRECT, .err = -EINVAL },
{ .name = "incompatible flags (O_PATH | O_LARGEFILE)",
.how.flags = O_PATH | O_LARGEFILE, .err = -EINVAL },
/* ->mode must only be set with O_{CREAT,TMPFILE}. */
{ .name = "non-zero how.mode and O_RDONLY",
.how.flags = O_RDONLY, .how.mode = 0600, .err = -EINVAL },
{ .name = "non-zero how.mode and O_PATH",
.how.flags = O_PATH, .how.mode = 0600, .err = -EINVAL },
{ .name = "valid how.mode and O_CREAT",
.how.flags = O_CREAT, .how.mode = 0600 },
{ .name = "valid how.mode and O_TMPFILE",
.how.flags = O_TMPFILE | O_RDWR, .how.mode = 0600 },
/* ->mode must only contain 0777 bits. */
{ .name = "invalid how.mode and O_CREAT",
.how.flags = O_CREAT,
.how.mode = 0xFFFF, .err = -EINVAL },
{ .name = "invalid (very large) how.mode and O_CREAT",
.how.flags = O_CREAT,
.how.mode = 0xC000000000000000ULL, .err = -EINVAL },
{ .name = "invalid how.mode and O_TMPFILE",
.how.flags = O_TMPFILE | O_RDWR,
.how.mode = 0x1337, .err = -EINVAL },
{ .name = "invalid (very large) how.mode and O_TMPFILE",
.how.flags = O_TMPFILE | O_RDWR,
.how.mode = 0x0000A00000000000ULL, .err = -EINVAL },
/* ->resolve must only contain RESOLVE_* flags. */
{ .name = "invalid how.resolve and O_RDONLY",
.how.flags = O_RDONLY,
.how.resolve = 0x1337, .err = -EINVAL },
{ .name = "invalid how.resolve and O_CREAT",
.how.flags = O_CREAT,
.how.resolve = 0x1337, .err = -EINVAL },
{ .name = "invalid how.resolve and O_TMPFILE",
.how.flags = O_TMPFILE | O_RDWR,
.how.resolve = 0x1337, .err = -EINVAL },
{ .name = "invalid how.resolve and O_PATH",
.how.flags = O_PATH,
.how.resolve = 0x1337, .err = -EINVAL },
};
BUILD_BUG_ON(ARRAY_LEN(tests) != NUM_OPENAT2_FLAG_TESTS);
for (int i = 0; i < ARRAY_LEN(tests); i++) {
int fd, fdflags = -1;
char *path, *fdpath = NULL;
bool failed = false;
struct flag_test *test = &tests[i];
void (*resultfn)(const char *msg, ...) = ksft_test_result_pass;
if (!openat2_supported) {
ksft_print_msg("openat2(2) unsupported\n");
resultfn = ksft_test_result_skip;
goto skip;
}
path = (test->how.flags & O_CREAT) ? "/tmp/ksft.openat2_tmpfile" : ".";
unlink(path);
fd = sys_openat2(AT_FDCWD, path, &test->how);
if (test->err >= 0)
failed = (fd < 0);
else
failed = (fd != test->err);
if (fd >= 0) {
int otherflags;
fdpath = fdreadlink(fd);
fdflags = fcntl(fd, F_GETFL);
otherflags = fcntl(fd, F_GETFD);
close(fd);
E_assert(fdflags >= 0, "fcntl F_GETFL of new fd");
E_assert(otherflags >= 0, "fcntl F_GETFD of new fd");
/* O_CLOEXEC isn't shown in F_GETFL. */
if (otherflags & FD_CLOEXEC)
fdflags |= O_CLOEXEC;
/* O_CREAT is hidden from F_GETFL. */
if (test->how.flags & O_CREAT)
fdflags |= O_CREAT;
if (!(test->how.flags & O_LARGEFILE))
fdflags &= ~O_LARGEFILE;
failed |= (fdflags != test->how.flags);
}
if (failed) {
resultfn = ksft_test_result_fail;
ksft_print_msg("openat2 unexpectedly returned ");
if (fdpath)
ksft_print_msg("%d['%s'] with %X (!= %X)\n",
fd, fdpath, fdflags,
test->how.flags);
else
ksft_print_msg("%d (%s)\n", fd, strerror(-fd));
}
skip:
if (test->err >= 0)
resultfn("openat2 with %s succeeds\n", test->name);
else
resultfn("openat2 with %s fails with %d (%s)\n",
test->name, test->err, strerror(-test->err));
free(fdpath);
fflush(stdout);
}
}
#define NUM_TESTS (NUM_OPENAT2_STRUCT_VARIATIONS * NUM_OPENAT2_STRUCT_TESTS + \
NUM_OPENAT2_FLAG_TESTS)
int main(int argc, char **argv)
{
ksft_print_header();
ksft_set_plan(NUM_TESTS);
test_openat2_struct();
test_openat2_flags();
if (ksft_get_fail_cnt() + ksft_get_error_cnt() > 0)
ksft_exit_fail();
else
ksft_exit_pass();
}
// SPDX-License-Identifier: GPL-2.0-or-later
/*
* Author: Aleksa Sarai <cyphar@cyphar.com>
* Copyright (C) 2018-2019 SUSE LLC.
*/
#define _GNU_SOURCE
#include <errno.h>
#include <fcntl.h>
#include <sched.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/mount.h>
#include <sys/mman.h>
#include <sys/prctl.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <string.h>
#include <syscall.h>
#include <limits.h>
#include <unistd.h>
#include "../kselftest.h"
#include "helpers.h"
/* Construct a test directory with the following structure:
*
* root/
* |-- a/
* | `-- c/
* `-- b/
*/
int setup_testdir(void)
{
int dfd;
char dirname[] = "/tmp/ksft-openat2-rename-attack.XXXXXX";
/* Make the top-level directory. */
if (!mkdtemp(dirname))
ksft_exit_fail_msg("setup_testdir: failed to create tmpdir\n");
dfd = open(dirname, O_PATH | O_DIRECTORY);
if (dfd < 0)
ksft_exit_fail_msg("setup_testdir: failed to open tmpdir\n");
E_mkdirat(dfd, "a", 0755);
E_mkdirat(dfd, "b", 0755);
E_mkdirat(dfd, "a/c", 0755);
return dfd;
}
/* Swap @dirfd/@a and @dirfd/@b constantly. Parent must kill this process. */
pid_t spawn_attack(int dirfd, char *a, char *b)
{
pid_t child = fork();
if (child != 0)
return child;
/* If the parent (the test process) dies, kill ourselves too. */
E_prctl(PR_SET_PDEATHSIG, SIGKILL);
/* Swap @a and @b. */
for (;;)
renameat2(dirfd, a, dirfd, b, RENAME_EXCHANGE);
exit(1);
}
#define NUM_RENAME_TESTS 2
#define ROUNDS 400000
const char *flagname(int resolve)
{
switch (resolve) {
case RESOLVE_IN_ROOT:
return "RESOLVE_IN_ROOT";
case RESOLVE_BENEATH:
return "RESOLVE_BENEATH";
}
return "(unknown)";
}
void test_rename_attack(int resolve)
{
int dfd, afd;
pid_t child;
void (*resultfn)(const char *msg, ...) = ksft_test_result_pass;
int escapes = 0, other_errs = 0, exdevs = 0, eagains = 0, successes = 0;
struct open_how how = {
.flags = O_PATH,
.resolve = resolve,
};
if (!openat2_supported) {
how.resolve = 0;
ksft_print_msg("openat2(2) unsupported -- using openat(2) instead\n");
}
dfd = setup_testdir();
afd = openat(dfd, "a", O_PATH);
if (afd < 0)
ksft_exit_fail_msg("test_rename_attack: failed to open 'a'\n");
child = spawn_attack(dfd, "a/c", "b");
for (int i = 0; i < ROUNDS; i++) {
int fd;
char *victim_path = "c/../../c/../../c/../../c/../../c/../../c/../../c/../../c/../../c/../../c/../../c/../../c/../../c/../../c/../../c/../../c/../../c/../../c/../../c/../..";
if (openat2_supported)
fd = sys_openat2(afd, victim_path, &how);
else
fd = sys_openat(afd, victim_path, &how);
if (fd < 0) {
if (fd == -EAGAIN)
eagains++;
else if (fd == -EXDEV)
exdevs++;
else if (fd == -ENOENT)
escapes++; /* escaped outside and got ENOENT... */
else
other_errs++; /* unexpected error */
} else {
if (fdequal(fd, afd, NULL))
successes++;
else
escapes++; /* we got an unexpected fd */
}
close(fd);
}
if (escapes > 0)
resultfn = ksft_test_result_fail;
ksft_print_msg("non-escapes: EAGAIN=%d EXDEV=%d E<other>=%d success=%d\n",
eagains, exdevs, other_errs, successes);
resultfn("rename attack with %s (%d runs, got %d escapes)\n",
flagname(resolve), ROUNDS, escapes);
/* Should be killed anyway, but might as well make sure. */
E_kill(child, SIGKILL);
}
#define NUM_TESTS NUM_RENAME_TESTS
int main(int argc, char **argv)
{
ksft_print_header();
ksft_set_plan(NUM_TESTS);
test_rename_attack(RESOLVE_BENEATH);
test_rename_attack(RESOLVE_IN_ROOT);
if (ksft_get_fail_cnt() + ksft_get_error_cnt() > 0)
ksft_exit_fail();
else
ksft_exit_pass();
}
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment