• Christian Brauner's avatar
    fs: add mount_setattr() · 2a186721
    Christian Brauner authored
    This implements the missing mount_setattr() syscall. While the new mount
    api allows to change the properties of a superblock there is currently
    no way to change the properties of a mount or a mount tree using file
    descriptors which the new mount api is based on. In addition the old
    mount api has the restriction that mount options cannot be applied
    recursively. This hasn't changed since changing mount options on a
    per-mount basis was implemented in [1] and has been a frequent request
    not just for convenience but also for security reasons. The legacy
    mount syscall is unable to accommodate this behavior without introducing
    a whole new set of flags because MS_REC | MS_REMOUNT | MS_BIND |
    MS_RDONLY | MS_NOEXEC | [...] only apply the mount option to the topmost
    mount. Changing MS_REC to apply to the whole mount tree would mean
    introducing a significant uapi change and would likely cause significant
    regressions.
    
    The new mount_setattr() syscall allows to recursively clear and set
    mount options in one shot. Multiple calls to change mount options
    requesting the same changes are idempotent:
    
    int mount_setattr(int dfd, const char *path, unsigned flags,
                      struct mount_attr *uattr, size_t usize);
    
    Flags to modify path resolution behavior are specified in the @flags
    argument. Currently, AT_EMPTY_PATH, AT_RECURSIVE, AT_SYMLINK_NOFOLLOW,
    and AT_NO_AUTOMOUNT are supported. If useful, additional lookup flags to
    restrict path resolution as introduced with openat2() might be supported
    in the future.
    
    The mount_setattr() syscall can be expected to grow over time and is
    designed with extensibility in mind. It follows the extensible syscall
    pattern we have used with other syscalls such as openat2(), clone3(),
    sched_{set,get}attr(), and others.
    The set of mount options is passed in the uapi struct mount_attr which
    currently has the following layout:
    
    struct mount_attr {
    	__u64 attr_set;
    	__u64 attr_clr;
    	__u64 propagation;
    	__u64 userns_fd;
    };
    
    The @attr_set and @attr_clr members are used to clear and set mount
    options. This way a user can e.g. request that a set of flags is to be
    raised such as turning mounts readonly by raising MOUNT_ATTR_RDONLY in
    @attr_set while at the same time requesting that another set of flags is
    to be lowered such as removing noexec from a mount tree by specifying
    MOUNT_ATTR_NOEXEC in @attr_clr.
    
    Note, since the MOUNT_ATTR_<atime> values are an enum starting from 0,
    not a bitmap, users wanting to transition to a different atime setting
    cannot simply specify the atime setting in @attr_set, but must also
    specify MOUNT_ATTR__ATIME in the @attr_clr field. So we ensure that
    MOUNT_ATTR__ATIME can't be partially set in @attr_clr and that @attr_set
    can't have any atime bits set if MOUNT_ATTR__ATIME isn't set in
    @attr_clr.
    
    The @propagation field lets callers specify the propagation type of a
    mount tree. Propagation is a single property that has four different
    settings and as such is not really a flag argument but an enum.
    Specifically, it would be unclear what setting and clearing propagation
    settings in combination would amount to. The legacy mount() syscall thus
    forbids the combination of multiple propagation settings too. The goal
    is to keep the semantics of mount propagation somewhat simple as they
    are overly complex as it is.
    
    The @userns_fd field lets user specify a user namespace whose idmapping
    becomes the idmapping of the mount. This is implemented and explained in
    detail in the next patch.
    
    [1]: commit 2e4b7fcd ("[PATCH] r/o bind mounts: honor mount writer counts at remount")
    
    Link: https://lore.kernel.org/r/20210121131959.646623-35-christian.brauner@ubuntu.com
    Cc: David Howells <dhowells@redhat.com>
    Cc: Aleksa Sarai <cyphar@cyphar.com>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: linux-fsdevel@vger.kernel.org
    Cc: linux-api@vger.kernel.org
    Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
    Signed-off-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
    2a186721
syscall_n32.tbl 13.5 KB