1. 26 Apr, 2016 19 commits
    • Seth Forshee's avatar
      b19d572e
    • Seth Forshee's avatar
      fuse: Restrict allow_other to the superblock's namespace or a descendant · ca63777e
      Seth Forshee authored
      Unprivileged users are normally restricted from mounting with the
      allow_other option by system policy, but this could be bypassed
      for a mount done with user namespace root permissions. In such
      cases allow_other should not allow users outside the userns
      to access the mount as doing so would give the unprivileged user
      the ability to manipulate processes it would otherwise be unable
      to manipulate. Restrict allow_other to apply to users in the same
      userns used at mount or a descendant of that namespace. Also
      export current_in_userns() for use by fuse when built as a
      module.
      Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Acked-by: default avatarSerge Hallyn <serge.hallyn@canonical.com>
      Acked-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      ca63777e
    • Seth Forshee's avatar
      fuse: Support fuse filesystems outside of init_user_ns · 894a4abf
      Seth Forshee authored
      In order to support mounts from namespaces other than
      init_user_ns, fuse must translate uids and gids to/from the
      userns of the process servicing requests on /dev/fuse. This
      patch does that, with a couple of restrictions on the namespace:
      
       - The userns for the fuse connection is fixed to the namespace
         from which /dev/fuse is opened.
      
       - The namespace must be the same as s_user_ns.
      
      These restrictions simplify the implementation by avoiding the
      need to pass around userns references and by allowing fuse to
      rely on the checks in inode_change_ok for ownership changes.
      Either restriction could be relaxed in the future if needed.
      
      For cuse the namespace used for the connection is also simply
      current_user_ns() at the time /dev/cuse is opened.
      Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      894a4abf
    • Seth Forshee's avatar
      fuse: Add support for pid namespaces · 1ef72e7e
      Seth Forshee authored
      When the userspace process servicing fuse requests is running in
      a pid namespace then pids passed via the fuse fd are not being
      translated into that process' namespace. Translation is necessary
      for the pid to be useful to that process.
      
      Since no use case currently exists for changing namespaces all
      translations can be done relative to the pid namespace in use
      when fuse_conn_init() is called. For fuse this translates to
      mount time, and for cuse this is when /dev/cuse is opened. IO for
      this connection from another namespace will return errors.
      
      Requests from processes whose pid cannot be translated into the
      target namespace are not permitted, except for requests
      allocated via fuse_get_req_nofail_nopages. For no-fail requests
      in.h.pid will be 0 if the pid translation fails.
      
      File locking changes based on previous work done by Eric
      Biederman.
      Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Acked-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      1ef72e7e
    • Seth Forshee's avatar
      capabilities: Allow privileged user in s_user_ns to set security.* xattrs · 688c89d9
      Seth Forshee authored
      A privileged user in s_user_ns will generally have the ability to
      manipulate the backing store and insert security.* xattrs into
      the filesystem directly. Therefore the kernel must be prepared to
      handle these xattrs from unprivileged mounts, and it makes little
      sense for commoncap to prevent writing these xattrs to the
      filesystem. The capability and LSM code have already been updated
      to appropriately handle xattrs from unprivileged mounts, so it
      is safe to loosen this restriction on setting xattrs.
      
      The exception to this logic is that writing xattrs to a mounted
      filesystem may also cause the LSM inode_post_setxattr or
      inode_setsecurity callbacks to be invoked. SELinux will deny the
      xattr update by virtue of applying mountpoint labeling to
      unprivileged userns mounts, and Smack will deny the writes for
      any user without global CAP_MAC_ADMIN, so loosening the
      capability check in commoncap is safe in this respect as well.
      Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Acked-by: default avatarSerge Hallyn <serge.hallyn@canonical.com>
      688c89d9
    • Seth Forshee's avatar
      fs: Allow superblock owner to access do_remount_sb() · 6a42ef98
      Seth Forshee authored
      Superblock level remounts are currently restricted to global
      CAP_SYS_ADMIN, as is the path for changing the root mount to
      read only on umount. Loosen both of these permission checks to
      also allow CAP_SYS_ADMIN in any namespace which is privileged
      towards the userns which originally mounted the filesystem.
      Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Acked-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: default avatarSerge Hallyn <serge.hallyn@canonical.com>
      6a42ef98
    • Seth Forshee's avatar
      fs: Don't remove suid for CAP_FSETID in s_user_ns · 490fb1b5
      Seth Forshee authored
      Expand the check in should_remove_suid() to keep privileges for
      CAP_FSETID in s_user_ns rather than init_user_ns.
      Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Acked-by: default avatarSerge Hallyn <serge.hallyn@canonical.com>
      490fb1b5
    • Seth Forshee's avatar
      fs: Allow superblock owner to change ownership of inodes with unmappable ids · 80a9d85b
      Seth Forshee authored
      In a userns mount some on-disk inodes may have ids which do not
      map into s_user_ns, in which case the in-kernel inodes are owned
      by invalid users. The superblock owner should be able to change
      attributes of these inodes but cannot. However it is unsafe to
      grant the superblock owner privileged access to all inodes in the
      superblock since proc, sysfs, etc. use DAC to protect files which
      may not belong to s_user_ns. The problem is restricted to only
      inodes where the owner or group is an invalid user.
      
      We can work around this by allowing users with CAP_CHOWN in
      s_user_ns to change an invalid owner or group id, so long as the
      other id is either invalid or mappable in s_user_ns. After
      changing ownership the user will be privileged towards the inode
      and thus able to change other attributes.
      
      As an precaution, checks for invalid ids are added to the proc
      and kernfs setattr interfaces. These filesystems are not expected
      to have inodes with invalid ids, but if it does happen any
      setattr operations will return -EPERM.
      Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Acked-by: default avatarSerge Hallyn <serge.hallyn@canonical.com>
      80a9d85b
    • Seth Forshee's avatar
      fs: Update posix_acl support to handle user namespace mounts · a6dfa2eb
      Seth Forshee authored
      ids in on-disk ACLs should be converted to s_user_ns instead of
      init_user_ns as is done now. This introduces the possibility for
      id mappings to fail, and when this happens syscalls will return
      EOVERFLOW.
      Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Acked-by: default avatarSerge Hallyn <serge.hallyn@canonical.com>
      a6dfa2eb
    • Seth Forshee's avatar
      fs: Refuse uid/gid changes which don't map into s_user_ns · 09b503ec
      Seth Forshee authored
      Add checks to inode_change_ok to verify that uid and gid changes
      will map into the superblock's user namespace. If they do not
      fail with -EOVERFLOW. This cannot be overriden with ATTR_FORCE.
      Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Acked-by: default avatarSerge Hallyn <serge.hallyn@canonical.com>
      09b503ec
    • Seth Forshee's avatar
      cred: Reject inodes with invalid ids in set_create_file_as() · 71e6906f
      Seth Forshee authored
      Using INVALID_[UG]ID for the LSM file creation context doesn't
      make sense, so return an error if the inode passed to
      set_create_file_as() has an invalid id.
      Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Acked-by: default avatarSerge Hallyn <serge.hallyn@canonical.com>
      71e6906f
    • Seth Forshee's avatar
      fs: Check for invalid i_uid in may_follow_link() · 0e418868
      Seth Forshee authored
      Filesystem uids which don't map into a user namespace may result
      in inode->i_uid being INVALID_UID. A symlink and its parent
      could have different owners in the filesystem can both get
      mapped to INVALID_UID, which may result in following a symlink
      when this would not have otherwise been permitted when protected
      symlinks are enabled.
      
      Add a new helper function, uid_valid_eq(), and use this to
      validate that the ids in may_follow_link() are both equal and
      valid. Also add an equivalent helper for gids, which is
      currently unused.
      Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Acked-by: default avatarSerge Hallyn <serge.hallyn@canonical.com>
      0e418868
    • Seth Forshee's avatar
      Smack: Handle labels consistently in untrusted mounts · 7724e1d6
      Seth Forshee authored
      The SMACK64, SMACK64EXEC, and SMACK64MMAP labels are all handled
      differently in untrusted mounts. This is confusing and
      potentically problematic. Change this to handle them all the same
      way that SMACK64 is currently handled; that is, read the label
      from disk and check it at use time. For SMACK64 and SMACK64MMAP
      access is denied if the label does not match smk_root. To be
      consistent with suid, a SMACK64EXEC label which does not match
      smk_root will still allow execution of the file but will not run
      with the label supplied in the xattr.
      Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Acked-by: default avatarCasey Schaufler <casey@schaufler-ca.com>
      7724e1d6
    • Seth Forshee's avatar
      userns: Replace in_userns with current_in_userns · 031163c2
      Seth Forshee authored
      All current callers of in_userns pass current_user_ns as the
      first argument. Simplify by replacing in_userns with
      current_in_userns which checks whether current_user_ns is in the
      namespace supplied as an argument.
      Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Acked-by: default avatarJames Morris <james.l.morris@oracle.com>
      Acked-by: default avatarSerge Hallyn <serge.hallyn@canonical.com>
      031163c2
    • Seth Forshee's avatar
      selinux: Add support for unprivileged mounts from user namespaces · 55e62c38
      Seth Forshee authored
      Security labels from unprivileged mounts in user namespaces must
      be ignored. Force superblocks from user namespaces whose labeling
      behavior is to use xattrs to use mountpoint labeling instead.
      For the mountpoint label, default to converting the current task
      context into a form suitable for file objects, but also allow the
      policy writer to specify a different label through policy
      transition rules.
      
      Pieced together from code snippets provided by Stephen Smalley.
      Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Acked-by: default avatarStephen Smalley <sds@tycho.nsa.gov>
      Acked-by: default avatarJames Morris <james.l.morris@oracle.com>
      55e62c38
    • Andy Lutomirski's avatar
      fs: Treat foreign mounts as nosuid · 0f36f56d
      Andy Lutomirski authored
      If a process gets access to a mount from a different user
      namespace, that process should not be able to take advantage of
      setuid files or selinux entrypoints from that filesystem.  Prevent
      this by treating mounts from other mount namespaces and those not
      owned by current_user_ns() or an ancestor as nosuid.
      
      This will make it safer to allow more complex filesystems to be
      mounted in non-root user namespaces.
      
      This does not remove the need for MNT_LOCK_NOSUID.  The setuid,
      setgid, and file capability bits can no longer be abused if code in
      a user namespace were to clear nosuid on an untrusted filesystem,
      but this patch, by itself, is insufficient to protect the system
      from abuse of files that, when execed, would increase MAC privilege.
      
      As a more concrete explanation, any task that can manipulate a
      vfsmount associated with a given user namespace already has
      capabilities in that namespace and all of its descendents.  If they
      can cause a malicious setuid, setgid, or file-caps executable to
      appear in that mount, then that executable will only allow them to
      elevate privileges in exactly the set of namespaces in which they
      are already privileges.
      
      On the other hand, if they can cause a malicious executable to
      appear with a dangerous MAC label, running it could change the
      caller's security context in a way that should not have been
      possible, even inside the namespace in which the task is confined.
      
      As a hardening measure, this would have made CVE-2014-5207 much
      more difficult to exploit.
      Signed-off-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Acked-by: default avatarJames Morris <james.l.morris@oracle.com>
      Acked-by: default avatarSerge Hallyn <serge.hallyn@canonical.com>
      0f36f56d
    • Seth Forshee's avatar
      block_dev: Check permissions towards block device inode when mounting · 87d1699a
      Seth Forshee authored
      Unprivileged users should not be able to mount block devices when
      they lack sufficient privileges towards the block device inode.
      Update blkdev_get_by_path() to validate that the user has the
      required access to the inode at the specified path. The check
      will be skipped for CAP_SYS_ADMIN, so privileged mounts will
      continue working as before.
      Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Acked-by: default avatarSerge Hallyn <serge.hallyn@canonical.com>
      87d1699a
    • Seth Forshee's avatar
      block_dev: Support checking inode permissions in lookup_bdev() · 470ebd6c
      Seth Forshee authored
      When looking up a block device by path no permission check is
      done to verify that the user has access to the block device inode
      at the specified path. In some cases it may be necessary to
      check permissions towards the inode, such as allowing
      unprivileged users to mount block devices in user namespaces.
      
      Add an argument to lookup_bdev() to optionally perform this
      permission check. A value of 0 skips the permission check and
      behaves the same as before. A non-zero value specifies the mask
      of access rights required towards the inode at the specified
      path. The check is always skipped if the user has CAP_SYS_ADMIN.
      
      All callers of lookup_bdev() currently pass a mask of 0, so this
      patch results in no functional change. Subsequent patches will
      add permission checks where appropriate.
      Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Acked-by: default avatarSerge Hallyn <serge.hallyn@canonical.com>
      470ebd6c
    • Seth Forshee's avatar
      fs: Allow sysfs and cgroupfs to share super blocks between user namespaces · 810bf840
      Seth Forshee authored
      Both of these filesystems already have use cases for mounting the
      same super block from multiple user namespaces. For sysfs this
      happens when using criu for snapshotting a container, where sysfs
      is mounted in the containers network ns but the hosts user ns.
      The cgroup filesystem shares the same super block for all mounts
      of the same hierarchy regardless of the namespace.
      
      As a result, the restriction on mounting a super block from a
      single user namespace creates regressions for existing uses of
      these filesystems. For these specific filesystems this
      restriction isn't really necessary since the backing store is
      objects in kernel memory and thus the ids assigned from inodes
      is not subject to translation relative to s_user_ns.
      
      Add a new filesystem flag, FS_USERNS_SHARE_SB, which when set
      causes sget_userns() to skip the check of s_user_ns. Set this
      flag for the sysfs and cgroup filesystems to fix the
      regressions.
      Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Acked-by: default avatarSerge Hallyn <serge.hallyn@ubuntu.com>
      810bf840
  2. 25 Apr, 2016 6 commits
    • Seth Forshee's avatar
      fs: Remove check of s_user_ns for existing mounts in fs_fully_visible() · f7be69dd
      Seth Forshee authored
      fs_fully_visible() ignores MNT_LOCK_NODEV when FS_USERS_DEV_MOUNT
      is not set for the filesystem, but there is a bug in the logic
      that may cause mounting to fail. It is doing this only when the
      existing mount is not in init_user_ns but should check the new
      mount instead. But the new mount is always in a non-init
      namespace when fs_fully_visible() is called, so that condition
      can simply be removed.
      Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      f7be69dd
    • Pavel Tikhomirov's avatar
      fs: fix a posible leak of allocated superblock · 1b86ddd3
      Pavel Tikhomirov authored
      We probably need to fix superblock leak in patch (v4 "fs: Add user
      namesapace member to struct super_block"):
      
      Imagine posible code path in sget_userns: we iterate through
      type->fs_supers and do not find suitable sb, we drop sb_lock to
      allocate s and go to retry. After we dropped sb_lock some other
      task from different userns takes sb_lock, it is already in retry
      stage and has s allocated, so it puts its s in type->fs_supers
      list. So in retry we will find these sb in list and check it has
      a different userns, and finally we will return without freeing s.
      Signed-off-by: default avatarPavel Tikhomirov <ptikhomirov@virtuozzo.com>
      Acked-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      1b86ddd3
    • Seth Forshee's avatar
      Smack: Add support for unprivileged mounts from user namespaces · 9c71bd0f
      Seth Forshee authored
      Security labels from unprivileged mounts cannot be trusted.
      Ideally for these mounts we would assign the objects in the
      filesystem the same label as the inode for the backing device
      passed to mount. Unfortunately it's currently impossible to
      determine which inode this is from the LSM mount hooks, so we
      settle for the label of the process doing the mount.
      
      This label is assigned to s_root, and also to smk_default to
      ensure that new inodes receive this label. The transmute property
      is also set on s_root to make this behavior more explicit, even
      though it is technically not necessary.
      
      If a filesystem has existing security labels, access to inodes is
      permitted if the label is the same as smk_root, otherwise access
      is denied. The SMACK64EXEC xattr is completely ignored.
      
      Explicit setting of security labels continues to require
      CAP_MAC_ADMIN in init_user_ns.
      
      Altogether, this ensures that filesystem objects are not
      accessible to subjects which cannot already access the backing
      store, that MAC is not violated for any objects in the fileystem
      which are already labeled, and that a user cannot use an
      unprivileged mount to gain elevated MAC privileges.
      
      sysfs, tmpfs, and ramfs are already mountable from user
      namespaces and support security labels. We can't rule out the
      possibility that these filesystems may already be used in mounts
      from user namespaces with security lables set from the init
      namespace, so failing to trust lables in these filesystems may
      introduce regressions. It is safe to trust labels from these
      filesystems, since the unprivileged user does not control the
      backing store and thus cannot supply security labels, so an
      explicit exception is made to trust labels from these
      filesystems.
      Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Acked-by: default avatarCasey Schaufler <casey@schaufler-ca.com>
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      9c71bd0f
    • Seth Forshee's avatar
      fs: Limit file caps to the user namespace of the super block · 65922e0a
      Seth Forshee authored
      Capability sets attached to files must be ignored except in the
      user namespaces where the mounter is privileged, i.e. s_user_ns
      and its descendants. Otherwise a vector exists for gaining
      privileges in namespaces where a user is not already privileged.
      
      Add a new helper function, in_user_ns(), to test whether a user
      namespace is the same as or a descendant of another namespace.
      Use this helper to determine whether a file's capability set
      should be applied to the caps constructed during exec.
      Acked-by: default avatarSerge Hallyn <serge.hallyn@canonical.com>
      Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      65922e0a
    • Eric W. Biederman's avatar
      userns: Simpilify MNT_NODEV handling. · 03aedadc
      Eric W. Biederman authored
      - Consolidate the testing if a device node may be opened in a new
        function may_open_dev.
      
      - Move the check for allowing access to device nodes on filesystems
        not mounted in the initial user namespace from mount time to open
        time and include it in may_open_dev.
      
      This set of changes removes the implicit adding of MNT_NODEV which
      simplifies the logic in fs/namespace.c and removes a potentially
      problematic difference in how normal and unprivileged mount
      namespaces work.  This is a user visible change in behavior for
      remount in unpriviliged mount namespaces but is unlikely to cause
      problems for existing software.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      03aedadc
    • Seth Forshee's avatar
      fs: Add user namesapace member to struct super_block · 97c8c9ae
      Seth Forshee authored
      Initially this will be used to eliminate the implicit MNT_NODEV
      flag for mounts from user namespaces. In the future it will also
      be used for translating ids and checking capabilities for
      filesystems mounted from user namespaces.
      
      s_user_ns is initialized in alloc_super() and is generally set to
      current_user_ns(). To avoid security and corruption issues, two
      additional mount checks are also added:
      
       - do_new_mount() gains a check that the user has CAP_SYS_ADMIN
         in current_user_ns().
      
       - sget() will fail with EBUSY when the filesystem it's looking
         for is already mounted from another user namespace.
      
      proc requires some special handling. The user namespace of
      current isn't appropriate when forking as a result of clone (2)
      with CLONE_NEWPID|CLONE_NEWUSER, as it will set s_user_ns to the
      namespace of the parent and make proc unmountable in the new user
      namespace. Instead, the user namespace which owns the new pid
      namespace is used. sget_userns() is allowed to allow passing in
      a namespace other than that of current, and sget becomes a
      wrapper around sget_userns() which passes current_user_ns().
      
      Changes to original version of this patch
        * Documented @user_ns in sget_userns, alloc_super and fs.h
        * Kept an blank line in fs.h
        * Removed unncessary include of user_namespace.h from fs.h
        * Tweaked the location of get_user_ns and put_user_ns so
          the security modules can (if they wish) depend on it.
        -- EWB
      Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      97c8c9ae
  3. 24 Apr, 2016 2 commits
  4. 23 Apr, 2016 10 commits
  5. 22 Apr, 2016 3 commits