1. 22 Apr, 2016 6 commits
    • Seth Forshee's avatar
      fs: Remove check of s_user_ns for existing mounts in fs_fully_visible() · 5c9f53a7
      Seth Forshee authored
      fs_fully_visible() ignores MNT_LOCK_NODEV when FS_USERS_DEV_MOUNT
      is not set for the filesystem, but there is a bug in the logic
      that may cause mounting to fail. It is doing this only when the
      existing mount is not in init_user_ns but should check the new
      mount instead. But the new mount is always in a non-init
      namespace when fs_fully_visible() is called, so that condition
      can simply be removed.
      Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      5c9f53a7
    • Pavel Tikhomirov's avatar
      fs: fix a posible leak of allocated superblock · 0e2c763c
      Pavel Tikhomirov authored
      We probably need to fix superblock leak in patch (v4 "fs: Add user
      namesapace member to struct super_block"):
      
      Imagine posible code path in sget_userns: we iterate through
      type->fs_supers and do not find suitable sb, we drop sb_lock to
      allocate s and go to retry. After we dropped sb_lock some other
      task from different userns takes sb_lock, it is already in retry
      stage and has s allocated, so it puts its s in type->fs_supers
      list. So in retry we will find these sb in list and check it has
      a different userns, and finally we will return without freeing s.
      Signed-off-by: default avatarPavel Tikhomirov <ptikhomirov@virtuozzo.com>
      Acked-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      0e2c763c
    • Seth Forshee's avatar
      Smack: Add support for unprivileged mounts from user namespaces · e823b4e3
      Seth Forshee authored
      Security labels from unprivileged mounts cannot be trusted.
      Ideally for these mounts we would assign the objects in the
      filesystem the same label as the inode for the backing device
      passed to mount. Unfortunately it's currently impossible to
      determine which inode this is from the LSM mount hooks, so we
      settle for the label of the process doing the mount.
      
      This label is assigned to s_root, and also to smk_default to
      ensure that new inodes receive this label. The transmute property
      is also set on s_root to make this behavior more explicit, even
      though it is technically not necessary.
      
      If a filesystem has existing security labels, access to inodes is
      permitted if the label is the same as smk_root, otherwise access
      is denied. The SMACK64EXEC xattr is completely ignored.
      
      Explicit setting of security labels continues to require
      CAP_MAC_ADMIN in init_user_ns.
      
      Altogether, this ensures that filesystem objects are not
      accessible to subjects which cannot already access the backing
      store, that MAC is not violated for any objects in the fileystem
      which are already labeled, and that a user cannot use an
      unprivileged mount to gain elevated MAC privileges.
      
      sysfs, tmpfs, and ramfs are already mountable from user
      namespaces and support security labels. We can't rule out the
      possibility that these filesystems may already be used in mounts
      from user namespaces with security lables set from the init
      namespace, so failing to trust lables in these filesystems may
      introduce regressions. It is safe to trust labels from these
      filesystems, since the unprivileged user does not control the
      backing store and thus cannot supply security labels, so an
      explicit exception is made to trust labels from these
      filesystems.
      Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Acked-by: default avatarCasey Schaufler <casey@schaufler-ca.com>
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      e823b4e3
    • Seth Forshee's avatar
      fs: Limit file caps to the user namespace of the super block · c34ae98b
      Seth Forshee authored
      Capability sets attached to files must be ignored except in the
      user namespaces where the mounter is privileged, i.e. s_user_ns
      and its descendants. Otherwise a vector exists for gaining
      privileges in namespaces where a user is not already privileged.
      
      Add a new helper function, in_user_ns(), to test whether a user
      namespace is the same as or a descendant of another namespace.
      Use this helper to determine whether a file's capability set
      should be applied to the caps constructed during exec.
      Acked-by: default avatarSerge Hallyn <serge.hallyn@canonical.com>
      Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      c34ae98b
    • Eric W. Biederman's avatar
      userns: Simpilify MNT_NODEV handling. · 57a3711a
      Eric W. Biederman authored
      - Consolidate the testing if a device node may be opened in a new
        function may_open_dev.
      
      - Move the check for allowing access to device nodes on filesystems
        not mounted in the initial user namespace from mount time to open
        time and include it in may_open_dev.
      
      This set of changes removes the implicit adding of MNT_NODEV which
      simplifies the logic in fs/namespace.c and removes a potentially
      problematic difference in how normal and unprivileged mount
      namespaces work.  This is a user visible change in behavior for
      remount in unpriviliged mount namespaces but is unlikely to cause
      problems for existing software.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      57a3711a
    • Seth Forshee's avatar
      fs: Add user namesapace member to struct super_block · dafbc5d7
      Seth Forshee authored
      Initially this will be used to eliminate the implicit MNT_NODEV
      flag for mounts from user namespaces. In the future it will also
      be used for translating ids and checking capabilities for
      filesystems mounted from user namespaces.
      
      s_user_ns is initialized in alloc_super() and is generally set to
      current_user_ns(). To avoid security and corruption issues, two
      additional mount checks are also added:
      
       - do_new_mount() gains a check that the user has CAP_SYS_ADMIN
         in current_user_ns().
      
       - sget() will fail with EBUSY when the filesystem it's looking
         for is already mounted from another user namespace.
      
      proc requires some special handling. The user namespace of
      current isn't appropriate when forking as a result of clone (2)
      with CLONE_NEWPID|CLONE_NEWUSER, as it will set s_user_ns to the
      namespace of the parent and make proc unmountable in the new user
      namespace. Instead, the user namespace which owns the new pid
      namespace is used. sget_userns() is allowed to allow passing in
      a namespace other than that of current, and sget becomes a
      wrapper around sget_userns() which passes current_user_ns().
      
      Changes to original version of this patch
        * Documented @user_ns in sget_userns, alloc_super and fs.h
        * Kept an blank line in fs.h
        * Removed unncessary include of user_namespace.h from fs.h
        * Tweaked the location of get_user_ns and put_user_ns so
          the security modules can (if they wish) depend on it.
        -- EWB
      Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      dafbc5d7
  2. 18 Apr, 2016 1 commit
  3. 17 Apr, 2016 5 commits
  4. 16 Apr, 2016 7 commits
  5. 15 Apr, 2016 17 commits
  6. 14 Apr, 2016 4 commits
    • Mike Snitzer's avatar
      dm cache metadata: fix READ_LOCK macros and cleanup WRITE_LOCK macros · 9567366f
      Mike Snitzer authored
      The READ_LOCK macro was incorrectly returning -EINVAL if
      dm_bm_is_read_only() was true -- it will always be true once the cache
      metadata transitions to read-only by dm_cache_metadata_set_read_only().
      
      Wrap READ_LOCK and WRITE_LOCK multi-statement macros in do {} while(0).
      Also, all accesses of the 'cmd' argument passed to these related macros
      are now encapsulated in parenthesis.
      
      A follow-up patch can be developed to eliminate the use of macros in
      favor of pure C code.  Avoiding that now given that this needs to apply
      to stable@.
      Reported-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Fixes: d14fcf3d ("dm cache: make sure every metadata function checks fail_io")
      Cc: stable@vger.kernel.org
      9567366f
    • Keith Busch's avatar
      NVMe: Always use MSI/MSI-x interrupts · a5229050
      Keith Busch authored
      Multiple users have reported device initialization failure due the driver
      not receiving legacy PCI interrupts. This is not unique to any particular
      controller, but has been observed on multiple platforms.
      
      There have been no issues reported or observed when with message signaled
      interrupts, so this patch attempts to use MSI-x during initialization,
      falling back to MSI. If that fails, legacy would become the default.
      
      The setup_io_queues error handling had to change as a result: the admin
      queue's msix_entry used to be initialized to the legacy IRQ. The case
      where nr_io_queues is 0 would fail request_irq when setting up the admin
      queue's interrupt since re-enabling MSI-x fails with 0 vectors, leaving
      the admin queue's msix_entry invalid. Instead, return success immediately.
      Reported-by: default avatarTim Muhlemmer <muhlemmer@gmail.com>
      Reported-by: default avatarJon Derrick <jonathan.derrick@intel.com>
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      a5229050
    • Linus Torvalds's avatar
      /proc/iomem: only expose physical resource addresses to privileged users · 51d7b120
      Linus Torvalds authored
      In commit c4004b02 ("x86: remove the kernel code/data/bss resources
      from /proc/iomem") I was hoping to remove the phyiscal kernel address
      data from /proc/iomem entirely, but that had to be reverted because some
      system programs actually use it.
      
      This limits all the detailed resource information to properly
      credentialed users instead.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      51d7b120
    • Linus Torvalds's avatar
      pci-sysfs: use proper file capability helper function · ab0fa82b
      Linus Torvalds authored
      The PCI config access checked the file capabilities correctly, but used
      the itnernal security capability check rather than the helper function
      that is actually meant for that.
      
      The security_capable() has unusual return values and is not meant to be
      used elsewhere (the only other use is in the capability checking
      functions that we actually intend people to use, and this odd PCI usage
      really stood out when looking around the capability code.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ab0fa82b