• Linus Torvalds's avatar
    Merge tag 'fs.ovl.setgid.v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping · cf619f89
    Linus Torvalds authored
    Pull setgid inheritance updates from Christian Brauner:
     "This contains the work to make setgid inheritance consistent between
      modifying a file and when changing ownership or mode as this has been
      a repeated source of very subtle bugs. The gist is that we perform the
      same permission checks in the write path as we do in the ownership and
      mode changing paths after this series where we're currently doing
      different things.
    
      We've already made setgid inheritance a lot more consistent and
      reliable in the last releases by moving setgid stripping from the
      individual filesystems up into the vfs. This aims to make the logic
      even more consistent and easier to understand and also to fix
      long-standing overlayfs setgid inheritance bugs. Miklos was nice
      enough to just let me carry the trivial overlayfs patches from Amir
      too.
    
      Below is a more detailed explanation how the current difference in
      setgid handling lead to very subtle bugs exemplified via overlayfs
      which is a victim of the current rules. I hope this explains why I
      think taking the regression risk here is worth it.
    
      A long while ago I found a few setgid inheritance bugs in overlayfs in
      the write path in certain conditions. Amir recently picked this back
      up in [1] and I jumped on board to fix this more generally.
    
      On the surface all that overlayfs would need to fix setgid inheritance
      would be to call file_remove_privs() or file_modified() but actually
      that isn't enough because the setgid inheritance api is wildly
      inconsistent in that area.
    
      Before this pr setgid stripping in file_remove_privs()'s old
      should_remove_suid() helper was inconsistent with other parts of the
      vfs. Specifically, it only raises ATTR_KILL_SGID if the inode is
      S_ISGID and S_IXGRP but not if the inode isn't in the caller's groups
      and the caller isn't privileged over the inode although we require
      this already in setattr_prepare() and setattr_copy() and so all
      filesystem implement this requirement implicitly because they have to
      use setattr_{prepare,copy}() anyway.
    
      But the inconsistency shows up in setgid stripping bugs for overlayfs
      in xfstests (e.g., generic/673, generic/683, generic/685, generic/686,
      generic/687). For example, we test whether suid and setgid stripping
      works correctly when performing various write-like operations as an
      unprivileged user (fallocate, reflink, write, etc.):
    
          echo "Test 1 - qa_user, non-exec file $verb"
          setup_testfile
          chmod a+rws $junk_file
          commit_and_check "$qa_user" "$verb" 64k 64k
    
      The test basically creates a file with 6666 permissions. While the
      file has the S_ISUID and S_ISGID bits set it does not have the S_IXGRP
      set.
    
      On a regular filesystem like xfs what will happen is:
    
          sys_fallocate()
          -> vfs_fallocate()
             -> xfs_file_fallocate()
                -> file_modified()
                   -> __file_remove_privs()
                      -> dentry_needs_remove_privs()
                         -> should_remove_suid()
                      -> __remove_privs()
                         newattrs.ia_valid = ATTR_FORCE | kill;
                         -> notify_change()
                            -> setattr_copy()
    
      In should_remove_suid() we can see that ATTR_KILL_SUID is raised
      unconditionally because the file in the test has S_ISUID set.
    
      But we also see that ATTR_KILL_SGID won't be set because while the
      file is S_ISGID it is not S_IXGRP (see above) which is a condition for
      ATTR_KILL_SGID being raised.
    
      So by the time we call notify_change() we have attr->ia_valid set to
      ATTR_KILL_SUID | ATTR_FORCE.
    
      Now notify_change() sees that ATTR_KILL_SUID is set and does:
    
          ia_valid      = attr->ia_valid |= ATTR_MODE
          attr->ia_mode = (inode->i_mode & ~S_ISUID);
    
      which means that when we call setattr_copy() later we will definitely
      update inode->i_mode. Note that attr->ia_mode still contains S_ISGID.
    
      Now we call into the filesystem's ->setattr() inode operation which
      will end up calling setattr_copy(). Since ATTR_MODE is set we will
      hit:
    
          if (ia_valid & ATTR_MODE) {
                  umode_t mode = attr->ia_mode;
                  vfsgid_t vfsgid = i_gid_into_vfsgid(mnt_userns, inode);
                  if (!vfsgid_in_group_p(vfsgid) &&
                      !capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID))
                          mode &= ~S_ISGID;
                  inode->i_mode = mode;
          }
    
      and since the caller in the test is neither capable nor in the group
      of the inode the S_ISGID bit is stripped.
    
      But assume the file isn't suid then ATTR_KILL_SUID won't be raised
      which has the consequence that neither the setgid nor the suid bits
      are stripped even though it should be stripped because the inode isn't
      in the caller's groups and the caller isn't privileged over the inode.
    
      If overlayfs is in the mix things become a bit more complicated and
      the bug shows up more clearly.
    
      When e.g., ovl_setattr() is hit from ovl_fallocate()'s call to
      file_remove_privs() then ATTR_KILL_SUID and ATTR_KILL_SGID might be
      raised but because the check in notify_change() is questioning the
      ATTR_KILL_SGID flag again by requiring S_IXGRP for it to be stripped
      the S_ISGID bit isn't removed even though it should be stripped:
    
          sys_fallocate()
          -> vfs_fallocate()
             -> ovl_fallocate()
                -> file_remove_privs()
                   -> dentry_needs_remove_privs()
                      -> should_remove_suid()
                   -> __remove_privs()
                      newattrs.ia_valid = ATTR_FORCE | kill;
                      -> notify_change()
                         -> ovl_setattr()
                            /* TAKE ON MOUNTER'S CREDS */
                            -> ovl_do_notify_change()
                               -> notify_change()
                            /* GIVE UP MOUNTER'S CREDS */
               /* TAKE ON MOUNTER'S CREDS */
               -> vfs_fallocate()
                  -> xfs_file_fallocate()
                     -> file_modified()
                        -> __file_remove_privs()
                           -> dentry_needs_remove_privs()
                              -> should_remove_suid()
                           -> __remove_privs()
                              newattrs.ia_valid = attr_force | kill;
                              -> notify_change()
    
      The fix for all of this is to make file_remove_privs()'s
      should_remove_suid() helper perform the same checks as we already
      require in setattr_prepare() and setattr_copy() and have
      notify_change() not pointlessly requiring S_IXGRP again. It doesn't
      make any sense in the first place because the caller must calculate
      the flags via should_remove_suid() anyway which would raise
      ATTR_KILL_SGID
    
      Note that some xfstests will now fail as these patches will cause the
      setgid bit to be lost in certain conditions for unprivileged users
      modifying a setgid file when they would've been kept otherwise. I
      think this risk is worth taking and I explained and mentioned this
      multiple times on the list [2].
    
      Enforcing the rules consistently across write operations and
      chmod/chown will lead to losing the setgid bit in cases were it
      might've been retained before.
    
      While I've mentioned this a few times but it's worth repeating just to
      make sure that this is understood. For the sake of maintainability,
      consistency, and security this is a risk worth taking.
    
      If we really see regressions for workloads the fix is to have special
      setgid handling in the write path again with different semantics from
      chmod/chown and possibly additional duct tape for overlayfs. I'll
      update the relevant xfstests with if you should decide to merge this
      second setgid cleanup.
    
      Before that people should be aware that there might be failures for
      fstests where unprivileged users modify a setgid file"
    
    Link: https://lore.kernel.org/linux-fsdevel/20221003123040.900827-1-amir73il@gmail.com [1]
    Link: https://lore.kernel.org/linux-fsdevel/20221122142010.zchf2jz2oymx55qi@wittgenstein [2]
    
    * tag 'fs.ovl.setgid.v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping:
      fs: use consistent setgid checks in is_sxid()
      ovl: remove privs in ovl_fallocate()
      ovl: remove privs in ovl_copyfile()
      attr: use consistent sgid stripping checks
      attr: add setattr_should_drop_sgid()
      fs: move should_remove_suid()
      attr: add in_group_or_capable()
    cf619f89
file.c 79 KB