1. 21 Jul, 2022 1 commit
    • Yang Xu's avatar
      fs: move S_ISGID stripping into the vfs_*() helpers · 1639a49c
      Yang Xu authored
      Move setgid handling out of individual filesystems and into the VFS
      itself to stop the proliferation of setgid inheritance bugs.
      
      Creating files that have both the S_IXGRP and S_ISGID bit raised in
      directories that themselves have the S_ISGID bit set requires additional
      privileges to avoid security issues.
      
      When a filesystem creates a new inode it needs to take care that the
      caller is either in the group of the newly created inode or they have
      CAP_FSETID in their current user namespace and are privileged over the
      parent directory of the new inode. If any of these two conditions is
      true then the S_ISGID bit can be raised for an S_IXGRP file and if not
      it needs to be stripped.
      
      However, there are several key issues with the current implementation:
      
      * S_ISGID stripping logic is entangled with umask stripping.
      
        If a filesystem doesn't support or enable POSIX ACLs then umask
        stripping is done directly in the vfs before calling into the
        filesystem.
        If the filesystem does support POSIX ACLs then unmask stripping may be
        done in the filesystem itself when calling posix_acl_create().
      
        Since umask stripping has an effect on S_ISGID inheritance, e.g., by
        stripping the S_IXGRP bit from the file to be created and all relevant
        filesystems have to call posix_acl_create() before inode_init_owner()
        where we currently take care of S_ISGID handling S_ISGID handling is
        order dependent. IOW, whether or not you get a setgid bit depends on
        POSIX ACLs and umask and in what order they are called.
      
        Note that technically filesystems are free to impose their own
        ordering between posix_acl_create() and inode_init_owner() meaning
        that there's additional ordering issues that influence S_SIGID
        inheritance.
      
      * Filesystems that don't rely on inode_init_owner() don't get S_ISGID
        stripping logic.
      
        While that may be intentional (e.g. network filesystems might just
        defer setgid stripping to a server) it is often just a security issue.
      
      This is not just ugly it's unsustainably messy especially since we do
      still have bugs in this area years after the initial round of setgid
      bugfixes.
      
      So the current state is quite messy and while we won't be able to make
      it completely clean as posix_acl_create() is still a filesystem specific
      call we can improve the S_SIGD stripping situation quite a bit by
      hoisting it out of inode_init_owner() and into the vfs creation
      operations. This means we alleviate the burden for filesystems to handle
      S_ISGID stripping correctly and can standardize the ordering between
      S_ISGID and umask stripping in the vfs.
      
      We add a new helper vfs_prepare_mode() so S_ISGID handling is now done
      in the VFS before umask handling. This has S_ISGID handling is
      unaffected unaffected by whether umask stripping is done by the VFS
      itself (if no POSIX ACLs are supported or enabled) or in the filesystem
      in posix_acl_create() (if POSIX ACLs are supported).
      
      The vfs_prepare_mode() helper is called directly in vfs_*() helpers that
      create new filesystem objects. We need to move them into there to make
      sure that filesystems like overlayfs hat have callchains like:
      
      sys_mknod()
      -> do_mknodat(mode)
         -> .mknod = ovl_mknod(mode)
            -> ovl_create(mode)
               -> vfs_mknod(mode)
      
      get S_ISGID stripping done when calling into lower filesystems via
      vfs_*() creation helpers. Moving vfs_prepare_mode() into e.g.
      vfs_mknod() takes care of that. This is in any case semantically cleaner
      because S_ISGID stripping is VFS security requirement.
      
      Security hooks so far have seen the mode with the umask applied but
      without S_ISGID handling done. The relevant hooks are called outside of
      vfs_*() creation helpers so by calling vfs_prepare_mode() from vfs_*()
      helpers the security hooks would now see the mode without umask
      stripping applied. For now we fix this by passing the mode with umask
      settings applied to not risk any regressions for LSM hooks. IOW, nothing
      changes for LSM hooks. It is worth pointing out that security hooks
      never saw the mode that is seen by the filesystem when actually creating
      the file. They have always been completely misplaced for that to work.
      
      The following filesystems use inode_init_owner() and thus relied on
      S_ISGID stripping: spufs, 9p, bfs, btrfs, ext2, ext4, f2fs, hfsplus,
      hugetlbfs, jfs, minix, nilfs2, ntfs3, ocfs2, omfs, overlayfs, ramfs,
      reiserfs, sysv, ubifs, udf, ufs, xfs, zonefs, bpf, tmpfs.
      
      All of the above filesystems end up calling inode_init_owner() when new
      filesystem objects are created through the ->mkdir(), ->mknod(),
      ->create(), ->tmpfile(), ->rename() inode operations.
      
      Since directories always inherit the S_ISGID bit with the exception of
      xfs when irix_sgid_inherit mode is turned on S_ISGID stripping doesn't
      apply. The ->symlink() and ->link() inode operations trivially inherit
      the mode from the target and the ->rename() inode operation inherits the
      mode from the source inode. All other creation inode operations will get
      S_ISGID handling via vfs_prepare_mode() when called from their relevant
      vfs_*() helpers.
      
      In addition to this there are filesystems which allow the creation of
      filesystem objects through ioctl()s or - in the case of spufs -
      circumventing the vfs in other ways. If filesystem objects are created
      through ioctl()s the vfs doesn't know about it and can't apply regular
      permission checking including S_ISGID logic. Therfore, a filesystem
      relying on S_ISGID stripping in inode_init_owner() in their ioctl()
      callpath will be affected by moving this logic into the vfs. We audited
      those filesystems:
      
      * btrfs allows the creation of filesystem objects through various
        ioctls(). Snapshot creation literally takes a snapshot and so the mode
        is fully preserved and S_ISGID stripping doesn't apply.
      
        Creating a new subvolum relies on inode_init_owner() in
        btrfs_new_subvol_inode() but only creates directories and doesn't
        raise S_ISGID.
      
      * ocfs2 has a peculiar implementation of reflinks. In contrast to e.g.
        xfs and btrfs FICLONE/FICLONERANGE ioctl() that is only concerned with
        the actual extents ocfs2 uses a separate ioctl() that also creates the
        target file.
      
        Iow, ocfs2 circumvents the vfs entirely here and did indeed rely on
        inode_init_owner() to strip the S_ISGID bit. This is the only place
        where a filesystem needs to call mode_strip_sgid() directly but this
        is self-inflicted pain.
      
      * spufs doesn't go through the vfs at all and doesn't use ioctl()s
        either. Instead it has a dedicated system call spufs_create() which
        allows the creation of filesystem objects. But spufs only creates
        directories and doesn't allo S_SIGID bits, i.e. it specifically only
        allows 0777 bits.
      
      * bpf uses vfs_mkobj() but also doesn't allow S_ISGID bits to be created.
      
      The patch will have an effect on ext2 when the EXT2_MOUNT_GRPID mount
      option is used, on ext4 when the EXT4_MOUNT_GRPID mount option is used,
      and on xfs when the XFS_FEAT_GRPID mount option is used. When any of
      these filesystems are mounted with their respective GRPID option then
      newly created files inherit the parent directories group
      unconditionally. In these cases non of the filesystems call
      inode_init_owner() and thus did never strip the S_ISGID bit for newly
      created files. Moving this logic into the VFS means that they now get
      the S_ISGID bit stripped. This is a user visible change. If this leads
      to regressions we will either need to figure out a better way or we need
      to revert. However, given the various setgid bugs that we found just in
      the last two years this is a regression risk we should take.
      
      Associated with this change is a new set of fstests to enforce the
      semantics for all new filesystems.
      
      Link: https://lore.kernel.org/ceph-devel/20220427092201.wvsdjbnc7b4dttaw@wittgenstein [1]
      Link: e014f37d ("xfs: use setattr_copy to set vfs inode attributes") [2]
      Link: 01ea173e ("xfs: fix up non-directory creation in SGID directories") [3]
      Link: fd84bfdd ("ceph: fix up non-directory creation in SGID directories") [4]
      Link: https://lore.kernel.org/r/1657779088-2242-3-git-send-email-xuyang2018.jy@fujitsu.comSuggested-by: default avatarDave Chinner <david@fromorbit.com>
      Suggested-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-and-Tested-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarYang Xu <xuyang2018.jy@fujitsu.com>
      [<brauner@kernel.org>: rewrote commit message]
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      1639a49c
  2. 19 Jul, 2022 2 commits
  3. 17 Jul, 2022 15 commits
  4. 16 Jul, 2022 12 commits
  5. 15 Jul, 2022 10 commits
    • Linus Torvalds's avatar
      Merge tag 'riscv-for-linus-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · 9b59ec8d
      Linus Torvalds authored
      Pull RISC-V fixes from Palmer Dabbelt:
      
       - A fix to avoid printing a warning when modules do not exercise any
         errata-dependent behavior and the SiFive errata are enabled.
      
       - A fix to the Microchip PFSOC to attach the L2 cache to the CPU nodes.
      
      * tag 'riscv-for-linus-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
        riscv: don't warn for sifive erratas in modules
        riscv: dts: microchip: hook up the mpfs' l2cache
      9b59ec8d
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · a8ebfcd3
      Linus Torvalds authored
      Pull KVM fixes from Paolo Bonzini:
       "RISC-V:
         - Fix missing PAGE_PFN_MASK
      
         - Fix SRCU deadlock caused by kvm_riscv_check_vcpu_requests()
      
        x86:
         - Fix for nested virtualization when TSC scaling is active
      
         - Estimate the size of fastcc subroutines conservatively, avoiding
           disastrous underestimation when return thunks are enabled
      
         - Avoid possible use of uninitialized fields of 'struct
           kvm_lapic_irq'
      
        Generic:
         - Mark as such the boolean values available from the statistics file
           descriptors
      
         - Clarify statistics documentation"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: emulate: do not adjust size of fastop and setcc subroutines
        KVM: x86: Fully initialize 'struct kvm_lapic_irq' in kvm_pv_kick_cpu_op()
        Documentation: kvm: clarify histogram units
        kvm: stats: tell userspace which values are boolean
        x86/kvm: fix FASTOP_SIZE when return thunks are enabled
        KVM: nVMX: Always enable TSC scaling for L2 when it was enabled for L1
        RISC-V: KVM: Fix SRCU deadlock caused by kvm_riscv_check_vcpu_requests()
        riscv: Fix missing PAGE_PFN_MASK
      a8ebfcd3
    • Linus Torvalds's avatar
      Merge tag 'ceph-for-5.19-rc7' of https://github.com/ceph/ceph-client · 1ce9d792
      Linus Torvalds authored
      Pull ceph fix from Ilya Dryomov:
       "A folio locking fixup that Xiubo and David cooperated on, marked for
        stable. Most of it is in netfs but I picked it up into ceph tree on
        agreement with David"
      
      * tag 'ceph-for-5.19-rc7' of https://github.com/ceph/ceph-client:
        netfs: do not unlock and put the folio twice
      1ce9d792
    • Linus Torvalds's avatar
      Merge tag 'spi-fix-v5.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi · 8006112d
      Linus Torvalds authored
      Pull spi fixes from Mark Brown:
       "A few driver specific fixes, none especially remarkable, plus a
        MAINTAINERS file update due to the previous maintainer for the NXP
        FSPI driver having left the company"
      
      * tag 'spi-fix-v5.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
        spi: cadence-quadspi: Remove spi_master_put() in probe failure path
        MAINTAINERS: change the NXP FSPI driver maintainer.
        spi: amd: Limit max transfer and message size
        spi: aspeed: Fix division by zero
        spi: aspeed: Add dev_dbg() to dump the spi-mem direct mapping descriptor
      8006112d
    • Linus Torvalds's avatar
      Merge tag 'soc-fixes-5.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc · 1c49f281
      Linus Torvalds authored
      Pull ARM SoC fixes from Arnd Bergmann:
       "Most of the contents are bugfixes for the devicetree files:
      
         - A Qualcomm MSM8974 pin controller regression, caused by a cleanup
           patch that gets partially reverted here.
      
         - Missing properties for Broadcom BCM49xx to fix timer detection and
           SMP boot.
      
         - Fix touchscreen pinctrl for imx6ull-colibri board
      
         - Multiple fixes for Rockchip rk3399 based machines including the vdu
           clock-rate fix, otg port fix on Quartz64-A and ethernet on
           Quartz64-B
      
         - Fixes for misspelled DT contents causing minor problems on
           imx6qdl-ts7970m, orangepi-zero, sama5d2, kontron-kswitch-d10, and
           ls1028a
      
        And a couple of changes elsewhere:
      
         - Fix binding for Allwinner D1 display pipeline
      
         - Trivial code fixes to the TEE and reset controller driver
           subsystems and the rockchip platform code.
      
         - Multiple updates to the MAINTAINERS files, marking the Palm Treo
           support as orphaned, and fixing some entries for added or changed
           file names"
      
      * tag 'soc-fixes-5.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (21 commits)
        arm64: dts: broadcom: bcm4908: Fix cpu node for smp boot
        arm64: dts: broadcom: bcm4908: Fix timer node for BCM4906 SoC
        ARM: dts: sunxi: Fix SPI NOR campatible on Orange Pi Zero
        ARM: dts: at91: sama5d2: Fix typo in i2s1 node
        tee: tee_get_drvdata(): fix description of return value
        optee: Remove duplicate 'of' in two places.
        ARM: dts: kswitch-d10: use open drain mode for coma-mode pins
        ARM: dts: colibri-imx6ull: fix snvs pinmux group
        optee: smc_abi.c: fix wrong pointer passed to IS_ERR/PTR_ERR()
        MAINTAINERS: add polarfire rng, pci and clock drivers
        MAINTAINERS: mark ARM/PALM TREO SUPPORT orphan
        ARM: dts: imx6qdl-ts7970: Fix ngpio typo and count
        arm64: dts: ls1028a: Update SFP node to include clock
        dt-bindings: display: sun4i: Fix D1 pipeline count
        ARM: dts: qcom: msm8974: re-add missing pinctrl
        reset: Fix devm bulk optional exclusive control getter
        MAINTAINERS: rectify entry for SYNOPSYS AXS10x RESET CONTROLLER DRIVER
        ARM: rockchip: Add missing of_node_put() in rockchip_suspend_init()
        arm64: dts: rockchip: Assign RK3399 VDU clock rate
        arm64: dts: rockchip: Fix Quartz64-A dwc3 otg port behavior
        ...
      1c49f281
    • David Sterba's avatar
      Revert "btrfs: turn delayed_nodes_tree into an XArray" · 088aea3b
      David Sterba authored
      This reverts commit 253bf575.
      
      Revert the xarray conversion, there's a problem with potential
      sleep-inside-spinlock [1] when calling xa_insert that triggers GFP_NOFS
      allocation. The radix tree used the preloading mechanism to avoid
      sleeping but this is not available in xarray.
      
      Conversion from spin lock to mutex is possible but at time of rc6 is
      riskier than a clean revert.
      
      [1] https://lore.kernel.org/linux-btrfs/cover.1657097693.git.fdmanana@suse.com/Reported-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      088aea3b
    • David Sterba's avatar
      Revert "btrfs: turn name_cache radix tree into XArray in send_ctx" · 5b8418b8
      David Sterba authored
      This reverts commit 40769420.
      
      Revert the xarray conversion, there's a problem with potential
      sleep-inside-spinlock [1] when calling xa_insert that triggers GFP_NOFS
      allocation. The radix tree used the preloading mechanism to avoid
      sleeping but this is not available in xarray.
      
      Conversion from spin lock to mutex is possible but at time of rc6 is
      riskier than a clean revert.
      
      [1] https://lore.kernel.org/linux-btrfs/cover.1657097693.git.fdmanana@suse.com/Reported-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      5b8418b8
    • David Sterba's avatar
      Revert "btrfs: turn fs_info member buffer_radix into XArray" · 01cd3909
      David Sterba authored
      This reverts commit 8ee92268.
      
      Revert the xarray conversion, there's a problem with potential
      sleep-inside-spinlock [1] when calling xa_insert that triggers GFP_NOFS
      allocation. The radix tree used the preloading mechanism to avoid
      sleeping but this is not available in xarray.
      
      Conversion from spin lock to mutex is possible but at time of rc6 is
      riskier than a clean revert.
      
      [1] https://lore.kernel.org/linux-btrfs/cover.1657097693.git.fdmanana@suse.com/Reported-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      01cd3909
    • David Sterba's avatar
      Revert "btrfs: turn fs_roots_radix in btrfs_fs_info into an XArray" · fc7cbcd4
      David Sterba authored
      This reverts commit 48b36a60.
      
      Revert the xarray conversion, there's a problem with potential
      sleep-inside-spinlock [1] when calling xa_insert that triggers GFP_NOFS
      allocation. The radix tree used the preloading mechanism to avoid
      sleeping but this is not available in xarray.
      
      Conversion from spin lock to mutex is possible but at time of rc6 is
      riskier than a clean revert.
      
      [1] https://lore.kernel.org/linux-btrfs/cover.1657097693.git.fdmanana@suse.com/Reported-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      fc7cbcd4
    • Linus Torvalds's avatar
      Merge tag 'platform-drivers-x86-v5.19-4' of... · 2a347a06
      Linus Torvalds authored
      Merge tag 'platform-drivers-x86-v5.19-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
      
      Pull x86 platform driver fixes from Hans de Goede:
       "Highlights:
      
         - Fix brightness key events getting reported twice on some Dells.
           Regression caused by recent Panasonic hotkey fixes
      
         - Fix poweroff no longer working on some devices regression caused
           by recent poweroff handler rework
      
         - Mark new (in 5.19) Intel IFS driver as broken, because of some
           issues surrounding the userspace (sysfs) API which need to be
           cleared up
      
         - Some hardware-id / quirk additions"
      
      * tag 'platform-drivers-x86-v5.19-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
        ACPI: video: Fix acpi_video_handles_brightness_key_presses()
        platform/x86: intel_atomisp2_led: Also turn off the always-on camera LED on the Asus T100TAF
        platform/x86/intel/ifs: Mark as BROKEN
        platform/x86: asus-wmi: Add key mappings
        efi: Fix efi_power_off() not being run before acpi_power_off() when necessary
        platform/x86: x86-android-tablets: Fix Lenovo Yoga Tablet 2 830/1050 poweroff again
        platform/x86: gigabyte-wmi: add support for B660I AORUS PRO DDR4
        platform/x86/amd/pmc: Add new platform support
        platform/x86/amd/pmc: Add new acpi id for PMC controller
      2a347a06