1. 08 Aug, 2023 2 commits
  2. 07 Aug, 2023 10 commits
  3. 06 Aug, 2023 8 commits
    • Linus Torvalds's avatar
      Linux 6.5-rc5 · 52a93d39
      Linus Torvalds authored
      52a93d39
    • Linus Torvalds's avatar
      Merge tag 'v6.5-rc5.vfs.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · 0108963f
      Linus Torvalds authored
      Pull vfs fixes from Christian Brauner:
      
       - Fix a wrong check for O_TMPFILE during RESOLVE_CACHED lookup
      
       - Clean up directory iterators and clarify file_needs_f_pos_lock()
      
      * tag 'v6.5-rc5.vfs.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
        fs: rely on ->iterate_shared to determine f_pos locking
        vfs: get rid of old '->iterate' directory operation
        proc: fix missing conversion to 'iterate_shared'
        open: make RESOLVE_CACHED correctly test for O_TMPFILE
      0108963f
    • Christian Brauner's avatar
      fs: rely on ->iterate_shared to determine f_pos locking · 7d84d1b9
      Christian Brauner authored
      Now that we removed ->iterate we don't need to check for either
      ->iterate or ->iterate_shared in file_needs_f_pos_lock(). Simply check
      for ->iterate_shared instead. This will tell us whether we need to
      unconditionally take the lock. Not just does it allow us to avoid
      checking f_inode's mode it also actually clearly shows that we're
      locking because of readdir.
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      7d84d1b9
    • Linus Torvalds's avatar
      vfs: get rid of old '->iterate' directory operation · 3e327154
      Linus Torvalds authored
      All users now just use '->iterate_shared()', which only takes the
      directory inode lock for reading.
      
      Filesystems that never got convered to shared mode now instead use a
      wrapper that drops the lock, re-takes it in write mode, calls the old
      function, and then downgrades the lock back to read mode.
      
      This way the VFS layer and other callers no longer need to care about
      filesystems that never got converted to the modern era.
      
      The filesystems that use the new wrapper are ceph, coda, exfat, jfs,
      ntfs, ocfs2, overlayfs, and vboxsf.
      
      Honestly, several of them look like they really could just iterate their
      directories in shared mode and skip the wrapper entirely, but the point
      of this change is to not change semantics or fix filesystems that
      haven't been fixed in the last 7+ years, but to finally get rid of the
      dual iterators.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      3e327154
    • Linus Torvalds's avatar
      proc: fix missing conversion to 'iterate_shared' · 0a2c2baa
      Linus Torvalds authored
      I'm looking at the directory handling due to the discussion about f_pos
      locking (see commit 79796425: "file: reinstate f_pos locking
      optimization for regular files"), and wanting to clean that up.
      
      And one source of ugliness is how we were supposed to move filesystems
      over to the '->iterate_shared()' function that only takes the inode lock
      for reading many many years ago, but several filesystems still use the
      bad old '->iterate()' that takes the inode lock for exclusive access.
      
      See commit 61922694 ("introduce a parallel variant of ->iterate()")
      that also added some documentation stating
      
            Old method is only used if the new one is absent; eventually it will
            be removed.  Switch while you still can; the old one won't stay.
      
      and that was back in April 2016.  Here we are, many years later, and the
      old version is still clearly sadly alive and well.
      
      Now, some of those old style iterators are probably just because the
      filesystem may end up having per-inode mutable data that it uses for
      iterating a directory, but at least one case is just a mistake.
      
      Al switched over most filesystems to use '->iterate_shared()' back when
      it was introduced.  In particular, the /proc filesystem was converted as
      one of the first ones in commit f50752ea ("switch all procfs
      directories ->iterate_shared()").
      
      But then later one new user of '->iterate()' was then re-introduced by
      commit 6d9c939d ("procfs: add smack subdir to attrs").
      
      And that's clearly not what we wanted, since that new case just uses the
      same 'proc_pident_readdir()' and 'proc_pident_lookup()' helper functions
      that other /proc pident directories use, and they are most definitely
      safe to use with the inode lock held shared.
      
      So just fix it.
      
      This still leaves a fair number of oddball filesystems using the
      old-style directory iterator (ceph, coda, exfat, jfs, ntfs, ocfs2,
      overlayfs, and vboxsf), but at least we don't have any remaining in the
      core filesystems.
      
      I'm going to add a wrapper function that just drops the read-lock and
      takes it as a write lock, so that we can clean up the core vfs layer and
      make all the ugly 'this filesystem needs exclusive inode locking' be
      just filesystem-internal warts.
      
      I just didn't want to make that conversion when we still had a core user
      left.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      0a2c2baa
    • Aleksa Sarai's avatar
      open: make RESOLVE_CACHED correctly test for O_TMPFILE · a0fc452a
      Aleksa Sarai authored
      O_TMPFILE is actually __O_TMPFILE|O_DIRECTORY. This means that the old
      fast-path check for RESOLVE_CACHED would reject all users passing
      O_DIRECTORY with -EAGAIN, when in fact the intended test was to check
      for __O_TMPFILE.
      
      Cc: stable@vger.kernel.org # v5.12+
      Fixes: 99668f61 ("fs: expose LOOKUP_CACHED through openat2() RESOLVE_CACHED")
      Signed-off-by: default avatarAleksa Sarai <cyphar@cyphar.com>
      Message-Id: <20230806-resolve_cached-o_tmpfile-v1-1-7ba16308465e@cyphar.com>
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      a0fc452a
    • Linus Torvalds's avatar
      Merge tag 'rust-fixes-6.5-rc5' of https://github.com/Rust-for-Linux/linux · f0ab9f34
      Linus Torvalds authored
      Pull rust fixes from Miguel Ojeda:
      
       - Allocator: prevent mis-aligned allocation
      
       - Types: delete 'ForeignOwnable::borrow_mut'. A sound replacement is
         planned for the merge window
      
       - Build: fix bindgen error with UBSAN_BOUNDS_STRICT
      
      * tag 'rust-fixes-6.5-rc5' of https://github.com/Rust-for-Linux/linux:
        rust: fix bindgen build error with UBSAN_BOUNDS_STRICT
        rust: delete `ForeignOwnable::borrow_mut`
        rust: allocator: Prevent mis-aligned allocation
      f0ab9f34
    • Linus Torvalds's avatar
      Merge tag 'ata-6.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata · fb0d9199
      Linus Torvalds authored
      Pull ata fix from Damien Le Moal:
      
       - Prevent the scsi disk driver from issuing a START STOP UNIT command
         for ATA devices during system resume as this causes various issues
         reported by multiple users.
      
      * tag 'ata-6.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
        ata,scsi: do not issue START STOP UNIT on resume
      fb0d9199
  4. 05 Aug, 2023 5 commits
  5. 04 Aug, 2023 15 commits
    • Linus Torvalds's avatar
      Merge tag 'riscv-for-linus-6.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · e661f98c
      Linus Torvalds authored
      Pull RISC-V fixes from Palmer Dabbelt:
      
       - A pair of fixes for build-related failures in the selftests
      
       - A fix for a sparse warning in acpi_os_ioremap()
      
       - A fix to restore the kernel PA offset in vmcoreinfo, to fix crash
         handling
      
      * tag 'riscv-for-linus-6.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
        Documentation: kdump: Add va_kernel_pa_offset for RISCV64
        riscv: Export va_kernel_pa_offset in vmcoreinfo
        RISC-V: ACPI: Fix acpi_os_ioremap to return iomem address
        selftests: riscv: Fix compilation error with vstate_exec_nolibc.c
        selftests/riscv: fix potential build failure during the "emit_tests" step
      e661f98c
    • Linus Torvalds's avatar
      Merge tag 'pm-6.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · ea4f142f
      Linus Torvalds authored
      Pull power management fix from Rafael Wysocki:
       "Fix a sparse warning triggered by the TPMI interface recently added to
        the Intel RAPL power capping driver (Zhang Rui)"
      
      * tag 'pm-6.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        powercap: intel_rapl: Fix a sparse warning in TPMI interface
      ea4f142f
    • Mark Brown's avatar
      selftests/rseq: Fix build with undefined __weak · d5ad9aae
      Mark Brown authored
      Commit 3bcbc209 ("selftests/rseq: Play nice with binaries statically
      linked against glibc 2.35+") which is now in Linus' tree introduced uses
      of __weak but did nothing to ensure that a definition is provided for it
      resulting in build failures for the rseq tests:
      
      rseq.c:41:1: error: unknown type name '__weak'
      __weak ptrdiff_t __rseq_offset;
      ^
      rseq.c:41:17: error: expected ';' after top level declarator
      __weak ptrdiff_t __rseq_offset;
                      ^
                      ;
      rseq.c:42:1: error: unknown type name '__weak'
      __weak unsigned int __rseq_size;
      ^
      rseq.c:43:1: error: unknown type name '__weak'
      __weak unsigned int __rseq_flags;
      
      Fix this by using the definition from tools/include compiler.h.
      
      Fixes: 3bcbc209 ("selftests/rseq: Play nice with binaries statically linked against glibc 2.35+")
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Message-Id: <20230804-kselftest-rseq-build-v1-1-015830b66aa9@kernel.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      d5ad9aae
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · e6fda526
      Linus Torvalds authored
      Pull arm64 fixes from Catalin Marinas:
       "More SVE/SME fixes for ptrace() and for the (potentially future) case
        where SME is implemented in hardware without SVE support"
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64/fpsimd: Sync and zero pad FPSIMD state for streaming SVE
        arm64/fpsimd: Sync FPSIMD state with SVE for SME only systems
        arm64/ptrace: Don't enable SVE when setting streaming SVE
        arm64/ptrace: Flush FP state when setting ZT0
        arm64/fpsimd: Clear SME state in the target task when setting the VL
      e6fda526
    • Linus Torvalds's avatar
      Merge tag 'mtd/fixes-for-6.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux · c8273a25
      Linus Torvalds authored
      Pull mtd fixes from Miquel Raynal:
       "Raw NAND fixes:
         - fsl_upm: Fix an off-by one test in fun_exec_op()
         - Rockchip:
             - Align hwecc vs. raw page helper layouts
             - Fix oobfree offset and description
         - Meson: Fix OOB available bytes for ECC
         - Omap ELM: Fix incorrect type in assignment
      
        SPI-NOR fix:
         - Avoid holes in struct spi_mem_op
      
        Hyperbus fix:
         - Add Tudor as reviewer in MAINTAINERS
      
        SPI-NAND fixes:
         - Winbond and Toshiba: Fix ecc_get_status"
      
      * tag 'mtd/fixes-for-6.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
        mtd: rawnand: fsl_upm: Fix an off-by one test in fun_exec_op()
        mtd: spi-nor: avoid holes in struct spi_mem_op
        MAINTAINERS: Add myself as reviewer for HYPERBUS
        mtd: rawnand: rockchip: Align hwecc vs. raw page helper layouts
        mtd: rawnand: rockchip: fix oobfree offset and description
        mtd: rawnand: meson: fix OOB available bytes for ECC
        mtd: rawnand: omap_elm: Fix incorrect type in assignment
        mtd: spinand: winbond: Fix ecc_get_status
        mtd: spinand: toshiba: Fix ecc_get_status
      c8273a25
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-2023-08-04' of git://anongit.freedesktop.org/drm/drm · 4142fc67
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "Small set of fixes this week, i915 and a few misc ones. I didn't see
        an amd pull so maybe next week it'll have a few more on that driver.
      
        ttm:
         - NULL ptr deref fix
      
        panel:
         - add missing MODULE_DEVICE_TABLE
      
        imx/ipuv3:
         - timing fix
      
        i915:
         - Fix bug in getting msg length in AUX CH registers handler
         - Gen12 AUX invalidation fixes
         - Fix premature release of request's reusable memory"
      
      * tag 'drm-fixes-2023-08-04' of git://anongit.freedesktop.org/drm/drm:
        drm/panel: samsung-s6d7aa0: Add MODULE_DEVICE_TABLE
        drm/i915: Fix premature release of request's reusable memory
        drm/i915/gt: Support aux invalidation on all engines
        drm/i915/gt: Poll aux invalidation register bit on invalidation
        drm/i915/gt: Enable the CCS_FLUSH bit in the pipe control and in the CS
        drm/i915/gt: Rename flags with bit_group_X according to the datasheet
        drm/i915/gt: Ensure memory quiesced before invalidation
        drm/i915: Add the gen12_needs_ccs_aux_inv helper
        drm/i915/gt: Cleanup aux invalidation registers
        drm/i915/gvt: Fix bug in getting msg length in AUX CH registers handler
        drm/imx/ipuv3: Fix front porch adjustment upon hactive aligning
        drm/ttm: check null pointer before accessing when swapping
      4142fc67
    • Linus Torvalds's avatar
      Merge tag 'ceph-for-6.5-rc5' of https://github.com/ceph/ceph-client · 4593f3c2
      Linus Torvalds authored
      Pull ceph fixes from Ilya Dryomov:
       "Two patches to improve RBD exclusive lock interaction with
        osd_request_timeout option and another fix to reduce the potential for
        erroneous blocklisting -- this time in CephFS. All going to stable"
      
      * tag 'ceph-for-6.5-rc5' of https://github.com/ceph/ceph-client:
        libceph: fix potential hang in ceph_osdc_notify()
        rbd: prevent busy loop when requesting exclusive lock
        ceph: defer stopping mdsc delayed_work
      4593f3c2
    • Linus Torvalds's avatar
      file: reinstate f_pos locking optimization for regular files · 79796425
      Linus Torvalds authored
      In commit 20ea1e7d ("file: always lock position for
      FMODE_ATOMIC_POS") we ended up always taking the file pos lock, because
      pidfd_getfd() could get a reference to the file even when it didn't have
      an elevated file count due to threading of other sharing cases.
      
      But Mateusz Guzik reports that the extra locking is actually measurable,
      so let's re-introduce the optimization, and only force the locking for
      directory traversal.
      
      Directories need the lock for correctness reasons, while regular files
      only need it for "POSIX semantics".  Since pidfd_getfd() is about
      debuggers etc special things that are _way_ outside of POSIX, we can
      relax the rules for that case.
      Reported-by: default avatarMateusz Guzik <mjguzik@gmail.com>
      Cc: Christian Brauner <brauner@kernel.org>
      Link: https://lore.kernel.org/linux-fsdevel/20230803095311.ijpvhx3fyrbkasul@f/Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      79796425
    • Paolo Bonzini's avatar
      Merge tag 'kvmarm-fixes-6.5-2' of... · 251199f4
      Paolo Bonzini authored
      Merge tag 'kvmarm-fixes-6.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
      
      KVM/arm64 fixes for 6.5, part #2
      
       - Fixes for the configuration of SVE/SME traps when hVHE mode is in use
      
       - Allow use of pKVM on systems with FF-A implementations that are v1.0
         compatible
      
       - Request/release percpu IRQs (arch timer, vGIC maintenance) correctly
         when pKVM is in use
      
       - Fix function prototype after __kvm_host_psci_cpu_entry() rename
      
       - Skip to the next instruction when emulating writes to TCR_EL1 on
         AmpereOne systems
      251199f4
    • Paolo Bonzini's avatar
      KVM: SEV: remove ghcb variable declarations · 63dbc67c
      Paolo Bonzini authored
      To avoid possible time-of-check/time-of-use issues, the GHCB should
      almost never be accessed outside dump_ghcb, sev_es_sync_to_ghcb
      and sev_es_sync_from_ghcb.  The only legitimate uses are to set the
      exitinfo fields and to find the address of the scratch area embedded
      in the ghcb.  Accessing ghcb_usage also goes through svm->sev_es.ghcb
      in sev_es_validate_vmgexit(), but that is because anyway the value is
      not used.
      
      Removing a shortcut variable that contains the value of svm->sev_es.ghcb
      makes these cases a bit more verbose, but it limits the chance of someone
      reading the ghcb by mistake.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      63dbc67c
    • Paolo Bonzini's avatar
      KVM: SEV: only access GHCB fields once · 7588dbce
      Paolo Bonzini authored
      A KVM guest using SEV-ES or SEV-SNP with multiple vCPUs can trigger
      a double fetch race condition vulnerability and invoke the VMGEXIT
      handler recursively.
      
      sev_handle_vmgexit() maps the GHCB page using kvm_vcpu_map() and then
      fetches the exit code using ghcb_get_sw_exit_code().  Soon after,
      sev_es_validate_vmgexit() fetches the exit code again. Since the GHCB
      page is shared with the guest, the guest is able to quickly swap the
      values with another vCPU and hence bypass the validation. One vmexit code
      that can be rejected by sev_es_validate_vmgexit() is SVM_EXIT_VMGEXIT;
      if sev_handle_vmgexit() observes it in the second fetch, the call
      to svm_invoke_exit_handler() will invoke sev_handle_vmgexit() again
      recursively.
      
      To avoid the race, always fetch the GHCB data from the places where
      sev_es_sync_from_ghcb stores it.
      
      Exploiting recursions on linux kernel has been proven feasible
      in the past, but the impact is mitigated by stack guard pages
      (CONFIG_VMAP_STACK).  Still, if an attacker manages to call the handler
      multiple times, they can theoretically trigger a stack overflow and
      cause a denial-of-service, or potentially guest-to-host escape in kernel
      configurations without stack guard pages.
      
      Note that winning the race reliably in every iteration is very tricky
      due to the very tight window of the fetches; depending on the compiler
      settings, they are often consecutive because of optimization and inlining.
      
      Tested by booting an SEV-ES RHEL9 guest.
      
      Fixes: CVE-2023-4155
      Fixes: 291bd20d ("KVM: SVM: Add initial support for a VMGEXIT VMEXIT")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarAndy Nguyen <theflow@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      7588dbce
    • Paolo Bonzini's avatar
      KVM: SEV: snapshot the GHCB before accessing it · 4e15a0dd
      Paolo Bonzini authored
      Validation of the GHCB is susceptible to time-of-check/time-of-use vulnerabilities.
      To avoid them, we would like to always snapshot the fields that are read in
      sev_es_validate_vmgexit(), and not use the GHCB anymore after it returns.
      
      This means:
      
      - invoking sev_es_sync_from_ghcb() before any GHCB access, including before
        sev_es_validate_vmgexit()
      
      - snapshotting all fields including the valid bitmap and the sw_scratch field,
        which are currently not caching anywhere.
      
      The valid bitmap is the first thing to be copied out of the GHCB; then,
      further accesses will use the copy in svm->sev_es.
      
      Fixes: 291bd20d ("KVM: SVM: Add initial support for a VMGEXIT VMEXIT")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      4e15a0dd
    • Mark Brown's avatar
      arm64/fpsimd: Sync and zero pad FPSIMD state for streaming SVE · 69af56ae
      Mark Brown authored
      We have a function sve_sync_from_fpsimd_zeropad() which is used by the
      ptrace code to update the SVE state when the user writes to the the
      FPSIMD register set.  Currently this checks that the task has SVE
      enabled but this will miss updates for tasks which have streaming SVE
      enabled if SVE has not been enabled for the thread, also do the
      conversion if the task has streaming SVE enabled.
      
      Fixes: e12310a0 ("arm64/sme: Implement ptrace support for streaming mode SVE registers")
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20230803-arm64-fix-ptrace-ssve-no-sve-v1-3-49df214bfb3e@kernel.orgSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      69af56ae
    • Mark Brown's avatar
      arm64/fpsimd: Sync FPSIMD state with SVE for SME only systems · 507ea5dd
      Mark Brown authored
      Currently we guard FPSIMD/SVE state conversions with a check for the system
      supporting SVE but SME only systems may need to sync streaming mode SVE
      state so add a check for SME support too.  These functions are only used
      by the ptrace code.
      
      Fixes: e12310a0 ("arm64/sme: Implement ptrace support for streaming mode SVE registers")
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20230803-arm64-fix-ptrace-ssve-no-sve-v1-2-49df214bfb3e@kernel.orgSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      507ea5dd
    • Mark Brown's avatar
      arm64/ptrace: Don't enable SVE when setting streaming SVE · 045aecdf
      Mark Brown authored
      Systems which implement SME without also implementing SVE are
      architecturally valid but were not initially supported by the kernel,
      unfortunately we missed one issue in the ptrace code.
      
      The SVE register setting code is shared between SVE and streaming mode
      SVE. When we set full SVE register state we currently enable TIF_SVE
      unconditionally, in the case where streaming SVE is being configured on a
      system that supports vanilla SVE this is not an issue since we always
      initialise enough state for both vector lengths but on a system which only
      support SME it will result in us attempting to restore the SVE vector
      length after having set streaming SVE registers.
      
      Fix this by making the enabling of SVE conditional on setting SVE vector
      state. If we set streaming SVE state and SVE was not already enabled this
      will result in a SVE access trap on next use of normal SVE, this will cause
      us to flush our register state but this is fine since the only way to
      trigger a SVE access trap would be to exit streaming mode which will cause
      the in register state to be flushed anyway.
      
      Fixes: e12310a0 ("arm64/sme: Implement ptrace support for streaming mode SVE registers")
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20230803-arm64-fix-ptrace-ssve-no-sve-v1-1-49df214bfb3e@kernel.orgSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      045aecdf