1. 12 Mar, 2024 30 commits
    • Linus Torvalds's avatar
      Merge tag 'for-6.9-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 43a7548e
      Linus Torvalds authored
      Pull btrfs updates from David Sterba:
       "Mostly stabilization, refactoring and cleanup changes. There rest are
        minor performance optimizations due to caching or lock contention
        reduction and a few notable fixes.
      
        Performance improvements:
      
         - minor speedup in logging when repeatedly allocated structure is
           preallocated only once, improves latency and decreases lock
           contention
      
         - minor throughput increase (+6%), reduced lock contention after
           clearing delayed allocation bits, applies to several common
           workload types
      
         - skip full quota rescan if a new relation is added in the same
           transaction
      
        Fixes:
      
         - zstd fix for inline compressed file in subpage mode, updated
           version from the 6.8 time
      
         - proper qgroup inheritance ioctl parameter validation
      
         - more fiemap followup fixes after reduced locking done in 6.8:
            - fix race when detecting delalloc ranges
      
        Core changes:
      
         - more debugging code:
            - added assertions for a very rare crash in raid56 calculation
            - tree-checker dumps page state to give more insights into
              possible reference counting issues
      
         - add checksum calculation offloading sysfs knob, for now enabled
           under DEBUG only to determine a good heuristic for deciding the
           offload or synchronous, depends on various factors (block group
           profile, device speed) and is not as clear as initially thought
           (checksum type)
      
         - error handling improvements, added assertions
      
         - more page to folio conversion (defrag, truncate), cached size and
           shift
      
         - preparation for more fine grained locking of sectors in subpage
           mode
      
         - cleanups and refactoring:
            - include cleanups, forward declarations
            - pointer-to-structure helpers
            - redundant argument removals
            - removed unused code
            - slab cache updates, last use of SLAB_MEM_SPREAD removed"
      
      * tag 'for-6.9-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (114 commits)
        btrfs: reuse cloned extent buffer during fiemap to avoid re-allocations
        btrfs: fix race when detecting delalloc ranges during fiemap
        btrfs: fix off-by-one chunk length calculation at contains_pending_extent()
        btrfs: qgroup: allow quick inherit if snapshot is created and added to the same parent
        btrfs: qgroup: validate btrfs_qgroup_inherit parameter
        btrfs: include device major and minor numbers in the device scan notice
        btrfs: mark btrfs_put_caching_control() static
        btrfs: remove SLAB_MEM_SPREAD flag use
        btrfs: qgroup: always free reserved space for extent records
        btrfs: tree-checker: dump the page status if hit something wrong
        btrfs: compression: remove dead comments in btrfs_compress_heuristic()
        btrfs: subpage: make writer lock utilize bitmap
        btrfs: subpage: make reader lock utilize bitmap
        btrfs: unexport btrfs_subpage_start_writer() and btrfs_subpage_end_and_test_writer()
        btrfs: pass a valid extent map cache pointer to __get_extent_map()
        btrfs: merge btrfs_del_delalloc_inode() helpers
        btrfs: pass btrfs_device to btrfs_scratch_superblocks()
        btrfs: handle transaction commit errors in flush_reservations()
        btrfs: use KMEM_CACHE() to create btrfs_free_space cache
        btrfs: use KMEM_CACHE() to create delayed ref caches
        ...
      43a7548e
    • Linus Torvalds's avatar
      Merge tag 'zonefs-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs · 35d4aeea
      Linus Torvalds authored
      Pull zonefs update from Damien Le Moal:
      
       - A single change for this cycle to convert zonefs to use the new
         mount API
      
      * tag 'zonefs-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
        zonefs: convert zonefs to use the new mount api
      35d4aeea
    • Linus Torvalds's avatar
      Merge tag 'asm-generic-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic · 65d287c7
      Linus Torvalds authored
      Pull asm-generic updates from Arnd Bergmann:
       "Just two small updates this time:
      
         - A series I did to unify the definition of PAGE_SIZE through
           Kconfig, intended to help with a vdso rework that needs the
           constant but cannot include the normal kernel headers when building
           the compat VDSO on arm64 and potentially others
      
         - a patch from Yan Zhao to remove the pfn_to_virt() definitions from
           a couple of architectures after finding they were both incorrect
           and entirely unused"
      
      * tag 'asm-generic-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
        arch: define CONFIG_PAGE_SIZE_*KB on all architectures
        arch: simplify architecture specific page size configuration
        arch: consolidate existing CONFIG_PAGE_SIZE_*KB definitions
        mm: Remove broken pfn_to_virt() on arch csky/hexagon/openrisc
      65d287c7
    • Linus Torvalds's avatar
      Merge tag 'soc-defconfig-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc · 3efa10eb
      Linus Torvalds authored
      Pull ARM defconfig updates from Arnd Bergmann:
       "This has the usual updates to enable platform specific driver modules
        as new hardware gets supported, as well as an update to the
        virt.config fragment so we disable all newly added platforms again"
      
      * tag 'soc-defconfig-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (24 commits)
        arm64: defconfig: Enable support for cbmem entries in the coreboot table
        ARM: defconfig: enable STMicroelectronics accelerometer and gyro for Exynos
        arm64: defconfig: drop ext2 filesystem and redundant ext3
        arm64: defconfig: Enable Rockchip HDMI/eDP Combo PHY
        arm64: defconfig: Enable Wave5 Video Encoder/Decoder
        arm64: config: disable new platforms in virt.config
        arm64: defconfig: Enable QCOM PBS
        arm64: deconfig: enable Goodix Berlin SPI touchscreen driver as module
        arm64: defconfig: Enable X1E80100 multimedia clock controllers configs
        arm64: defconfig: Enable GCC and interconnect for QDU1000/QRU1000
        arm64: defconfig: enable i.MX8MP ldb bridge
        arm64: defconfig: enable the vf610 gpio driver
        ARM: imx_v6_v7_defconfig: enable the vf610 gpio driver
        ARM: multi_v7_defconfig: Add more TI Keystone support
        arm64: defconfig: enable WCD939x USBSS driver as module
        arm64: defconfig: enable audio drivers for SM8650 QRD board
        arm64: defconfig: Enable Qualcomm interconnect providers
        ARM: multi_v7_defconfig: Enable BACKLIGHT_CLASS_DEVICE
        arm64: defconfig: Enable i.MX8QXP device drivers
        ARM: multi_v7_defconfig: Add more TI Keystone support
        ...
      3efa10eb
    • Linus Torvalds's avatar
      Merge tag 'soc-arm-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc · a6081672
      Linus Torvalds authored
      Pull ARM SoC code updates from Arnd Bergmann:
       "These are mostly minor updates, including a number of kerneldoc fixes
        from Randy Dunlap across multiple platforms. OMAP gets a few bugfixes,
        and the MAINTAINERS file gets updated for AMD Zynq and NXP S32G"
      
      * tag 'soc-arm-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (23 commits)
        ARM: s32c: update MAINTAINERS entry
        ARM: AM33xx: PRM: Implement REBOOT_COLD
        ARM: AM33xx: PRM: Remove redundand defines
        ARM: omap1: remove duplicated 'select ARCH_OMAP'
        ARM: s3c64xx: make bus_type const
        ARM: imx: Remove usage of the deprecated ida_simple_xx() API
        ARM: OMAP2+: fix kernel-doc warnings
        ARM: OMAP2+: fix kernel-doc warnings
        ARM: OMAP2+: fix a kernel-doc warning
        ARM: OMAP2+: PRM: fix kernel-doc warnings
        ARM: OMAP2+: prm44xx: fix a kernel-doc warning
        ARM: OMAP2+: pmic-cpcap: fix kernel-doc warnings
        ARM: OMAP2+: hwmod: fix kernel-doc warnings
        ARM: OMAP2+: hwmod: remove misuse of kernel-doc
        ARM: OMAP2+: CMINST: use matching function name in kernel-doc
        ARM: OMAP2+: cm33xx: use matching function name in kernel-doc
        ARM: OMAP2+: clock: fix a function name in kernel-doc
        ARM: OMAP2+: clockdomain: fix kernel-doc warnings
        ARM: OMAP2+: am33xx-restart: fix function name in kernel-doc
        soc: xilinx: update maintainer of event manager driver
        ...
      a6081672
    • Linus Torvalds's avatar
      Merge tag 'soc-drivers-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc · 2184dbcd
      Linus Torvalds authored
      Pull ARM SoC driver updates from Arnd Bergmann:
       "This is the usual mix of updates for drivers that are used on (mostly
        ARM) SoCs with no other top-level subsystem tree, including:
      
         - The SCMI firmware subsystem gains support for version 3.2 of the
           specification and updates to the notification code
      
         - Feature updates for Tegra and Qualcomm platforms for added hardware
           support
      
         - A number of platforms get soc_device additions for identifying
           newly added chips from Renesas, Qualcomm, Mediatek and Google
      
         - Trivial improvements for firmware and memory drivers amongst
           others, in particular 'const' annotations throughout multiple
           subsystems"
      
      * tag 'soc-drivers-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (96 commits)
        tee: make tee_bus_type const
        soc: qcom: aoss: add missing kerneldoc for qmp members
        soc: qcom: geni-se: drop unused kerneldoc struct geni_wrapper param
        soc: qcom: spm: fix building with CONFIG_REGULATOR=n
        bus: ti-sysc: constify the struct device_type usage
        memory: stm32-fmc2-ebi: keep power domain on
        memory: stm32-fmc2-ebi: add MP25 RIF support
        memory: stm32-fmc2-ebi: add MP25 support
        memory: stm32-fmc2-ebi: check regmap_read return value
        dt-bindings: memory-controller: st,stm32: add MP25 support
        dt-bindings: bus: imx-weim: convert to YAML
        watchdog: s3c2410_wdt: use exynos_get_pmu_regmap_by_phandle() for PMU regs
        soc: samsung: exynos-pmu: Add regmap support for SoCs that protect PMU regs
        MAINTAINERS: Update SCMI entry with HWMON driver
        MAINTAINERS: samsung: gs101: match patches touching Google Tensor SoC
        memory: tegra: Fix indentation
        memory: tegra: Add BPMP and ICC info for DLA clients
        memory: tegra: Correct DLA client names
        dt-bindings: memory: renesas,rpc-if: Document R-Car V4M support
        firmware: arm_scmi: Update the supported clock protocol version
        ...
      2184dbcd
    • Linus Torvalds's avatar
      Merge tag 'soc-dt-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc · 306bee64
      Linus Torvalds authored
      Pull SoC device tree updates from Arnd Bergmann:
       "There is very little going on with new SoC support this time, all the
        new chips are variations of others that we already support, and they
        are all based on ARMv8 cores:
      
         - Mediatek MT7981B (Filogic 820) and MT7988A (Filogic 880) are
           networking SoCs designed to be used in wireless routers, similar to
           the already supported MT7986A (Filogic 830).
      
         - NXP i.MX8DXP is a variant of i.MX8QXP, with two CPU cores less.
           These are used in many embedded and industrial applications.
      
         - Renesas R8A779G2 (R-Car V4H ES2.0) and R8A779H0 (R-Car V4M) are
           automotive SoCs.
      
         - TI J722S is another automotive variant of its K3 family, related to
           the AM62 series.
      
        There are a total of 7 new arm32 machines and 45 arm64 ones, including
      
         - Two Android phones based on the old Tegra30 chip
      
         - Two machines using Cortex-A53 SoCs from Allwinner, a mini PC and a
           SoM development board
      
         - A set-top box using Amlogic Meson G12A S905X2
      
         - Eight embedded board using NXP i.MX6/8/9
      
         - Three machines using Mediatek network router chips
      
         - Ten Chromebooks, all based on Mediatek MT8186
      
         - One development board based on Mediatek MT8395 (Genio 1200)
      
         - Seven tablets and phones based on Qualcomm SoCs, most of them from
           Samsung.
      
         - A third development board for Qualcomm SM8550 (Snapdragon 8 Gen 2)
      
         - Three variants of the "White Hawk" board for Renesas automotive
           SoCs
      
         - Ten Rockchips RK35xx based machines, including NAS, Tablet, Game
           console and industrial form factors.
      
         - Three evaluation boards for TI K3 based SoCs
      
        The other changes are mainly the usual feature additions for existing
        hardware, cleanups, and dtc compile time fixes. One notable change is
        the inclusion of PowerVR SGX GPU nodes on TI SoCs"
      
      * tag 'soc-dt-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (824 commits)
        riscv: dts: Move BUILTIN_DTB_SOURCE to common Kconfig
        riscv: dts: starfive: jh7100: fix root clock names
        ARM: dts: samsung: exynos4412: decrease memory to account for unusable region
        arm64: dts: qcom: sm8250-xiaomi-elish: set rotation
        arm64: dts: qcom: sm8650: Fix SPMI channels size
        arm64: dts: qcom: sm8550: Fix SPMI channels size
        arm64: dts: rockchip: Fix name for UART pin header on qnap-ts433
        arm: dts: marvell: clearfog-gtr-l8: align port numbers with enclosure
        arm: dts: marvell: clearfog-gtr-l8: add support for second sfp connector
        dt-bindings: soc: renesas: renesas-soc: Add pattern for gray-hawk
        dtc: Enable dtc interrupt_provider check
        arm64: dts: st: add video encoder support to stm32mp255
        arm64: dts: st: add video decoder support to stm32mp255
        ARM: dts: stm32: enable crypto accelerator on stm32mp135f-dk
        ARM: dts: stm32: enable CRC on stm32mp135f-dk
        ARM: dts: stm32: add CRC on stm32mp131
        ARM: dts: add stm32f769-disco-mb1166-reva09
        ARM: dts: stm32: add display support on stm32f769-disco
        ARM: dts: stm32: rename mmc_vcard to vcc-3v3 on stm32f769-disco
        ARM: dts: stm32: add DSI support on stm32f769
        ...
      306bee64
    • Linus Torvalds's avatar
      Merge tag 'm68k-for-v6.9-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k · 508f34f2
      Linus Torvalds authored
      Pull m68k updates from Geert Uytterhoeven:
      
       - Make the Zorro bus type constant
      
       - defconfig updates
      
      * tag 'm68k-for-v6.9-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
        m68k: defconfig: Update defconfigs for v6.8-rc1
        zorro: Make zorro_bus_type const
      508f34f2
    • Linus Torvalds's avatar
      Merge tag 's390-6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · 691632f0
      Linus Torvalds authored
      Pull s390 updates from Heiko Carstens:
      
       - Various virtual vs physical address usage fixes
      
       - Fix error handling in Processor Activity Instrumentation device
         driver, and export number of counters with a sysfs file
      
       - Allow for multiple events when Processor Activity Instrumentation
         counters are monitored in system wide sampling
      
       - Change multiplier and shift values of the Time-of-Day clock source to
         improve steering precision
      
       - Remove a couple of unneeded GFP_DMA flags from allocations
      
       - Disable mmap alignment if randomize_va_space is also disabled, to
         avoid a too small heap
      
       - Various changes to allow s390 to be compiled with LLVM=1, since
         ld.lld and llvm-objcopy will have proper s390 support witch clang 19
      
       - Add __uninitialized macro to Compiler Attributes. This is helpful
         with s390's FPU code where some users have up to 520 byte stack
         frames. Clearing such stack frames (if INIT_STACK_ALL_PATTERN or
         INIT_STACK_ALL_ZERO is enabled) before they are used contradicts the
         intention (performance improvement) of such code sections.
      
       - Convert switch_to() to an out-of-line function, and use the generic
         switch_to header file
      
       - Replace the usage of s390's debug feature with pr_debug() calls
         within the zcrypt device driver
      
       - Improve hotplug support of the Adjunct Processor device driver
      
       - Improve retry handling in the zcrypt device driver
      
       - Various changes to the in-kernel FPU code:
      
           - Make in-kernel FPU sections preemptible
      
           - Convert various larger inline assemblies and assembler files to
             C, mainly by using singe instruction inline assemblies. This
             increases readability, but also allows makes it easier to add
             proper instrumentation hooks
      
           - Cleanup of the header files
      
       - Provide fast variants of csum_partial() and
         csum_partial_copy_nocheck() based on vector instructions
      
       - Introduce and use a lock to synchronize accesses to zpci device data
         structures to avoid inconsistent states caused by concurrent accesses
      
       - Compile the kernel without -fPIE. This addresses the following
         problems if the kernel is compiled with -fPIE:
      
           - It uses dynamic symbols (.dynsym), for which the linker refuses
             to allow more than 64k sections. This can break features which
             use '-ffunction-sections' and '-fdata-sections', including
             kpatch-build and function granular KASLR
      
           - It unnecessarily uses GOT relocations, adding an extra layer of
             indirection for many memory accesses
      
       - Fix shared_cpu_list for CPU private L2 caches, which incorrectly were
         reported as globally shared
      
      * tag 's390-6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (117 commits)
        s390/tools: handle rela R_390_GOTPCDBL/R_390_GOTOFF64
        s390/cache: prevent rebuild of shared_cpu_list
        s390/crypto: remove retry loop with sleep from PAES pkey invocation
        s390/pkey: improve pkey retry behavior
        s390/zcrypt: improve zcrypt retry behavior
        s390/zcrypt: introduce retries on in-kernel send CPRB functions
        s390/ap: introduce mutex to lock the AP bus scan
        s390/ap: rework ap_scan_bus() to return true on config change
        s390/ap: clarify AP scan bus related functions and variables
        s390/ap: rearm APQNs bindings complete completion
        s390/configs: increase number of LOCKDEP_BITS
        s390/vfio-ap: handle hardware checkstop state on queue reset operation
        s390/pai: change sampling event assignment for PMU device driver
        s390/boot: fix minor comment style damages
        s390/boot: do not check for zero-termination relocation entry
        s390/boot: make type of __vmlinux_relocs_64_start|end consistent
        s390/boot: sanitize kaslr_adjust_relocs() function prototype
        s390/boot: simplify GOT handling
        s390: vmlinux.lds.S: fix .got.plt assertion
        s390/boot: workaround current 'llvm-objdump -t -j ...' behavior
        ...
      691632f0
    • Linus Torvalds's avatar
      Merge tag 'x86-boot-2024-03-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · b29f3771
      Linus Torvalds authored
      Pull x86 boot updates from Ingo Molnar:
      
       - Continuing work by Ard Biesheuvel to improve the x86 early startup
         code, with the long-term goal to make it position independent:
      
            - Get rid of early accesses to global objects, either by moving
              them to the stack, deferring the access until later, or dropping
              the globals entirely
      
            - Move all code that runs early via the 1:1 mapping into
              .head.text, and move code that does not out of it, so that build
              time checks can be added later to ensure that no inadvertent
              absolute references were emitted into code that does not
              tolerate them
      
            - Remove fixup_pointer() and occurrences of __pa_symbol(), which
              rely on the compiler emitting absolute references, which is not
              guaranteed
      
       - Improve the early console code
      
       - Add early console message about ignored NMIs, so that users are at
         least warned about their existence - even if we cannot do anything
         about them
      
       - Improve the kexec code's kernel load address handling
      
       - Enable more X86S (simplified x86) bits
      
       - Simplify early boot GDT handling
      
       - Micro-optimize the boot code a bit
      
       - Misc cleanups
      
      * tag 'x86-boot-2024-03-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
        x86/sev: Move early startup code into .head.text section
        x86/sme: Move early SME kernel encryption handling into .head.text
        x86/boot: Move mem_encrypt= parsing to the decompressor
        efi/libstub: Add generic support for parsing mem_encrypt=
        x86/startup_64: Simplify virtual switch on primary boot
        x86/startup_64: Simplify calculation of initial page table address
        x86/startup_64: Defer assignment of 5-level paging global variables
        x86/startup_64: Simplify CR4 handling in startup code
        x86/boot: Use 32-bit XOR to clear registers
        efi/x86: Set the PE/COFF header's NX compat flag unconditionally
        x86/boot/64: Load the final kernel GDT during early boot directly, remove startup_gdt[]
        x86/boot/64: Use RIP_REL_REF() to access early_top_pgt[]
        x86/boot/64: Use RIP_REL_REF() to access early page tables
        x86/boot/64: Use RIP_REL_REF() to access '__supported_pte_mask'
        x86/boot/64: Use RIP_REL_REF() to access early_dynamic_pgts[]
        x86/boot/64: Use RIP_REL_REF() to assign 'phys_base'
        x86/boot/64: Simplify global variable accesses in GDT/IDT programming
        x86/trampoline: Bypass compat mode in trampoline_start64() if not needed
        kexec: Allocate kernel above bzImage's pref_address
        x86/boot: Add a message about ignored early NMIs
        ...
      b29f3771
    • Linus Torvalds's avatar
      Merge tag 'x86-apic-2024-03-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · e66c58f7
      Linus Torvalds authored
      Pull x86 APIC fixup from Dave Hansen:
       "Revert VERW fixed addressing patch.
      
        The reverted commit is not x86/apic material and was cruft left over
        from a merge.
      
        I believe the sequence of events went something like this:
      
         - The commit in question was added to x86/urgent
      
         - x86/urgent was merged into x86/apic to resolve a conflict
      
         - The commit was zapped from x86/urgent, but *not* from x86/apic
      
         - x86/apic got pullled (yesterday)
      
        I think we need to be a bit more vigilant when zapping things to make
        sure none of the other branches are depending on the zapped material"
      
      * tag 'x86-apic-2024-03-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        Revert "x86/bugs: Use fixed addressing for VERW operand"
      e66c58f7
    • Linus Torvalds's avatar
      Merge tag 'rfds-for-linus-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 0e33cf95
      Linus Torvalds authored
      Pull x86 RFDS mitigation from Dave Hansen:
       "RFDS is a CPU vulnerability that may allow a malicious userspace to
        infer stale register values from kernel space. Kernel registers can
        have all kinds of secrets in them so the mitigation is basically to
        wait until the kernel is about to return to userspace and has user
        values in the registers. At that point there is little chance of
        kernel secrets ending up in the registers and the microarchitectural
        state can be cleared.
      
        This leverages some recent robustness fixes for the existing MDS
        vulnerability. Both MDS and RFDS use the VERW instruction for
        mitigation"
      
      * tag 'rfds-for-linus-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        KVM/x86: Export RFDS_NO and RFDS_CLEAR to guests
        x86/rfds: Mitigate Register File Data Sampling (RFDS)
        Documentation/hw-vuln: Add documentation for RFDS
        x86/mmio: Disable KVM mitigation when X86_FEATURE_CLEAR_CPU_BUF is set
      0e33cf95
    • Dave Hansen's avatar
      Revert "x86/bugs: Use fixed addressing for VERW operand" · 532a0c57
      Dave Hansen authored
      This was reverts commit 8009479e.
      
      It was originally in x86/urgent, but was deemed wrong so got zapped.
      But in the meantime, x86/urgent had been merged into x86/apic to
      resolve a conflict.  I didn't notice the merge so didn't zap it
      from x86/apic and it managed to make it up with the x86/apic
      material.
      
      The reverted commit is known to cause some KASAN problems.
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      532a0c57
    • Ingo Molnar's avatar
      Merge branch 'linus' into x86/boot, to resolve conflict · 2e2bc42c
      Ingo Molnar authored
      There's a new conflict with Linus's upstream tree, because
      in the following merge conflict resolution in <asm/coco.h>:
      
        38b334fc Merge tag 'x86_sev_for_v6.9_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
      
      Linus has resolved the conflicting placement of 'cc_mask' better
      than the original commit:
      
        1c811d40 x86/sev: Fix position dependent variable references in startup code
      
      ... which was also done by an internal merge resolution:
      
        2e5fc478 Merge branch 'x86/sev' into x86/boot, to resolve conflicts and to pick up dependent tree
      
      But Linus is right in 38b334fc, the 'cc_mask' declaration is sufficient
      within the #ifdef CONFIG_ARCH_HAS_CC_PLATFORM block.
      
      So instead of forcing Linus to do the same resolution again, merge in Linus's
      tree and follow his conflict resolution.
      
       Conflicts:
      	arch/x86/include/asm/coco.h
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      2e2bc42c
    • Linus Torvalds's avatar
      Merge tag 'x86_tdx_for_6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 855684c7
      Linus Torvalds authored
      Pull x86 tdx update from Dave Hansen:
      
       - Fix sparse warning from TDX use of movdir64b()
      
      * tag 'x86_tdx_for_6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/asm: Remove the __iomem annotation of movdir64b()'s dst argument
      855684c7
    • Linus Torvalds's avatar
      Merge tag 'x86_mm_for_6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 555b6841
      Linus Torvalds authored
      Pull x86 mm updates from Dave Hansen:
      
       - Add a warning when memory encryption conversions fail. These
         operations require VMM cooperation, even in CoCo environments where
         the VMM is untrusted. While it's _possible_ that memory pressure
         could trigger the new warning, the odds are that a guest would only
         see this from an attacking VMM.
      
       - Simplify page fault code by re-enabling interrupts unconditionally
      
       - Avoid truncation issues when pfns are passed in to pfn_to_kaddr()
         with small (<64-bit) types.
      
      * tag 'x86_mm_for_6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/mm/cpa: Warn for set_memory_XXcrypted() VMM fails
        x86/mm: Get rid of conditional IF flag handling in page fault path
        x86/mm: Ensure input to pfn_to_kaddr() is treated as a 64-bit type
      555b6841
    • Linus Torvalds's avatar
      Merge tag 'x86-core-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 685d9821
      Linus Torvalds authored
      Pull core x86 updates from Ingo Molnar:
      
       - The biggest change is the rework of the percpu code, to support the
         'Named Address Spaces' GCC feature, by Uros Bizjak:
      
            - This allows C code to access GS and FS segment relative memory
              via variables declared with such attributes, which allows the
              compiler to better optimize those accesses than the previous
              inline assembly code.
      
            - The series also includes a number of micro-optimizations for
              various percpu access methods, plus a number of cleanups of %gs
              accesses in assembly code.
      
            - These changes have been exposed to linux-next testing for the
              last ~5 months, with no known regressions in this area.
      
       - Fix/clean up __switch_to()'s broken but accidentally working handling
         of FPU switching - which also generates better code
      
       - Propagate more RIP-relative addressing in assembly code, to generate
         slightly better code
      
       - Rework the CPU mitigations Kconfig space to be less idiosyncratic, to
         make it easier for distros to follow & maintain these options
      
       - Rework the x86 idle code to cure RCU violations and to clean up the
         logic
      
       - Clean up the vDSO Makefile logic
      
       - Misc cleanups and fixes
      
      * tag 'x86-core-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits)
        x86/idle: Select idle routine only once
        x86/idle: Let prefer_mwait_c1_over_halt() return bool
        x86/idle: Cleanup idle_setup()
        x86/idle: Clean up idle selection
        x86/idle: Sanitize X86_BUG_AMD_E400 handling
        sched/idle: Conditionally handle tick broadcast in default_idle_call()
        x86: Increase brk randomness entropy for 64-bit systems
        x86/vdso: Move vDSO to mmap region
        x86/vdso/kbuild: Group non-standard build attributes and primary object file rules together
        x86/vdso: Fix rethunk patching for vdso-image-{32,64}.o
        x86/retpoline: Ensure default return thunk isn't used at runtime
        x86/vdso: Use CONFIG_COMPAT_32 to specify vdso32
        x86/vdso: Use $(addprefix ) instead of $(foreach )
        x86/vdso: Simplify obj-y addition
        x86/vdso: Consolidate targets and clean-files
        x86/bugs: Rename CONFIG_RETHUNK              => CONFIG_MITIGATION_RETHUNK
        x86/bugs: Rename CONFIG_CPU_SRSO             => CONFIG_MITIGATION_SRSO
        x86/bugs: Rename CONFIG_CPU_IBRS_ENTRY       => CONFIG_MITIGATION_IBRS_ENTRY
        x86/bugs: Rename CONFIG_CPU_UNRET_ENTRY      => CONFIG_MITIGATION_UNRET_ENTRY
        x86/bugs: Rename CONFIG_SLS                  => CONFIG_MITIGATION_SLS
        ...
      685d9821
    • Linus Torvalds's avatar
      Merge tag 'x86-cleanups-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · fcc19657
      Linus Torvalds authored
      Pull x86 cleanups from Ingo Molnar:
       "Misc cleanups, including a large series from Thomas Gleixner to cure
        sparse warnings"
      
      * tag 'x86-cleanups-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/nmi: Drop unused declaration of proc_nmi_enabled()
        x86/callthunks: Use EXPORT_PER_CPU_SYMBOL_GPL() for per CPU variables
        x86/cpu: Provide a declaration for itlb_multihit_kvm_mitigation
        x86/cpu: Use EXPORT_PER_CPU_SYMBOL_GPL() for x86_spec_ctrl_current
        x86/uaccess: Add missing __force to casts in __access_ok() and valid_user_address()
        x86/percpu: Cure per CPU madness on UP
        smp: Consolidate smp_prepare_boot_cpu()
        x86/msr: Add missing __percpu annotations
        x86/msr: Prepare for including <linux/percpu.h> into <asm/msr.h>
        perf/x86/amd/uncore: Fix __percpu annotation
        x86/nmi: Remove an unnecessary IS_ENABLED(CONFIG_SMP)
        x86/apm_32: Remove dead function apm_get_battery_status()
        x86/insn-eval: Fix function param name in get_eff_addr_sib()
      fcc19657
    • Linus Torvalds's avatar
      Merge tag 'x86-build-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · d69ad12c
      Linus Torvalds authored
      Pull x86 build updates from Ingo Molnar:
      
       - Reduce <asm/bootparam.h> dependencies
      
       - Simplify <asm/efi.h>
      
       - Unify *_setup_data definitions into <asm/setup_data.h>
      
       - Reduce the size of <asm/bootparam.h>
      
      * tag 'x86-build-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86: Do not include <asm/bootparam.h> in several files
        x86/efi: Implement arch_ima_efi_boot_mode() in source file
        x86/setup: Move internal setup_data structures into setup_data.h
        x86/setup: Move UAPI setup structures into setup_data.h
      d69ad12c
    • Linus Torvalds's avatar
      Merge tag 'x86-asm-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 73f0d1d7
      Linus Torvalds authored
      Pull x86 asm updates from Ingo Molnar:
       "Two changes to simplify the x86 decoder logic a bit"
      
      * tag 'x86-asm-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/insn: Directly assign x86_64 state in insn_init()
        x86/insn: Remove superfluous checks from instruction decoding routines
      73f0d1d7
    • Linus Torvalds's avatar
      Merge tag 'sched-core-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 89c572e2
      Linus Torvalds authored
      Pull scheduler updates from Ingo Molnar:
      
       - Fix inconsistency in misfit task load-balancing
      
       - Fix CPU isolation bugs in the task-wakeup logic
      
       - Rework and unify the sched_use_asym_prio() and sched_asym_prefer()
         logic
      
       - Clean up and simplify ->avg_* accesses
      
       - Misc cleanups and fixes
      
      * tag 'sched-core-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/topology: Rename SD_SHARE_PKG_RESOURCES to SD_SHARE_LLC
        sched/fair: Check the SD_ASYM_PACKING flag in sched_use_asym_prio()
        sched/fair: Rework sched_use_asym_prio() and sched_asym_prefer()
        sched/fair: Remove unused parameter from sched_asym()
        sched/topology: Remove duplicate descriptions from TOPOLOGY_SD_FLAGS
        sched/fair: Simplify the update_sd_pick_busiest() logic
        sched/fair: Do strict inequality check for busiest misfit task group
        sched/fair: Remove unnecessary goto in update_sd_lb_stats()
        sched/fair: Take the scheduling domain into account in select_idle_core()
        sched/fair: Take the scheduling domain into account in select_idle_smt()
        sched/fair: Add READ_ONCE() and use existing helper function to access ->avg_irq
        sched/fair: Use existing helper functions to access ->avg_rt and ->avg_dl
        sched/core: Simplify code by removing duplicate #ifdefs
      89c572e2
    • Linus Torvalds's avatar
      Merge tag 'locking-core-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · a5b1a017
      Linus Torvalds authored
      Pull locking updates from Ingo Molnar:
      
       - Micro-optimize local_xchg() and the rtmutex code on x86
      
       - Fix percpu-rwsem contention tracepoints
      
       - Simplify debugging Kconfig dependencies
      
       - Update/clarify the documentation of atomic primitives
      
       - Misc cleanups
      
      * tag 'locking-core-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        locking/rtmutex: Use try_cmpxchg_relaxed() in mark_rt_mutex_waiters()
        locking/x86: Implement local_xchg() using CMPXCHG without the LOCK prefix
        locking/percpu-rwsem: Trigger contention tracepoints only if contended
        locking/rwsem: Make DEBUG_RWSEMS and PREEMPT_RT mutually exclusive
        locking/rwsem: Clarify that RWSEM_READER_OWNED is just a hint
        locking/mutex: Simplify <linux/mutex.h>
        locking/qspinlock: Fix 'wait_early' set but not used warning
        locking/atomic: scripts: Clarify ordering of conditional atomics
      a5b1a017
    • Linus Torvalds's avatar
      Merge tag 'edac_updates_for_v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras · b0402403
      Linus Torvalds authored
      Pull EDAC updates from Borislav Petkov:
      
       - Add a FRU (Field Replaceable Unit) memory poison manager which
         collects and manages previously encountered hw errors in order to
         save them to persistent storage across reboots. Previously recorded
         errors are "replayed" upon reboot in order to poison memory which has
         caused said errors in the past.
      
         The main use case is stacked, on-chip memory which cannot simply be
         replaced so poisoning faulty areas of it and thus making them
         inaccessible is the only strategy to prolong its lifetime.
      
       - Add an AMD address translation library glue which converts the
         reported addresses of hw errors into system physical addresses in
         order to be used by other subsystems like memory failure, for
         example. Add support for MI300 accelerators to that library.
      
       - igen6: Add support for Alder Lake-N SoC
      
       - i10nm: Add Grand Ridge support
      
       - The usual fixlets and cleanups
      
      * tag 'edac_updates_for_v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
        EDAC/versal: Convert to platform remove callback returning void
        RAS/AMD/FMPM: Fix off by one when unwinding on error
        RAS/AMD/FMPM: Add debugfs interface to print record entries
        RAS/AMD/FMPM: Save SPA values
        RAS: Export helper to get ras_debugfs_dir
        RAS/AMD/ATL: Fix bit overflow in denorm_addr_df4_np2()
        RAS: Introduce a FRU memory poison manager
        RAS/AMD/ATL: Add MI300 row retirement support
        Documentation: Move RAS section to admin-guide
        EDAC/versal: Make the bit position of injected errors configurable
        EDAC/i10nm: Add Intel Grand Ridge micro-server support
        EDAC/igen6: Add one more Intel Alder Lake-N SoC support
        RAS/AMD/ATL: Add MI300 DRAM to normalized address translation support
        RAS/AMD/ATL: Fix array overflow in get_logical_coh_st_fabric_id_mi300()
        RAS/AMD/ATL: Add MI300 support
        Documentation: RAS: Add index and address translation section
        EDAC/amd64: Use new AMD Address Translation Library
        RAS: Introduce AMD Address Translation Library
        EDAC/synopsys: Convert to devm_platform_ioremap_resource()
      b0402403
    • Linus Torvalds's avatar
      Merge tag 'x86_misc_for_v6.9_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 1f75619a
      Linus Torvalds authored
      Pull misc x86 fixes from Borislav Petkov:
      
       - Fix a wrong check in the function reporting whether a CPU executes
         (or not) a NMI handler
      
       - Ratelimit unknown NMIs messages in order to not potentially slow down
         the machine
      
       - Other fixlets
      
      * tag 'x86_misc_for_v6.9_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/nmi: Fix the inverse "in NMI handler" check
        Documentation/maintainer-tip: Add C++ tail comments exception
        Documentation/maintainer-tip: Add Closes tag
        x86/nmi: Rate limit unknown NMI messages
        Documentation/kernel-parameters: Add spec_rstack_overflow to mitigations=off
      1f75619a
    • Linus Torvalds's avatar
      Merge tag 'x86_sev_for_v6.9_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 38b334fc
      Linus Torvalds authored
      Pull x86 SEV updates from Borislav Petkov:
      
       - Add the x86 part of the SEV-SNP host support.
      
         This will allow the kernel to be used as a KVM hypervisor capable of
         running SNP (Secure Nested Paging) guests. Roughly speaking, SEV-SNP
         is the ultimate goal of the AMD confidential computing side,
         providing the most comprehensive confidential computing environment
         up to date.
      
         This is the x86 part and there is a KVM part which did not get ready
         in time for the merge window so latter will be forthcoming in the
         next cycle.
      
       - Rework the early code's position-dependent SEV variable references in
         order to allow building the kernel with clang and -fPIE/-fPIC and
         -mcmodel=kernel
      
       - The usual set of fixes, cleanups and improvements all over the place
      
      * tag 'x86_sev_for_v6.9_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
        x86/sev: Disable KMSAN for memory encryption TUs
        x86/sev: Dump SEV_STATUS
        crypto: ccp - Have it depend on AMD_IOMMU
        iommu/amd: Fix failure return from snp_lookup_rmpentry()
        x86/sev: Fix position dependent variable references in startup code
        crypto: ccp: Make snp_range_list static
        x86/Kconfig: Remove CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT
        Documentation: virt: Fix up pre-formatted text block for SEV ioctls
        crypto: ccp: Add the SNP_SET_CONFIG command
        crypto: ccp: Add the SNP_COMMIT command
        crypto: ccp: Add the SNP_PLATFORM_STATUS command
        x86/cpufeatures: Enable/unmask SEV-SNP CPU feature
        KVM: SEV: Make AVIC backing, VMSA and VMCB memory allocation SNP safe
        crypto: ccp: Add panic notifier for SEV/SNP firmware shutdown on kdump
        iommu/amd: Clean up RMP entries for IOMMU pages during SNP shutdown
        crypto: ccp: Handle legacy SEV commands when SNP is enabled
        crypto: ccp: Handle non-volatile INIT_EX data when SNP is enabled
        crypto: ccp: Handle the legacy TMR allocation when SNP is enabled
        x86/sev: Introduce an SNP leaked pages list
        crypto: ccp: Provide an API to issue SEV and SNP commands
        ...
      38b334fc
    • Linus Torvalds's avatar
      Merge tag 'x86_cache_for_v6.9_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 2edfd104
      Linus Torvalds authored
      Pull resource control updates from Borislav Petkov:
      
       - Rework different aspects of the resctrl code like adding
         arch-specific accessors and splitting the locking, in order to
         accomodate ARM's MPAM implementation of hw resource control and be
         able to use the same filesystem control interface like on x86. Work
         by James Morse
      
       - Improve the memory bandwidth throttling heuristic to handle workloads
         with not too regular load levels which end up penalized unnecessarily
      
       - Use CPUID to detect the memory bandwidth enforcement limit on AMD
      
       - The usual set of fixes
      
      * tag 'x86_cache_for_v6.9_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits)
        x86/resctrl: Remove lockdep annotation that triggers false positive
        x86/resctrl: Separate arch and fs resctrl locks
        x86/resctrl: Move domain helper migration into resctrl_offline_cpu()
        x86/resctrl: Add CPU offline callback for resctrl work
        x86/resctrl: Allow overflow/limbo handlers to be scheduled on any-but CPU
        x86/resctrl: Add CPU online callback for resctrl work
        x86/resctrl: Add helpers for system wide mon/alloc capable
        x86/resctrl: Make rdt_enable_key the arch's decision to switch
        x86/resctrl: Move alloc/mon static keys into helpers
        x86/resctrl: Make resctrl_mounted checks explicit
        x86/resctrl: Allow arch to allocate memory needed in resctrl_arch_rmid_read()
        x86/resctrl: Allow resctrl_arch_rmid_read() to sleep
        x86/resctrl: Queue mon_event_read() instead of sending an IPI
        x86/resctrl: Add cpumask_any_housekeeping() for limbo/overflow
        x86/resctrl: Move CLOSID/RMID matching and setting to use helpers
        x86/resctrl: Allocate the cleanest CLOSID by searching closid_num_dirty_rmid
        x86/resctrl: Use __set_bit()/__clear_bit() instead of open coding
        x86/resctrl: Track the number of dirty RMID a CLOSID has
        x86/resctrl: Allow RMID allocation to be scoped by CLOSID
        x86/resctrl: Access per-rmid structures by index
        ...
      2edfd104
    • Linus Torvalds's avatar
      Merge tag 'x86_mtrr_for_v6.9_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · bfdb395a
      Linus Torvalds authored
      Pull x86 MTRR update from Borislav Petkov:
      
       - Relax the PAT MSR programming which was unnecessarily using the MTRR
         programming protocol of disabling the cache around the changes. The
         reason behind this is the current algorithm triggering a #VE
         exception for TDX guests and unnecessarily complicating things
      
      * tag 'x86_mtrr_for_v6.9_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/pat: Simplify the PAT programming protocol
      bfdb395a
    • Linus Torvalds's avatar
      Merge tag 'x86_cpu_for_v6.9_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 742582ac
      Linus Torvalds authored
      Pull x86 cpu update from Borislav Petkov:
      
       - Have AMD Zen common init code run on all families from Zen1 onwards
         in order to save some future enablement effort
      
      * tag 'x86_cpu_for_v6.9_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/CPU/AMD: Do the common init on future Zens too
      742582ac
    • Linus Torvalds's avatar
      Merge tag 'ras_core_for_v6.9_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · d8941ce5
      Linus Torvalds authored
      Pull RAS fixlet from Borislav Petkov:
      
       - Constify yet another static struct bus_type instance now that the
         driver core can handle that
      
      * tag 'ras_core_for_v6.9_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/mce: Make mce_subsys const
      d8941ce5
    • Linus Torvalds's avatar
      Revert "dm: use queue_limits_set" · bff4b746
      Linus Torvalds authored
      This reverts commit 8e0ef412.
      
      It's broken, and causes the boot to fail on encrypted volumes.
      Reported-and-bisected-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Link: https://lore.kernel.org/all/20240311235023.GA1205@cmpxchg.org/Acked-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bff4b746
  2. 11 Mar, 2024 10 commits
    • Linus Torvalds's avatar
      Merge tag 'x86-entry-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 86833aec
      Linus Torvalds authored
      Pull x86 entry update from Thomas Gleixner:
       "A single update for the x86 entry code:
      
        The current CR3 handling for kernel page table isolation in the
        paranoid return paths which are relevant for #NMI, #MCE, #VC, #DB and
        #DF is unconditionally writing CR3 with the value retrieved on
        exception entry.
      
        In the vast majority of cases when returning to the kernel this is a
        pointless exercise because CR3 was not modified on exception entry.
        The only situation where this is necessary is when the exception
        interrupts a entry from user before switching to kernel CR3 or
        interrupts an exit to user after switching back to user CR3.
      
        As CR3 writes can be expensive on some systems this becomes measurable
        overhead with high frequency #NMIs such as perf.
      
        Avoid this overhead by checking the CR3 value, which was saved on
        entry, and write it back to CR3 only when it is a user CR3"
      
      * tag 'x86-entry-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/entry: Avoid redundant CR3 write on paranoid returns
      86833aec
    • Linus Torvalds's avatar
      Merge tag 'x86-fred-2024-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 720c8579
      Linus Torvalds authored
      Pull x86 FRED support from Thomas Gleixner:
       "Support for x86 Fast Return and Event Delivery (FRED).
      
        FRED is a replacement for IDT event delivery on x86 and addresses most
        of the technical nightmares which IDT exposes:
      
         1) Exception cause registers like CR2 need to be manually preserved
            in nested exception scenarios.
      
         2) Hardware interrupt stack switching is suboptimal for nested
            exceptions as the interrupt stack mechanism rewinds the stack on
            each entry which requires a massive effort in the low level entry
            of #NMI code to handle this.
      
         3) No hardware distinction between entry from kernel or from user
            which makes establishing kernel context more complex than it needs
            to be especially for unconditionally nestable exceptions like NMI.
      
         4) NMI nesting caused by IRET unconditionally reenabling NMIs, which
            is a problem when the perf NMI takes a fault when collecting a
            stack trace.
      
         5) Partial restore of ESP when returning to a 16-bit segment
      
         6) Limitation of the vector space which can cause vector exhaustion
            on large systems.
      
         7) Inability to differentiate NMI sources
      
        FRED addresses these shortcomings by:
      
         1) An extended exception stack frame which the CPU uses to save
            exception cause registers. This ensures that the meta information
            for each exception is preserved on stack and avoids the extra
            complexity of preserving it in software.
      
         2) Hardware interrupt stack switching is non-rewinding if a nested
            exception uses the currently interrupt stack.
      
         3) The entry points for kernel and user context are separate and GS
            BASE handling which is required to establish kernel context for
            per CPU variable access is done in hardware.
      
         4) NMIs are now nesting protected. They are only reenabled on the
            return from NMI.
      
         5) FRED guarantees full restore of ESP
      
         6) FRED does not put a limitation on the vector space by design
            because it uses a central entry points for kernel and user space
            and the CPUstores the entry type (exception, trap, interrupt,
            syscall) on the entry stack along with the vector number. The
            entry code has to demultiplex this information, but this removes
            the vector space restriction.
      
            The first hardware implementations will still have the current
            restricted vector space because lifting this limitation requires
            further changes to the local APIC.
      
         7) FRED stores the vector number and meta information on stack which
            allows having more than one NMI vector in future hardware when the
            required local APIC changes are in place.
      
        The series implements the initial FRED support by:
      
         - Reworking the existing entry and IDT handling infrastructure to
           accomodate for the alternative entry mechanism.
      
         - Expanding the stack frame to accomodate for the extra 16 bytes FRED
           requires to store context and meta information
      
         - Providing FRED specific C entry points for events which have
           information pushed to the extended stack frame, e.g. #PF and #DB.
      
         - Providing FRED specific C entry points for #NMI and #MCE
      
         - Implementing the FRED specific ASM entry points and the C code to
           demultiplex the events
      
         - Providing detection and initialization mechanisms and the necessary
           tweaks in context switching, GS BASE handling etc.
      
        The FRED integration aims for maximum code reuse vs the existing IDT
        implementation to the extent possible and the deviation in hot paths
        like context switching are handled with alternatives to minimalize the
        impact. The low level entry and exit paths are seperate due to the
        extended stack frame and the hardware based GS BASE swichting and
        therefore have no impact on IDT based systems.
      
        It has been extensively tested on existing systems and on the FRED
        simulation and as of now there are no outstanding problems"
      
      * tag 'x86-fred-2024-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits)
        x86/fred: Fix init_task thread stack pointer initialization
        MAINTAINERS: Add a maintainer entry for FRED
        x86/fred: Fix a build warning with allmodconfig due to 'inline' failing to inline properly
        x86/fred: Invoke FRED initialization code to enable FRED
        x86/fred: Add FRED initialization functions
        x86/syscall: Split IDT syscall setup code into idt_syscall_init()
        KVM: VMX: Call fred_entry_from_kvm() for IRQ/NMI handling
        x86/entry: Add fred_entry_from_kvm() for VMX to handle IRQ/NMI
        x86/entry/calling: Allow PUSH_AND_CLEAR_REGS being used beyond actual entry code
        x86/fred: Fixup fault on ERETU by jumping to fred_entrypoint_user
        x86/fred: Let ret_from_fork_asm() jmp to asm_fred_exit_user when FRED is enabled
        x86/traps: Add sysvec_install() to install a system interrupt handler
        x86/fred: FRED entry/exit and dispatch code
        x86/fred: Add a machine check entry stub for FRED
        x86/fred: Add a NMI entry stub for FRED
        x86/fred: Add a debug fault entry stub for FRED
        x86/idtentry: Incorporate definitions/declarations of the FRED entries
        x86/fred: Make exc_page_fault() work for FRED
        x86/fred: Allow single-step trap and NMI when starting a new task
        x86/fred: No ESPFIX needed when FRED is enabled
        ...
      720c8579
    • Linus Torvalds's avatar
      Merge tag 'x86-apic-2024-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · ca7e9177
      Linus Torvalds authored
      Pull x86 APIC updates from Thomas Gleixner:
       "Rework of APIC enumeration and topology evaluation.
      
        The current implementation has a couple of shortcomings:
      
         - It fails to handle hybrid systems correctly.
      
         - The APIC registration code which handles CPU number assignents is
           in the middle of the APIC code and detached from the topology
           evaluation.
      
         - The various mechanisms which enumerate APICs, ACPI, MPPARSE and
           guest specific ones, tweak global variables as they see fit or in
           case of XENPV just hack around the generic mechanisms completely.
      
         - The CPUID topology evaluation code is sprinkled all over the vendor
           code and reevaluates global variables on every hotplug operation.
      
         - There is no way to analyze topology on the boot CPU before bringing
           up the APs. This causes problems for infrastructure like PERF which
           needs to size certain aspects upfront or could be simplified if
           that would be possible.
      
         - The APIC admission and CPU number association logic is
           incomprehensible and overly complex and needs to be kept around
           after boot instead of completing this right after the APIC
           enumeration.
      
        This update addresses these shortcomings with the following changes:
      
         - Rework the CPUID evaluation code so it is common for all vendors
           and provides information about the APIC ID segments in a uniform
           way independent of the number of segments (Thread, Core, Module,
           ..., Die, Package) so that this information can be computed instead
           of rewriting global variables of dubious value over and over.
      
         - A few cleanups and simplifcations of the APIC, IO/APIC and related
           interfaces to prepare for the topology evaluation changes.
      
         - Seperation of the parser stages so the early evaluation which tries
           to find the APIC address can be seperately overridden from the late
           evaluation which enumerates and registers the local APIC as further
           preparation for sanitizing the topology evaluation.
      
         - A new registration and admission logic which
      
             - encapsulates the inner workings so that parsers and guest logic
               cannot longer fiddle in it
      
             - uses the APIC ID segments to build topology bitmaps at
               registration time
      
             - provides a sane admission logic
      
             - allows to detect the crash kernel case, where CPU0 does not run
               on the real BSP, automatically. This is required to prevent
               sending INIT/SIPI sequences to the real BSP which would reset
               the whole machine. This was so far handled by a tedious command
               line parameter, which does not even work in nested crash
               scenarios.
      
             - Associates CPU number after the enumeration completed and
               prevents the late registration of APICs, which was somehow
               tolerated before.
      
         - Converting all parsers and guest enumeration mechanisms over to the
           new interfaces.
      
           This allows to get rid of all global variable tweaking from the
           parsers and enumeration mechanisms and sanitizes the XEN[PV]
           handling so it can use CPUID evaluation for the first time.
      
         - Mopping up existing sins by taking the information from the APIC ID
           segment bitmaps.
      
           This evaluates hybrid systems correctly on the boot CPU and allows
           for cleanups and fixes in the related drivers, e.g. PERF.
      
        The series has been extensively tested and the minimal late fallout
        due to a broken ACPI/MADT table has been addressed by tightening the
        admission logic further"
      
      * tag 'x86-apic-2024-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (76 commits)
        x86/topology: Ignore non-present APIC IDs in a present package
        x86/apic: Build the x86 topology enumeration functions on UP APIC builds too
        smp: Provide 'setup_max_cpus' definition on UP too
        smp: Avoid 'setup_max_cpus' namespace collision/shadowing
        x86/bugs: Use fixed addressing for VERW operand
        x86/cpu/topology: Get rid of cpuinfo::x86_max_cores
        x86/cpu/topology: Provide __num_[cores|threads]_per_package
        x86/cpu/topology: Rename topology_max_die_per_package()
        x86/cpu/topology: Rename smp_num_siblings
        x86/cpu/topology: Retrieve cores per package from topology bitmaps
        x86/cpu/topology: Use topology logical mapping mechanism
        x86/cpu/topology: Provide logical pkg/die mapping
        x86/cpu/topology: Simplify cpu_mark_primary_thread()
        x86/cpu/topology: Mop up primary thread mask handling
        x86/cpu/topology: Use topology bitmaps for sizing
        x86/cpu/topology: Let XEN/PV use topology from CPUID/MADT
        x86/xen/smp_pv: Count number of vCPUs early
        x86/cpu/topology: Assign hotpluggable CPUIDs during init
        x86/cpu/topology: Reject unknown APIC IDs on ACPI hotplug
        x86/topology: Add a mechanism to track topology via APIC IDs
        ...
      ca7e9177
    • Linus Torvalds's avatar
      Merge tag 'timers-core-2024-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · d08c407f
      Linus Torvalds authored
      Pull timer updates from Thomas Gleixner:
       "A large set of updates and features for timers and timekeeping:
      
         - The hierarchical timer pull model
      
           When timer wheel timers are armed they are placed into the timer
           wheel of a CPU which is likely to be busy at the time of expiry.
           This is done to avoid wakeups on potentially idle CPUs.
      
           This is wrong in several aspects:
      
             1) The heuristics to select the target CPU are wrong by
                definition as the chance to get the prediction right is
                close to zero.
      
             2) Due to #1 it is possible that timers are accumulated on
                a single target CPU
      
             3) The required computation in the enqueue path is just overhead
                for dubious value especially under the consideration that the
                vast majority of timer wheel timers are either canceled or
                rearmed before they expire.
      
           The timer pull model avoids the above by removing the target
           computation on enqueue and queueing timers always on the CPU on
           which they get armed.
      
           This is achieved by having separate wheels for CPU pinned timers
           and global timers which do not care about where they expire.
      
           As long as a CPU is busy it handles both the pinned and the global
           timers which are queued on the CPU local timer wheels.
      
           When a CPU goes idle it evaluates its own timer wheels:
      
             - If the first expiring timer is a pinned timer, then the global
               timers can be ignored as the CPU will wake up before they
               expire.
      
             - If the first expiring timer is a global timer, then the expiry
               time is propagated into the timer pull hierarchy and the CPU
               makes sure to wake up for the first pinned timer.
      
           The timer pull hierarchy organizes CPUs in groups of eight at the
           lowest level and at the next levels groups of eight groups up to
           the point where no further aggregation of groups is required, i.e.
           the number of levels is log8(NR_CPUS). The magic number of eight
           has been established by experimention, but can be adjusted if
           needed.
      
           In each group one busy CPU acts as the migrator. It's only one CPU
           to avoid lock contention on remote timer wheels.
      
           The migrator CPU checks in its own timer wheel handling whether
           there are other CPUs in the group which have gone idle and have
           global timers to expire. If there are global timers to expire, the
           migrator locks the remote CPU timer wheel and handles the expiry.
      
           Depending on the group level in the hierarchy this handling can
           require to walk the hierarchy downwards to the CPU level.
      
           Special care is taken when the last CPU goes idle. At this point
           the CPU is the systemwide migrator at the top of the hierarchy and
           it therefore cannot delegate to the hierarchy. It needs to arm its
           own timer device to expire either at the first expiring timer in
           the hierarchy or at the first CPU local timer, which ever expires
           first.
      
           This completely removes the overhead from the enqueue path, which
           is e.g. for networking a true hotpath and trades it for a slightly
           more complex idle path.
      
           This has been in development for a couple of years and the final
           series has been extensively tested by various teams from silicon
           vendors and ran through extensive CI.
      
           There have been slight performance improvements observed on network
           centric workloads and an Intel team confirmed that this allows them
           to power down a die completely on a mult-die socket for the first
           time in a mostly idle scenario.
      
           There is only one outstanding ~1.5% regression on a specific
           overloaded netperf test which is currently investigated, but the
           rest is either positive or neutral performance wise and positive on
           the power management side.
      
         - Fixes for the timekeeping interpolation code for cross-timestamps:
      
           cross-timestamps are used for PTP to get snapshots from hardware
           timers and interpolated them back to clock MONOTONIC. The changes
           address a few corner cases in the interpolation code which got the
           math and logic wrong.
      
         - Simplifcation of the clocksource watchdog retry logic to
           automatically adjust to handle larger systems correctly instead of
           having more incomprehensible command line parameters.
      
         - Treewide consolidation of the VDSO data structures.
      
         - The usual small improvements and cleanups all over the place"
      
      * tag 'timers-core-2024-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (62 commits)
        timer/migration: Fix quick check reporting late expiry
        tick/sched: Fix build failure for CONFIG_NO_HZ_COMMON=n
        vdso/datapage: Quick fix - use asm/page-def.h for ARM64
        timers: Assert no next dyntick timer look-up while CPU is offline
        tick: Assume timekeeping is correctly handed over upon last offline idle call
        tick: Shut down low-res tick from dying CPU
        tick: Split nohz and highres features from nohz_mode
        tick: Move individual bit features to debuggable mask accesses
        tick: Move got_idle_tick away from common flags
        tick: Assume the tick can't be stopped in NOHZ_MODE_INACTIVE mode
        tick: Move broadcast cancellation up to CPUHP_AP_TICK_DYING
        tick: Move tick cancellation up to CPUHP_AP_TICK_DYING
        tick: Start centralizing tick related CPU hotplug operations
        tick/sched: Don't clear ts::next_tick again in can_stop_idle_tick()
        tick/sched: Rename tick_nohz_stop_sched_tick() to tick_nohz_full_stop_tick()
        tick: Use IS_ENABLED() whenever possible
        tick/sched: Remove useless oneshot ifdeffery
        tick/nohz: Remove duplicate between lowres and highres handlers
        tick/nohz: Remove duplicate between tick_nohz_switch_to_nohz() and tick_setup_sched_timer()
        hrtimer: Select housekeeping CPU during migration
        ...
      d08c407f
    • Linus Torvalds's avatar
      Merge tag 'timers-ptp-2024-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 80a76c60
      Linus Torvalds authored
      Pull clocksource updates from Thomas Gleixner:
       "Updates for timekeeping and PTP core.
      
        The cross-timestamp mechanism which allows to correlate hardware
        clocks uses clocksource pointers for describing the correlation.
      
        That's suboptimal as drivers need to obtain the pointer, which
        requires needless exports and exposing internals. This can all be
        completely avoided by assigning clocksource IDs and using them for
        describing the correlated clock source.
      
        So this adds clocksource IDs to all clocksources in the tree which can
        be exposed to this mechanism and removes the pointer and now needless
        exports.
      
        A related improvement for the core and the correlation handling has
        not made it this time, but is expected to get ready for the next
        round"
      
      * tag 'timers-ptp-2024-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        kvmclock: Unexport kvmclock clocksource
        treewide: Remove system_counterval_t.cs, which is never read
        timekeeping: Evaluate system_counterval_t.cs_id instead of .cs
        ptp/kvm, arm_arch_timer: Set system_counterval_t.cs_id to constant
        x86/kvm, ptp/kvm: Add clocksource ID, set system_counterval_t.cs_id
        x86/tsc: Add clocksource ID, set system_counterval_t.cs_id
        timekeeping: Add clocksource ID to struct system_counterval_t
        x86/tsc: Correct kernel-doc notation
      80a76c60
    • Linus Torvalds's avatar
      Merge tag 'smp-core-2024-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 397935e3
      Linus Torvalds authored
      Pull cpu core updates from Thomas Gleixner:
       "A small boring set of cleanups for the SMP and CPU hotplug code"
      
      * tag 'smp-core-2024-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        cpu: Remove stray semicolon
        smp: Make __smp_processor_id() 0-argument macro
        cpu: Mark cpu_possible_mask as __ro_after_init
        kernel/cpu: Convert snprintf() to sysfs_emit()
        cpu/hotplug: Delete an extraneous kernel-doc description
      397935e3
    • Linus Torvalds's avatar
      Merge tag 'irq-msi-2024-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 4527e837
      Linus Torvalds authored
      Pull MSI updates from Thomas Gleixner:
       "Updates for the MSI interrupt subsystem and initial RISC-V MSI
        support.
      
        The core changes have been adopted from previous work which converted
        ARM[64] to the new per device MSI domain model, which was merged to
        support multiple MSI domain per device. The ARM[64] changes are being
        worked on too, but have not been ready yet. The core and platform-MSI
        changes have been split out to not hold up RISC-V and to avoid that
        RISC-V builds on the scheduled for removal interfaces.
      
        The core support provides new interfaces to handle wire to MSI bridges
        in a straight forward way and introduces new platform-MSI interfaces
        which are built on top of the per device MSI domain model.
      
        Once ARM[64] is converted over the old platform-MSI interfaces and the
        related ugliness in the MSI core code will be removed.
      
        The actual MSI parts for RISC-V were finalized late and have been
        post-poned for the next merge window.
      
        Drivers:
      
         - Add a new driver for the Andes hart-level interrupt controller
      
         - Rework the SiFive PLIC driver to prepare for MSI suport
      
         - Expand the RISC-V INTC driver to support the new RISC-V AIA
           controller which provides the basis for MSI on RISC-V
      
         - A few fixup for the fallout of the core changes"
      
      * tag 'irq-msi-2024-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (29 commits)
        irqchip/riscv-intc: Fix low-level interrupt handler setup for AIA
        x86/apic/msi: Use DOMAIN_BUS_GENERIC_MSI for HPET/IO-APIC domain search
        genirq/matrix: Dynamic bitmap allocation
        irqchip/riscv-intc: Add support for RISC-V AIA
        irqchip/sifive-plic: Improve locking safety by using irqsave/irqrestore
        irqchip/sifive-plic: Parse number of interrupts and contexts early in plic_probe()
        irqchip/sifive-plic: Cleanup PLIC contexts upon irqdomain creation failure
        irqchip/sifive-plic: Use riscv_get_intc_hwnode() to get parent fwnode
        irqchip/sifive-plic: Use devm_xyz() for managed allocation
        irqchip/sifive-plic: Use dev_xyz() in-place of pr_xyz()
        irqchip/sifive-plic: Convert PLIC driver into a platform driver
        irqchip/riscv-intc: Introduce Andes hart-level interrupt controller
        irqchip/riscv-intc: Allow large non-standard interrupt number
        genirq/irqdomain: Don't call ops->select for DOMAIN_BUS_ANY tokens
        irqchip/imx-intmux: Handle pure domain searches correctly
        genirq/msi: Provide MSI_FLAG_PARENT_PM_DEV
        genirq/irqdomain: Reroute device MSI create_mapping
        genirq/msi: Provide allocation/free functions for "wired" MSI interrupts
        genirq/msi: Optionally use dev->fwnode for device domain
        genirq/msi: Provide DOMAIN_BUS_WIRED_TO_MSI
        ...
      4527e837
    • Linus Torvalds's avatar
      Merge tag 'irq-core-2024-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 02d4df78
      Linus Torvalds authored
      Pull irq updates from Thomas Gleixner:
       "Core:
      
         - Make affinity changes take effect immediately for interrupt
           threads. This reduces the impact on isolated CPUs as it pulls over
           the thread right away instead of doing it after the next hardware
           interrupt arrived.
      
         - Cleanup and improvements for the interrupt chip simulator
      
         - Deduplication of the interrupt descriptor initialization code so
           the sparse and non-sparse mode share more code.
      
        Drivers:
      
         - A set of conversions to platform_drivers::remove_new() which gets
           rid of the pointless return value.
      
         - A new driver for the Starfive JH8100 SoC
      
         - Support for Amlogic-T7 SoCs
      
         - Improvement for the interrupt handling and EOI management for the
           loongson interrupt controller.
      
         - The usual fixes and improvements all over the place"
      
      * tag 'irq-core-2024-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
        irqchip/ts4800: Convert to platform_driver::remove_new() callback
        irqchip/stm32-exti: Convert to platform_driver::remove_new() callback
        irqchip/renesas-rza1: Convert to platform_driver::remove_new() callback
        irqchip/renesas-irqc: Convert to platform_driver::remove_new() callback
        irqchip/renesas-intc-irqpin: Convert to platform_driver::remove_new() callback
        irqchip/pruss-intc: Convert to platform_driver::remove_new() callback
        irqchip/mvebu-pic: Convert to platform_driver::remove_new() callback
        irqchip/madera: Convert to platform_driver::remove_new() callback
        irqchip/ls-scfg-msi: Convert to platform_driver::remove_new() callback
        irqchip/keystone: Convert to platform_driver::remove_new() callback
        irqchip/imx-irqsteer: Convert to platform_driver::remove_new() callback
        irqchip/imx-intmux: Convert to platform_driver::remove_new() callback
        irqchip/imgpdc: Convert to platform_driver::remove_new() callback
        irqchip: Add StarFive external interrupt controller
        dt-bindings: interrupt-controller: Add starfive,jh8100-intc
        arm64: dts: Add gpio_intc node for Amlogic-T7 SoCs
        irqchip/meson-gpio: Add support for Amlogic-T7 SoCs
        dt-bindings: interrupt-controller: Add support for Amlogic-T7 SoCs
        irqchip/vic: Fix a kernel-doc warning
        genirq: Wake interrupt threads immediately when changing affinity
        ...
      02d4df78
    • Pawan Gupta's avatar
      KVM/x86: Export RFDS_NO and RFDS_CLEAR to guests · 2a018012
      Pawan Gupta authored
      Mitigation for RFDS requires RFDS_CLEAR capability which is enumerated
      by MSR_IA32_ARCH_CAPABILITIES bit 27. If the host has it set, export it
      to guests so that they can deploy the mitigation.
      
      RFDS_NO indicates that the system is not vulnerable to RFDS, export it
      to guests so that they don't deploy the mitigation unnecessarily. When
      the host is not affected by X86_BUG_RFDS, but has RFDS_NO=0, synthesize
      RFDS_NO to the guest.
      Signed-off-by: default avatarPawan Gupta <pawan.kumar.gupta@linux.intel.com>
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarJosh Poimboeuf <jpoimboe@kernel.org>
      2a018012
    • Pawan Gupta's avatar
      x86/rfds: Mitigate Register File Data Sampling (RFDS) · 8076fcde
      Pawan Gupta authored
      RFDS is a CPU vulnerability that may allow userspace to infer kernel
      stale data previously used in floating point registers, vector registers
      and integer registers. RFDS only affects certain Intel Atom processors.
      
      Intel released a microcode update that uses VERW instruction to clear
      the affected CPU buffers. Unlike MDS, none of the affected cores support
      SMT.
      
      Add RFDS bug infrastructure and enable the VERW based mitigation by
      default, that clears the affected buffers just before exiting to
      userspace. Also add sysfs reporting and cmdline parameter
      "reg_file_data_sampling" to control the mitigation.
      
      For details see:
      Documentation/admin-guide/hw-vuln/reg-file-data-sampling.rst
      Signed-off-by: default avatarPawan Gupta <pawan.kumar.gupta@linux.intel.com>
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarJosh Poimboeuf <jpoimboe@kernel.org>
      8076fcde