1. 31 Oct, 2023 19 commits
    • Linus Torvalds's avatar
      Merge tag 'hardening-v6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · befaa609
      Linus Torvalds authored
      Pull hardening updates from Kees Cook:
       "One of the more voluminous set of changes is for adding the new
        __counted_by annotation[1] to gain run-time bounds checking of
        dynamically sized arrays with UBSan.
      
         - Add LKDTM test for stuck CPUs (Mark Rutland)
      
         - Improve LKDTM selftest behavior under UBSan (Ricardo Cañuelo)
      
         - Refactor more 1-element arrays into flexible arrays (Gustavo A. R.
           Silva)
      
         - Analyze and replace strlcpy and strncpy uses (Justin Stitt, Azeem
           Shaikh)
      
         - Convert group_info.usage to refcount_t (Elena Reshetova)
      
         - Add __counted_by annotations (Kees Cook, Gustavo A. R. Silva)
      
         - Add Kconfig fragment for basic hardening options (Kees Cook, Lukas
           Bulwahn)
      
         - Fix randstruct GCC plugin performance mode to stay in groups (Kees
           Cook)
      
         - Fix strtomem() compile-time check for small sources (Kees Cook)"
      
      * tag 'hardening-v6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (56 commits)
        hwmon: (acpi_power_meter) replace open-coded kmemdup_nul
        reset: Annotate struct reset_control_array with __counted_by
        kexec: Annotate struct crash_mem with __counted_by
        virtio_console: Annotate struct port_buffer with __counted_by
        ima: Add __counted_by for struct modsig and use struct_size()
        MAINTAINERS: Include stackleak paths in hardening entry
        string: Adjust strtomem() logic to allow for smaller sources
        hardening: x86: drop reference to removed config AMD_IOMMU_V2
        randstruct: Fix gcc-plugin performance mode to stay in group
        mailbox: zynqmp: Annotate struct zynqmp_ipi_pdata with __counted_by
        drivers: thermal: tsens: Annotate struct tsens_priv with __counted_by
        irqchip/imx-intmux: Annotate struct intmux_data with __counted_by
        KVM: Annotate struct kvm_irq_routing_table with __counted_by
        virt: acrn: Annotate struct vm_memory_region_batch with __counted_by
        hwmon: Annotate struct gsc_hwmon_platform_data with __counted_by
        sparc: Annotate struct cpuinfo_tree with __counted_by
        isdn: kcapi: replace deprecated strncpy with strscpy_pad
        isdn: replace deprecated strncpy with strscpy
        NFS/flexfiles: Annotate struct nfs4_ff_layout_segment with __counted_by
        nfs41: Annotate struct nfs4_file_layout_dsaddr with __counted_by
        ...
      befaa609
    • Linus Torvalds's avatar
      Merge tag 'slab-for-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab · fdce8bd3
      Linus Torvalds authored
      Pull slab updates from Vlastimil Babka:
      
       - SLUB: slab order calculation refactoring (Vlastimil Babka, Feng Tang)
      
         Recent proposals to tune the slab order calculations have prompted us
         to look at the current code and refactor it to make it easier to
         follow and eliminate some odd corner cases.
      
         The refactoring is mostly non-functional changes, but should make the
         actual tuning easier to implement and review.
      
      * tag 'slab-for-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
        mm/slub: refactor calculate_order() and calc_slab_order()
        mm/slub: attempt to find layouts up to 1/2 waste in calculate_order()
        mm/slub: remove min_objects loop from calculate_order()
        mm/slub: simplify the last resort slab order calculation
        mm/slub: add sanity check for slub_min/max_order cmdline setup
      fdce8bd3
    • Linus Torvalds's avatar
      Merge tag 'rcu-next-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks · 2656821f
      Linus Torvalds authored
      Pull RCU updates from Frederic Weisbecker:
      
       - RCU torture, locktorture and generic torture infrastructure updates
         that include various fixes, cleanups and consolidations.
      
         Among the user visible things, ftrace dumps can now be found into
         their own file, and module parameters get better documented and
         reported on dumps.
      
       - Generic and misc fixes all over the place. Some highlights:
      
           * Hotplug handling has seen some light cleanups and comments
      
           * An RCU barrier can now be triggered through sysfs to serialize
             memory stress testing and avoid OOM
      
           * Object information is now dumped in case of invalid callback
             invocation
      
           * Also various SRCU issues, too hard to trigger to deserve urgent
             pull requests, have been fixed
      
       - RCU documentation updates
      
       - RCU reference scalability test minor fixes and doc improvements.
      
       - RCU tasks minor fixes
      
       - Stall detection updates. Introduce RCU CPU Stall notifiers that
         allows a subsystem to provide informations to help debugging. Also
         cure some false positive stalls.
      
      * tag 'rcu-next-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks: (56 commits)
        srcu: Only accelerate on enqueue time
        locktorture: Check the correct variable for allocation failure
        srcu: Fix callbacks acceleration mishandling
        rcu: Comment why callbacks migration can't wait for CPUHP_RCUTREE_PREP
        rcu: Standardize explicit CPU-hotplug calls
        rcu: Conditionally build CPU-hotplug teardown callbacks
        rcu: Remove references to rcu_migrate_callbacks() from diagrams
        rcu: Assume rcu_report_dead() is always called locally
        rcu: Assume IRQS disabled from rcu_report_dead()
        rcu: Use rcu_segcblist_segempty() instead of open coding it
        rcu: kmemleak: Ignore kmemleak false positives when RCU-freeing objects
        srcu: Fix srcu_struct node grpmask overflow on 64-bit systems
        torture: Convert parse-console.sh to mktemp
        rcutorture: Traverse possible cpu to set maxcpu in rcu_nocb_toggle()
        rcutorture: Replace schedule_timeout*() 1-jiffy waits with HZ/20
        torture: Add kvm.sh --debug-info argument
        locktorture: Rename readers_bind/writers_bind to bind_readers/bind_writers
        doc: Catch-up update for locktorture module parameters
        locktorture: Add call_rcu_chains module parameter
        locktorture: Add new module parameters to lock_torture_print_module_parms()
        ...
      2656821f
    • Linus Torvalds's avatar
      Merge tag 'csd-lock.2023.10.23a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu · 9a0f53e0
      Linus Torvalds authored
      Pull CSD lock update from Paul McKenney:
       "This adds a kernel boot parameter that causes the kernel to panic if
        one of the call_smp_function() APIs is stalled for more than the
        specified duration.
      
        This is useful in deployments in which a clean panic is preferable to
        an indefinite stall"
      
      * tag 'csd-lock.2023.10.23a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
        smp,csd: Throw an error if a CSD lock is stuck for too long
      9a0f53e0
    • Linus Torvalds's avatar
      Merge tag 'lkmm.2023.10.28a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu · 6750f0de
      Linus Torvalds authored
      Pull Linux Kernel Memory Model updates from Paul McKenney:
       "This update adds paragraphs to the portions of memory-barriers.txt
        that have been marked historical due to changes in the way that the
        Linux kernel handles DEC Alpha. These paragraphs includes information
        on where to find the corresponding up-to-date information"
      
      * tag 'lkmm.2023.10.28a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
        docs: memory-barriers: Add note on compiler transformation and address deps
      6750f0de
    • Linus Torvalds's avatar
      Merge tag 'nolibc.2023.10.23a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu · c9049984
      Linus Torvalds authored
      Pull nolibc updates from Paul McKenney:
      
       - Add stdarg.h header and a few additional system-call upgrades
      
       - Add support for constructors and destructors
      
       - Add tests to verify the ability to link multiple .o files against
         nolibc
      
       - Numerous string-function optimizations and improvements
      
       - Prevent redundant kernel relinks by avoiding embedding of initramfs
         into the kernel image
      
       - Allow building i386 with multiarch compiler and make ppc64le use
         qemu-system-ppc64
      
       - Miscellaneous fixups, including addition of -nostdinc for
         nolibc-test, avoiding -Wstringop-overflow warnings, and avoiding
         unused parameter warnings for ENOSYS fallbacks
      
      * tag 'nolibc.2023.10.23a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
        selftests/nolibc: add tests for multi-object linkage
        selftests/nolibc: use qemu-system-ppc64 for ppc64le
        tools/nolibc: add support for constructors and destructors
        tools/nolibc: drop test for getauxval(AT_PAGESZ)
        tools/nolibc: automatically detect necessity to use pselect6
        tools/nolibc: don't define new syscall number
        tools/nolibc: avoid unused parameter warnings for ENOSYS fallbacks
        selftests/nolibc: allow building i386 with multiarch compiler
        selftests/nolibc: don't embed initramfs into kernel image
        selftests/nolibc: libc-test: avoid -Wstringop-overflow warnings
        tools/nolibc: string: Remove the `_nolibc_memcpy_up()` function
        tools/nolibc: string: Remove the `_nolibc_memcpy_down()` function
        tools/nolibc: x86-64: Use `rep stosb` for `memset()`
        tools/nolibc: x86-64: Use `rep movsb` for `memcpy()` and `memmove()`
        selftests/nolibc: use -nostdinc for nolibc-test
        tools/nolibc: add stdarg.h header
      c9049984
    • Linus Torvalds's avatar
      Merge tag 'x86-core-2023-10-29-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · eb55307e
      Linus Torvalds authored
      Pull x86 core updates from Thomas Gleixner:
      
       - Limit the hardcoded topology quirk for Hygon CPUs to those which have
         a model ID less than 4.
      
         The newer models have the topology CPUID leaf 0xB correctly
         implemented and are not affected.
      
       - Make SMT control more robust against enumeration failures
      
         SMT control was added to allow controlling SMT at boottime or
         runtime. The primary purpose was to provide a simple mechanism to
         disable SMT in the light of speculation attack vectors.
      
         It turned out that the code is sensible to enumeration failures and
         worked only by chance for XEN/PV. XEN/PV has no real APIC enumeration
         which means the primary thread mask is not set up correctly. By
         chance a XEN/PV boot ends up with smp_num_siblings == 2, which makes
         the hotplug control stay at its default value "enabled". So the mask
         is never evaluated.
      
         The ongoing rework of the topology evaluation caused XEN/PV to end up
         with smp_num_siblings == 1, which sets the SMT control to "not
         supported" and the empty primary thread mask causes the hotplug core
         to deny the bringup of the APS.
      
         Make the decision logic more robust and take 'not supported' and 'not
         implemented' into account for the decision whether a CPU should be
         booted or not.
      
       - Fake primary thread mask for XEN/PV
      
         Pretend that all XEN/PV vCPUs are primary threads, which makes the
         usage of the primary thread mask valid on XEN/PV. That is consistent
         with because all of the topology information on XEN/PV is fake or
         even non-existent.
      
       - Encapsulate topology information in cpuinfo_x86
      
         Move the randomly scattered topology data into a separate data
         structure for readability and as a preparatory step for the topology
         evaluation overhaul.
      
       - Consolidate APIC ID data type to u32
      
         It's fixed width hardware data and not randomly u16, int, unsigned
         long or whatever developers decided to use.
      
       - Cure the abuse of cpuinfo for persisting logical IDs.
      
         Per CPU cpuinfo is used to persist the logical package and die IDs.
         That's really not the right place simply because cpuinfo is subject
         to be reinitialized when a CPU goes through an offline/online cycle.
      
         Use separate per CPU data for the persisting to enable the further
         topology management rework. It will be removed once the new topology
         management is in place.
      
       - Provide a debug interface for inspecting topology information
      
         Useful in general and extremly helpful for validating the topology
         management rework in terms of correctness or "bug" compatibility.
      
      * tag 'x86-core-2023-10-29-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
        x86/apic, x86/hyperv: Use u32 in hv_snp_boot_ap() too
        x86/cpu: Provide debug interface
        x86/cpu/topology: Cure the abuse of cpuinfo for persisting logical ids
        x86/apic: Use u32 for wakeup_secondary_cpu[_64]()
        x86/apic: Use u32 for [gs]et_apic_id()
        x86/apic: Use u32 for phys_pkg_id()
        x86/apic: Use u32 for cpu_present_to_apicid()
        x86/apic: Use u32 for check_apicid_used()
        x86/apic: Use u32 for APIC IDs in global data
        x86/apic: Use BAD_APICID consistently
        x86/cpu: Move cpu_l[l2]c_id into topology info
        x86/cpu: Move logical package and die IDs into topology info
        x86/cpu: Remove pointless evaluation of x86_coreid_bits
        x86/cpu: Move cu_id into topology info
        x86/cpu: Move cpu_core_id into topology info
        hwmon: (fam15h_power) Use topology_core_id()
        scsi: lpfc: Use topology_core_id()
        x86/cpu: Move cpu_die_id into topology info
        x86/cpu: Move phys_proc_id into topology info
        x86/cpu: Encapsulate topology information in cpuinfo_x86
        ...
      eb55307e
    • Linus Torvalds's avatar
      Merge tag 'x86-apic-2023-10-29-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 943af0e7
      Linus Torvalds authored
      Pull x86 APIC updates from Thomas Gleixner:
      
       - Make the quirk for non-maskable MSI interrupts in the affinity setter
         functional again.
      
         It was broken by a MSI core code update, which restructured the code
         in a way that the quirk flag was not longer set correctly.
      
         Trying to restore the core logic caused a deeper inspection and it
         turned out that the extra quirk flag is not required at all because
         it's the inverse of the reservation mode bit, which only can be set
         when the MSI interrupt is maskable.
      
         So the trivial fix is to use the reservation mode check in the
         affinity setter function and remove almost 40 lines of code related
         to the no-mask quirk flag.
      
       - Cure a Kconfig dependency issue which causes compile failures by
         correcting the conditionals in the affected header files.
      
       - Clean up coding style in the UV APIC driver.
      
      * tag 'x86-apic-2023-10-29-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/apic/msi: Fix misconfigured non-maskable MSI quirk
        x86/msi: Fix compile error caused by CONFIG_GENERIC_MSI_IRQ=y && !CONFIG_X86_LOCAL_APIC
        x86/platform/uv/apic: Clean up inconsistent indenting
      943af0e7
    • Linus Torvalds's avatar
      Merge tag 'timers-core-2023-10-29-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 63a3f119
      Linus Torvalds authored
      Pull timer updates from Thomas Gleixner:
       "Updates for time, timekeeping and timers:
      
        Core:
      
         - Avoid superfluous deactivation of the tick in the low resolution
           tick NOHZ interrupt handler as the deactivation is handled already
           in the idle loop and on interrupt exit.
      
         - Update stale comments in the tick NOHZ code and rename the tick
           handler functions to be self-explanatory.
      
         - Remove an unused function in the tick NOHZ code, which was
           forgotten when the last user went away.
      
         - Handle RTC alarms which exceed the maximum alarm time of the
           underlying RTC hardware gracefully.
      
           Setting RTC alarms which exceed the maximum alarm time of the RTC
           hardware failed so far and caused suspend operations to abort.
      
           Cure this by limiting the alarm to the maximum alarm time of the
           RTC hardware, which is provided by the driver. This causes early
           resume wakeups, but that's way better than not suspending at all.
      
        Drivers:
      
         - Add a proper clocksource/event driver for the ancient Cirrus Logic
           EP93xx SoC family, which is one of the last non device-tree
           holdouts in arch/arm.
      
         - The usual boring device tree bindings updates and small fixes and
           enhancements all over the place"
      
      * tag 'timers-core-2023-10-29-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        clocksource: ep93xx: Add driver for Cirrus Logic EP93xx
        dt-bindings: timers: Add Cirrus EP93xx
        clocksource/drivers/timer-atmel-tcb: Fix initialization on SAM9 hardware
        clocksource/timer-riscv: ACPI: Add timer_cannot_wakeup_cpu
        clocksource/drivers/sun5i: Remove surplus dev_err() when using platform_get_irq()
        drivers/clocksource/timer-ti-dm: Don't call clk_get_rate() in stop function
        clocksource/drivers/timer-imx-gpt: Fix potential memory leak
        dt-bindings: timer: renesas,rz-mtu3: Document RZ/{G2UL,Five} SoCs
        dt-bindings: timer: renesas,rz-mtu3: Improve documentation
        dt-bindings: timer: renesas,rz-mtu3: Fix overflow/underflow interrupt names
        alarmtimer: Use maximum alarm time for suspend
        rtc: Add API function to return alarm time bound by hardware limit
        tick/nohz: Update comments some more
        tick/nohz: Remove unused tick_nohz_idle_stop_tick_protected()
        tick/nohz: Don't shutdown the lowres tick from itself
        tick/nohz: Update obsolete comments
        tick/nohz: Rename the tick handlers to more self-explanatory names
      63a3f119
    • Linus Torvalds's avatar
      Merge tag 'smp-core-2023-10-29-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · c891e98a
      Linus Torvalds authored
      Pull SMP and CPU hotplug updates from Thomas Gleixner:
      
       - Switch the smp_call_function*() @csd argument to call_single_data_t
         type, which is a cache-line aligned typedef of the underlying struct
         __call_single_data.
      
         This ensures that the call data is not crossing a cacheline which
         avoids bouncing an extra cache-line for the SMP function call
      
       - Prevent offlining of the last housekeeping CPU when CPU isolation is
         active.
      
         Offlining the last housekeeping CPU makes no sense in general, but
         also caused the scheduler to panic due to the empty CPU mask when
         rebuilding the scheduler domains.
      
       - Remove an unused CPU hotplug state
      
      * tag 'smp-core-2023-10-29-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        cpu/hotplug: Don't offline the last non-isolated CPU
        cpu/hotplug: Remove unused cpuhp_state CPUHP_AP_X86_VDSO_VMA_ONLINE
        smp: Change function signatures to use call_single_data_t
      c891e98a
    • Linus Torvalds's avatar
      Merge tag 'irq-core-2023-10-29-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · b08eccef
      Linus Torvalds authored
      Pull irq updates from Thomas Gleixner:
       "Core:
      
         - Exclude managed interrupts in the calculation of interrupts which
           are targeted to a CPU which is about to be offlined to ensure that
           there are enough free vectors on the still online CPUs to migrate
           them over.
      
           Managed interrupts do not need to be accounted because they are
           either shut down on offline or migrated to an already reserved and
           guaranteed slot on a still online CPU in the interrupts affinity
           mask.
      
           Including managed interrupts is overaccounting and can result in
           needlessly aborting hibernation on large server machines.
      
         - The usual set of small improvements
      
        Drivers:
      
         - Make the generic interrupt chip implementation handle interrupt
           domains correctly and initialize the name pointers correctly
      
         - Add interrupt affinity setting support to the Renesas RZG2L chip
           driver.
      
         - Prevent registering syscore operations multiple times in the SiFive
           PLIC chip driver.
      
         - Update device tree handling in the NXP Layerscape MSI chip driver"
      
      * tag 'irq-core-2023-10-29-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        irqchip/sifive-plic: Fix syscore registration for multi-socket systems
        irqchip/ls-scfg-msi: Use device_get_match_data()
        genirq/generic_chip: Make irq_remove_generic_chip() irqdomain aware
        genirq/matrix: Exclude managed interrupts in irq_matrix_allocated()
        PCI/MSI: Provide stubs for IMS functions
        irqchip/renesas-rzg2l: Enhance driver to support interrupt affinity setting
        genirq/generic-chip: Fix the irq_chip name for /proc/interrupts
        irqdomain: Annotate struct irq_domain with __counted_by
      b08eccef
    • Linus Torvalds's avatar
      Merge tag 'core-core-2023-10-29-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 9cc6fea1
      Linus Torvalds authored
      Pull core updates from Thomas Gleixner:
       "Two small updates to ptrace_stop():
      
         - Add a comment to explain that the preempt_disable() before
           unlocking tasklist lock is not a correctness problem and just
           avoids the tracer to preempt the tracee before the tracee schedules
           out.
      
         - Make that preempt_disable() conditional on PREEMPT_RT=n.
      
           RT enabled kernels cannot disable preemption at this point because
           cgroup_enter_frozen() and sched_submit_work() acquire spinlocks or
           rwlocks which are substituted by sleeping locks on RT. Acquiring a
           sleeping lock in a preemption disable region is obviously not
           possible.
      
           This obviously brings back the potential slowdown of ptrace() for
           RT enabled kernels, but that's a price to be paid for latency
           guarantees"
      
      * tag 'core-core-2023-10-29-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        signal: Don't disable preemption in ptrace_stop() on PREEMPT_RT
        signal: Add a proper comment about preempt_disable() in ptrace_stop()
      9cc6fea1
    • Linus Torvalds's avatar
      Merge tag 'x86-build-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · ecb8cd2a
      Linus Torvalds authored
      Pull x86 build update from Ingo Molnar:
       "Enable CONFIG_DEBUG_ENTRY=y in the x86 defconfigs"
      
      * tag 'x86-build-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/defconfig: Enable CONFIG_DEBUG_ENTRY=y
      ecb8cd2a
    • Linus Torvalds's avatar
      Merge tag 'x86-mm-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · f0d25b5d
      Linus Torvalds authored
      Pull x86 mm handling updates from Ingo Molnar:
      
       - Add new NX-stack self-test
      
       - Improve NUMA partial-CFMWS handling
      
       - Fix #VC handler bugs resulting in SEV-SNP boot failures
      
       - Drop the 4MB memory size restriction on minimal NUMA nodes
      
       - Reorganize headers a bit, in preparation to header dependency
         reduction efforts
      
       - Misc cleanups & fixes
      
      * tag 'x86-mm-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/mm: Drop the 4 MB restriction on minimal NUMA node memory size
        selftests/x86/lam: Zero out buffer for readlink()
        x86/sev: Drop unneeded #include
        x86/sev: Move sev_setup_arch() to mem_encrypt.c
        x86/tdx: Replace deprecated strncpy() with strtomem_pad()
        selftests/x86/mm: Add new test that userspace stack is in fact NX
        x86/sev: Make boot_ghcb_page[] static
        x86/boot: Move x86_cache_alignment initialization to correct spot
        x86/sev-es: Set x86_virt_bits to the correct value straight away, instead of a two-phase approach
        x86/sev-es: Allow copy_from_kernel_nofault() in earlier boot
        x86_64: Show CR4.PSE on auxiliaries like on BSP
        x86/iommu/docs: Update AMD IOMMU specification document URL
        x86/sev/docs: Update document URL in amd-memory-encryption.rst
        x86/mm: Move arch_memory_failure() and arch_is_platform_page() definitions from <asm/processor.h> to <asm/pgtable.h>
        ACPI/NUMA: Apply SRAT proximity domain to entire CFMWS window
        x86/numa: Introduce numa_fill_memblks()
      f0d25b5d
    • Linus Torvalds's avatar
      Merge tag 'x86-irq-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 1641b9b0
      Linus Torvalds authored
      Pull x86 irq fix from Ingo Molnar:
       "Fix out-of-order NMI nesting checks resulting in false positive
        warnings"
      
      * tag 'x86-irq-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/nmi: Fix out-of-order NMI nesting checks & false positive warning
      1641b9b0
    • Linus Torvalds's avatar
      Merge tag 'x86-entry-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · ed766c26
      Linus Torvalds authored
      Pull x86 entry updates from Ingo Molnar:
      
       - Make IA32_EMULATION boot time configurable with
         the new ia32_emulation=<bool> boot option
      
       - Clean up fast syscall return validation code: convert
         it to C and refactor the code
      
       - As part of this, optimize the canonical RIP test code
      
      * tag 'x86-entry-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/entry/32: Clean up syscall fast exit tests
        x86/entry/64: Use TASK_SIZE_MAX for canonical RIP test
        x86/entry/64: Convert SYSRET validation tests to C
        x86/entry/32: Remove SEP test for SYSEXIT
        x86/entry/32: Convert do_fast_syscall_32() to bool return type
        x86/entry/compat: Combine return value test from syscall handler
        x86/entry/64: Remove obsolete comment on tracing vs. SYSRET
        x86: Make IA32_EMULATION boot time configurable
        x86/entry: Make IA32 syscalls' availability depend on ia32_enabled()
        x86/elf: Make loading of 32bit processes depend on ia32_enabled()
        x86/entry: Compile entry_SYSCALL32_ignore() unconditionally
        x86/entry: Rename ignore_sysret()
        x86: Introduce ia32_enabled()
      ed766c26
    • Linus Torvalds's avatar
      Merge tag 'x86-asm-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 5780e39e
      Linus Torvalds authored
      Pull x86 assembly code updates from Ingo Molnar:
      
       - Micro-optimize the x86 bitops code
      
       - Define target-specific {raw,this}_cpu_try_cmpxchg{64,128}() to
         improve code generation
      
       - Define and use raw_cpu_try_cmpxchg() preempt_count_set()
      
       - Do not clobber %rsi in percpu_{try_,}cmpxchg{64,128}_op
      
       - Remove the unused __sw_hweight64() implementation on x86-32
      
       - Misc fixes and cleanups
      
      * tag 'x86-asm-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/lib: Address kernel-doc warnings
        x86/entry: Fix typos in comments
        x86/entry: Remove unused argument %rsi passed to exc_nmi()
        x86/bitops: Remove unused __sw_hweight64() assembly implementation on x86-32
        x86/percpu: Do not clobber %rsi in percpu_{try_,}cmpxchg{64,128}_op
        x86/percpu: Use raw_cpu_try_cmpxchg() in preempt_count_set()
        x86/percpu: Define raw_cpu_try_cmpxchg and this_cpu_try_cmpxchg()
        x86/percpu: Define {raw,this}_cpu_try_cmpxchg{64,128}
        x86/asm/bitops: Use __builtin_clz{l|ll} to evaluate constant expressions
      5780e39e
    • Linus Torvalds's avatar
      Merge tag 'x86-boot-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 2b95bb05
      Linus Torvalds authored
      Pull x86 boot updates from Ingo Molnar:
      
       - Rework PE header generation, primarily to generate a modern, 4k
         aligned kernel image view with narrower W^X permissions.
      
       - Further refine init-lifetime annotations
      
       - Misc cleanups & fixes
      
      * tag 'x86-boot-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
        x86/boot: efistub: Assign global boot_params variable
        x86/boot: Rename conflicting 'boot_params' pointer to 'boot_params_ptr'
        x86/head/64: Move the __head definition to <asm/init.h>
        x86/head/64: Add missing __head annotation to startup_64_load_idt()
        x86/head/64: Mark 'startup_gdt[]' and 'startup_gdt_descr' as __initdata
        x86/boot: Harmonize the style of array-type parameter for fixup_pointer() calls
        x86/boot: Fix incorrect startup_gdt_descr.size
        x86/boot: Compile boot code with -std=gnu11 too
        x86/boot: Increase section and file alignment to 4k/512
        x86/boot: Split off PE/COFF .data section
        x86/boot: Drop PE/COFF .reloc section
        x86/boot: Construct PE/COFF .text section from assembler
        x86/boot: Derive file size from _edata symbol
        x86/boot: Define setup size in linker script
        x86/boot: Set EFI handover offset directly in header asm
        x86/boot: Grab kernel_info offset from zoffset header directly
        x86/boot: Drop references to startup_64
        x86/boot: Drop redundant code setting the root device
        x86/boot: Omit compression buffer from PE/COFF image memory footprint
        x86/boot: Remove the 'bugger off' message
        ...
      2b95bb05
    • Linus Torvalds's avatar
      Merge tag 'x86-headers-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 3b8b4b4f
      Linus Torvalds authored
      Pull x86 header file cleanup from Ingo Molnar:
       "Replace <asm/export.h> uses with <linux/export.h> and then remove
        <asm/export.h>"
      
      * tag 'x86-headers-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/headers: Remove <asm/export.h>
        x86/headers: Replace #include <asm/export.h> with #include <linux/export.h>
        x86/headers: Remove unnecessary #include <asm/export.h>
      3b8b4b4f
  2. 30 Oct, 2023 21 commits
    • Linus Torvalds's avatar
      Merge tag 'perf-core-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · bceb7acc
      Linus Torvalds authored
      Pull performance event updates from Ingo Molnar:
       - Add AMD Unified Memory Controller (UMC) events introduced with Zen 4
       - Simplify & clean up the uncore management code
       - Fall back from RDPMC to RDMSR on certain uncore PMUs
       - Improve per-package and cstate event reading
       - Extend the Intel ref-cycles event to GP counters
       - Fix Intel MTL event constraints
       - Improve the Intel hybrid CPU handling code
       - Micro-optimize the RAPL code
       - Optimize perf_cgroup_switch()
       - Improve large AUX area error handling
       - Misc fixes and cleanups
      
      * tag 'perf-core-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (26 commits)
        perf/x86/amd/uncore: Pass through error code for initialization failures, instead of -ENODEV
        perf/x86/amd/uncore: Fix uninitialized return value in amd_uncore_init()
        x86/cpu: Fix the AMD Fam 17h, Fam 19h, Zen2 and Zen4 MSR enumerations
        perf: Optimize perf_cgroup_switch()
        perf/x86/amd/uncore: Add memory controller support
        perf/x86/amd/uncore: Add group exclusivity
        perf/x86/amd/uncore: Use rdmsr if rdpmc is unavailable
        perf/x86/amd/uncore: Move discovery and registration
        perf/x86/amd/uncore: Refactor uncore management
        perf/core: Allow reading package events from perf_event_read_local
        perf/x86/cstate: Allow reading the package statistics from local CPU
        perf/x86/intel/pt: Fix kernel-doc comments
        perf/x86/rapl: Annotate 'struct rapl_pmus' with __counted_by
        perf/core: Rename perf_proc_update_handler() -> perf_event_max_sample_rate_handler(), for readability
        perf/x86/rapl: Fix "Using plain integer as NULL pointer" Sparse warning
        perf/x86/rapl: Use local64_try_cmpxchg in rapl_event_update()
        perf/x86/rapl: Stop doing cpu_relax() in the local64_cmpxchg() loop in rapl_event_update()
        perf/core: Bail out early if the request AUX area is out of bound
        perf/x86/intel: Extend the ref-cycles event to GP counters
        perf/x86/intel: Fix broken fixed event constraints extension
        ...
      bceb7acc
    • Linus Torvalds's avatar
      Merge tag 'objtool-core-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · cd063c8b
      Linus Torvalds authored
      Pull objtool updates from Ingo Molnar:
       "Misc fixes and cleanups:
      
         - Fix potential MAX_NAME_LEN limit related build failures
      
         - Fix scripts/faddr2line symbol filtering bug
      
         - Fix scripts/faddr2line on LLVM=1
      
         - Fix scripts/faddr2line to accept readelf output with mapping
           symbols
      
         - Minor cleanups"
      
      * tag 'objtool-core-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        scripts/faddr2line: Skip over mapping symbols in output from readelf
        scripts/faddr2line: Use LLVM addr2line and readelf if LLVM=1
        scripts/faddr2line: Don't filter out non-function symbols from readelf
        objtool: Remove max symbol name length limitation
        objtool: Propagate early errors
        objtool: Use 'the fallthrough' pseudo-keyword
        x86/speculation, objtool: Use absolute relocations for annotations
        x86/unwind/orc: Remove redundant initialization of 'mid' pointer in __orc_find()
      cd063c8b
    • Linus Torvalds's avatar
      Merge tag 'sched-core-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 63ce50ff
      Linus Torvalds authored
      Pull scheduler updates from Ingo Molnar:
       "Fair scheduler (SCHED_OTHER) improvements:
         - Remove the old and now unused SIS_PROP code & option
         - Scan cluster before LLC in the wake-up path
         - Use candidate prev/recent_used CPU if scanning failed for cluster
           wakeup
      
        NUMA scheduling improvements:
         - Improve the VMA access-PID code to better skip/scan VMAs
         - Extend tracing to cover VMA-skipping decisions
         - Improve/fix the recently introduced sched_numa_find_nth_cpu() code
         - Generalize numa_map_to_online_node()
      
        Energy scheduling improvements:
         - Remove the EM_MAX_COMPLEXITY limit
         - Add tracepoints to track energy computation
         - Make the behavior of the 'sched_energy_aware' sysctl more
           consistent
         - Consolidate and clean up access to a CPU's max compute capacity
         - Fix uclamp code corner cases
      
        RT scheduling improvements:
         - Drive dl_rq->overloaded with dl_rq->pushable_dl_tasks updates
         - Drive the ->rto_mask with rt_rq->pushable_tasks updates
      
        Scheduler scalability improvements:
         - Rate-limit updates to tg->load_avg
         - On x86 disable IBRS when CPU is offline to improve single-threaded
           performance
         - Micro-optimize in_task() and in_interrupt()
         - Micro-optimize the PSI code
         - Avoid updating PSI triggers and ->rtpoll_total when there are no
           state changes
      
        Core scheduler infrastructure improvements:
         - Use saved_state to reduce some spurious freezer wakeups
         - Bring in a handful of fast-headers improvements to scheduler
           headers
         - Make the scheduler UAPI headers more widely usable by user-space
         - Simplify the control flow of scheduler syscalls by using lock
           guards
         - Fix sched_setaffinity() vs. CPU hotplug race
      
        Scheduler debuggability improvements:
         - Disallow writing invalid values to sched_rt_period_us
         - Fix a race in the rq-clock debugging code triggering warnings
         - Fix a warning in the bandwidth distribution code
         - Micro-optimize in_atomic_preempt_off() checks
         - Enforce that the tasklist_lock is held in for_each_thread()
         - Print the TGID in sched_show_task()
         - Remove the /proc/sys/kernel/sched_child_runs_first sysctl
      
        ... and misc cleanups & fixes"
      
      * tag 'sched-core-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (82 commits)
        sched/fair: Remove SIS_PROP
        sched/fair: Use candidate prev/recent_used CPU if scanning failed for cluster wakeup
        sched/fair: Scan cluster before scanning LLC in wake-up path
        sched: Add cpus_share_resources API
        sched/core: Fix RQCF_ACT_SKIP leak
        sched/fair: Remove unused 'curr' argument from pick_next_entity()
        sched/nohz: Update comments about NEWILB_KICK
        sched/fair: Remove duplicate #include
        sched/psi: Update poll => rtpoll in relevant comments
        sched: Make PELT acronym definition searchable
        sched: Fix stop_one_cpu_nowait() vs hotplug
        sched/psi: Bail out early from irq time accounting
        sched/topology: Rename 'DIE' domain to 'PKG'
        sched/psi: Delete the 'update_total' function parameter from update_triggers()
        sched/psi: Avoid updating PSI triggers and ->rtpoll_total when there are no state changes
        sched/headers: Remove comment referring to rq::cpu_load, since this has been removed
        sched/numa: Complete scanning of inactive VMAs when there is no alternative
        sched/numa: Complete scanning of partial VMAs regardless of PID activity
        sched/numa: Move up the access pid reset logic
        sched/numa: Trace decisions related to skipping VMAs
        ...
      63ce50ff
    • Linus Torvalds's avatar
      Merge tag 'locking-core-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 3cf3fabc
      Linus Torvalds authored
      Pull locking updates from Info Molnar:
       "Futex improvements:
      
         - Add the 'futex2' syscall ABI, which is an attempt to get away from
           the multiplex syscall and adds a little room for extentions, while
           lifting some limitations.
      
         - Fix futex PI recursive rt_mutex waiter state bug
      
         - Fix inter-process shared futexes on no-MMU systems
      
         - Use folios instead of pages
      
        Micro-optimizations of locking primitives:
      
         - Improve arch_spin_value_unlocked() on asm-generic ticket spinlock
           architectures, to improve lockref code generation
      
         - Improve the x86-32 lockref_get_not_zero() main loop by adding
           build-time CMPXCHG8B support detection for the relevant lockref
           code, and by better interfacing the CMPXCHG8B assembly code with
           the compiler
      
         - Introduce arch_sync_try_cmpxchg() on x86 to improve
           sync_try_cmpxchg() code generation. Convert some sync_cmpxchg()
           users to sync_try_cmpxchg().
      
         - Micro-optimize rcuref_put_slowpath()
      
        Locking debuggability improvements:
      
         - Improve CONFIG_DEBUG_RT_MUTEXES=y to have a fast-path as well
      
         - Enforce atomicity of sched_submit_work(), which is de-facto atomic
           but was un-enforced previously.
      
         - Extend <linux/cleanup.h>'s no_free_ptr() with __must_check
           semantics
      
         - Fix ww_mutex self-tests
      
         - Clean up const-propagation in <linux/seqlock.h> and simplify the
           API-instantiation macros a bit
      
        RT locking improvements:
      
         - Provide the rt_mutex_*_schedule() primitives/helpers and use them
           in the rtmutex code to avoid recursion vs. rtlock on the PI state.
      
         - Add nested blocking lockdep asserts to rt_mutex_lock(),
           rtlock_lock() and rwbase_read_lock()
      
        .. plus misc fixes & cleanups"
      
      * tag 'locking-core-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits)
        futex: Don't include process MM in futex key on no-MMU
        locking/seqlock: Fix grammar in comment
        alpha: Fix up new futex syscall numbers
        locking/seqlock: Propagate 'const' pointers within read-only methods, remove forced type casts
        locking/lockdep: Fix string sizing bug that triggers a format-truncation compiler-warning
        locking/seqlock: Change __seqprop() to return the function pointer
        locking/seqlock: Simplify SEQCOUNT_LOCKNAME()
        locking/atomics: Use atomic_try_cmpxchg_release() to micro-optimize rcuref_put_slowpath()
        locking/atomic, xen: Use sync_try_cmpxchg() instead of sync_cmpxchg()
        locking/atomic/x86: Introduce arch_sync_try_cmpxchg()
        locking/atomic: Add generic support for sync_try_cmpxchg() and its fallback
        locking/seqlock: Fix typo in comment
        futex/requeue: Remove unnecessary ‘NULL’ initialization from futex_proxy_trylock_atomic()
        locking/local, arch: Rewrite local_add_unless() as a static inline function
        locking/debug: Fix debugfs API return value checks to use IS_ERR()
        locking/ww_mutex/test: Make sure we bail out instead of livelock
        locking/ww_mutex/test: Fix potential workqueue corruption
        locking/ww_mutex/test: Use prng instead of rng to avoid hangs at bootup
        futex: Add sys_futex_requeue()
        futex: Add flags2 argument to futex_requeue()
        ...
      3cf3fabc
    • Linus Torvalds's avatar
      Merge tag 'x86_fpu_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 9cda4eb0
      Linus Torvalds authored
      Pull x86 fpu fixlet from Borislav Petkov:
      
       - kernel-doc fix
      
      * tag 'x86_fpu_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/fpu/xstate: Address kernel-doc warning
      9cda4eb0
    • Linus Torvalds's avatar
      Merge tag 'x86_platform_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · f155f3b3
      Linus Torvalds authored
      Pull x86 platform updates from Borislav Petkov:
      
       - Make sure PCI function 4 IDs of AMD family 0x19, models 0x60-0x7f are
         actually used in the amd_nb.c enumeration
      
       - Add support for extracting NUMA information from devicetree for
         Hyper-V usages
      
       - Add PCI device IDs for the new AMD MI300 AI accelerators
      
       - Annotate an array in struct uv_rtc_timer_head with the new
         __counted_by attribute
      
       - Rework UV's NMI action parameter handling
      
      * tag 'x86_platform_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/amd_nb: Use Family 19h Models 60h-7Fh Function 4 IDs
        x86/numa: Add Devicetree support
        x86/of: Move the x86_flattree_get_config() call out of x86_dtb_init()
        x86/amd_nb: Add AMD Family MI300 PCI IDs
        x86/platform/uv: Annotate struct uv_rtc_timer_head with __counted_by
        x86/platform/uv: Rework NMI "action" modparam handling
      f155f3b3
    • Linus Torvalds's avatar
      Merge tag 'x86_cpu_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · ca2e9c3b
      Linus Torvalds authored
      Pull x86 cpuid updates from Borislav Petkov:
      
       - Make sure the "svm" feature flag is cleared from /proc/cpuinfo when
         virtualization support is disabled in the BIOS on AMD and Hygon
         platforms
      
       - A minor cleanup
      
      * tag 'x86_cpu_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/cpu/amd: Remove redundant 'break' statement
        x86/cpu: Clear SVM feature if disabled by BIOS
      ca2e9c3b
    • Linus Torvalds's avatar
      Merge tag 'x86_cache_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 9ab021a1
      Linus Torvalds authored
      Pull x86 resource control updates from Borislav Petkov:
      
       - Add support for non-contiguous capacity bitmasks being added to
         Intel's CAT implementation
      
       - Other improvements to resctrl code: better configuration,
         simplifications, debugging support, fixes
      
      * tag 'x86_cache_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/resctrl: Display RMID of resource group
        x86/resctrl: Add support for the files of MON groups only
        x86/resctrl: Display CLOSID for resource group
        x86/resctrl: Introduce "-o debug" mount option
        x86/resctrl: Move default group file creation to mount
        x86/resctrl: Unwind properly from rdt_enable_ctx()
        x86/resctrl: Rename rftype flags for consistency
        x86/resctrl: Simplify rftype flag definitions
        x86/resctrl: Add multiple tasks to the resctrl group at once
        Documentation/x86: Document resctrl's new sparse_masks
        x86/resctrl: Add sparse_masks file in info
        x86/resctrl: Enable non-contiguous CBMs in Intel CAT
        x86/resctrl: Rename arch_has_sparse_bitmaps
        x86/resctrl: Fix remaining kernel-doc warnings
      9ab021a1
    • Linus Torvalds's avatar
      Merge tag 'x86_bugs_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · f84a52ee
      Linus Torvalds authored
      Pull x86 hw mitigation updates from Borislav Petkov:
      
       - A bunch of improvements, cleanups and fixlets to the SRSO mitigation
         machinery and other, general cleanups to the hw mitigations code, by
         Josh Poimboeuf
      
       - Improve the return thunk detection by objtool as it is absolutely
         important that the default return thunk is not used after returns
         have been patched. Future work to detect and report this better is
         pending
      
       - Other misc cleanups and fixes
      
      * tag 'x86_bugs_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
        x86/retpoline: Document some thunk handling aspects
        x86/retpoline: Make sure there are no unconverted return thunks due to KCSAN
        x86/callthunks: Delete unused "struct thunk_desc"
        x86/vdso: Run objtool on vdso32-setup.o
        objtool: Fix return thunk patching in retpolines
        x86/srso: Remove unnecessary semicolon
        x86/pti: Fix kernel warnings for pti= and nopti cmdline options
        x86/calldepth: Rename __x86_return_skl() to call_depth_return_thunk()
        x86/nospec: Refactor UNTRAIN_RET[_*]
        x86/rethunk: Use SYM_CODE_START[_LOCAL]_NOALIGN macros
        x86/srso: Disentangle rethunk-dependent options
        x86/srso: Move retbleed IBPB check into existing 'has_microcode' code block
        x86/bugs: Remove default case for fully switched enums
        x86/srso: Remove 'pred_cmd' label
        x86/srso: Unexport untraining functions
        x86/srso: Improve i-cache locality for alias mitigation
        x86/srso: Fix unret validation dependencies
        x86/srso: Fix vulnerability reporting for missing microcode
        x86/srso: Print mitigation for retbleed IBPB case
        x86/srso: Print actual mitigation if requested mitigation isn't possible
        ...
      f84a52ee
    • Linus Torvalds's avatar
      Merge tag 'ras_core_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 01ae815c
      Linus Torvalds authored
      Pull x86 RAS updates from Borislav Petkov:
      
       - Specify what error addresses reported on AMD are actually usable
         memory error addresses for further decoding
      
      * tag 'ras_core_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/mce: Cleanup mce_usable_address()
        x86/mce: Define amd_mce_usable_address()
        x86/MCE/AMD: Split amd_mce_is_memory_error()
      01ae815c
    • Linus Torvalds's avatar
      Merge tag 'edac_updates_for_v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras · 66cc8838
      Linus Torvalds authored
      Pull EDAC updates from Borislav Petkov:
      
       - A new EDAC driver for Xilinx's Versal integrated memory controller
      
      * tag 'edac_updates_for_v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
        EDAC/versal: Add a Xilinx Versal memory controller driver
        dt-bindings: memory-controllers: Add support for Xilinx Versal EDAC for DDRMC
      66cc8838
    • Linus Torvalds's avatar
      Merge tag 'bcachefs-2023-10-30' of https://evilpiepirate.org/git/bcachefs · 9e877052
      Linus Torvalds authored
      Pull initial bcachefs updates from Kent Overstreet:
       "Here's the bcachefs filesystem pull request.
      
        One new patch since last week: the exportfs constants ended up
        conflicting with other filesystems that are also getting added to the
        global enum, so switched to new constants picked by Amir.
      
        The only new non fs/bcachefs/ patch is the objtool patch that adds
        bcachefs functions to the list of noreturns. The patch that exports
        osq_lock() has been dropped for now, per Ingo"
      
      * tag 'bcachefs-2023-10-30' of https://evilpiepirate.org/git/bcachefs: (2781 commits)
        exportfs: Change bcachefs fid_type enum to avoid conflicts
        bcachefs: Refactor memcpy into direct assignment
        bcachefs: Fix drop_alloc_keys()
        bcachefs: snapshot_create_lock
        bcachefs: Fix snapshot skiplists during snapshot deletion
        bcachefs: bch2_sb_field_get() refactoring
        bcachefs: KEY_TYPE_error now counts towards i_sectors
        bcachefs: Fix handling of unknown bkey types
        bcachefs: Switch to unsafe_memcpy() in a few places
        bcachefs: Use struct_size()
        bcachefs: Correctly initialize new buckets on device resize
        bcachefs: Fix another smatch complaint
        bcachefs: Use strsep() in split_devs()
        bcachefs: Add iops fields to bch_member
        bcachefs: Rename bch_sb_field_members -> bch_sb_field_members_v1
        bcachefs: New superblock section members_v2
        bcachefs: Add new helper to retrieve bch_member from sb
        bcachefs: bucket_lock() is now a sleepable lock
        bcachefs: fix crc32c checksum merge byte order problem
        bcachefs: Fix bch2_inode_delete_keys()
        ...
      9e877052
    • Linus Torvalds's avatar
      Merge tag 'for-6.7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · d5acbc60
      Linus Torvalds authored
      Pull btrfs updates from David Sterba:
       "New features:
      
         - raid-stripe-tree
      
           New tree for logical file extent mapping where the physical mapping
           may not match on multiple devices. This is now used in zoned mode
           to implement RAID0/RAID1* profiles, but can be used in non-zoned
           mode as well. The support for RAID56 is in development and will
           eventually fix the problems with the current implementation. This
           is a backward incompatible feature and has to be enabled at mkfs
           time.
      
         - simple quota accounting (squota)
      
           A simplified mode of qgroup that accounts all space on the initial
           extent owners (a subvolume), the snapshots are then cheap to create
           and delete. The deletion of snapshots in fully accounting qgroups
           is a known CPU/IO performance bottleneck.
      
           The squota is not suitable for the general use case but works well
           for containers where the original subvolume exists for the whole
           time. This is a backward incompatible feature as it needs extending
           some structures, but can be enabled on an existing filesystem.
      
         - temporary filesystem fsid (temp_fsid)
      
           The fsid identifies a filesystem and is hard coded in the
           structures, which disallows mounting the same fsid found on
           different devices.
      
           For a single device filesystem this is not strictly necessary, a
           new temporary fsid can be generated on mount e.g. after a device is
           cloned. This will be used by Steam Deck for root partition A/B
           testing, or can be used for VM root images.
      
        Other user visible changes:
      
         - filesystems with partially finished metadata_uuid conversion cannot
           be mounted anymore and the uuid fixup has to be done by btrfs-progs
           (btrfstune).
      
        Performance improvements:
      
         - reduce reservations for checksum deletions (with enabled free space
           tree by factor of 4), on a sample workload on file with many
           extents the deletion time decreased by 12%
      
         - make extent state merges more efficient during insertions, reduce
           rb-tree iterations (run time of critical functions reduced by 5%)
      
        Core changes:
      
         - the integrity check functionality has been removed, this was a
           debugging feature and removal does not affect other integrity
           checks like checksums or tree-checker
      
         - space reservation changes:
      
            - more efficient delayed ref reservations, this avoids building up
              too much work or overusing or exhausting the global block
              reserve in some situations
      
            - move delayed refs reservation to the transaction start time,
              this prevents some ENOSPC corner cases related to exhaustion of
              global reserve
      
            - improvements in reducing excessive reservations for block group
              items
      
            - adjust overcommit logic in near full situations, account for one
              more chunk to eventually allocate metadata chunk, this is mostly
              relevant for small filesystems (<10GiB)
      
         - single device filesystems are scanned but not registered (except
           seed devices), this allows temp_fsid to work
      
         - qgroup iterations do not need GFP_ATOMIC allocations anymore
      
         - cleanups, refactoring, reduced data structure size, function
           parameter simplifications, error handling fixes"
      
      * tag 'for-6.7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (156 commits)
        btrfs: open code timespec64 in struct btrfs_inode
        btrfs: remove redundant log root tree index assignment during log sync
        btrfs: remove redundant initialization of variable dirty in btrfs_update_time()
        btrfs: sysfs: show temp_fsid feature
        btrfs: disable the device add feature for temp-fsid
        btrfs: disable the seed feature for temp-fsid
        btrfs: update comment for temp-fsid, fsid, and metadata_uuid
        btrfs: remove pointless empty log context list check when syncing log
        btrfs: update comment for struct btrfs_inode::lock
        btrfs: remove pointless barrier from btrfs_sync_file()
        btrfs: add and use helpers for reading and writing last_trans_committed
        btrfs: add and use helpers for reading and writing fs_info->generation
        btrfs: add and use helpers for reading and writing log_transid
        btrfs: add and use helpers for reading and writing last_log_commit
        btrfs: support cloned-device mount capability
        btrfs: add helper function find_fsid_by_disk
        btrfs: stop reserving excessive space for block group item insertions
        btrfs: stop reserving excessive space for block group item updates
        btrfs: reorder btrfs_inode to fill gaps
        btrfs: open code btrfs_ordered_inode_tree in btrfs_inode
        ...
      d5acbc60
    • Linus Torvalds's avatar
      Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux · 8829687a
      Linus Torvalds authored
      Pull fscrypt updates from Eric Biggers:
       "This update adds support for configuring the crypto data unit size
        (i.e. the granularity of file contents encryption) to be less than the
        filesystem block size. This can allow users to use inline encryption
        hardware in some cases when it wouldn't otherwise be possible.
      
        In addition, there are two commits that are prerequisites for the
        extent-based encryption support that the btrfs folks are working on"
      
      * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux:
        fscrypt: track master key presence separately from secret
        fscrypt: rename fscrypt_info => fscrypt_inode_info
        fscrypt: support crypto data unit size less than filesystem block size
        fscrypt: replace get_ino_and_lblk_bits with just has_32bit_inodes
        fscrypt: compute max_lblk_bits from s_maxbytes and block size
        fscrypt: make the bounce page pool opt-in instead of opt-out
        fscrypt: make it clearer that key_prefix is deprecated
      8829687a
    • Linus Torvalds's avatar
      Merge tag 'nfsd-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux · 8b16da68
      Linus Torvalds authored
      Pull nfsd updates from Chuck Lever:
       "This release completes the SunRPC thread scheduler work that was begun
        in v6.6. The scheduler can now find an svc thread to wake in constant
        time and without a list walk. Thanks again to Neil Brown for this
        overhaul.
      
        Lorenzo Bianconi contributed infrastructure for a netlink-based NFSD
        control plane. The long-term plan is to provide the same functionality
        as found in /proc/fs/nfsd, plus some interesting additions, and then
        migrate the NFSD user space utilities to netlink.
      
        A long series to overhaul NFSD's NFSv4 operation encoding was applied
        in this release. The goals are to bring this family of encoding
        functions in line with the matching NFSv4 decoding functions and with
        the NFSv2 and NFSv3 XDR functions, preparing the way for better memory
        safety and maintainability.
      
        A further improvement to NFSD's write delegation support was
        contributed by Dai Ngo. This adds a CB_GETATTR callback, enabling the
        server to retrieve cached size and mtime data from clients holding
        write delegations. If the server can retrieve this information, it
        does not have to recall the delegation in some cases.
      
        The usual panoply of bug fixes and minor improvements round out this
        release. As always I am grateful to all contributors, reviewers, and
        testers"
      
      * tag 'nfsd-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (127 commits)
        svcrdma: Fix tracepoint printk format
        svcrdma: Drop connection after an RDMA Read error
        NFSD: clean up alloc_init_deleg()
        NFSD: Fix frame size warning in svc_export_parse()
        NFSD: Rewrite synopsis of nfsd_percpu_counters_init()
        nfsd: Clean up errors in nfs3proc.c
        nfsd: Clean up errors in nfs4state.c
        NFSD: Clean up errors in stats.c
        NFSD: simplify error paths in nfsd_svc()
        NFSD: Clean up nfsd4_encode_seek()
        NFSD: Clean up nfsd4_encode_offset_status()
        NFSD: Clean up nfsd4_encode_copy_notify()
        NFSD: Clean up nfsd4_encode_copy()
        NFSD: Clean up nfsd4_encode_test_stateid()
        NFSD: Clean up nfsd4_encode_exchange_id()
        NFSD: Clean up nfsd4_do_encode_secinfo()
        NFSD: Clean up nfsd4_encode_access()
        NFSD: Clean up nfsd4_encode_readdir()
        NFSD: Clean up nfsd4_encode_entry4()
        NFSD: Add an nfsd4_encode_nfs_cookie4() helper
        ...
      8b16da68
    • Linus Torvalds's avatar
      Merge tag 'vfs-6.7.ctime' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs · 14ab6d42
      Linus Torvalds authored
      Pull vfs inode time accessor updates from Christian Brauner:
       "This finishes the conversion of all inode time fields to accessor
        functions as discussed on list. Changing timestamps manually as we
        used to do before is error prone. Using accessors function makes this
        robust.
      
        It does not contain the switch of the time fields to discrete 64 bit
        integers to replace struct timespec and free up space in struct inode.
        But after this, the switch can be trivially made and the patch should
        only affect the vfs if we decide to do it"
      
      * tag 'vfs-6.7.ctime' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: (86 commits)
        fs: rename inode i_atime and i_mtime fields
        security: convert to new timestamp accessors
        selinux: convert to new timestamp accessors
        apparmor: convert to new timestamp accessors
        sunrpc: convert to new timestamp accessors
        mm: convert to new timestamp accessors
        bpf: convert to new timestamp accessors
        ipc: convert to new timestamp accessors
        linux: convert to new timestamp accessors
        zonefs: convert to new timestamp accessors
        xfs: convert to new timestamp accessors
        vboxsf: convert to new timestamp accessors
        ufs: convert to new timestamp accessors
        udf: convert to new timestamp accessors
        ubifs: convert to new timestamp accessors
        tracefs: convert to new timestamp accessors
        sysv: convert to new timestamp accessors
        squashfs: convert to new timestamp accessors
        server: convert to new timestamp accessors
        client: convert to new timestamp accessors
        ...
      14ab6d42
    • Linus Torvalds's avatar
      Merge tag 'vfs-6.7.xattr' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs · 7352a676
      Linus Torvalds authored
      Pull vfs xattr updates from Christian Brauner:
       "The 's_xattr' field of 'struct super_block' currently requires a
        mutable table of 'struct xattr_handler' entries (although each handler
        itself is const). However, no code in vfs actually modifies the
        tables.
      
        This changes the type of 's_xattr' to allow const tables, and modifies
        existing file systems to move their tables to .rodata. This is
        desirable because these tables contain entries with function pointers
        in them; moving them to .rodata makes it considerably less likely to
        be modified accidentally or maliciously at runtime"
      
      * tag 'vfs-6.7.xattr' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: (30 commits)
        const_structs.checkpatch: add xattr_handler
        net: move sockfs_xattr_handlers to .rodata
        shmem: move shmem_xattr_handlers to .rodata
        overlayfs: move xattr tables to .rodata
        xfs: move xfs_xattr_handlers to .rodata
        ubifs: move ubifs_xattr_handlers to .rodata
        squashfs: move squashfs_xattr_handlers to .rodata
        smb: move cifs_xattr_handlers to .rodata
        reiserfs: move reiserfs_xattr_handlers to .rodata
        orangefs: move orangefs_xattr_handlers to .rodata
        ocfs2: move ocfs2_xattr_handlers and ocfs2_xattr_handler_map to .rodata
        ntfs3: move ntfs_xattr_handlers to .rodata
        nfs: move nfs4_xattr_handlers to .rodata
        kernfs: move kernfs_xattr_handlers to .rodata
        jfs: move jfs_xattr_handlers to .rodata
        jffs2: move jffs2_xattr_handlers to .rodata
        hfsplus: move hfsplus_xattr_handlers to .rodata
        hfs: move hfs_xattr_handlers to .rodata
        gfs2: move gfs2_xattr_handlers_max to .rodata
        fuse: move fuse_xattr_handlers to .rodata
        ...
      7352a676
    • Linus Torvalds's avatar
      Merge tag 'vfs-6.7.iov_iter' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs · df9c65b5
      Linus Torvalds authored
      Pull iov_iter updates from Christian Brauner:
       "This contain's David's iov_iter cleanup work to convert the iov_iter
        iteration macros to inline functions:
      
         - Remove last_offset from iov_iter as it was only used by ITER_PIPE
      
         - Add a __user tag on copy_mc_to_user()'s dst argument on x86 to
           match that on powerpc and get rid of a sparse warning
      
         - Convert iter->user_backed to user_backed_iter() in the sound PCM
           driver
      
         - Convert iter->user_backed to user_backed_iter() in a couple of
           infiniband drivers
      
         - Renumber the type enum so that the ITER_* constants match the order
           in iterate_and_advance*()
      
         - Since the preceding patch puts UBUF and IOVEC at 0 and 1, change
           user_backed_iter() to just use the type value and get rid of the
           extra flag
      
         - Convert the iov_iter iteration macros to always-inline functions to
           make the code easier to follow. It uses function pointers, but they
           get optimised away
      
         - Move the check for ->copy_mc to _copy_from_iter() and
           copy_page_from_iter_atomic() rather than in memcpy_from_iter_mc()
           where it gets repeated for every segment. Instead, we check once
           and invoke a side function that can use iterate_bvec() rather than
           iterate_and_advance() and supply a different step function
      
         - Move the copy-and-csum code to net/ where it can be in proximity
           with the code that uses it
      
         - Fold memcpy_and_csum() in to its two users
      
         - Move csum_and_copy_from_iter_full() out of line and merge in
           csum_and_copy_from_iter() since the former is the only caller of
           the latter
      
         - Move hash_and_copy_to_iter() to net/ where it can be with its only
           caller"
      
      * tag 'vfs-6.7.iov_iter' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs:
        iov_iter, net: Move hash_and_copy_to_iter() to net/
        iov_iter, net: Merge csum_and_copy_from_iter{,_full}() together
        iov_iter, net: Fold in csum_and_memcpy()
        iov_iter, net: Move csum_and_copy_to/from_iter() to net/
        iov_iter: Don't deal with iter->copy_mc in memcpy_from_iter_mc()
        iov_iter: Convert iterate*() to inline funcs
        iov_iter: Derive user-backedness from the iterator type
        iov_iter: Renumber ITER_* constants
        infiniband: Use user_backed_iter() to see if iterator is UBUF/IOVEC
        sound: Fix snd_pcm_readv()/writev() to use iov access functions
        iov_iter, x86: Be consistent about the __user tag on copy_mc_to_user()
        iov_iter: Remove last_offset from iov_iter as it was for ITER_PIPE
      df9c65b5
    • Linus Torvalds's avatar
      Merge tag 'vfs-6.7.misc' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs · 3b3f874c
      Linus Torvalds authored
      Pull misc vfs updates from Christian Brauner:
       "This contains the usual miscellaneous features, cleanups, and fixes
        for vfs and individual fses.
      
        Features:
      
         - Rename and export helpers that get write access to a mount. They
           are used in overlayfs to get write access to the upper mount.
      
         - Print the pretty name of the root device on boot failure. This
           helps in scenarios where we would usually only print
           "unknown-block(1,2)".
      
         - Add an internal SB_I_NOUMASK flag. This is another part in the
           endless POSIX ACL saga in a way.
      
           When POSIX ACLs are enabled via SB_POSIXACL the vfs cannot strip
           the umask because if the relevant inode has POSIX ACLs set it might
           take the umask from there. But if the inode doesn't have any POSIX
           ACLs set then we apply the umask in the filesytem itself. So we end
           up with:
      
            (1) no SB_POSIXACL -> strip umask in vfs
            (2) SB_POSIXACL    -> strip umask in filesystem
      
           The umask semantics associated with SB_POSIXACL allowed filesystems
           that don't even support POSIX ACLs at all to raise SB_POSIXACL
           purely to avoid umask stripping. That specifically means NFS v4 and
           Overlayfs. NFS v4 does it because it delegates this to the server
           and Overlayfs because it needs to delegate umask stripping to the
           upper filesystem, i.e., the filesystem used as the writable layer.
      
           This went so far that SB_POSIXACL is raised eve on kernels that
           don't even have POSIX ACL support at all.
      
           Stop this blatant abuse and add SB_I_NOUMASK which is an internal
           superblock flag that filesystems can raise to opt out of umask
           handling. That should really only be the two mentioned above. It's
           not that we want any filesystems to do this. Ideally we have all
           umask handling always in the vfs.
      
         - Make overlayfs use SB_I_NOUMASK too.
      
         - Now that we have SB_I_NOUMASK, stop checking for SB_POSIXACL in
           IS_POSIXACL() if the kernel doesn't have support for it. This is a
           very old patch but it's only possible to do this now with the wider
           cleanup that was done.
      
         - Follow-up work on fake path handling from last cycle. Citing mostly
           from Amir:
      
           When overlayfs was first merged, overlayfs files of regular files
           and directories, the ones that are installed in file table, had a
           "fake" path, namely, f_path is the overlayfs path and f_inode is
           the "real" inode on the underlying filesystem.
      
           In v6.5, we took another small step by introducing of the
           backing_file container and the file_real_path() helper. This change
           allowed vfs and filesystem code to get the "real" path of an
           overlayfs backing file. With this change, we were able to make
           fsnotify work correctly and report events on the "real" filesystem
           objects that were accessed via overlayfs.
      
           This method works fine, but it still leaves the vfs vulnerable to
           new code that is not aware of files with fake path. A recent
           example is commit db1d1e8b ("IMA: use vfs_getattr_nosec to get
           the i_version"). This commit uses direct referencing to f_path in
           IMA code that otherwise uses file_inode() and file_dentry() to
           reference the filesystem objects that it is measuring.
      
           This contains work to switch things around: instead of having
           filesystem code opt-in to get the "real" path, have generic code
           opt-in for the "fake" path in the few places that it is needed.
      
           Is it far more likely that new filesystems code that does not use
           the file_dentry() and file_real_path() helpers will end up causing
           crashes or averting LSM/audit rules if we keep the "fake" path
           exposed by default.
      
           This change already makes file_dentry() moot, but for now we did
           not change this helper just added a WARN_ON() in ovl_d_real() to
           catch if we have made any wrong assumptions.
      
           After the dust settles on this change, we can make file_dentry() a
           plain accessor and we can drop the inode argument to ->d_real().
      
         - Switch struct file to SLAB_TYPESAFE_BY_RCU. This looks like a small
           change but it really isn't and I would like to see everyone on
           their tippie toes for any possible bugs from this work.
      
           Essentially we've been doing most of what SLAB_TYPESAFE_BY_RCU for
           files since a very long time because of the nasty interactions
           between the SCM_RIGHTS file descriptor garbage collection. So
           extending it makes a lot of sense but it is a subtle change. There
           are almost no places that fiddle with file rcu semantics directly
           and the ones that did mess around with struct file internal under
           rcu have been made to stop doing that because it really was always
           dodgy.
      
           I forgot to put in the link tag for this change and the discussion
           in the commit so adding it into the merge message:
      
             https://lore.kernel.org/r/20230926162228.68666-1-mjguzik@gmail.com
      
        Cleanups:
      
         - Various smaller pipe cleanups including the removal of a spin lock
           that was only used to protect against writes without pipe_lock()
           from O_NOTIFICATION_PIPE aka watch queues. As that was never
           implemented remove the additional locking from pipe_write().
      
         - Annotate struct watch_filter with the new __counted_by attribute.
      
         - Clarify do_unlinkat() cleanup so that it doesn't look like an extra
           iput() is done that would cause issues.
      
         - Simplify file cleanup when the file has never been opened.
      
         - Use module helper instead of open-coding it.
      
         - Predict error unlikely for stale retry.
      
         - Use WRITE_ONCE() for mount expiry field instead of just commenting
           that one hopes the compiler doesn't get smart.
      
        Fixes:
      
         - Fix readahead on block devices.
      
         - Fix writeback when layztime is enabled and inodes whose timestamp
           is the only thing that changed reside on wb->b_dirty_time. This
           caused excessively large zombie memory cgroup when lazytime was
           enabled as such inodes weren't handled fast enough.
      
         - Convert BUG_ON() to WARN_ON_ONCE() in open_last_lookups()"
      
      * tag 'vfs-6.7.misc' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: (26 commits)
        file, i915: fix file reference for mmap_singleton()
        vfs: Convert BUG_ON to WARN_ON_ONCE in open_last_lookups
        writeback, cgroup: switch inodes with dirty timestamps to release dying cgwbs
        chardev: Simplify usage of try_module_get()
        ovl: rely on SB_I_NOUMASK
        fs: fix umask on NFS with CONFIG_FS_POSIX_ACL=n
        fs: store real path instead of fake path in backing file f_path
        fs: create helper file_user_path() for user displayed mapped file path
        fs: get mnt_writers count for an open backing file's real path
        vfs: stop counting on gcc not messing with mnt_expiry_mark if not asked
        vfs: predict the error in retry_estale as unlikely
        backing file: free directly
        vfs: fix readahead(2) on block devices
        io_uring: use files_lookup_fd_locked()
        file: convert to SLAB_TYPESAFE_BY_RCU
        vfs: shave work on failed file open
        fs: simplify misleading code to remove ambiguity regarding ihold()/iput()
        watch_queue: Annotate struct watch_filter with __counted_by
        fs/pipe: use spinlock in pipe_read() only if there is a watch_queue
        fs/pipe: remove unnecessary spinlock from pipe_write()
        ...
      3b3f874c
    • Linus Torvalds's avatar
      Merge tag 'vfs-6.7.autofs' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs · 0d63d8b2
      Linus Torvalds authored
      Pull autofs mount api updates from Christian Brauner:
       "This ports autofs to the new mount api. The patchset has existed for
        quite a while but never made it upstream. Ian picked it back up.
      
        This also fixes a bug where fs_param_is_fd() was passed a garbage
        param->dirfd but it expected it to be set to the fd that was used to
        set param->file otherwise result->uint_32 contains nonsense. So make
        sure it's set.
      
        One less filesystem using the old mount api. We're getting there,
        albeit rather slow. The last remaining major filesystem that hasn't
        converted is btrfs. Patches exist - I even wrote them - but so far
        they haven't made it upstream"
      
      * tag 'vfs-6.7.autofs' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs:
        autofs: fix add autofs_parse_fd()
        fsconfig: ensure that dirfd is set to aux
        autofs: fix protocol sub version setting
        autofs: convert autofs to use the new mount api
        autofs: validate protocol version
        autofs: refactor parse_options()
        autofs: reformat 0pt enum declaration
        autofs: refactor super block info init
        autofs: add autofs_parse_fd()
        autofs: refactor autofs_prepare_pipe()
      0d63d8b2
    • Linus Torvalds's avatar
      Merge tag 'vfs-6.7.super' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs · d4e175f2
      Linus Torvalds authored
      Pull vfs superblock updates from Christian Brauner:
       "This contains the work to make block device opening functions return a
        struct bdev_handle instead of just a struct block_device. The same
        struct bdev_handle is then also passed to block device closing
        functions.
      
        This allows us to propagate context from opening to closing a block
        device without having to modify all users everytime.
      
        Sidenote, in the future we might even want to try and have block
        device opening functions return a struct file directly but that's a
        series on top of this.
      
        These are further preparatory changes to be able to count writable
        opens and blocking writes to mounted block devices. That's a separate
        piece of work for next cycle and for that we absolutely need the
        changes to btrfs that have been quietly dropped somehow.
      
        Originally the series contained a patch that removed the old
        blkdev_*() helpers. But since this would've caused needles churn in
        -next for bcachefs we ended up delaying it.
      
        The second piece of work addresses one of the major annoyances about
        the work last cycle, namely that we required dropping s_umount
        whenever we used the superblock and fs_holder_ops for a block device.
      
        The reason for that requirement had been that in some codepaths
        s_umount could've been taken under disk->open_mutex (that's always
        been the case, at least theoretically). For example, on surprise block
        device removal or media change. And opening and closing block devices
        required grabbing disk->open_mutex as well.
      
        So we did the work and went through the block layer and fixed all
        those places so that s_umount is never taken under disk->open_mutex.
        This means no more brittle games where we yield and reacquire s_umount
        during block device opening and closing and no more requirements where
        block devices need to be closed. Filesystems don't need to care about
        this.
      
        There's a bunch of other follow-up work such as moving block device
        freezing and thawing to holder operations which makes it work for all
        block devices and not just the main block device just as we did for
        surprise removal. But that is for next cycle.
      
        Tested with fstests for all major fses, blktests, LTP"
      
      * tag 'vfs-6.7.super' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: (37 commits)
        porting: update locking requirements
        fs: assert that open_mutex isn't held over holder ops
        block: assert that we're not holding open_mutex over blk_report_disk_dead
        block: move bdev_mark_dead out of disk_check_media_change
        block: WARN_ON_ONCE() when we remove active partitions
        block: simplify bdev_del_partition()
        fs: Avoid grabbing sb->s_umount under bdev->bd_holder_lock
        jfs: fix log->bdev_handle null ptr deref in lbmStartIO
        bcache: Fixup error handling in register_cache()
        xfs: Convert to bdev_open_by_path()
        reiserfs: Convert to bdev_open_by_dev/path()
        ocfs2: Convert to use bdev_open_by_dev()
        nfs/blocklayout: Convert to use bdev_open_by_dev/path()
        jfs: Convert to bdev_open_by_dev()
        f2fs: Convert to bdev_open_by_dev/path()
        ext4: Convert to bdev_open_by_dev()
        erofs: Convert to use bdev_open_by_path()
        btrfs: Convert to bdev_open_by_path()
        fs: Convert to bdev_open_by_dev()
        mm/swap: Convert to use bdev_open_by_dev()
        ...
      d4e175f2