1. 05 Apr, 2024 28 commits
    • Linus Torvalds's avatar
      Merge tag 'mm-hotfixes-stable-2024-04-05-11-30' of... · af709adf
      Linus Torvalds authored
      Merge tag 'mm-hotfixes-stable-2024-04-05-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
      
      Pull misc fixes from Andrew Morton:
       "8 hotfixes, 3 are cc:stable
      
        There are a couple of fixups for this cycle's vmalloc changes and one
        for the stackdepot changes. And a fix for a very old x86 PAT issue
        which can cause a warning splat"
      
      * tag 'mm-hotfixes-stable-2024-04-05-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
        stackdepot: rename pool_index to pool_index_plus_1
        x86/mm/pat: fix VM_PAT handling in COW mappings
        MAINTAINERS: change vmware.com addresses to broadcom.com
        selftests/mm: include strings.h for ffsl
        mm: vmalloc: fix lockdep warning
        mm: vmalloc: bail out early in find_vmap_area() if vmap is not init
        init: open output files from cpio unpacking with O_LARGEFILE
        mm/secretmem: fix GUP-fast succeeding on secretmem folios
      af709adf
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · c7830236
      Linus Torvalds authored
      Pull arm64 fix from Catalin Marinas:
       "arm64/ptrace fix to use the correct SVE layout based on the saved
        floating point state rather than the TIF_SVE flag. The latter may be
        left on during syscalls even if the SVE state is discarded"
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64/ptrace: Use saved floating point state type to determine SVE layout
      c7830236
    • Linus Torvalds's avatar
      Merge tag 'riscv-for-linus-6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · 261b8e89
      Linus Torvalds authored
      Pull RISC-V fixes from Palmer Dabbelt:
      
       - A fix for an __{get,put}_kernel_nofault to avoid an uninitialized
         value causing spurious failures
      
       - compat_vdso.so.dbg is now installed to the standard install location
      
       - A fix to avoid initializing PERF_SAMPLE_BRANCH_*-related events, as
         they aren't supported and will just later fail
      
       - A fix to make AT_VECTOR_SIZE_ARCH correct now that we're providing
         AT_MINSIGSTKSZ
      
       - pgprot_nx() is now implemented, which fixes vmap W^X protection
      
       - A fix for the vector save/restore code, which at least manifests as
         corrupted vector state when a signal is taken
      
       - A fix for a race condition in instruction patching
      
       - A fix to avoid leaking the kernel-mode GP to userspace, which is a
         kernel pointer leak that can be used to defeat KASLR in various ways
      
       - A handful of smaller fixes to build warnings, an overzealous printk,
         and some missing tracing annotations
      
      * tag 'riscv-for-linus-6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
        riscv: process: Fix kernel gp leakage
        riscv: Disable preemption when using patch_map()
        riscv: Fix warning by declaring arch_cpu_idle() as noinstr
        riscv: use KERN_INFO in do_trap
        riscv: Fix vector state restore in rt_sigreturn()
        riscv: mm: implement pgprot_nx
        riscv: compat_vdso: align VDSOAS build log
        RISC-V: Update AT_VECTOR_SIZE_ARCH for new AT_MINSIGSTKSZ
        riscv: Mark __se_sys_* functions __used
        drivers/perf: riscv: Disable PERF_SAMPLE_BRANCH_* while not supported
        riscv: compat_vdso: install compat_vdso.so.dbg to /lib/modules/*/vdso/
        riscv: hwprobe: do not produce frtace relocation
        riscv: Fix spurious errors from __get/put_kernel_nofault
        riscv: mm: Fix prototype to avoid discarding const
      261b8e89
    • Linus Torvalds's avatar
      Merge tag 's390-6.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · 50094473
      Linus Torvalds authored
      Pull s390 fixes from Alexander Gordeev:
      
       - Fix missing NULL pointer check when determining guest/host fault
      
       - Mark all functions in asm/atomic_ops.h, asm/atomic.h and
         asm/preempt.h as __always_inline to avoid unwanted instrumentation
      
       - Fix removal of a Processor Activity Instrumentation (PAI) sampling
         event in PMU device driver
      
       - Align system call table on 8 bytes
      
      * tag 's390-6.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        s390/entry: align system call table on 8 bytes
        s390/pai: fix sampling event removal for PMU device driver
        s390/preempt: mark all functions __always_inline
        s390/atomic: mark all functions __always_inline
        s390/mm: fix NULL pointer dereference
      50094473
    • Linus Torvalds's avatar
      Merge tag 'pm-6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 2f9fd9e4
      Linus Torvalds authored
      Pull power management fix from Rafael Wysocki:
       "Fix a recent Energy Model change that went against a recent scheduler
        change made independently (Vincent Guittot)"
      
      * tag 'pm-6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        PM: EM: fix wrong utilization estimation in em_cpu_energy()
      2f9fd9e4
    • Linus Torvalds's avatar
      Merge tag 'thermal-6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · b21defcb
      Linus Torvalds authored
      Pull thermal control fixes from Rafael Wysocki:
       "These fix two power allocator thermal governor issues and an ACPI
        thermal driver regression that all were introduced during the 6.8
        development cycle.
      
        Specifics:
      
         - Allow the power allocator thermal governor to bind to a thermal
           zone without cooling devices and/or without trip points (Nikita
           Travkin)
      
         - Make the ACPI thermal driver register a tripless thermal zone when
           it cannot find any usable trip points instead of returning an error
           from acpi_thermal_add() (Stephen Horvath)"
      
      * tag 'thermal-6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        thermal: gov_power_allocator: Allow binding without trip points
        thermal: gov_power_allocator: Allow binding without cooling devices
        ACPI: thermal: Register thermal zones without valid trip points
      b21defcb
    • Linus Torvalds's avatar
      Merge tag 'gpio-fixes-for-v6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux · 2e69af16
      Linus Torvalds authored
      Pull gpio fixes from Bartosz Golaszewski:
      
       - make sure GPIO devices are registered with the subsystem before
         trying to return them to a caller of gpio_device_find()
      
       - fix two issues with incorrect sanitization of the interrupt labels
      
      * tag 'gpio-fixes-for-v6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
        gpio: cdev: fix missed label sanitizing in debounce_setup()
        gpio: cdev: check for NULL labels when sanitizing them for irqs
        gpiolib: Fix triggering "kobject: 'gpiochipX' is not initialized, yet" kobject_get() errors
      2e69af16
    • Linus Torvalds's avatar
      Merge tag 'ata-6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux · 4c3fc345
      Linus Torvalds authored
      Pull ata fixes from Damien Le Moal:
      
       - Compilation warning fixes from Arnd: one in the sata_sx4 driver due
         to an incorrect calculation of the parameters passed to memcpy() and
         another one in the sata_mv driver when CONFIG_PCI is not set
      
       - Drop the owner driver field assignment in the pata_macio driver. That
         is not needed as the PCI core code does that already (Krzysztof)
      
       - Remove an unusued field in struct st_ahci_drv_data of the ahci_st
         driver (Christophe)
      
       - Add a missing clock probe error check in the sata_gemini driver
         (Chen)
      
      * tag 'ata-6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
        ata: sata_gemini: Check clk_enable() result
        ata: sata_mv: Fix PCI device ID table declaration compilation warning
        ata: ahci_st: Remove an unused field in struct st_ahci_drv_data
        ata: pata_macio: drop driver owner assignment
        ata: sata_sx4: fix pdc20621_get_from_dimm() on 64-bit
      4c3fc345
    • Linus Torvalds's avatar
      Merge tag 'sound-6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · c42881d4
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "This became a bit bigger collection of patches, but almost all are
        about device-specific fixes, and should be safe for 6.9:
      
         - Lots of ASoC Intel SOF-related fixes/updates
      
         - Locking fixes in SoundWire drivers
      
         - ASoC AMD ACP/SOF updates
      
         - ASoC ES8326 codec fixes
      
         - HD-audio codec fixes and quirks
      
         - A regression fix in emu10k1 synth code"
      
      * tag 'sound-6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (49 commits)
        ASoC: SOF: Core: Add remove_late() to sof_init_environment failure path
        ASoC: SOF: amd: fix for false dsp interrupts
        ASoC: SOF: Intel: lnl: Disable DMIC/SSP offload on remove
        ASoC: Intel: avs: boards: Add modules description
        ASoC: codecs: ES8326: Removing the control of ADC_SCALE
        ASoC: codecs: ES8326: Solve a headphone detection issue after suspend and resume
        ASoC: codecs: ES8326: modify clock table
        ASoC: codecs: ES8326: Solve error interruption issue
        ALSA: line6: Zero-initialize message buffers
        ALSA: hda/realtek: cs35l41: Support ASUS ROG G634JYR
        ALSA: hda/realtek: Update Panasonic CF-SZ6 quirk to support headset with microphone
        ALSA: hda/realtek: Add sound quirks for Lenovo Legion slim 7 16ARHA7 models
        Revert "ALSA: emu10k1: fix synthesizer sample playback position and caching"
        OSS: dmasound/paula: Mark driver struct with __refdata to prevent section mismatch
        ALSA: hda/realtek: Add quirks for ASUS Laptops using CS35L56
        ASoC: amd: acp: fix for acp_init function error handling
        ASoC: tas2781: mark dvc_tlv with __maybe_unused
        ASoC: ops: Fix wraparound for mask in snd_soc_get_volsw
        ASoC: rt-sdw*: add __func__ to all error logs
        ASoC: rt722-sdca-sdw: fix locking sequence
        ...
      c42881d4
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-2024-04-05' of https://gitlab.freedesktop.org/drm/kernel · 89103a16
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "Weekly fixes, mostly xe and i915, amdgpu on a week off, otherwise a
        nouveau fix for a crash with new vulkan cts tests, and a couple of
        cleanups and misc fixes.
      
        display:
         - fix typos in kerneldoc
      
        prime:
         - unbreak dma-buf export for virt-gpu
      
        nouveau:
         - uvmm: fix remap address calculation
         - minor cleanups
      
        panfrost:
         - fix power-transition timeouts
      
        xe:
         - Stop using system_unbound_wq for preempt fences
         - Fix saving unordered rebinding fences by attaching them as kernel
           feces to the vm's resv
         - Fix TLB invalidation fences completing out of order
         - Move rebind TLB invalidation to the ring ops to reduce the latency
      
        i915:
         - A few DisplayPort related fixes
         - eDP PSR fixes
         - Remove some VM space restrictions on older platforms
         - Disable automatic load CCS load balancing"
      
      * tag 'drm-fixes-2024-04-05' of https://gitlab.freedesktop.org/drm/kernel: (22 commits)
        drm/xe: Use ordered wq for preempt fence waiting
        drm/xe: Move vma rebinding to the drm_exec locking loop
        drm/xe: Make TLB invalidation fences unordered
        drm/xe: Rework rebinding
        drm/xe: Use ring ops TLB invalidation for rebinds
        drm/i915/mst: Reject FEC+MST on ICL
        drm/i915/mst: Limit MST+DSC to TGL+
        drm/i915/dp: Fix the computation for compressed_bpp for DISPLAY < 13
        drm/i915/gt: Enable only one CCS for compute workload
        drm/i915/gt: Do not generate the command streamer for all the CCS
        drm/i915/gt: Disable HW load balancing for CCS
        drm/i915/gt: Limit the reserved VM space to only the platforms that need it
        drm/i915/psr: Fix intel_psr2_sel_fetch_et_alignment usage
        drm/i915/psr: Move writing early transport pipe src
        drm/i915/psr: Calculate PIPE_SRCSZ_ERLY_TPT value
        drm/i915/dp: Remove support for UHBR13.5
        drm/i915/dp: Fix DSC state HW readout for SST connectors
        drm/display: fix typo
        drm/prime: Unbreak virtgpu dma-buf export
        nouveau/uvmm: fix addr/range calcs for remap operations
        ...
      89103a16
    • Peter Collingbourne's avatar
      stackdepot: rename pool_index to pool_index_plus_1 · a6c1d9cb
      Peter Collingbourne authored
      Commit 3ee34eab ("lib/stackdepot: fix first entry having a 0-handle")
      changed the meaning of the pool_index field to mean "the pool index plus
      1".  This made the code accessing this field less self-documenting, as
      well as causing debuggers such as drgn to not be able to easily remain
      compatible with both old and new kernels, because they typically do that
      by testing for presence of the new field.  Because stackdepot is a
      debugging tool, we should make sure that it is debugger friendly. 
      Therefore, give the field a different name to improve readability as well
      as enabling debugger backwards compatibility.
      
      This is needed in 6.9, which would otherwise become an odd release with
      the new semantics and old name so debuggers wouldn't recognize the new
      semantics there.
      
      Fixes: 3ee34eab ("lib/stackdepot: fix first entry having a 0-handle")
      Link: https://lkml.kernel.org/r/20240402001500.53533-1-pcc@google.com
      Link: https://linux-review.googlesource.com/id/Ib3e70c36c1d230dd0a118dc22649b33e768b9f88Signed-off-by: default avatarPeter Collingbourne <pcc@google.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: default avatarAlexander Potapenko <glider@google.com>
      Acked-by: default avatarMarco Elver <elver@google.com>
      Acked-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Omar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      a6c1d9cb
    • David Hildenbrand's avatar
      x86/mm/pat: fix VM_PAT handling in COW mappings · 04c35ab3
      David Hildenbrand authored
      PAT handling won't do the right thing in COW mappings: the first PTE (or,
      in fact, all PTEs) can be replaced during write faults to point at anon
      folios.  Reliably recovering the correct PFN and cachemode using
      follow_phys() from PTEs will not work in COW mappings.
      
      Using follow_phys(), we might just get the address+protection of the anon
      folio (which is very wrong), or fail on swap/nonswap entries, failing
      follow_phys() and triggering a WARN_ON_ONCE() in untrack_pfn() and
      track_pfn_copy(), not properly calling free_pfn_range().
      
      In free_pfn_range(), we either wouldn't call memtype_free() or would call
      it with the wrong range, possibly leaking memory.
      
      To fix that, let's update follow_phys() to refuse returning anon folios,
      and fallback to using the stored PFN inside vma->vm_pgoff for COW mappings
      if we run into that.
      
      We will now properly handle untrack_pfn() with COW mappings, where we
      don't need the cachemode.  We'll have to fail fork()->track_pfn_copy() if
      the first page was replaced by an anon folio, though: we'd have to store
      the cachemode in the VMA to make this work, likely growing the VMA size.
      
      For now, lets keep it simple and let track_pfn_copy() just fail in that
      case: it would have failed in the past with swap/nonswap entries already,
      and it would have done the wrong thing with anon folios.
      
      Simple reproducer to trigger the WARN_ON_ONCE() in untrack_pfn():
      
      <--- C reproducer --->
       #include <stdio.h>
       #include <sys/mman.h>
       #include <unistd.h>
       #include <liburing.h>
      
       int main(void)
       {
               struct io_uring_params p = {};
               int ring_fd;
               size_t size;
               char *map;
      
               ring_fd = io_uring_setup(1, &p);
               if (ring_fd < 0) {
                       perror("io_uring_setup");
                       return 1;
               }
               size = p.sq_off.array + p.sq_entries * sizeof(unsigned);
      
               /* Map the submission queue ring MAP_PRIVATE */
               map = mmap(0, size, PROT_READ | PROT_WRITE, MAP_PRIVATE,
                          ring_fd, IORING_OFF_SQ_RING);
               if (map == MAP_FAILED) {
                       perror("mmap");
                       return 1;
               }
      
               /* We have at least one page. Let's COW it. */
               *map = 0;
               pause();
               return 0;
       }
      <--- C reproducer --->
      
      On a system with 16 GiB RAM and swap configured:
       # ./iouring &
       # memhog 16G
       # killall iouring
      [  301.552930] ------------[ cut here ]------------
      [  301.553285] WARNING: CPU: 7 PID: 1402 at arch/x86/mm/pat/memtype.c:1060 untrack_pfn+0xf4/0x100
      [  301.553989] Modules linked in: binfmt_misc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_g
      [  301.558232] CPU: 7 PID: 1402 Comm: iouring Not tainted 6.7.5-100.fc38.x86_64 #1
      [  301.558772] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebu4
      [  301.559569] RIP: 0010:untrack_pfn+0xf4/0x100
      [  301.559893] Code: 75 c4 eb cf 48 8b 43 10 8b a8 e8 00 00 00 3b 6b 28 74 b8 48 8b 7b 30 e8 ea 1a f7 000
      [  301.561189] RSP: 0018:ffffba2c0377fab8 EFLAGS: 00010282
      [  301.561590] RAX: 00000000ffffffea RBX: ffff9208c8ce9cc0 RCX: 000000010455e047
      [  301.562105] RDX: 07fffffff0eb1e0a RSI: 0000000000000000 RDI: ffff9208c391d200
      [  301.562628] RBP: 0000000000000000 R08: ffffba2c0377fab8 R09: 0000000000000000
      [  301.563145] R10: ffff9208d2292d50 R11: 0000000000000002 R12: 00007fea890e0000
      [  301.563669] R13: 0000000000000000 R14: ffffba2c0377fc08 R15: 0000000000000000
      [  301.564186] FS:  0000000000000000(0000) GS:ffff920c2fbc0000(0000) knlGS:0000000000000000
      [  301.564773] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  301.565197] CR2: 00007fea88ee8a20 CR3: 00000001033a8000 CR4: 0000000000750ef0
      [  301.565725] PKRU: 55555554
      [  301.565944] Call Trace:
      [  301.566148]  <TASK>
      [  301.566325]  ? untrack_pfn+0xf4/0x100
      [  301.566618]  ? __warn+0x81/0x130
      [  301.566876]  ? untrack_pfn+0xf4/0x100
      [  301.567163]  ? report_bug+0x171/0x1a0
      [  301.567466]  ? handle_bug+0x3c/0x80
      [  301.567743]  ? exc_invalid_op+0x17/0x70
      [  301.568038]  ? asm_exc_invalid_op+0x1a/0x20
      [  301.568363]  ? untrack_pfn+0xf4/0x100
      [  301.568660]  ? untrack_pfn+0x65/0x100
      [  301.568947]  unmap_single_vma+0xa6/0xe0
      [  301.569247]  unmap_vmas+0xb5/0x190
      [  301.569532]  exit_mmap+0xec/0x340
      [  301.569801]  __mmput+0x3e/0x130
      [  301.570051]  do_exit+0x305/0xaf0
      ...
      
      Link: https://lkml.kernel.org/r/20240403212131.929421-3-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reported-by: default avatarWupeng Ma <mawupeng1@huawei.com>
      Closes: https://lkml.kernel.org/r/20240227122814.3781907-1-mawupeng1@huawei.com
      Fixes: b1a86e15 ("x86, pat: remove the dependency on 'vm_pgoff' in track/untrack pfn vma routines")
      Fixes: 5899329b ("x86: PAT: implement track/untrack of pfnmap regions for x86 - v3")
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      04c35ab3
    • Alexey Makhalov's avatar
      MAINTAINERS: change vmware.com addresses to broadcom.com · 87f0e65c
      Alexey Makhalov authored
      Update all remaining vmware.com email addresses to actual broadcom.com.
      
      Add corresponding .mailmap entries for maintainers who contributed in the
      past as the vmware.com address will start bouncing soon.
      
      Maintainership update. Jeff Sipek has left VMware, Nick Shi will be
      maintaining VMware PTP.
      
      Link: https://lkml.kernel.org/r/20240402232334.33167-1-alexey.makhalov@broadcom.comSigned-off-by: default avatarAlexey Makhalov <alexey.makhalov@broadcom.com>
      Acked-by: default avatarFlorian Fainelli <florian.fainelli@broadcom.com>
      Acked-by: default avatarAjay Kaher <ajay.kaher@broadcom.com>
      Acked-by: default avatarRonak Doshi <ronak.doshi@broadcom.com>
      Acked-by: default avatarNick Shi <nick.shi@broadcom.com>
      Acked-by: default avatarBryan Tan <bryan-bt.tan@broadcom.com>
      Acked-by: default avatarVishnu Dasa <vishnu.dasa@broadcom.com>
      Acked-by: default avatarVishal Bhakta <vishal.bhakta@broadcom.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      87f0e65c
    • Edward Liaw's avatar
      selftests/mm: include strings.h for ffsl · 176517c9
      Edward Liaw authored
      Got a compilation error on Android for ffsl after 91b80cc5
      ("selftests: mm: fix map_hugetlb failure on 64K page size systems")
      included vm_util.h.
      
      Link: https://lkml.kernel.org/r/20240329185814.16304-1-edliaw@google.com
      Fixes: af605d26 ("selftests/mm: merge util.h into vm_util.h")
      Signed-off-by: default avatarEdward Liaw <edliaw@google.com>
      Reviewed-by: default avatarMuhammad Usama Anjum <usama.anjum@collabora.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Mike Rapoport (IBM)" <rppt@kernel.org>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      176517c9
    • Uladzislau Rezki (Sony)'s avatar
      mm: vmalloc: fix lockdep warning · fc2c2269
      Uladzislau Rezki (Sony) authored
      A lockdep reports a possible deadlock in the find_vmap_area_exceed_addr_lock()
      function:
      
      ============================================
      WARNING: possible recursive locking detected
      6.9.0-rc1-00060-ged3ccc57b108-dirty #6140 Not tainted
      --------------------------------------------
      drgn/455 is trying to acquire lock:
      ffff0000c00131d0 (&vn->busy.lock/1){+.+.}-{2:2}, at: find_vmap_area_exceed_addr_lock+0x64/0x124
      
      but task is already holding lock:
      ffff0000c0011878 (&vn->busy.lock/1){+.+.}-{2:2}, at: find_vmap_area_exceed_addr_lock+0x64/0x124
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(&vn->busy.lock/1);
        lock(&vn->busy.lock/1);
      
       *** DEADLOCK ***
      
      indeed it can happen if the find_vmap_area_exceed_addr_lock() gets called
      concurrently because it tries to acquire two nodes locks.  It was done to
      prevent removing a lowest VA found on a previous step.
      
      To address this a lowest VA is found first without holding a node lock
      where it resides.  As a last step we check if a VA still there because it
      can go away, if removed, proceed with next lowest.
      
      [akpm@linux-foundation.org: fix comment typos, per Baoquan]
      Link: https://lkml.kernel.org/r/20240328140330.4747-1-urezki@gmail.com
      Fixes: 53becf32 ("mm: vmalloc: support multiple nodes in vread_iter")
      Signed-off-by: default avatarUladzislau Rezki (Sony) <urezki@gmail.com>
      Tested-by: default avatarJens Axboe <axboe@kernel.dk>
      Tested-by: default avatarOmar Sandoval <osandov@fb.com>
      Reported-by: default avatarJens Axboe <axboe@kernel.dk>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sony.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      fc2c2269
    • Uladzislau Rezki (Sony)'s avatar
      mm: vmalloc: bail out early in find_vmap_area() if vmap is not init · 4ed91fa9
      Uladzislau Rezki (Sony) authored
      During the boot the s390 system triggers "spinlock bad magic" messages
      if the spinlock debugging is enabled:
      
      [    0.465445] BUG: spinlock bad magic on CPU#0, swapper/0
      [    0.465490]  lock: single+0x1860/0x1958, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
      [    0.466067] CPU: 0 PID: 0 Comm: swapper Not tainted 6.8.0-12955-g8e938e39 #1
      [    0.466188] Hardware name: QEMU 8561 QEMU (KVM/Linux)
      [    0.466270] Call Trace:
      [    0.466470]  [<00000000011f26c8>] dump_stack_lvl+0x98/0xd8
      [    0.466516]  [<00000000001dcc6a>] do_raw_spin_lock+0x8a/0x108
      [    0.466545]  [<000000000042146c>] find_vmap_area+0x6c/0x108
      [    0.466572]  [<000000000042175a>] find_vm_area+0x22/0x40
      [    0.466597]  [<000000000012f152>] __set_memory+0x132/0x150
      [    0.466624]  [<0000000001cc0398>] vmem_map_init+0x40/0x118
      [    0.466651]  [<0000000001cc0092>] paging_init+0x22/0x68
      [    0.466677]  [<0000000001cbbed2>] setup_arch+0x52a/0x708
      [    0.466702]  [<0000000001cb6140>] start_kernel+0x80/0x5c8
      [    0.466727]  [<0000000000100036>] startup_continue+0x36/0x40
      
      it happens because such system tries to access some vmap areas
      whereas the vmalloc initialization is not even yet done:
      
      [    0.465490] lock: single+0x1860/0x1958, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
      [    0.466067] CPU: 0 PID: 0 Comm: swapper Not tainted 6.8.0-12955-g8e938e39 #1
      [    0.466188] Hardware name: QEMU 8561 QEMU (KVM/Linux)
      [    0.466270] Call Trace:
      [    0.466470] dump_stack_lvl (lib/dump_stack.c:117)
      [    0.466516] do_raw_spin_lock (kernel/locking/spinlock_debug.c:87 kernel/locking/spinlock_debug.c:115)
      [    0.466545] find_vmap_area (mm/vmalloc.c:1059 mm/vmalloc.c:2364)
      [    0.466572] find_vm_area (mm/vmalloc.c:3150)
      [    0.466597] __set_memory (arch/s390/mm/pageattr.c:360 arch/s390/mm/pageattr.c:393)
      [    0.466624] vmem_map_init (./arch/s390/include/asm/set_memory.h:55 arch/s390/mm/vmem.c:660)
      [    0.466651] paging_init (arch/s390/mm/init.c:97)
      [    0.466677] setup_arch (arch/s390/kernel/setup.c:972)
      [    0.466702] start_kernel (init/main.c:899)
      [    0.466727] startup_continue (arch/s390/kernel/head64.S:35)
      [    0.466811] INFO: lockdep is turned off.
      ...
      [    0.718250] vmalloc init - busy lock init 0000000002871860
      [    0.718328] vmalloc init - busy lock init 00000000028731b8
      
      Some background. It worked before because the lock that is in question
      was statically defined and initialized. As of now, the locks and data
      structures are initialized in the vmalloc_init() function.
      
      To address that issue add the check whether the "vmap_initialized"
      variable is set, if not find_vmap_area() bails out on entry returning NULL.
      
      Link: https://lkml.kernel.org/r/20240323141544.4150-1-urezki@gmail.com
      Fixes: 72210662 ("mm: vmalloc: offload free_vmap_area_lock lock")
      Signed-off-by: default avatarUladzislau Rezki (Sony) <urezki@gmail.com>
      Tested-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Reviewed-by: default avatarBaoquan He <bhe@redhat.com>
      Acked-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sony.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      4ed91fa9
    • John Sperbeck's avatar
      init: open output files from cpio unpacking with O_LARGEFILE · 8434f9aa
      John Sperbeck authored
      If a member of a cpio archive for an initrd or initrams is larger than
      2Gb, we'll eventually fail to write to that file when we get to that
      limit, unless O_LARGEFILE is set.
      
      The problem can be seen with this recipe, assuming that BLK_DEV_RAM
      is not configured:
      
      cd /tmp
      dd if=/dev/zero of=BIGFILE bs=1048576 count=2200
      echo BIGFILE | cpio -o -H newc -R root:root > initrd.img
      kexec -l /boot/vmlinuz-$(uname -r) --initrd=initrd.img --reuse-cmdline
      kexec -e
      
      The console will show 'Initramfs unpacking failed: write error'.  With
      the patch, the error is gone.
      
      Link: https://lkml.kernel.org/r/20240323152934.3307391-1-jsperbeck@google.comSigned-off-by: default avatarJohn Sperbeck <jsperbeck@google.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      8434f9aa
    • David Hildenbrand's avatar
      mm/secretmem: fix GUP-fast succeeding on secretmem folios · 65291dcf
      David Hildenbrand authored
      folio_is_secretmem() currently relies on secretmem folios being LRU
      folios, to save some cycles.
      
      However, folios might reside in a folio batch without the LRU flag set, or
      temporarily have their LRU flag cleared.  Consequently, the LRU flag is
      unreliable for this purpose.
      
      In particular, this is the case when secretmem_fault() allocates a fresh
      page and calls filemap_add_folio()->folio_add_lru().  The folio might be
      added to the per-cpu folio batch and won't get the LRU flag set until the
      batch was drained using e.g., lru_add_drain().
      
      Consequently, folio_is_secretmem() might not detect secretmem folios and
      GUP-fast can succeed in grabbing a secretmem folio, crashing the kernel
      when we would later try reading/writing to the folio, because the folio
      has been unmapped from the directmap.
      
      Fix it by removing that unreliable check.
      
      Link: https://lkml.kernel.org/r/20240326143210.291116-2-david@redhat.com
      Fixes: 1507f512 ("mm: introduce memfd_secret system call to create "secret" memory areas")
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reported-by: default avatarxingwei lee <xrivendell7@gmail.com>
      Reported-by: default avataryue sun <samsun1006219@gmail.com>
      Closes: https://lore.kernel.org/lkml/CABOYnLyevJeravW=QrH0JUPYEcDN160aZFb7kwndm-J2rmz0HQ@mail.gmail.com/Debugged-by: default avatarMiklos Szeredi <miklos@szeredi.hu>
      Tested-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Reviewed-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      65291dcf
    • Rafael J. Wysocki's avatar
      Merge branch 'acpi-thermal' · 6f824c9f
      Rafael J. Wysocki authored
      * acpi-thermal:
        ACPI: thermal: Register thermal zones without valid trip points
      6f824c9f
    • Linus Torvalds's avatar
      Merge tag '9p-for-6.9-rc3' of https://github.com/martinetd/linux · e8b0ccb2
      Linus Torvalds authored
      Pull minor 9p cleanups from Dominique Martinet:
      
       - kernel doc fix & removal of unused flag
      
       - fix some bogus debug statement for read/write
      
      * tag '9p-for-6.9-rc3' of https://github.com/martinetd/linux:
        9p: remove SLAB_MEM_SPREAD flag usage
        9p: Fix read/write debug statements to report server reply
        9p/trans_fd: remove Excess kernel-doc comment
      e8b0ccb2
    • Linus Torvalds's avatar
      Merge tag '6.9-rc2-ksmbd-server-fixes' of git://git.samba.org/ksmbd · 405ac6a5
      Linus Torvalds authored
      Pull smb server fixes from Steve French:
       "Three fixes, all also for stable:
      
         - encryption fix
      
         - memory overrun fix
      
         - oplock break fix"
      
      * tag '6.9-rc2-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
        ksmbd: do not set SMB2_GLOBAL_CAP_ENCRYPTION for SMB 3.1.1
        ksmbd: validate payload size in ipc response
        ksmbd: don't send oplock break if rename fails
      405ac6a5
    • Linus Torvalds's avatar
      Merge tag 'vfs-6.9-rc3.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · fae02687
      Linus Torvalds authored
      Pull vfs fixes from Christian Brauner:
       "This contains a few small fixes. This comes with some delay because I
        wanted to wait on people running their reproducers and the Easter
        Holidays meant that those replies came in a little later than usual:
      
         - Fix handling of preventing writes to mounted block devices.
      
           Since last kernel we allow to prevent writing to mounted block
           devices provided CONFIG_BLK_DEV_WRITE_MOUNTED isn't set and the
           block device is opened with restricted writes. When we switched to
           opening block devices as files we altered the mechanism by which we
           recognize when a block device has been opened with write
           restrictions.
      
           The detection logic assumed that only read-write mounted
           filesystems would apply write restrictions to their block devices
           from other openers. That of course is not true since it also makes
           sense to apply write restrictions for filesystems that are
           read-only.
      
           Fix the detection logic using an FMODE_* bit. We still have a few
           left since we freed up a couple a while ago. I also picked up a
           patch to free up four additional FMODE_* bits scheduled for the
           next merge window.
      
         - Fix counting the number of writers to a block device. This just
           changes the logic to be consistent.
      
         - Fix a bug in aio causing a NULL pointer derefernce after we
           implemented batched processing in aio.
      
         - Finally, add the changes we discussed that allows to yield block
           devices early even though file closing itself is deferred.
      
           This also allows us to remove two holder operations to get and
           release the holder to align lifetime of file and holder of the
           block device"
      
      * tag 'vfs-6.9-rc3.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
        aio: Fix null ptr deref in aio_complete() wakeup
        fs,block: yield devices early
        block: count BLK_OPEN_RESTRICT_WRITES openers
        block: handle BLK_OPEN_RESTRICT_WRITES correctly
      fae02687
    • Kent Overstreet's avatar
      aio: Fix null ptr deref in aio_complete() wakeup · caeb4b0a
      Kent Overstreet authored
      list_del_init_careful() needs to be the last access to the wait queue
      entry - it effectively unlocks access.
      
      Previously, finish_wait() would see the empty list head and skip taking
      the lock, and then we'd return - but the completion path would still
      attempt to do the wakeup after the task_struct pointer had been
      overwritten.
      
      Fixes: 71eb6b6b ("fs/aio: obey min_nr when doing wakeups")
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/linux-fsdevel/CAHTA-ubfwwB51A5Wg5M6H_rPEQK9pNf8FkAGH=vr=FEkyRrtqw@mail.gmail.com/Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      Link: https://lore.kernel.org/stable/20240331215212.522544-1-kent.overstreet%40linux.dev
      Link: https://lore.kernel.org/r/20240331215212.522544-1-kent.overstreet@linux.devSigned-off-by: default avatarChristian Brauner <brauner@kernel.org>
      caeb4b0a
    • Takashi Iwai's avatar
      Merge tag 'asoc-fix-v6.9-rc2' of... · 100c8542
      Takashi Iwai authored
      Merge tag 'asoc-fix-v6.9-rc2' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
      
      ASoC: Fixes for v6.9
      
      A relatively large set of fixes here, the biggest piece of it is a
      series correcting some problems with the delay reporting for Intel SOF
      cards but there's a bunch of other things.  Everything here is driver
      specific except for a fix in the core for an issue with sign extension
      handling volume controls.
      100c8542
    • Dave Airlie's avatar
      Merge tag 'drm-intel-fixes-2024-04-04' of... · 4c859574
      Dave Airlie authored
      Merge tag 'drm-intel-fixes-2024-04-04' of https://anongit.freedesktop.org/git/drm/drm-intel into drm-fixes
      
      Display fixes:
      - A few DisplayPort related fixes (Imre, Arun, Ankit, Ville)
      - eDP PSR fixes (Jouni)
      
      Core/GT fixes:
      - Remove some VM space restrictions on older platforms (Andi)
      - Disable automatic load CCS load balancing (Andi)
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      
      From: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/Zg7nSK5oTmWfKPPI@intel.com
      4c859574
    • Dave Airlie's avatar
      Merge tag 'drm-xe-fixes-2024-04-04' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes · a5b5ab33
      Dave Airlie authored
      - Stop using system_unbound_wq for preempt fences,
        as this can cause starvation when reaching more
        than max_active defined by workqueue
      - Fix saving unordered rebinding fences by attaching
        them as kernel feces to the vm's resv
      - Fix TLB invalidation fences completing out of order
      - Move rebind TLB invalidation to the ring ops to reduce
        the latency
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      
      From: Lucas De Marchi <lucas.demarchi@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/tizan6wdpxu4ayudeikjglxdgzmnhdzj3li3z2pgkierjtozzw@lbfddeg43a7h
      a5b5ab33
    • Dave Airlie's avatar
      Merge tag 'drm-misc-fixes-2024-04-04' of... · 4cf09f17
      Dave Airlie authored
      Merge tag 'drm-misc-fixes-2024-04-04' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
      
      Short summary of fixes pull:
      
      display:
      - fix typos in kerneldoc
      
      nouveau:
      - uvmm: fix remap address calculation
      - minor cleanups
      
      panfrost:
      - fix power-transition timeouts
      
      prime:
      - unbreak dma-buf export for virt-gpu
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      
      From: Thomas Zimmermann <tzimmermann@suse.de>
      Link: https://patchwork.freedesktop.org/patch/msgid/20240404104813.GA27376@localhost.localdomain
      4cf09f17
    • Sean Christopherson's avatar
      x86/cpufeatures: Add CPUID_LNX_5 to track recently added Linux-defined word · 8cb4a9a8
      Sean Christopherson authored
      Add CPUID_LNX_5 to track cpufeatures' word 21, and add the appropriate
      compile-time assert in KVM to prevent direct lookups on the features in
      CPUID_LNX_5.  KVM uses X86_FEATURE_* flags to manage guest CPUID, and so
      must translate features that are scattered by Linux from the Linux-defined
      bit to the hardware-defined bit, i.e. should never try to directly access
      scattered features in guest CPUID.
      
      Opportunistically add NR_CPUID_WORDS to enum cpuid_leafs, along with a
      compile-time assert in KVM's CPUID infrastructure to ensure that future
      additions update cpuid_leafs along with NCAPINTS.
      
      No functional change intended.
      
      Fixes: 7f274e60 ("x86/cpufeatures: Add new word for scattered features")
      Cc: Sandipan Das <sandipan.das@amd.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Acked-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8cb4a9a8
  2. 04 Apr, 2024 12 commits
    • Linus Torvalds's avatar
      Merge tag 'net-6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · c88b9b4c
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from netfilter, bluetooth and bpf.
      
        Fairly usual collection of driver and core fixes. The large selftest
        accompanying one of the fixes is also becoming a common occurrence.
      
        Current release - regressions:
      
         - ipv6: fix infinite recursion in fib6_dump_done()
      
         - net/rds: fix possible null-deref in newly added error path
      
        Current release - new code bugs:
      
         - net: do not consume a full cacheline for system_page_pool
      
         - bpf: fix bpf_arena-related file descriptor leaks in the verifier
      
         - drv: ice: fix freeing uninitialized pointers, fixing misuse of the
           newfangled __free() auto-cleanup
      
        Previous releases - regressions:
      
         - x86/bpf: fixes the BPF JIT with retbleed=stuff
      
         - xen-netfront: add missing skb_mark_for_recycle, fix page pool
           accounting leaks, revealed by recently added explicit warning
      
         - tcp: fix bind() regression for v6-only wildcard and v4-mapped-v6
           non-wildcard addresses
      
         - Bluetooth:
            - replace "hci_qca: Set BDA quirk bit if fwnode exists in DT" with
              better workarounds to un-break some buggy Qualcomm devices
            - set conn encrypted before conn establishes, fix re-connecting to
              some headsets which use slightly unusual sequence of msgs
      
         - mptcp:
            - prevent BPF accessing lowat from a subflow socket
            - don't account accept() of non-MPC client as fallback to TCP
      
         - drv: mana: fix Rx DMA datasize and skb_over_panic
      
         - drv: i40e: fix VF MAC filter removal
      
        Previous releases - always broken:
      
         - gro: various fixes related to UDP tunnels - netns crossing
           problems, incorrect checksum conversions, and incorrect packet
           transformations which may lead to panics
      
         - bpf: support deferring bpf_link dealloc to after RCU grace period
      
         - nf_tables:
            - release batch on table validation from abort path
            - release mutex after nft_gc_seq_end from abort path
            - flush pending destroy work before exit_net release
      
         - drv: r8169: skip DASH fw status checks when DASH is disabled"
      
      * tag 'net-6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (81 commits)
        netfilter: validate user input for expected length
        net/sched: act_skbmod: prevent kernel-infoleak
        net: usb: ax88179_178a: avoid the interface always configured as random address
        net: dsa: sja1105: Fix parameters order in sja1110_pcs_mdio_write_c45()
        net: ravb: Always update error counters
        net: ravb: Always process TX descriptor ring
        netfilter: nf_tables: discard table flag update with pending basechain deletion
        netfilter: nf_tables: Fix potential data-race in __nft_flowtable_type_get()
        netfilter: nf_tables: reject new basechain after table flag update
        netfilter: nf_tables: flush pending destroy work before exit_net release
        netfilter: nf_tables: release mutex after nft_gc_seq_end from abort path
        netfilter: nf_tables: release batch on table validation from abort path
        Revert "tg3: Remove residual error handling in tg3_suspend"
        tg3: Remove residual error handling in tg3_suspend
        net: mana: Fix Rx DMA datasize and skb_over_panic
        net/sched: fix lockdep splat in qdisc_tree_reduce_backlog()
        net: phy: micrel: lan8814: Fix when enabling/disabling 1-step timestamping
        net: stmmac: fix rx queue priority assignment
        net: txgbe: fix i2c dev name cannot match clkdev
        net: fec: Set mac_managed_pm during probe
        ...
      c88b9b4c
    • Linus Torvalds's avatar
      Merge tag 'bcachefs-2024-04-03' of https://evilpiepirate.org/git/bcachefs · ec25bd8d
      Linus Torvalds authored
      Pull bcachefs repair code from Kent Overstreet:
       "A couple more small fixes, and new repair code.
      
        We can now automatically recover from arbitrary corrupted interior
        btree nodes by scanning, and we can reconstruct metadata as needed to
        bring a filesystem back into a working, consistent, read-write state
        and preserve access to whatevver wasn't corrupted.
      
        Meaning - you can blow away all metadata except for extents and
        dirents leaf nodes, and repair will reconstruct everything else and
        give you your data, and under the correct paths. If inodes are missing
        i_size will be slightly off and permissions/ownership/timestamps will
        be gone, and we do still need the snapshots btree if snapshots were in
        use - in the future we'll be able to guess the snapshot tree structure
        in some situations.
      
        IOW - aside from shaking out remaining bugs (fuzz testing is still
        coming), repair code should be complete and if repair ever doesn't
        work that's the highest priority bug that I want to know about
        immediately.
      
        This patchset was kindly tested by a user from India who accidentally
        wiped one drive out of a three drive filesystem with no replication on
        the family computer - it took a couple weeks but we got everything
        important back"
      
      * tag 'bcachefs-2024-04-03' of https://evilpiepirate.org/git/bcachefs:
        bcachefs: reconstruct_inode()
        bcachefs: Subvolume reconstruction
        bcachefs: Check for extents that point to same space
        bcachefs: Reconstruct missing snapshot nodes
        bcachefs: Flag btrees with missing data
        bcachefs: Topology repair now uses nodes found by scanning to fill holes
        bcachefs: Repair pass for scanning for btree nodes
        bcachefs: Don't skip fake btree roots in fsck
        bcachefs: bch2_btree_root_alloc() -> bch2_btree_root_alloc_fake()
        bcachefs: Etyzinger cleanups
        bcachefs: bch2_shoot_down_journal_keys()
        bcachefs: Clear recovery_passes_required as they complete without errors
        bcachefs: ratelimit informational fsck errors
        bcachefs: Check for bad needs_discard before doing discard
        bcachefs: Improve bch2_btree_update_to_text()
        mean_and_variance: Drop always failing tests
        bcachefs: fix nocow lock deadlock
        bcachefs: BCH_WATERMARK_interior_updates
        bcachefs: Fix btree node reserve
      ec25bd8d
    • Stefan O'Rear's avatar
      riscv: process: Fix kernel gp leakage · d14fa1fc
      Stefan O'Rear authored
      childregs represents the registers which are active for the new thread
      in user context. For a kernel thread, childregs->gp is never used since
      the kernel gp is not touched by switch_to. For a user mode helper, the
      gp value can be observed in user space after execve or possibly by other
      means.
      
      [From the email thread]
      
      The /* Kernel thread */ comment is somewhat inaccurate in that it is also used
      for user_mode_helper threads, which exec a user process, e.g. /sbin/init or
      when /proc/sys/kernel/core_pattern is a pipe. Such threads do not have
      PF_KTHREAD set and are valid targets for ptrace etc. even before they exec.
      
      childregs is the *user* context during syscall execution and it is observable
      from userspace in at least five ways:
      
      1. kernel_execve does not currently clear integer registers, so the starting
         register state for PID 1 and other user processes started by the kernel has
         sp = user stack, gp = kernel __global_pointer$, all other integer registers
         zeroed by the memset in the patch comment.
      
         This is a bug in its own right, but I'm unwilling to bet that it is the only
         way to exploit the issue addressed by this patch.
      
      2. ptrace(PTRACE_GETREGSET): you can PTRACE_ATTACH to a user_mode_helper thread
         before it execs, but ptrace requires SIGSTOP to be delivered which can only
         happen at user/kernel boundaries.
      
      3. /proc/*/task/*/syscall: this is perfectly happy to read pt_regs for
         user_mode_helpers before the exec completes, but gp is not one of the
         registers it returns.
      
      4. PERF_SAMPLE_REGS_USER: LOCKDOWN_PERF normally prevents access to kernel
         addresses via PERF_SAMPLE_REGS_INTR, but due to this bug kernel addresses
         are also exposed via PERF_SAMPLE_REGS_USER which is permitted under
         LOCKDOWN_PERF. I have not attempted to write exploit code.
      
      5. Much of the tracing infrastructure allows access to user registers. I have
         not attempted to determine which forms of tracing allow access to user
         registers without already allowing access to kernel registers.
      
      Fixes: 7db91e57 ("RISC-V: Task implementation")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarStefan O'Rear <sorear@fastmail.com>
      Reviewed-by: default avatarAlexandre Ghiti <alexghiti@rivosinc.com>
      Link: https://lore.kernel.org/r/20240327061258.2370291-1-sorear@fastmail.comSigned-off-by: default avatarPalmer Dabbelt <palmer@rivosinc.com>
      d14fa1fc
    • Alexandre Ghiti's avatar
      riscv: Disable preemption when using patch_map() · a370c241
      Alexandre Ghiti authored
      patch_map() uses fixmap mappings to circumvent the non-writability of
      the kernel text mapping.
      
      The __set_fixmap() function only flushes the current cpu tlb, it does
      not emit an IPI so we must make sure that while we use a fixmap mapping,
      the current task is not migrated on another cpu which could miss the
      newly introduced fixmap mapping.
      
      So in order to avoid any task migration, disable the preemption.
      Reported-by: default avatarAndrea Parri <andrea@rivosinc.com>
      Closes: https://lore.kernel.org/all/ZcS+GAaM25LXsBOl@andrea/Reported-by: default avatarAndy Chiu <andy.chiu@sifive.com>
      Closes: https://lore.kernel.org/linux-riscv/CABgGipUMz3Sffu-CkmeUB1dKVwVQ73+7=sgC45-m0AE9RCjOZg@mail.gmail.com/
      Fixes: cad539ba ("riscv: implement a memset like function for text")
      Fixes: 0ff7c3b3 ("riscv: Use text_mutex instead of patch_lock")
      Co-developed-by: default avatarAndy Chiu <andy.chiu@sifive.com>
      Signed-off-by: default avatarAndy Chiu <andy.chiu@sifive.com>
      Signed-off-by: default avatarAlexandre Ghiti <alexghiti@rivosinc.com>
      Acked-by: default avatarPuranjay Mohan <puranjay12@gmail.com>
      Link: https://lore.kernel.org/r/20240326203017.310422-3-alexghiti@rivosinc.comSigned-off-by: default avatarPalmer Dabbelt <palmer@rivosinc.com>
      a370c241
    • Alexandre Ghiti's avatar
      riscv: Fix warning by declaring arch_cpu_idle() as noinstr · 8a48ea87
      Alexandre Ghiti authored
      The following warning appears when using ftrace:
      
      [89855.443413] RCU not on for: arch_cpu_idle+0x0/0x1c
      [89855.445640] WARNING: CPU: 5 PID: 0 at include/linux/trace_recursion.h:162 arch_ftrace_ops_list_func+0x208/0x228
      [89855.445824] Modules linked in: xt_conntrack(E) nft_chain_nat(E) xt_MASQUERADE(E) nf_conntrack_netlink(E) xt_addrtype(E) nft_compat(E) nf_tables(E) nfnetlink(E) br_netfilter(E) cfg80211(E) nls_iso8859_1(E) ofpart(E) redboot(E) cmdlinepart(E) cfi_cmdset_0001(E) virtio_net(E) cfi_probe(E) cfi_util(E) 9pnet_virtio(E) gen_probe(E) net_failover(E) virtio_rng(E) failover(E) 9pnet(E) physmap(E) map_funcs(E) chipreg(E) mtd(E) uio_pdrv_genirq(E) uio(E) dm_multipath(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) drm(E) efi_pstore(E) backlight(E) ip_tables(E) x_tables(E) raid10(E) raid456(E) async_raid6_recov(E) async_memcpy(E) async_pq(E) async_xor(E) xor(E) async_tx(E) raid6_pq(E) raid1(E) raid0(E) virtio_blk(E)
      [89855.451563] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G            E      6.8.0-rc6ubuntu-defconfig #2
      [89855.451726] Hardware name: riscv-virtio,qemu (DT)
      [89855.451899] epc : arch_ftrace_ops_list_func+0x208/0x228
      [89855.452016]  ra : arch_ftrace_ops_list_func+0x208/0x228
      [89855.452119] epc : ffffffff8016b216 ra : ffffffff8016b216 sp : ffffaf808090fdb0
      [89855.452171]  gp : ffffffff827c7680 tp : ffffaf808089ad40 t0 : ffffffff800c0dd8
      [89855.452216]  t1 : 0000000000000001 t2 : 0000000000000000 s0 : ffffaf808090fe30
      [89855.452306]  s1 : 0000000000000000 a0 : 0000000000000026 a1 : ffffffff82cd6ac8
      [89855.452423]  a2 : ffffffff800458c8 a3 : ffffaf80b1870640 a4 : 0000000000000000
      [89855.452646]  a5 : 0000000000000000 a6 : 00000000ffffffff a7 : ffffffffffffffff
      [89855.452698]  s2 : ffffffff82766872 s3 : ffffffff80004caa s4 : ffffffff80ebea90
      [89855.452743]  s5 : ffffaf808089bd40 s6 : 8000000a00006e00 s7 : 0000000000000008
      [89855.452787]  s8 : 0000000000002000 s9 : 0000000080043700 s10: 0000000000000000
      [89855.452831]  s11: 0000000000000000 t3 : 0000000000100000 t4 : 0000000000000064
      [89855.452874]  t5 : 000000000000000c t6 : ffffaf80b182dbfc
      [89855.452929] status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003
      [89855.453053] [<ffffffff8016b216>] arch_ftrace_ops_list_func+0x208/0x228
      [89855.453191] [<ffffffff8000e082>] ftrace_call+0x8/0x22
      [89855.453265] [<ffffffff800a149c>] do_idle+0x24c/0x2ca
      [89855.453357] [<ffffffff8000da54>] return_to_handler+0x0/0x26
      [89855.453429] [<ffffffff8000b716>] smp_callin+0x92/0xb6
      [89855.453785] ---[ end trace 0000000000000000 ]---
      
      To fix this, mark arch_cpu_idle() as noinstr, like it is done in commit
      a9cbc1b4 ("s390/idle: mark arch_cpu_idle() noinstr").
      Reported-by: default avatarEvgenii Shatokhin <e.shatokhin@yadro.com>
      Closes: https://lore.kernel.org/linux-riscv/51f21b87-ebed-4411-afbc-c00d3dea2bab@yadro.com/
      Fixes: cfbc4f81 ("riscv: Select ARCH_WANTS_NO_INSTR")
      Signed-off-by: default avatarAlexandre Ghiti <alexghiti@rivosinc.com>
      Reviewed-by: default avatarAndy Chiu <andy.chiu@sifive.com>
      Tested-by: default avatarAndy Chiu <andy.chiu@sifive.com>
      Acked-by: default avatarPuranjay Mohan <puranjay12@gmail.com>
      Link: https://lore.kernel.org/r/20240326203017.310422-2-alexghiti@rivosinc.comSigned-off-by: default avatarPalmer Dabbelt <palmer@rivosinc.com>
      8a48ea87
    • Andreas Schwab's avatar
      riscv: use KERN_INFO in do_trap · dd33e5dc
      Andreas Schwab authored
      Print the instruction dump with info instead of emergency level.  The
      unhandled signal message is only for informational purpose.
      
      Fixes: b8a03a63 ("riscv: add userland instruction dump to RISC-V splats")
      Signed-off-by: default avatarAndreas Schwab <schwab@suse.de>
      Reviewed-by: default avatarConor Dooley <conor.dooley@microchip.com>
      Reviewed-by: default avatarAtish Patra <atishp@rivosinc.com>
      Reviewed-by: default avatarYunhui Cui <cuiyunhui@bytedance.com>
      Link: https://lore.kernel.org/r/mvmy1aegrhm.fsf@suse.deSigned-off-by: default avatarPalmer Dabbelt <palmer@rivosinc.com>
      dd33e5dc
    • Chaitanya Kumar Borah's avatar
      ASoC: SOF: Core: Add remove_late() to sof_init_environment failure path · 90f8917e
      Chaitanya Kumar Borah authored
      In cases where the sof driver is unable to find the firmware and/or
      topology file [1], it exits without releasing the i915 runtime
      pm wakeref [2]. This results in dmesg warnings[3] during
      suspend/resume or driver unbind. Add remove_late() to the failure path
      of sof_init_environment so that i915 wakeref is released appropriately
      
      [1]
      
      [    8.990366] sof-audio-pci-intel-mtl 0000:00:1f.3: SOF firmware and/or topology file not found.
      [    8.990396] sof-audio-pci-intel-mtl 0000:00:1f.3: Supported default profiles
      [    8.990398] sof-audio-pci-intel-mtl 0000:00:1f.3: - ipc type 1 (Requested):
      [    8.990399] sof-audio-pci-intel-mtl 0000:00:1f.3:  Firmware file: intel/sof-ipc4/mtl/sof-mtl.ri
      [    8.990401] sof-audio-pci-intel-mtl 0000:00:1f.3:  Topology file: intel/sof-ace-tplg/sof-mtl-rt711-2ch.tplg
      [    8.990402] sof-audio-pci-intel-mtl 0000:00:1f.3: Check if you have 'sof-firmware' package installed.
      [    8.990403] sof-audio-pci-intel-mtl 0000:00:1f.3: Optionally it can be manually downloaded from:
      [    8.990404] sof-audio-pci-intel-mtl 0000:00:1f.3:    https://github.com/thesofproject/sof-bin/
      [    8.999088] sof-audio-pci-intel-mtl 0000:00:1f.3: error: sof_probe_work failed err: -2
      
      [2]
      
      ref_tracker: 0000:00:02.0@ffff9b8511b6a378 has 1/5 users at
           track_intel_runtime_pm_wakeref.part.0+0x36/0x70 [i915]
           __intel_runtime_pm_get+0x51/0xb0 [i915]
           intel_runtime_pm_get+0x17/0x20 [i915]
           intel_display_power_get+0x2f/0x70 [i915]
           i915_audio_component_get_power+0x23/0x120 [i915]
           snd_hdac_display_power+0x89/0x130 [snd_hda_core]
           hda_codec_i915_init+0x3f/0x50 [snd_sof_intel_hda]
           hda_dsp_probe_early+0x170/0x250 [snd_sof_intel_hda_common]
           snd_sof_device_probe+0x224/0x320 [snd_sof]
           sof_pci_probe+0x15b/0x220 [snd_sof_pci]
           hda_pci_intel_probe+0x30/0x70 [snd_sof_intel_hda_common]
           local_pci_probe+0x4c/0xb0
           pci_device_probe+0xcc/0x250
           really_probe+0x18e/0x420
           __driver_probe_device+0x7e/0x170
           driver_probe_device+0x23/0xa0
      
      [3]
      [  484.105070] ------------[ cut here ]------------
      [  484.108238] thunderbolt 0000:00:0d.2: PM: pci_pm_suspend_late+0x0/0x50 returned 0 after 0 usecs
      [  484.117106] i915 0000:00:02.0: i915 raw-wakerefs=1 wakelocks=1 on cleanup
      [  484.792005] WARNING: CPU: 2 PID: 2405 at drivers/gpu/drm/i915/intel_runtime_pm.c:444 intel_runtime_pm_driver_release+0x6c/0x80
      Tested-by: default avatarRodrigo Vivi <rodrigo.vivi@intel.com>
      Reviewed-by: default avatarRodrigo Vivi <rodrigo.vivi@intel.com>
      Reviewed-by: default avatarBard Liao <yung-chuan.liao@linux.intel.com>
      Reviewed-by: default avatarPéter Ujfalusi <peter.ujfalusi@linux.intel.com>
      Reviewed-by: default avatarKai Vehmanen <kai.vehmanen@linux.intel.com>
      Signed-off-by: default avatarChaitanya Kumar Borah <chaitanya.kumar.borah@intel.com>
      Signed-off-by: default avatarPierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
      Acked-by: default avatarLucas De Marchi <lucas.demarchi@intel.com>
      Link: https://github.com/thesofproject/linux/pull/4878Signed-off-by: default avatarRodrigo Vivi <rodrigo.vivi@intel.com>
      Link: https://msgid.link/r/20240404184813.134566-1-pierre-louis.bossart@linux.intel.comSigned-off-by: default avatarMark Brown <broonie@kernel.org>
      90f8917e
    • Jakub Kicinski's avatar
      Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 1cfa2f10
      Jakub Kicinski authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2024-04-04
      
      We've added 7 non-merge commits during the last 5 day(s) which contain
      a total of 9 files changed, 75 insertions(+), 24 deletions(-).
      
      The main changes are:
      
      1) Fix x86 BPF JIT under retbleed=stuff which causes kernel panics due to
         incorrect destination IP calculation and incorrect IP for relocations,
         from Uros Bizjak and Joan Bruguera Micó.
      
      2) Fix BPF arena file descriptor leaks in the verifier,
         from Anton Protopopov.
      
      3) Defer bpf_link deallocation to after RCU grace period as currently
         running multi-{kprobes,uprobes} programs might still access cookie
         information from the link, from Andrii Nakryiko.
      
      4) Fix a BPF sockmap lock inversion deadlock in map_delete_elem reported
         by syzkaller, from Jakub Sitnicki.
      
      5) Fix resolve_btfids build with musl libc due to missing linux/types.h
         include, from Natanael Copa.
      
      * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
        bpf, sockmap: Prevent lock inversion deadlock in map delete elem
        x86/bpf: Fix IP for relocating call depth accounting
        x86/bpf: Fix IP after emitting call depth accounting
        bpf: fix possible file descriptor leaks in verifier
        tools/resolve_btfids: fix build with musl libc
        bpf: support deferring bpf_link dealloc to after RCU grace period
        bpf: put uprobe link's path and task in release callback
      ====================
      
      Link: https://lore.kernel.org/r/20240404183258.4401-1-daniel@iogearbox.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1cfa2f10
    • Vincent Guittot's avatar
      PM: EM: fix wrong utilization estimation in em_cpu_energy() · 8130b05c
      Vincent Guittot authored
      Commit 1b600da5 ("PM: EM: Optimize em_cpu_energy() and remove division")
      has added back map_util_perf() in em_cpu_energy() computation which has
      been removed with the rework of scheduler/cpufreq interface.
      This is wrong because sugov_effective_cpu_perf() already takes care of
      mapping the utilization to a performance level.
      
      Fixes: 1b600da5 ("PM: EM: Optimize em_cpu_energy() and remove division")
      Signed-off-by: default avatarVincent Guittot <vincent.guittot@linaro.org>
      Reviewed-by: default avatarLukasz Luba <lukasz.luba@arm.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      8130b05c
    • Kent Gibson's avatar
      gpio: cdev: fix missed label sanitizing in debounce_setup() · 83092341
      Kent Gibson authored
      When adding sanitization of the label, the path through
      edge_detector_setup() that leads to debounce_setup() was overlooked.
      A request taking this path does not allocate a new label and the
      request label is freed twice when the request is released, resulting
      in memory corruption.
      
      Add label sanitization to debounce_setup().
      
      Cc: stable@vger.kernel.org
      Fixes: b3449087 ("gpio: cdev: sanitize the label before requesting the interrupt")
      Signed-off-by: default avatarKent Gibson <warthog618@gmail.com>
      [Bartosz: rebased on top of the fix for empty GPIO labels]
      Co-developed-by: default avatarBartosz Golaszewski <bartosz.golaszewski@linaro.org>
      Signed-off-by: default avatarBartosz Golaszewski <bartosz.golaszewski@linaro.org>
      83092341
    • Eric Dumazet's avatar
      netfilter: validate user input for expected length · 0c83842d
      Eric Dumazet authored
      I got multiple syzbot reports showing old bugs exposed
      by BPF after commit 20f2505f ("bpf: Try to avoid kzalloc
      in cgroup/{s,g}etsockopt")
      
      setsockopt() @optlen argument should be taken into account
      before copying data.
      
       BUG: KASAN: slab-out-of-bounds in copy_from_sockptr_offset include/linux/sockptr.h:49 [inline]
       BUG: KASAN: slab-out-of-bounds in copy_from_sockptr include/linux/sockptr.h:55 [inline]
       BUG: KASAN: slab-out-of-bounds in do_replace net/ipv4/netfilter/ip_tables.c:1111 [inline]
       BUG: KASAN: slab-out-of-bounds in do_ipt_set_ctl+0x902/0x3dd0 net/ipv4/netfilter/ip_tables.c:1627
      Read of size 96 at addr ffff88802cd73da0 by task syz-executor.4/7238
      
      CPU: 1 PID: 7238 Comm: syz-executor.4 Not tainted 6.9.0-rc2-next-20240403-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
      Call Trace:
       <TASK>
        __dump_stack lib/dump_stack.c:88 [inline]
        dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
        print_address_description mm/kasan/report.c:377 [inline]
        print_report+0x169/0x550 mm/kasan/report.c:488
        kasan_report+0x143/0x180 mm/kasan/report.c:601
        kasan_check_range+0x282/0x290 mm/kasan/generic.c:189
        __asan_memcpy+0x29/0x70 mm/kasan/shadow.c:105
        copy_from_sockptr_offset include/linux/sockptr.h:49 [inline]
        copy_from_sockptr include/linux/sockptr.h:55 [inline]
        do_replace net/ipv4/netfilter/ip_tables.c:1111 [inline]
        do_ipt_set_ctl+0x902/0x3dd0 net/ipv4/netfilter/ip_tables.c:1627
        nf_setsockopt+0x295/0x2c0 net/netfilter/nf_sockopt.c:101
        do_sock_setsockopt+0x3af/0x720 net/socket.c:2311
        __sys_setsockopt+0x1ae/0x250 net/socket.c:2334
        __do_sys_setsockopt net/socket.c:2343 [inline]
        __se_sys_setsockopt net/socket.c:2340 [inline]
        __x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2340
       do_syscall_64+0xfb/0x240
       entry_SYSCALL_64_after_hwframe+0x72/0x7a
      RIP: 0033:0x7fd22067dde9
      Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007fd21f9ff0c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
      RAX: ffffffffffffffda RBX: 00007fd2207abf80 RCX: 00007fd22067dde9
      RDX: 0000000000000040 RSI: 0000000000000000 RDI: 0000000000000003
      RBP: 00007fd2206ca47a R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000020000880 R11: 0000000000000246 R12: 0000000000000000
      R13: 000000000000000b R14: 00007fd2207abf80 R15: 00007ffd2d0170d8
       </TASK>
      
      Allocated by task 7238:
        kasan_save_stack mm/kasan/common.c:47 [inline]
        kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
        poison_kmalloc_redzone mm/kasan/common.c:370 [inline]
        __kasan_kmalloc+0x98/0xb0 mm/kasan/common.c:387
        kasan_kmalloc include/linux/kasan.h:211 [inline]
        __do_kmalloc_node mm/slub.c:4069 [inline]
        __kmalloc_noprof+0x200/0x410 mm/slub.c:4082
        kmalloc_noprof include/linux/slab.h:664 [inline]
        __cgroup_bpf_run_filter_setsockopt+0xd47/0x1050 kernel/bpf/cgroup.c:1869
        do_sock_setsockopt+0x6b4/0x720 net/socket.c:2293
        __sys_setsockopt+0x1ae/0x250 net/socket.c:2334
        __do_sys_setsockopt net/socket.c:2343 [inline]
        __se_sys_setsockopt net/socket.c:2340 [inline]
        __x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2340
       do_syscall_64+0xfb/0x240
       entry_SYSCALL_64_after_hwframe+0x72/0x7a
      
      The buggy address belongs to the object at ffff88802cd73da0
       which belongs to the cache kmalloc-8 of size 8
      The buggy address is located 0 bytes inside of
       allocated 1-byte region [ffff88802cd73da0, ffff88802cd73da1)
      
      The buggy address belongs to the physical page:
      page: refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88802cd73020 pfn:0x2cd73
      flags: 0xfff80000000000(node=0|zone=1|lastcpupid=0xfff)
      page_type: 0xffffefff(slab)
      raw: 00fff80000000000 ffff888015041280 dead000000000100 dead000000000122
      raw: ffff88802cd73020 000000008080007f 00000001ffffefff 0000000000000000
      page dumped because: kasan: bad access detected
      page_owner tracks the page as allocated
      page last allocated via order 0, migratetype Unmovable, gfp_mask 0x12cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY), pid 5103, tgid 2119833701 (syz-executor.4), ts 5103, free_ts 70804600828
        set_page_owner include/linux/page_owner.h:32 [inline]
        post_alloc_hook+0x1f3/0x230 mm/page_alloc.c:1490
        prep_new_page mm/page_alloc.c:1498 [inline]
        get_page_from_freelist+0x2e7e/0x2f40 mm/page_alloc.c:3454
        __alloc_pages_noprof+0x256/0x6c0 mm/page_alloc.c:4712
        __alloc_pages_node_noprof include/linux/gfp.h:244 [inline]
        alloc_pages_node_noprof include/linux/gfp.h:271 [inline]
        alloc_slab_page+0x5f/0x120 mm/slub.c:2249
        allocate_slab+0x5a/0x2e0 mm/slub.c:2412
        new_slab mm/slub.c:2465 [inline]
        ___slab_alloc+0xcd1/0x14b0 mm/slub.c:3615
        __slab_alloc+0x58/0xa0 mm/slub.c:3705
        __slab_alloc_node mm/slub.c:3758 [inline]
        slab_alloc_node mm/slub.c:3936 [inline]
        __do_kmalloc_node mm/slub.c:4068 [inline]
        kmalloc_node_track_caller_noprof+0x286/0x450 mm/slub.c:4089
        kstrdup+0x3a/0x80 mm/util.c:62
        device_rename+0xb5/0x1b0 drivers/base/core.c:4558
        dev_change_name+0x275/0x860 net/core/dev.c:1232
        do_setlink+0xa4b/0x41f0 net/core/rtnetlink.c:2864
        __rtnl_newlink net/core/rtnetlink.c:3680 [inline]
        rtnl_newlink+0x180b/0x20a0 net/core/rtnetlink.c:3727
        rtnetlink_rcv_msg+0x89b/0x10d0 net/core/rtnetlink.c:6594
        netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2559
        netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
        netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1361
      page last free pid 5146 tgid 5146 stack trace:
        reset_page_owner include/linux/page_owner.h:25 [inline]
        free_pages_prepare mm/page_alloc.c:1110 [inline]
        free_unref_page+0xd3c/0xec0 mm/page_alloc.c:2617
        discard_slab mm/slub.c:2511 [inline]
        __put_partials+0xeb/0x130 mm/slub.c:2980
        put_cpu_partial+0x17c/0x250 mm/slub.c:3055
        __slab_free+0x2ea/0x3d0 mm/slub.c:4254
        qlink_free mm/kasan/quarantine.c:163 [inline]
        qlist_free_all+0x9e/0x140 mm/kasan/quarantine.c:179
        kasan_quarantine_reduce+0x14f/0x170 mm/kasan/quarantine.c:286
        __kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:322
        kasan_slab_alloc include/linux/kasan.h:201 [inline]
        slab_post_alloc_hook mm/slub.c:3888 [inline]
        slab_alloc_node mm/slub.c:3948 [inline]
        __do_kmalloc_node mm/slub.c:4068 [inline]
        __kmalloc_node_noprof+0x1d7/0x450 mm/slub.c:4076
        kmalloc_node_noprof include/linux/slab.h:681 [inline]
        kvmalloc_node_noprof+0x72/0x190 mm/util.c:634
        bucket_table_alloc lib/rhashtable.c:186 [inline]
        rhashtable_rehash_alloc+0x9e/0x290 lib/rhashtable.c:367
        rht_deferred_worker+0x4e1/0x2440 lib/rhashtable.c:427
        process_one_work kernel/workqueue.c:3218 [inline]
        process_scheduled_works+0xa2c/0x1830 kernel/workqueue.c:3299
        worker_thread+0x86d/0xd70 kernel/workqueue.c:3380
        kthread+0x2f0/0x390 kernel/kthread.c:388
        ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
        ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243
      
      Memory state around the buggy address:
       ffff88802cd73c80: 07 fc fc fc 05 fc fc fc 05 fc fc fc fa fc fc fc
       ffff88802cd73d00: fa fc fc fc fa fc fc fc fa fc fc fc fa fc fc fc
      >ffff88802cd73d80: fa fc fc fc 01 fc fc fc fa fc fc fc fa fc fc fc
                                     ^
       ffff88802cd73e00: fa fc fc fc fa fc fc fc 05 fc fc fc 07 fc fc fc
       ffff88802cd73e80: 07 fc fc fc 07 fc fc fc 07 fc fc fc 07 fc fc fc
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Link: https://lore.kernel.org/r/20240404122051.2303764-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0c83842d
    • Jakub Kicinski's avatar
      Merge tag 'nf-24-04-04' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf · d432f7bd
      Jakub Kicinski authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for net:
      
      Patch #1 unlike early commit path stage which triggers a call to abort,
               an explicit release of the batch is required on abort, otherwise
               mutex is released and commit_list remains in place.
      
      Patch #2 release mutex after nft_gc_seq_end() in commit path, otherwise
               async GC worker could collect expired objects.
      
      Patch #3 flush pending destroy work in module removal path, otherwise UaF
               is possible.
      
      Patch #4 and #6 restrict the table dormant flag with basechain updates
      	 to fix state inconsistency in the hook registration.
      
      Patch #5 adds missing RCU read side lock to flowtable type to avoid races
      	 with module removal.
      
      * tag 'nf-24-04-04' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
        netfilter: nf_tables: discard table flag update with pending basechain deletion
        netfilter: nf_tables: Fix potential data-race in __nft_flowtable_type_get()
        netfilter: nf_tables: reject new basechain after table flag update
        netfilter: nf_tables: flush pending destroy work before exit_net release
        netfilter: nf_tables: release mutex after nft_gc_seq_end from abort path
        netfilter: nf_tables: release batch on table validation from abort path
      ====================
      
      Link: https://lore.kernel.org/r/20240404104334.1627-1-pablo@netfilter.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d432f7bd