- 23 May, 2024 2 commits
-
-
Alexandre Ghiti authored
Commit c97bf629 ("riscv: Fix text patching when IPI are used") converted ftrace_make_nop() to use patch_insn_write() which does not emit any icache flush relying entirely on __ftrace_modify_code() to do that. But we missed that ftrace_make_nop() was called very early directly when converting mcount calls into nops (actually on riscv it converts 2B nops emitted by the compiler into 4B nops). This caused crashes on multiple HW as reported by Conor and Björn since the booting core could have half-patched instructions in its icache which would trigger an illegal instruction trap: fix this by emitting a local flush icache when early patching nops. Fixes: c97bf629 ("riscv: Fix text patching when IPI are used") Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reported-by: Conor Dooley <conor.dooley@microchip.com> Tested-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Tested-by: Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/r/20240523115134.70380-1-alexghiti@rivosinc.comSigned-off-by: Palmer Dabbelt <palmer@rivosinc.com>
-
Palmer Dabbelt authored
There was a semantic conflict between 21a8f8a0 ("irqchip: Add RISC-V incoming MSI controller early driver") and dc892fb4 ("riscv: Use IPIs for remote cache/TLB flushes by default") due to an API change. This manifests as a build failure post-merge. Reported-by: Tomasz Jeznach <tjeznach@rivosinc.com> Link: https://lore.kernel.org/all/mhng-10b71228-cf3e-42ca-9abf-5464b15093f1@palmer-ri-x1c9/ Fixes: 0bfbc914 ("Merge tag 'riscv-for-linus-6.10-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux") Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20240522184953.28531-3-palmer@rivosinc.comSigned-off-by: Palmer Dabbelt <palmer@rivosinc.com>
-
- 22 May, 2024 25 commits
-
-
Palmer Dabbelt authored
Charlie Jenkins <charlie@rivosinc.com> says: This series contains two minor fixes for the extension parsing in cpufeature.c. Some T-Head boards without vector 1.0 support report "v" in the isa string in their DT which will cause the kernel to run vector code. The code to blacklist "v" from these boards was doing so by using riscv_cached_mvendorid() which has not been populated at the time of extension parsing. This fix instead greedily reads the mvendorid CSR of the boot hart to determine if the cpu is from T-Head. The other fix is for an incorrect indexing bug. riscv extensions sometimes imply other extensions. When adding these "subset" extensions to the hardware capabilities array, they need to be checked if they are valid. The current code only checks if the extension that is including other extensions is valid and not the subset extensions. These patches were previously included in: https://lore.kernel.org/lkml/20240420-dev-charlie-support_thead_vector_6_9-v3-0-67cff4271d1d@rivosinc.com/ * b4-shazam-merge: riscv: cpufeature: Fix extension subset checking riscv: cpufeature: Fix thead vector hwcap removal Link: https://lore.kernel.org/r/20240502-cpufeature_fixes-v4-0-b3d1a088722d@rivosinc.comSigned-off-by: Palmer Dabbelt <palmer@rivosinc.com>
-
Charlie Jenkins authored
Add two tests to check vector save/restore when a signal is received during a vector routine. One test ensures that a value is not clobbered during signal handling. The other verifies that vector registers modified in the signal handler are properly reflected when the signal handling is complete. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Reviewed-by: Andy Chiu <andy.chiu@sifive.com> Tested-by: Andy Chiu <andy.chiu@sifive.com> Link: https://lore.kernel.org/r/20240403-vector_sigreturn_tests-v1-1-2e68b7a3b8d7@rivosinc.comSigned-off-by: Palmer Dabbelt <palmer@rivosinc.com>
-
Kefeng Wang authored
The access_error() of vma already checked under per-VMA lock, if it is a bad access, directly handle error, no need to retry with mmap_lock again. Since the page faut is handled under per-VMA lock, count it as a vma lock event with VMA_LOCK_SUCCESS. Reviewed-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20240403083805.1818160-6-wangkefeng.wang@huawei.comSigned-off-by: Palmer Dabbelt <palmer@rivosinc.com>
-
Xiao Wang authored
The bytes copy for unaligned head would cover at most SZREG-1 bytes, so it's better to set the threshold as >= (SZREG-1 + word_copy stride size) which equals to 9*SZREG-1. Signed-off-by: Xiao Wang <xiao.w.wang@intel.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20240313091929.4029960-1-xiao.w.wang@intel.comSigned-off-by: Palmer Dabbelt <palmer@rivosinc.com>
-
Xiao Wang authored
When the dst buffer pointer points to the last accessible aligned addr, we could still run another iteration of unrolled copy. Signed-off-by: Xiao Wang <xiao.w.wang@intel.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20240313103334.4036554-1-xiao.w.wang@intel.comSigned-off-by: Palmer Dabbelt <palmer@rivosinc.com>
-
Xingyou Chen authored
Signed-off-by: Xingyou Chen <rockrush@rockwork.org> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20240317055556.9449-1-rockrush@rockwork.orgSigned-off-by: Palmer Dabbelt <palmer@rivosinc.com>
-
Zhao Ke authored
The declaration of set_cpu_online() takes a bool value. So replace int here to make it consistent with the declaration. Signed-off-by: Zhao Ke <ke.zhao@shingroup.cn> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Link: https://lore.kernel.org/r/20240318065404.123668-1-ke.zhao@shingroup.cnSigned-off-by: Palmer Dabbelt <palmer@rivosinc.com>
-
Charlie Jenkins authored
The cbo and which-cpu hwprobe selftests leave their artifacts in the kernel tree and end up being tracked by git. Add the binaries to the hwprobe selftest .gitignore so this no longer happens. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Fixes: a29e2a48 ("RISC-V: selftests: Add CBO tests") Fixes: ef7d6abb ("RISC-V: selftests: Add which-cpus hwprobe test") Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20240425-gitignore_hwprobe_artifacts-v1-1-dfc5a20da469@rivosinc.comSigned-off-by: Palmer Dabbelt <palmer@rivosinc.com>
-
Palmer Dabbelt authored
Nam Cao <namcao@linutronix.de> says: The debug_pagealloc feature is not functional on RISCV. With this feature enabled (CONFIG_DEBUG_PAGEALLOC=y and debug_pagealloc=on), kernel crashes early during boot. QEMU command that can reproduce this problem: qemu-system-riscv64 -machine virt \ -kernel Image \ -append "console=ttyS0 root=/dev/vda debug_pagealloc=on" \ -nographic \ -drive "file=root.img,format=raw,id=hd0" \ -device virtio-blk-device,drive=hd0 \ -m 4G \ This series makes debug_pagealloc functional. * b4-shazam-merge: riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context riscv: force PAGE_SIZE linear mapping if debug_pagealloc is enabled Link: https://lore.kernel.org/r/cover.1715750938.git.namcao@linutronix.deSigned-off-by: Palmer Dabbelt <palmer@rivosinc.com>
-
Matthew Bystrin authored
If the load access fault occures in a leaf function (with CONFIG_FRAME_POINTER=y), when wrong stack trace will be displayed: [<ffffffff804853c2>] regmap_mmio_read32le+0xe/0x1c ---[ end trace 0000000000000000 ]--- Registers dump: ra 0xffffffff80485758 <regmap_mmio_read+36> sp 0xffffffc80200b9a0 fp 0xffffffc80200b9b0 pc 0xffffffff804853ba <regmap_mmio_read32le+6> Stack dump: 0xffffffc80200b9a0: 0xffffffc80200b9e0 0xffffffc80200b9e0 0xffffffc80200b9b0: 0xffffffff8116d7e8 0x0000000000000100 0xffffffc80200b9c0: 0xffffffd8055b9400 0xffffffd8055b9400 0xffffffc80200b9d0: 0xffffffc80200b9f0 0xffffffff8047c526 0xffffffc80200b9e0: 0xffffffc80200ba30 0xffffffff8047fe9a The assembler dump of the function preambula: add sp,sp,-16 sd s0,8(sp) add s0,sp,16 In the fist stack frame, where ra is not stored on the stack we can observe: 0(sp) 8(sp) .---------------------------------------------. sp->| frame->fp | frame->ra (saved fp) | |---------------------------------------------| fp->| .... | .... | |---------------------------------------------| | | | and in the code check is performed: if (regs && (regs->epc == pc) && (frame->fp & 0x7)) I see no reason to check frame->fp value at all, because it is can be uninitialized value on the stack. A better way is to check frame->ra to be an address on the stack. After the stacktrace shows as expect: [<ffffffff804853c2>] regmap_mmio_read32le+0xe/0x1c [<ffffffff80485758>] regmap_mmio_read+0x24/0x52 [<ffffffff8047c526>] _regmap_bus_reg_read+0x1a/0x22 [<ffffffff8047fe9a>] _regmap_read+0x5c/0xea [<ffffffff80480376>] _regmap_update_bits+0x76/0xc0 ... ---[ end trace 0000000000000000 ]--- As pointed by Samuel Holland it is incorrect to remove check of the stackframe entirely. Changes since v2 [2]: - Add accidentally forgotten curly brace Changes since v1 [1]: - Instead of just dropping frame->fp check, replace it with validation of frame->ra, which should be a stack address. - Move frame pointer validation into the separate function. [1] https://lore.kernel.org/linux-riscv/20240426072701.6463-1-dev.mbstr@gmail.com/ [2] https://lore.kernel.org/linux-riscv/20240521131314.48895-1-dev.mbstr@gmail.com/ Fixes: f766f77a ("riscv/stacktrace: Fix stack output without ra on the stack top") Signed-off-by: Matthew Bystrin <dev.mbstr@gmail.com> Reviewed-by: Samuel Holland <samuel.holland@sifive.com> Link: https://lore.kernel.org/r/20240521191727.62012-1-dev.mbstr@gmail.comSigned-off-by: Palmer Dabbelt <palmer@rivosinc.com>
-
Puranjay Mohan authored
This commit replaces riscv's support for FTRACE_WITH_REGS with support for FTRACE_WITH_ARGS. This is required for the ongoing effort to stop relying on stop_machine() for RISCV's implementation of ftrace. The main relevant benefit that this change will bring for the above use-case is that now we don't have separate ftrace_caller and ftrace_regs_caller trampolines. This will allow the callsite to call ftrace_caller by modifying a single instruction. Now the callsite can do something similar to: When not tracing: | When tracing: func: func: auipc t0, ftrace_caller_top auipc t0, ftrace_caller_top nop <=========<Enable/Disable>=========> jalr t0, ftrace_caller_bottom [...] [...] The above assumes that we are dropping the support of calling a direct trampoline from the callsite. We need to drop this as the callsite can't change the target address to call, it can only enable/disable a call to a preset target (ftrace_caller in the above diagram). We can later optimize this by calling an intermediate dispatcher trampoline before ftrace_caller. Currently, ftrace_regs_caller saves all CPU registers in the format of struct pt_regs and allows the tracer to modify them. We don't need to save all of the CPU registers because at function entry only a subset of pt_regs is live: |----------+----------+---------------------------------------------| | Register | ABI Name | Description | |----------+----------+---------------------------------------------| | x1 | ra | Return address for traced function | | x2 | sp | Stack pointer | | x5 | t0 | Return address for ftrace_caller trampoline | | x8 | s0/fp | Frame pointer | | x10-11 | a0-1 | Function arguments/return values | | x12-17 | a2-7 | Function arguments | |----------+----------+---------------------------------------------| See RISCV calling convention[1] for the above table. Saving just the live registers decreases the amount of stack space required from 288 Bytes to 112 Bytes. Basic testing was done with this on the VisionFive 2 development board. Note: - Moving from REGS to ARGS will mean that RISCV will stop supporting KPROBES_ON_FTRACE as it requires full pt_regs to be saved. - KPROBES_ON_FTRACE will be supplanted by FPROBES see [2]. [1] https://riscv.org/wp-content/uploads/2015/01/riscv-calling.pdf [2] https://lore.kernel.org/all/170887410337.564249.6360118840946697039.stgit@devnote2/Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Tested-by: Björn Töpel <bjorn@rivosinc.com> Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/r/20240405142453.4187-1-puranjay@kernel.orgSigned-off-by: Palmer Dabbelt <palmer@rivosinc.com>
-
Palmer Dabbelt authored
Samuel Holland <samuel.holland@sifive.com> says: This series optimizes access_ok() by defining TASK_SIZE_MAX. At Alex's suggestion, I also tried making TASK_SIZE constant (specifically by making PGDIR_SHIFT a variable instead of a ternary expression, then replacing the load with an immediate using ALTERNATIVE). This appeared to slightly improve performance on some implementations (C906) but regressed it on others (FU740). So I am leaving further optimizations to a later series. * b4-shazam-merge: riscv: Define TASK_SIZE_MAX for __access_ok() riscv: Remove PGDIR_SIZE_L3 and TASK_SIZE_MIN Link: https://lore.kernel.org/r/20240327143858.711792-1-samuel.holland@sifive.comSigned-off-by: Palmer Dabbelt <palmer@rivosinc.com>
-
Qingfang Deng authored
Since commit aad15bc8 ("riscv: Change code model of module to medany to improve data accessing"), kernel modules have not been built with -fPIC, so they wouldn't have R_RISCV_GOT_HI20 or R_RISCV_CALL_PLT relocations, and handling of those relocations is unnecessary. If RELOCATABLE=y, kernel modules will be built with -fPIE, which would reintroduce said relocations, so only select MODULE_SECTIONS when RELOCATABLE. Signed-off-by: Qingfang Deng <qingfang.deng@siflower.com.cn> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Link: https://lore.kernel.org/r/20240511015725.1162-1-dqfext@gmail.comSigned-off-by: Palmer Dabbelt <palmer@rivosinc.com>
-
Emil Renner Berthing authored
Define the archhelp variable so that 'make ACRH=riscv help' will show the targets specific to building a RISC-V kernel like other architectures. Tested-by: Björn Töpel <bjorn@rivosinc.com> Signed-off-by: Emil Renner Berthing <emil.renner.berthing@canonical.com> Reviewed-by: Masahiro Yamada <masahiroy@kernel.org> Link: https://lore.kernel.org/r/20240504193446.196886-3-emil.renner.berthing@canonical.comSigned-off-by: Palmer Dabbelt <palmer@rivosinc.com>
-
Emil Renner Berthing authored
Previously the build process would always set KBUILD_IMAGE to the uncompressed Image file (unless XIP_KERNEL or EFI_ZBOOT was enabled) and unconditionally compress it into Image.gz. However there are already build targets for Image.bz2, Image.lz4, Image.lzma, Image.lzo and Image.zstd, so let's make use of those, make the compression method configurable and set KBUILD_IMAGE accordingly so that targets like 'make install' and 'make bindeb-pkg' will use the chosen image. Tested-by: Björn Töpel <bjorn@rivosinc.com> Signed-off-by: Emil Renner Berthing <emil.renner.berthing@canonical.com> Reviewed-by: Nicolas Schier <n.schier@avm.de> Reviewed-by: Masahiro Yamada <masahiroy@kernel.org> Link: https://lore.kernel.org/r/20240504193446.196886-2-emil.renner.berthing@canonical.comSigned-off-by: Palmer Dabbelt <palmer@rivosinc.com>
-
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linuxLinus Torvalds authored
Pull RISC-V updates from Palmer Dabbelt: - Add byte/half-word compare-and-exchange, emulated via LR/SC loops - Support for Rust - Support for Zihintpause in hwprobe - Add PR_RISCV_SET_ICACHE_FLUSH_CTX prctl() - Support lockless lockrefs * tag 'riscv-for-linus-6.10-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (42 commits) riscv: defconfig: Enable CONFIG_CLK_SOPHGO_CV1800 riscv: select ARCH_HAS_FAST_MULTIPLIER riscv: mm: still create swiotlb buffer for kmalloc() bouncing if required riscv: Annotate pgtable_l{4,5}_enabled with __ro_after_init riscv: Remove redundant CONFIG_64BIT from pgtable_l{4,5}_enabled riscv: mm: Always use an ASID to flush mm contexts riscv: mm: Preserve global TLB entries when switching contexts riscv: mm: Make asid_bits a local variable riscv: mm: Use a fixed layout for the MM context ID riscv: mm: Introduce cntx2asid/cntx2version helper macros riscv: Avoid TLB flush loops when affected by SiFive CIP-1200 riscv: Apply SiFive CIP-1200 workaround to single-ASID sfence.vma riscv: mm: Combine the SMP and UP TLB flush code riscv: Only send remote fences when some other CPU is online riscv: mm: Broadcast kernel TLB flushes only when needed riscv: Use IPIs for remote cache/TLB flushes by default riscv: Factor out page table TLB synchronization riscv: Flush the instruction cache during SMP bringup riscv: hwprobe: export Zihintpause ISA extension riscv: misaligned: remove CONFIG_RISCV_M_MODE specific code ...
-
Linus Torvalds authored
Merge tag 'loongarch-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson Pull LoongArch updates from Huacai Chen: - Select some options in Kconfig - Give a chance to build with !CONFIG_SMP - Switch to use built-in rustc target - Add new supported device nodes to dts - Some bug fixes and other small changes - Update the default config file * tag 'loongarch-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: LoongArch: Update Loongson-3 default config file LoongArch: dts: Add new supported device nodes to Loongson-2K2000 LoongArch: dts: Add new supported device nodes to Loongson-2K0500 LoongArch: dts: Remove "disabled" state of clock controller node LoongArch: rust: Switch to use built-in rustc target LoongArch: Fix callchain parse error with kernel tracepoint events again LoongArch: Give a chance to build with !CONFIG_SMP LoongArch: Select THP_SWAP if HAVE_ARCH_TRANSPARENT_HUGEPAGE LoongArch: Select ARCH_WANT_DEFAULT_BPF_JIT LoongArch: Select ARCH_SUPPORTS_INT128 if CC_HAS_INT128 LoongArch: Select ARCH_HAS_FAST_MULTIPLIER
-
Charlie Jenkins authored
This loop is supposed to check if ext->subset_ext_ids[j] is valid, rather than if ext->subset_ext_ids[i] is valid, before setting the extension id ext->subset_ext_ids[j] in isainfo->isa. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Fixes: 0d8295ed ("riscv: add ISA extension parsing for scalar crypto") Link: https://lore.kernel.org/r/20240502-cpufeature_fixes-v4-2-b3d1a088722d@rivosinc.comSigned-off-by: Palmer Dabbelt <palmer@rivosinc.com>
-
Charlie Jenkins authored
The riscv_cpuinfo struct that contains mvendorid and marchid is not populated until all harts are booted which happens after the DT parsing. Use the mvendorid/marchid from the boot hart to determine if the DT contains an invalid V. Fixes: d82f3220 ("RISC-V: Ignore V from the riscv,isa DT property on older T-Head CPUs") Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Guo Ren <guoren@kernel.org> Link: https://lore.kernel.org/r/20240502-cpufeature_fixes-v4-1-b3d1a088722d@rivosinc.comSigned-off-by: Palmer Dabbelt <palmer@rivosinc.com>
-
git://git.monstr.eu/linux-2.6-microblazeLinus Torvalds authored
Pull microblaze updates from Michal Simek: - Cleanup code around removed early_printk * tag 'microblaze-v6.10' of git://git.monstr.eu/linux-2.6-microblaze: microblaze: Remove early printk call from cpuinfo-static.c microblaze: Remove gcc flag for non existing early_printk.c file
-
git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfsLinus Torvalds authored
Pull overlayfs updates from Miklos Szeredi: - Add tmpfile support - Clean up include * tag 'ovl-update-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs: ovl: remove duplicate included header ovl: remove upper umask handling from ovl_create_upper() ovl: implement tmpfile
-
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuseLinus Torvalds authored
Pull fuse updates from Miklos Szeredi: - Add fs-verity support (Richard Fung) - Add multi-queue support to virtio-fs (Peter-Jan Gootzen) - Fix a bug in NOTIFY_RESEND handling (Hou Tao) - page -> folio cleanup (Matthew Wilcox) * tag 'fuse-update-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: virtio-fs: add multi-queue support virtio-fs: limit number of request queues fuse: clear FR_SENT when re-adding requests into pending list fuse: set FR_PENDING atomically in fuse_resend() fuse: Add initial support for fs-verity fuse: Convert fuse_readpages_end() to use folio_end_read()
-
Nam Cao authored
__kernel_map_pages() is a debug function which clears the valid bit in page table entry for deallocated pages to detect illegal memory accesses to freed pages. This function set/clear the valid bit using __set_memory(). __set_memory() acquires init_mm's semaphore, and this operation may sleep. This is problematic, because __kernel_map_pages() can be called in atomic context, and thus is illegal to sleep. An example warning that this causes: BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1578 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd preempt_count: 2, expected: 0 CPU: 0 PID: 2 Comm: kthreadd Not tainted 6.9.0-g1d4c6d784ef6 #37 Hardware name: riscv-virtio,qemu (DT) Call Trace: [<ffffffff800060dc>] dump_backtrace+0x1c/0x24 [<ffffffff8091ef6e>] show_stack+0x2c/0x38 [<ffffffff8092baf8>] dump_stack_lvl+0x5a/0x72 [<ffffffff8092bb24>] dump_stack+0x14/0x1c [<ffffffff8003b7ac>] __might_resched+0x104/0x10e [<ffffffff8003b7f4>] __might_sleep+0x3e/0x62 [<ffffffff8093276a>] down_write+0x20/0x72 [<ffffffff8000cf00>] __set_memory+0x82/0x2fa [<ffffffff8000d324>] __kernel_map_pages+0x5a/0xd4 [<ffffffff80196cca>] __alloc_pages_bulk+0x3b2/0x43a [<ffffffff8018ee82>] __vmalloc_node_range+0x196/0x6ba [<ffffffff80011904>] copy_process+0x72c/0x17ec [<ffffffff80012ab4>] kernel_clone+0x60/0x2fe [<ffffffff80012f62>] kernel_thread+0x82/0xa0 [<ffffffff8003552c>] kthreadd+0x14a/0x1be [<ffffffff809357de>] ret_from_fork+0xe/0x1c Rewrite this function with apply_to_existing_page_range(). It is fine to not have any locking, because __kernel_map_pages() works with pages being allocated/deallocated and those pages are not changed by anyone else in the meantime. Fixes: 5fde3db5 ("riscv: add ARCH_SUPPORTS_DEBUG_PAGEALLOC support") Signed-off-by: Nam Cao <namcao@linutronix.de> Cc: stable@vger.kernel.org Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/1289ecba9606a19917bc12b6c27da8aa23e1e5ae.1715750938.git.namcao@linutronix.deSigned-off-by: Palmer Dabbelt <palmer@rivosinc.com>
-
Nam Cao authored
debug_pagealloc is a debug feature which clears the valid bit in page table entry for freed pages to detect illegal accesses to freed memory. For this feature to work, virtual mapping must have PAGE_SIZE resolution. (No, we cannot map with huge pages and split them only when needed; because pages can be allocated/freed in atomic context and page splitting cannot be done in atomic context) Force linear mapping to use small pages if debug_pagealloc is enabled. Note that it is not necessary to force the entire linear mapping, but only those that are given to memory allocator. Some parts of memory can keep using huge page mapping (for example, kernel's executable code). But these parts are minority, so keep it simple. This is just a debug feature, some extra overhead should be acceptable. Fixes: 5fde3db5 ("riscv: add ARCH_SUPPORTS_DEBUG_PAGEALLOC support") Signed-off-by: Nam Cao <namcao@linutronix.de> Cc: stable@vger.kernel.org Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/2e391fa6c6f9b3fcf1b41cefbace02ee4ab4bf59.1715750938.git.namcao@linutronix.deSigned-off-by: Palmer Dabbelt <palmer@rivosinc.com>
-
Yafang Shao authored
Our applications, built on Elasticsearch[0], frequently create and delete files. These applications operate within containers, some with a memory limit exceeding 100GB. Over prolonged periods, the accumulation of negative dentries within these containers can amount to tens of gigabytes. Upon container exit, directories are deleted. However, due to the numerous associated dentries, this process can be time-consuming. Our users have expressed frustration with this prolonged exit duration, which constitutes our first issue. Simultaneously, other processes may attempt to access the parent directory of the Elasticsearch directories. Since the task responsible for deleting the dentries holds the inode lock, processes attempting directory lookup experience significant delays. This issue, our second problem, is easily demonstrated: - Task 1 generates negative dentries: $ pwd ~/test $ mkdir es && cd es/ && ./create_and_delete_files.sh [ After generating tens of GB dentries ] $ cd ~/test && rm -rf es [ It will take a long duration to finish ] - Task 2 attempts to lookup the 'test/' directory $ pwd ~/test $ ls The 'ls' command in Task 2 experiences prolonged execution as Task 1 is deleting the dentries. We've devised a solution to address both issues by deleting associated dentry when removing a file. Interestingly, we've noted that a similar patch was proposed years ago[1], although it was rejected citing the absence of tangible issues caused by negative dentries. Given our current challenges, we're resubmitting the proposal. All relevant stakeholders from previous discussions have been included for reference. Some alternative solutions are also under discussion[2][3], such as shrinking child dentries outside of the parent inode lock or even asynchronously shrinking child dentries. However, given the straightforward nature of the current solution, I believe this approach is still necessary. [ NOTE! This is a pretty fundamental change in how we deal with unlinking dentries, and it doesn't change the fact that you can have lots of negative dentries from just doing negative lookups. But the kernel test robot is at least initially happy with this from a performance angle, so I'm applying this ASAP just to get more testing and as a "known fix for an issue people hit in real life". Put another way: we should still look at the alternatives, and this patch may get reverted if somebody finds a performance regression on some other load. - Linus ] Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Link: https://github.com/elastic/elasticsearch [0] Link: https://patchwork.kernel.org/project/linux-fsdevel/patch/1502099673-31620-1-git-send-email-wangkai86@huawei.com [1] Link: https://lore.kernel.org/linux-fsdevel/20240511200240.6354-2-torvalds@linux-foundation.org/ [2] Link: https://lore.kernel.org/linux-fsdevel/CAHk-=wjEMf8Du4UFzxuToGDnF3yLaMcrYeyNAaH1NJWa6fwcNQ@mail.gmail.com/ [3] Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Waiman Long <longman@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Wangkai <wangkai86@huawei.com> Cc: Colin Walters <walters@verbum.org> Tested-by: kernel test robot <oliver.sang@intel.com> Link: https://lore.kernel.org/all/202405221518.ecea2810-oliver.sang@intel.com/Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
- 21 May, 2024 13 commits
-
-
Linus Torvalds authored
Merge tag 'perf-tools-for-v6.10-1-2024-05-21' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tools updates from Arnaldo Carvalho de Melo: "General: - Integrate the shellcheck utility with the build of perf to allow catching shell problems early in areas such as 'perf test', 'perf trace' scrape scripts, etc - Add 'uretprobe' variant in the 'perf bench uprobe' tool - Add script to run instances of 'perf script' in parallel - Allow parsing tracepoint names that start with digits, such as 9p/9p_client_req, etc. Make sure 'perf test' tests it even on systems where those tracepoints aren't available - Add Kan Liang to MAINTAINERS as a perf tools reviewer - Add support for using the 'capstone' disassembler library in various tools, such as 'perf script' and 'perf annotate'. This is an alternative for the use of the 'xed' and 'objdump' disassemblers Data-type profiling improvements: - Resolve types for a->b->c by backtracking the assignments until it finds DWARF info for one of those members - Support for global variables, keeping a cache to speed up lookups - Handle the 'call' instruction, dealing with effects on registers and handling its return when tracking register data types - Handle x86's segment based addressing like %gs:0x28, to support things like per CPU variables, the stack canary, etc - Data-type profiling got big speedups when using capstone for disassembling. The objdump outoput parsing method is left as a fallback when capstone fails or isn't available. There are patches posted for 6.11 that to use a LLVM disassembler - Support event group display in the TUI when annotating types with --data-type, for instance to show memory load and store events for the data type fields - Optimize the 'perf annotate' data structures, reducing memory usage - Add a initial 'perf test' for 'perf annotate', checking that a target symbol appears on the output, specifying objdump via the command line, etc Vendor Events: - Update Intel JSON files for Cascade Lake X, Emerald Rapids, Grand Ridge, Ice Lake X, Lunar Lake, Meteor Lake, Sapphire Rapids, Sierra Forest, Sky Lake X, Sky Lake and Snow Ridge X. Remove info metrics erroneously in TopdownL1 - Add AMD's Zen 5 core and uncore events and metrics. Those come from the "Performance Monitor Counters for AMD Family 1Ah Model 00h- 0Fh Processors" document, with events that capture information on op dispatch, execution and retirement, branch prediction, L1 and L2 cache activity, TLB activity, etc - Mark L1D_CACHE_INVAL impacted by errata for ARM64's AmpereOne/ AmpereOneX Miscellaneous: - Sync header copies with the kernel sources - Move some header copies used only for generating translation string tables for ioctl cmds and other syscall integer arguments to a new directory under tools/perf/beauty/, to separate from copies in tools/include/ that are used to build the tools - Introduce scrape script for several syscall 'flags'/'mask' arguments - Improve cpumap utilization, fixing up pairing of refcounts, using the right iterators (perf_cpu_map__for_each_cpu), etc - Give more details about raw event encodings in 'perf list', show tracepoint encoding in the detailed output - Refactor the DSOs handling code, reducing memory usage - Document the BPF event modifier and add a 'perf test' for it - Improve the event parser, better error messages and add further 'perf test's for it - Add reference count checking to 'struct comm_str' and 'struct mem_info' - Make ARM64's 'perf test' entries for the Neoverse N1 more robust - Tweak the ARM64's Coresight 'perf test's - Improve ARM64's CoreSight ETM version detection and error reporting - Fix handling of symbols when using kcore - Fix PAI (Processor Activity Instrumentation) counter names for s390 virtual machines in 'perf report' - Fix -g/--call-graph option failure in 'perf sched timehist' - Add LIBTRACEEVENT_DIR build option to allow building with libtraceevent installed in non-standard directories, such as when doing cross builds - Various 'perf test' and 'perf bench' fixes - Improve 'perf probe' error message for long C++ probe names" * tag 'perf-tools-for-v6.10-1-2024-05-21' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (260 commits) tools lib subcmd: Show parent options in help perf pmu: Count sys and cpuid JSON events separately perf stat: Don't display metric header for non-leader uncore events perf annotate-data: Ensure the number of type histograms perf annotate: Fix segfault on sample histogram perf daemon: Fix file leak in daemon_session__control libsubcmd: Fix parse-options memory leak perf lock: Avoid memory leaks from strdup() perf sched: Rename 'switches' column header to 'count' and add usage description, options for latency perf tools: Ignore deleted cgroups perf parse: Allow tracepoint names to start with digits perf parse-events: Add new 'fake_tp' parameter for tests perf parse-events: pass parse_state to add_tracepoint perf symbols: Fix ownership of string in dso__load_vmlinux() perf symbols: Update kcore map before merging in remaining symbols perf maps: Re-use __maps__free_maps_by_name() perf symbols: Remove map from list before updating addresses perf tracepoint: Don't scan all tracepoints to test if one exists perf dwarf-aux: Fix build with HAVE_DWARF_CFI_SUPPORT perf thread: Fixes to thread__new() related to initializing comm ...
-
https://github.com/norov/linuxLinus Torvalds authored
Pull bitmap updates from Yury Norov: - topology_span_sane() optimization from Kyle Meyer - fns() rework from Kuan-Wei Chiu (used in cpumask_local_spread() and other places) - headers cleanup from Andy - add a MAINTAINERS record for bitops API * tag 'bitmap-for-6.10v2' of https://github.com/norov/linux: usercopy: Don't use "proxy" headers bitops: Move aligned_byte_mask() to wordpart.h MAINTAINERS: add BITOPS API record bitmap: relax find_nth_bit() limitation on return value lib: make test_bitops compilable into the kernel image bitops: Optimize fns() for improved performance lib/test_bitops: Add benchmark test for fns() Compiler Attributes: Add __always_used macro sched/topology: Optimize topology_span_sane() cpumask: Add for_each_cpu_from()
-
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds authored
Pull misc vfs updates from Al Viro: "Assorted commits that had missed the last merge window..." * tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: remove call_{read,write}_iter() functions do_dentry_open(): kill inode argument kernel_file_open(): get rid of inode argument get_file_rcu(): no need to check for NULL separately fd_is_open(): move to fs/file.c close_on_exec(): pass files_struct instead of fdtable
-
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds authored
Pull bdev flags update from Al Viro: "Compactifying bdev flags. We can easily have up to 24 flags with sane atomicity, _without_ pushing anything out of the first cacheline of struct block_device" * tag 'pull-bd_flags-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: bdev: move ->bd_make_it_fail to ->__bd_flags bdev: move ->bd_ro_warned to ->__bd_flags bdev: move ->bd_has_subit_bio to ->__bd_flags bdev: move ->bd_write_holder into ->__bd_flags bdev: move ->bd_read_only to ->__bd_flags bdev: infrastructure for flags wrapper for access to ->bd_partno Use bdev_is_paritition() instead of open-coding it
-
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linuxLinus Torvalds authored
Pull more s390 updates from Alexander Gordeev: - Switch read and write software bits for PUDs - Add missing hardware bits for PUDs and PMDs - Generate unwind information for C modules to fix GDB unwind error for vDSO functions - Create .build-id links for unstripped vDSO files to enable vDSO debugging with symbols - Use standard stack frame layout for vDSO generated stack frames to manually walk stack frames without DWARF information - Rework perf_callchain_user() and arch_stack_walk_user() functions to reduce code duplication - Skip first stack frame when walking user stack - Add basic checks to identify invalid instruction pointers when walking stack frames - Introduce and use struct stack_frame_vdso_wrapper within vDSO user wrapper code to automatically generate an asm-offset define. Also use STACK_FRAME_USER_OVERHEAD instead of STACK_FRAME_OVERHEAD to document that the code works with user space stack - Clear the backchain of the extra stack frame added by the vDSO user wrapper code. This allows the user stack walker to detect and skip the non-standard stack frame. Without this an incorrect instruction pointer would be added to stack traces. - Rewrite psw_idle() function in C to ease maintenance and further enhancements - Remove get_vtimer() function and use get_cpu_timer() instead - Mark psw variable in __load_psw_mask() as __unitialized to avoid superfluous clearing of PSW - Remove obsolete and superfluous comment about removed TIF_FPU flag - Replace memzero_explicit() and kfree() with kfree_sensitive() to fix warnings reported by Coccinelle - Wipe sensitive data and all copies of protected- or secure-keys from stack when an IOCTL fails - Both do_airq_interrupt() and do_io_interrupt() functions set CIF_NOHZ_DELAY flag. Move it in do_io_irq() to simplify the code - Provide iucv_alloc_device() and iucv_release_device() helpers, which can be used to deduplicate more or less identical IUCV device allocation and release code in four different drivers - Make use of iucv_alloc_device() and iucv_release_device() helpers to get rid of quite some code and also remove a cast to an incompatible function (clang W=1) - There is no user of iucv_root outside of the core IUCV code left. Therefore remove the EXPORT_SYMBOL - __apply_alternatives() contains a runtime check which verifies that the size of the to be patched code area is even. Convert this to a compile time check - Increase size of buffers for sending z/VM CP DIAGNOSE X'008' commands from 128 to 240 - Do not accept z/VM CP DIAGNOSE X'008' commands longer than maximally allowed - Use correct defines IPL_BP_NVME_LEN and IPL_BP0_NVME_LEN instead of IPL_BP_FCP_LEN and IPL_BP0_FCP_LEN ones to initialize NVMe reIPL block on 'scp_data' sysfs attribute update - Initialize the correct fields of the NVMe dump block, which were confused with FCP fields - Refactor macros for 'scp_data' (re-)IPL sysfs attribute to reduce code duplication - Introduce 'scp_data' sysfs attribute for dump IPL to allow tools such as dumpconf passing additional kernel command line parameters to a stand-alone dumper - Rework the CPACF query functions to use the correct RRE or RRF instruction formats and set instruction register fields correctly - Instead of calling BUG() at runtime force a link error during compile when a unsupported opcode is used with __cpacf_query() or __cpacf_check_opcode() functions - Fix a crash in ap_parse_bitmap_str() function on /sys/bus/ap/apmask or /sys/bus/ap/aqmask sysfs file update with a relative mask value - Fix "bindings complete" udev event which should be sent once all AP devices have been bound to device drivers and again when unbind/bind actions take place and all AP devices are bound again - Facility list alt_stfle_fac_list is nowhere used in the decompressor, therefore remove it there - Remove custom kprobes insn slot allocator in favour of the standard module_alloc() one, since kernel image and module areas are located within 4GB - Use kvcalloc() instead of kvmalloc_array() in zcrypt driver to avoid calling memset() with a large byte count and get rid of the sparse warning as result * tag 's390-6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (39 commits) s390/zcrypt: Use kvcalloc() instead of kvmalloc_array() s390/kprobes: Remove custom insn slot allocator s390/boot: Remove alt_stfle_fac_list from decompressor s390/ap: Fix bind complete udev event sent after each AP bus scan s390/ap: Fix crash in AP internal function modify_bitmap() s390/cpacf: Make use of invalid opcode produce a link error s390/cpacf: Split and rework cpacf query functions s390/ipl: Introduce sysfs attribute 'scp_data' for dump ipl s390/ipl: Introduce macros for (re)ipl sysfs attribute 'scp_data' s390/ipl: Fix incorrect initialization of nvme dump block s390/ipl: Fix incorrect initialization of len fields in nvme reipl block s390/ipl: Do not accept z/VM CP diag X'008' cmds longer than max length s390/ipl: Fix size of vmcmd buffers for sending z/VM CP diag X'008' cmds s390/alternatives: Convert runtime sanity check into compile time check s390/iucv: Unexport iucv_root tty: hvc-iucv: Make use of iucv_alloc_device() s390/smsgiucv_app: Make use of iucv_alloc_device() s390/netiucv: Make use of iucv_alloc_device() s390/vmlogrdr: Make use of iucv_alloc_device() s390/iucv: Provide iucv_alloc_device() / iucv_release_device() ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommuLinus Torvalds authored
Pull m68knommu update from Greg Ungerer: . remove use of kernel config option from uapi header * tag 'm68knommu-for-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu: m68k: Avoid CONFIG_COLDFIRE switch in uapi header
-
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efiLinus Torvalds authored
Pull EFI fix from Ard Biesheuvel: - Followup fix for the EFI boot sequence refactor, which may result in physical KASLR putting the kernel in a region which is being used for a special purpose via a command line argument. * tag 'efi-fixes-for-v6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: x86/efistub: Omit physical KASLR when memory reservations exist
-
Linus Torvalds authored
Merge tag 'for-6.10/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mike Snitzer: - Fix DM discard regressions due to DM core switching over to using queue_limits_set() without DM core and targets first being updated to set (and stack) discard limits in terms of max_hw_discard_sectors and not max_discard_sectors - Fix stable@ DM integrity discard support to set device's discard_granularity limit to the device's logical block size * tag 'for-6.10/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm: always manage discard support in terms of max_hw_discard_sectors dm-integrity: set discard_granularity to logical block size
-
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pmLinus Torvalds authored
Pull power management fixes from Rafael Wysocki: "These fix the amd-pstate driver and the operating performance point (OPP) handling related to generic PM domains. Specifics: - Fix a memory leak in the exit path of amd-pstate (Peng Ma) - Fix required_opp_tables handling in the cases when multiple generic PM domains share one OPP table (Viresh Kumar)" * tag 'pm-6.10-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: OPP: Fix required_opp_tables for multiple genpds using same table cpufreq: amd-pstate: fix memory leak on CPU EPP exit
-
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pmLinus Torvalds authored
Pull ACPI fixes from Rafael Wysocki: "These make the ACPI EC driver always install the EC address space handler at the root of the ACPI namespace which causes it to take care of all EC operation regions everywhere. This means that the custom EC address space handler in the WMI driver is not needed any more and accordingly it gets removed altogether" * tag 'acpi-6.10-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: platform/x86: wmi: Remove custom EC address space handler ACPI: EC: Install address space handler at the namespace root
-
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pmLinus Torvalds authored
Pull thermal control fixes from Rafael Wysocki: "These fix the MediaTek lvts_thermal driver and the handling of trip points that start as invalid and are adjusted later by user space via sysfs. Specifics: - Fix and clean up the MediaTek lvts_thermal driver (Julien Panis) - Prevent invalid trip point handling from triggering spurious trip point crossing events and allow passive polling to stop when a passive trip point involved in it becomes invalid (Rafael Wysocki)" * tag 'thermal-6.10-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: thermal: core: Fix the handling of invalid trip points thermal/drivers/mediatek/lvts_thermal: Fix wrong lvts_ctrl index thermal/drivers/mediatek/lvts_thermal: Remove unused members from struct lvts_ctrl_data thermal/drivers/mediatek/lvts_thermal: Check NULL ptr on lvts_data
-
Linus Torvalds authored
Merge tag 'intel-gpio-v6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/andy/linux-gpio-intel Pull intel-gpio fixes from Andy Shevchenko: - NULL pointer dereference fix in GPIO APCI library - Restore ACPI handle matching for GPIO devices represented in banks * tag 'intel-gpio-v6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/andy/linux-gpio-intel: gpiolib: acpi: Fix failed in acpi_gpiochip_find() by adding parent node match gpiolib: acpi: Move ACPI device NULL check to acpi_can_fallback_to_crs()
-
git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwireLinus Torvalds authored
Pull soundwire updates from Vinod Koul: - cleanup and conversion for soundwire sysfs groups - intel support for ace2x bits, auxdevice pm improvements - qcom multi link device support * tag 'soundwire-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire: (33 commits) soundwire: intel_ace2.x: add support for DOAISE property soundwire: intel_ace2.x: add support for DODSE property soundwire: intel_ace2x: use DOAIS and DODS settings from firmware soundwire: intel_ace2x: cleanup DOAIS/DODS settings soundwire: intel_ace2x: simplify check_wake() soundwire: intel_ace2x: fix wakeup handling soundwire: intel_init: resume all devices on exit. soundwire: intel: export intel_resume_child_device soundwire: intel_auxdevice: use pm_runtime_resume() instead of pm_request_resume() ASoC: SOF: Intel: hda: disable SoundWire interrupt later soundwire: qcom: allow multi-link on newer devices soundwire: intel_ace2x: use legacy formula for intel_alh_id soundwire: reconcile dp0_prop and dpn_prop soundwire: intel_ace2x: set the clock source soundwire: intel_ace2.x: power-up first before setting SYNCPRD soundwire: intel_ace2x: move and extend clock selection soundwire: intel: add support for MeteorLake additional clocks soundwire: intel: add more values for SYNCPRD soundwire: bus: extend base clock checks to 96 MHz soundwire: cadence: show the bus frequency and frame shape ...
-