- 24 May, 2024 5 commits
-
-
Jeff Xu authored
Sealing read-only of elf mapping so it can't be changed by mprotect. [jeffxu@chromium.org: style change] Link: https://lkml.kernel.org/r/20240416220944.2481203-2-jeffxu@chromium.org [amer.shanawany@gmail.com: fix linker error for inline function] Link: https://lkml.kernel.org/r/20240420202346.546444-1-amer.shanawany@gmail.com [jeffxu@chromium.org: fix compile warning] Link: https://lkml.kernel.org/r/20240420003515.345982-2-jeffxu@chromium.org [jeffxu@chromium.org: fix arm build] Link: https://lkml.kernel.org/r/20240502225331.3806279-2-jeffxu@chromium.org Link: https://lkml.kernel.org/r/20240415163527.626541-6-jeffxu@chromium.orgSigned-off-by: Jeff Xu <jeffxu@chromium.org> Signed-off-by: Amer Al Shanawany <amer.shanawany@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Guenter Roeck <groeck@chromium.org> Cc: Jann Horn <jannh@google.com> Cc: Jeff Xu <jeffxu@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Jorge Lucangeli Obes <jorgelo@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Muhammad Usama Anjum <usama.anjum@collabora.com> Cc: Pedro Falcato <pedro.falcato@gmail.com> Cc: Stephen Röttger <sroettger@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Amer Al Shanawany <amer.shanawany@gmail.com> Cc: Javier Carrasco <javier.carrasco.cruz@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Jeff Xu authored
Add documentation for mseal(). Link: https://lkml.kernel.org/r/20240415163527.626541-5-jeffxu@chromium.orgSigned-off-by: Jeff Xu <jeffxu@chromium.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Guenter Roeck <groeck@chromium.org> Cc: Jann Horn <jannh@google.com> Cc: Jeff Xu <jeffxu@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Jorge Lucangeli Obes <jorgelo@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Muhammad Usama Anjum <usama.anjum@collabora.com> Cc: Pedro Falcato <pedro.falcato@gmail.com> Cc: Stephen Röttger <sroettger@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Amer Al Shanawany <amer.shanawany@gmail.com> Cc: Javier Carrasco <javier.carrasco.cruz@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Jeff Xu authored
selftest for memory sealing change in mmap() and mseal(). Link: https://lkml.kernel.org/r/20240415163527.626541-4-jeffxu@chromium.orgSigned-off-by: Jeff Xu <jeffxu@chromium.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Guenter Roeck <groeck@chromium.org> Cc: Jann Horn <jannh@google.com> Cc: Jeff Xu <jeffxu@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Jorge Lucangeli Obes <jorgelo@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Muhammad Usama Anjum <usama.anjum@collabora.com> Cc: Pedro Falcato <pedro.falcato@gmail.com> Cc: Stephen Röttger <sroettger@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Amer Al Shanawany <amer.shanawany@gmail.com> Cc: Javier Carrasco <javier.carrasco.cruz@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Jeff Xu authored
The new mseal() is an syscall on 64 bit CPU, and with following signature: int mseal(void addr, size_t len, unsigned long flags) addr/len: memory range. flags: reserved. mseal() blocks following operations for the given memory range. 1> Unmapping, moving to another location, and shrinking the size, via munmap() and mremap(), can leave an empty space, therefore can be replaced with a VMA with a new set of attributes. 2> Moving or expanding a different VMA into the current location, via mremap(). 3> Modifying a VMA via mmap(MAP_FIXED). 4> Size expansion, via mremap(), does not appear to pose any specific risks to sealed VMAs. It is included anyway because the use case is unclear. In any case, users can rely on merging to expand a sealed VMA. 5> mprotect() and pkey_mprotect(). 6> Some destructive madvice() behaviors (e.g. MADV_DONTNEED) for anonymous memory, when users don't have write permission to the memory. Those behaviors can alter region contents by discarding pages, effectively a memset(0) for anonymous memory. Following input during RFC are incooperated into this patch: Jann Horn: raising awareness and providing valuable insights on the destructive madvise operations. Linus Torvalds: assisting in defining system call signature and scope. Liam R. Howlett: perf optimization. Theo de Raadt: sharing the experiences and insight gained from implementing mimmutable() in OpenBSD. Finally, the idea that inspired this patch comes from Stephen Röttger's work in Chrome V8 CFI. [jeffxu@chromium.org: add branch prediction hint, per Pedro] Link: https://lkml.kernel.org/r/20240423192825.1273679-2-jeffxu@chromium.org Link: https://lkml.kernel.org/r/20240415163527.626541-3-jeffxu@chromium.orgSigned-off-by: Jeff Xu <jeffxu@chromium.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Pedro Falcato <pedro.falcato@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Guenter Roeck <groeck@chromium.org> Cc: Jann Horn <jannh@google.com> Cc: Jeff Xu <jeffxu@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Jorge Lucangeli Obes <jorgelo@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Muhammad Usama Anjum <usama.anjum@collabora.com> Cc: Pedro Falcato <pedro.falcato@gmail.com> Cc: Stephen Röttger <sroettger@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Amer Al Shanawany <amer.shanawany@gmail.com> Cc: Javier Carrasco <javier.carrasco.cruz@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Jeff Xu authored
Patch series "Introduce mseal", v10. This patchset proposes a new mseal() syscall for the Linux kernel. In a nutshell, mseal() protects the VMAs of a given virtual memory range against modifications, such as changes to their permission bits. Modern CPUs support memory permissions, such as the read/write (RW) and no-execute (NX) bits. Linux has supported NX since the release of kernel version 2.6.8 in August 2004 [1]. The memory permission feature improves the security stance on memory corruption bugs, as an attacker cannot simply write to arbitrary memory and point the code to it. The memory must be marked with the X bit, or else an exception will occur. Internally, the kernel maintains the memory permissions in a data structure called VMA (vm_area_struct). mseal() additionally protects the VMA itself against modifications of the selected seal type. Memory sealing is useful to mitigate memory corruption issues where a corrupted pointer is passed to a memory management system. For example, such an attacker primitive can break control-flow integrity guarantees since read-only memory that is supposed to be trusted can become writable or .text pages can get remapped. Memory sealing can automatically be applied by the runtime loader to seal .text and .rodata pages and applications can additionally seal security critical data at runtime. A similar feature already exists in the XNU kernel with the VM_FLAGS_PERMANENT [3] flag and on OpenBSD with the mimmutable syscall [4]. Also, Chrome wants to adopt this feature for their CFI work [2] and this patchset has been designed to be compatible with the Chrome use case. Two system calls are involved in sealing the map: mmap() and mseal(). The new mseal() is an syscall on 64 bit CPU, and with following signature: int mseal(void addr, size_t len, unsigned long flags) addr/len: memory range. flags: reserved. mseal() blocks following operations for the given memory range. 1> Unmapping, moving to another location, and shrinking the size, via munmap() and mremap(), can leave an empty space, therefore can be replaced with a VMA with a new set of attributes. 2> Moving or expanding a different VMA into the current location, via mremap(). 3> Modifying a VMA via mmap(MAP_FIXED). 4> Size expansion, via mremap(), does not appear to pose any specific risks to sealed VMAs. It is included anyway because the use case is unclear. In any case, users can rely on merging to expand a sealed VMA. 5> mprotect() and pkey_mprotect(). 6> Some destructive madvice() behaviors (e.g. MADV_DONTNEED) for anonymous memory, when users don't have write permission to the memory. Those behaviors can alter region contents by discarding pages, effectively a memset(0) for anonymous memory. The idea that inspired this patch comes from Stephen Röttger’s work in V8 CFI [5]. Chrome browser in ChromeOS will be the first user of this API. Indeed, the Chrome browser has very specific requirements for sealing, which are distinct from those of most applications. For example, in the case of libc, sealing is only applied to read-only (RO) or read-execute (RX) memory segments (such as .text and .RELRO) to prevent them from becoming writable, the lifetime of those mappings are tied to the lifetime of the process. Chrome wants to seal two large address space reservations that are managed by different allocators. The memory is mapped RW- and RWX respectively but write access to it is restricted using pkeys (or in the future ARM permission overlay extensions). The lifetime of those mappings are not tied to the lifetime of the process, therefore, while the memory is sealed, the allocators still need to free or discard the unused memory. For example, with madvise(DONTNEED). However, always allowing madvise(DONTNEED) on this range poses a security risk. For example if a jump instruction crosses a page boundary and the second page gets discarded, it will overwrite the target bytes with zeros and change the control flow. Checking write-permission before the discard operation allows us to control when the operation is valid. In this case, the madvise will only succeed if the executing thread has PKEY write permissions and PKRU changes are protected in software by control-flow integrity. Although the initial version of this patch series is targeting the Chrome browser as its first user, it became evident during upstream discussions that we would also want to ensure that the patch set eventually is a complete solution for memory sealing and compatible with other use cases. The specific scenario currently in mind is glibc's use case of loading and sealing ELF executables. To this end, Stephen is working on a change to glibc to add sealing support to the dynamic linker, which will seal all non-writable segments at startup. Once this work is completed, all applications will be able to automatically benefit from these new protections. In closing, I would like to formally acknowledge the valuable contributions received during the RFC process, which were instrumental in shaping this patch: Jann Horn: raising awareness and providing valuable insights on the destructive madvise operations. Liam R. Howlett: perf optimization. Linus Torvalds: assisting in defining system call signature and scope. Theo de Raadt: sharing the experiences and insight gained from implementing mimmutable() in OpenBSD. MM perf benchmarks ================== This patch adds a loop in the mprotect/munmap/madvise(DONTNEED) to check the VMAs’ sealing flag, so that no partial update can be made, when any segment within the given memory range is sealed. To measure the performance impact of this loop, two tests are developed. [8] The first is measuring the time taken for a particular system call, by using clock_gettime(CLOCK_MONOTONIC). The second is using PERF_COUNT_HW_REF_CPU_CYCLES (exclude user space). Both tests have similar results. The tests have roughly below sequence: for (i = 0; i < 1000, i++) create 1000 mappings (1 page per VMA) start the sampling for (j = 0; j < 1000, j++) mprotect one mapping stop and save the sample delete 1000 mappings calculates all samples. Below tests are performed on Intel(R) Pentium(R) Gold 7505 @ 2.00GHz, 4G memory, Chromebook. Based on the latest upstream code: The first test (measuring time) syscall__ vmas t t_mseal delta_ns per_vma % munmap__ 1 909 944 35 35 104% munmap__ 2 1398 1502 104 52 107% munmap__ 4 2444 2594 149 37 106% munmap__ 8 4029 4323 293 37 107% munmap__ 16 6647 6935 288 18 104% munmap__ 32 11811 12398 587 18 105% mprotect 1 439 465 26 26 106% mprotect 2 1659 1745 86 43 105% mprotect 4 3747 3889 142 36 104% mprotect 8 6755 6969 215 27 103% mprotect 16 13748 14144 396 25 103% mprotect 32 27827 28969 1142 36 104% madvise_ 1 240 262 22 22 109% madvise_ 2 366 442 76 38 121% madvise_ 4 623 751 128 32 121% madvise_ 8 1110 1324 215 27 119% madvise_ 16 2127 2451 324 20 115% madvise_ 32 4109 4642 534 17 113% The second test (measuring cpu cycle) syscall__ vmas cpu cmseal delta_cpu per_vma % munmap__ 1 1790 1890 100 100 106% munmap__ 2 2819 3033 214 107 108% munmap__ 4 4959 5271 312 78 106% munmap__ 8 8262 8745 483 60 106% munmap__ 16 13099 14116 1017 64 108% munmap__ 32 23221 24785 1565 49 107% mprotect 1 906 967 62 62 107% mprotect 2 3019 3203 184 92 106% mprotect 4 6149 6569 420 105 107% mprotect 8 9978 10524 545 68 105% mprotect 16 20448 21427 979 61 105% mprotect 32 40972 42935 1963 61 105% madvise_ 1 434 497 63 63 115% madvise_ 2 752 899 147 74 120% madvise_ 4 1313 1513 200 50 115% madvise_ 8 2271 2627 356 44 116% madvise_ 16 4312 4883 571 36 113% madvise_ 32 8376 9319 943 29 111% Based on the result, for 6.8 kernel, sealing check adds 20-40 nano seconds, or around 50-100 CPU cycles, per VMA. In addition, I applied the sealing to 5.10 kernel: The first test (measuring time) syscall__ vmas t tmseal delta_ns per_vma % munmap__ 1 357 390 33 33 109% munmap__ 2 442 463 21 11 105% munmap__ 4 614 634 20 5 103% munmap__ 8 1017 1137 120 15 112% munmap__ 16 1889 2153 263 16 114% munmap__ 32 4109 4088 -21 -1 99% mprotect 1 235 227 -7 -7 97% mprotect 2 495 464 -30 -15 94% mprotect 4 741 764 24 6 103% mprotect 8 1434 1437 2 0 100% mprotect 16 2958 2991 33 2 101% mprotect 32 6431 6608 177 6 103% madvise_ 1 191 208 16 16 109% madvise_ 2 300 324 24 12 108% madvise_ 4 450 473 23 6 105% madvise_ 8 753 806 53 7 107% madvise_ 16 1467 1592 125 8 108% madvise_ 32 2795 3405 610 19 122% The second test (measuring cpu cycle) syscall__ nbr_vma cpu cmseal delta_cpu per_vma % munmap__ 1 684 715 31 31 105% munmap__ 2 861 898 38 19 104% munmap__ 4 1183 1235 51 13 104% munmap__ 8 1999 2045 46 6 102% munmap__ 16 3839 3816 -23 -1 99% munmap__ 32 7672 7887 216 7 103% mprotect 1 397 443 46 46 112% mprotect 2 738 788 50 25 107% mprotect 4 1221 1256 35 9 103% mprotect 8 2356 2429 72 9 103% mprotect 16 4961 4935 -26 -2 99% mprotect 32 9882 10172 291 9 103% madvise_ 1 351 380 29 29 108% madvise_ 2 565 615 49 25 109% madvise_ 4 872 933 61 15 107% madvise_ 8 1508 1640 132 16 109% madvise_ 16 3078 3323 245 15 108% madvise_ 32 5893 6704 811 25 114% For 5.10 kernel, sealing check adds 0-15 ns in time, or 10-30 CPU cycles, there is even decrease in some cases. It might be interesting to compare 5.10 and 6.8 kernel The first test (measuring time) syscall__ vmas t_5_10 t_6_8 delta_ns per_vma % munmap__ 1 357 909 552 552 254% munmap__ 2 442 1398 956 478 316% munmap__ 4 614 2444 1830 458 398% munmap__ 8 1017 4029 3012 377 396% munmap__ 16 1889 6647 4758 297 352% munmap__ 32 4109 11811 7702 241 287% mprotect 1 235 439 204 204 187% mprotect 2 495 1659 1164 582 335% mprotect 4 741 3747 3006 752 506% mprotect 8 1434 6755 5320 665 471% mprotect 16 2958 13748 10790 674 465% mprotect 32 6431 27827 21397 669 433% madvise_ 1 191 240 49 49 125% madvise_ 2 300 366 67 33 122% madvise_ 4 450 623 173 43 138% madvise_ 8 753 1110 357 45 147% madvise_ 16 1467 2127 660 41 145% madvise_ 32 2795 4109 1314 41 147% The second test (measuring cpu cycle) syscall__ vmas cpu_5_10 c_6_8 delta_cpu per_vma % munmap__ 1 684 1790 1106 1106 262% munmap__ 2 861 2819 1958 979 327% munmap__ 4 1183 4959 3776 944 419% munmap__ 8 1999 8262 6263 783 413% munmap__ 16 3839 13099 9260 579 341% munmap__ 32 7672 23221 15549 486 303% mprotect 1 397 906 509 509 228% mprotect 2 738 3019 2281 1140 409% mprotect 4 1221 6149 4929 1232 504% mprotect 8 2356 9978 7622 953 423% mprotect 16 4961 20448 15487 968 412% mprotect 32 9882 40972 31091 972 415% madvise_ 1 351 434 82 82 123% madvise_ 2 565 752 186 93 133% madvise_ 4 872 1313 442 110 151% madvise_ 8 1508 2271 763 95 151% madvise_ 16 3078 4312 1234 77 140% madvise_ 32 5893 8376 2483 78 142% From 5.10 to 6.8 munmap: added 250-550 ns in time, or 500-1100 in cpu cycle, per vma. mprotect: added 200-750 ns in time, or 500-1200 in cpu cycle, per vma. madvise: added 33-50 ns in time, or 70-110 in cpu cycle, per vma. In comparison to mseal, which adds 20-40 ns or 50-100 CPU cycles, the increase from 5.10 to 6.8 is significantly larger, approximately ten times greater for munmap and mprotect. When I discuss the mm performance with Brian Makin, an engineer who worked on performance, it was brought to my attention that such performance benchmarks, which measuring millions of mm syscall in a tight loop, may not accurately reflect real-world scenarios, such as that of a database service. Also this is tested using a single HW and ChromeOS, the data from another HW or distribution might be different. It might be best to take this data with a grain of salt. This patch (of 5): Wire up mseal syscall for all architectures. Link: https://lkml.kernel.org/r/20240415163527.626541-1-jeffxu@chromium.org Link: https://lkml.kernel.org/r/20240415163527.626541-2-jeffxu@chromium.orgSigned-off-by: Jeff Xu <jeffxu@chromium.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Guenter Roeck <groeck@chromium.org> Cc: Jann Horn <jannh@google.com> [Bug #2] Cc: Jeff Xu <jeffxu@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Jorge Lucangeli Obes <jorgelo@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Muhammad Usama Anjum <usama.anjum@collabora.com> Cc: Pedro Falcato <pedro.falcato@gmail.com> Cc: Stephen Röttger <sroettger@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Amer Al Shanawany <amer.shanawany@gmail.com> Cc: Javier Carrasco <javier.carrasco.cruz@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 23 May, 2024 2 commits
-
-
Linus Torvalds authored
Merge tag 'mm-nonmm-stable-2024-05-22-17-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull more non-mm updates from Andrew Morton: - A series ("kbuild: enable more warnings by default") from Arnd Bergmann which enables a number of additional build-time warnings. We fixed all the fallout which we could find, there may still be a few stragglers. - Samuel Holland has developed the series "Unified cross-architecture kernel-mode FPU API". This does a lot of consolidation of per-architecture kernel-mode FPU usage and enables the use of newer AMD GPUs on RISC-V. - Tao Su has fixed some selftests build warnings in the series "Selftests: Fix compilation warnings due to missing _GNU_SOURCE definition". - This pull also includes a nilfs2 fixup from Ryusuke Konishi. * tag 'mm-nonmm-stable-2024-05-22-17-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (23 commits) nilfs2: make block erasure safe in nilfs_finish_roll_forward() selftests/harness: use 1024 in place of LINE_MAX Revert "selftests/harness: remove use of LINE_MAX" selftests/fpu: allow building on other architectures selftests/fpu: move FP code to a separate translation unit drm/amd/display: use ARCH_HAS_KERNEL_FPU_SUPPORT drm/amd/display: only use hard-float, not altivec on powerpc riscv: add support for kernel-mode FPU x86: implement ARCH_HAS_KERNEL_FPU_SUPPORT powerpc: implement ARCH_HAS_KERNEL_FPU_SUPPORT LoongArch: implement ARCH_HAS_KERNEL_FPU_SUPPORT lib/raid6: use CC_FLAGS_FPU for NEON CFLAGS arm64: crypto: use CC_FLAGS_FPU for NEON CFLAGS arm64: implement ARCH_HAS_KERNEL_FPU_SUPPORT ARM: crypto: use CC_FLAGS_FPU for NEON CFLAGS ARM: implement ARCH_HAS_KERNEL_FPU_SUPPORT arch: add ARCH_HAS_KERNEL_FPU_SUPPORT x86/fpu: fix asm/fpu/types.h include guard kbuild: enable -Wcast-function-type-strict unconditionally kbuild: enable -Wformat-truncation on clang ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mmLinus Torvalds authored
Pull more mm updates from Andrew Morton: "A series from Dave Chinner which cleans up and fixes the handling of nested allocations within stackdepot and page-owner" * tag 'mm-stable-2024-05-22-17-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm/page-owner: use gfp_nested_mask() instead of open coded masking stackdepot: use gfp_nested_mask() instead of open coded masking mm: lift gfp_kmemleak_mask() to gfp.h
-
- 22 May, 2024 19 commits
-
-
Linus Torvalds authored
Use '%pD' to print out the filename, and print out the actual offset within the file too, rather than just what the virtual address of the mapping is (which doesn't tell you anything about any mapping offsets). Also, use the exact vma_lookup() instead of find_vma() - the latter looks up any vma _after_ the address, which is of questionable value (yes, maybe you fell off the beginning, but you'd be more likely to fall off the end). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Linus Torvalds authored
Merge trivial x86 code generation annoyances - Introduce helper macros for clang asm input problems - use said macros to improve trivially stupid code generation issues in bitops and array_index_mask_nospec - also improve codegen with 32-bit array index comparisons None of these really matter, but I look at code generation and profiles fairly regularly, and these misfeatures caused the generated code to look really odd and distract from the real issues. * branch 'x86-codegen' of local tree: x86: improve bitop code generation with clang x86: improve array_index_mask_nospec() code generation clang: work around asm input constraint problems
-
Linus Torvalds authored
This uses the new ASM_INPUT_RM macro to avoid the bad code generation issue that clang has with more generic asm inputs. This ends up avoiding generating code like this: mov %r10,(%rsp) tzcnt (%rsp),%rcx which now becomes just tzcnt %r10,%rcx and in the process ends up also removing a few unnecessary stack frames when the only use was that pointless "asm uses memory location off stack". Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Linus Torvalds authored
Don't force the inputs to be 'unsigned long', when the comparison can easily be done in 32-bit if that's more appropriate. Note that while we can look at the inputs to choose an appropriate size for the compare instruction, the output is fixed at 'unsigned long'. That's not technically optimal either, since a 32-bit 'sbbl' would often be sufficient. But for the outgoing mask we don't know how the mask ends up being used (ie we have uses that have an incoming 32-bit array index, but end up using the mask for other things). That said, it only costs the extra REX prefix to always generate the 64-bit mask. [ A 'sbbl' also always technically generates a 64-bit mask, but with the upper 32 bits clear: that's fine for when the incoming index that will be masked is already 32-bit, but not if you use the mask to mask a pointer afterwards, like the file table lookup does ] Cc: Peter Zijlstra <peterz@infradead.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Linus Torvalds authored
Work around clang problems with asm constraints that have multiple possibilities, particularly "g" and "rm". Clang seems to turn inputs like that into the most generic form, which is the memory input - but to make matters worse, clang won't even use a possible original memory location, but will spill the value to stack, and use the stack for the asm input. See https://github.com/llvm/llvm-project/issues/20571#issuecomment-980933442 for some explanation of why clang has this strange behavior, but the end result is that "g" and "rm" really end up generating horrid code. Link: https://github.com/llvm/llvm-project/issues/20571 Cc: Peter Zijlstra <peterz@infradead.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-miscLinus Torvalds authored
Pull char/misc and other driver subsystem updates from Greg KH: "Here is the big set of char/misc and other driver subsystem updates for 6.10-rc1. Nothing major here, just lots of new drivers and updates for apis and new hardware types. Included in here are: - big IIO driver updates with more devices and drivers added - fpga driver updates - hyper-v driver updates - uio_pruss driver removal, no one uses it, other drivers control the same hardware now - binder minor updates - mhi driver updates - excon driver updates - counter driver updates - accessability driver updates - coresight driver updates - other hwtracing driver updates - nvmem driver updates - slimbus driver updates - spmi driver updates - other smaller misc and char driver updates All of these have been in linux-next for a while with no reported issues" * tag 'char-misc-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (319 commits) misc: ntsync: mark driver as "broken" to prevent from building spmi: pmic-arb: Add multi bus support spmi: pmic-arb: Register controller for bus instead of arbiter spmi: pmic-arb: Make core resources acquiring a version operation spmi: pmic-arb: Make the APID init a version operation spmi: pmic-arb: Fix some compile warnings about members not being described dt-bindings: spmi: Deprecate qcom,bus-id dt-bindings: spmi: Add X1E80100 SPMI PMIC ARB schema spmi: pmic-arb: Replace three IS_ERR() calls by null pointer checks in spmi_pmic_arb_probe() spmi: hisi-spmi-controller: Do not override device identifier dt-bindings: spmi: hisilicon,hisi-spmi-controller: clean up example dt-bindings: spmi: hisilicon,hisi-spmi-controller: fix binding references spmi: make spmi_bus_type const extcon: adc-jack: Document missing struct members extcon: realtek: Remove unused of_gpio.h extcon: usbc-cros-ec: Convert to platform remove callback returning void extcon: usb-gpio: Convert to platform remove callback returning void extcon: max77843: Convert to platform remove callback returning void extcon: max3355: Convert to platform remove callback returning void extcon: intel-mrfld: Convert to platform remove callback returning void ...
-
Linus Torvalds authored
Merge tag 'driver-core-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the small set of driver core and kernfs changes for 6.10-rc1. Nothing major here at all, just a small set of changes for some driver core apis, and minor fixups. Included in here are: - sysfs_bin_attr_simple_read() helper added and used - device_show_string() helper added and used All usages of these were acked by the various maintainers. Also in here are: - kernfs minor cleanup - removed unused functions - typo fix in documentation - pay attention to sysfs_create_link() failures in module.c finally All of these have been in linux-next for a very long time with no reported problems" * tag 'driver-core-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: device property: Fix a typo in the description of device_get_child_node_count() kernfs: mount: Remove unnecessary ‘NULL’ values from knparent scsi: Use device_show_string() helper for sysfs attributes platform/x86: Use device_show_string() helper for sysfs attributes perf: Use device_show_string() helper for sysfs attributes IB/qib: Use device_show_string() helper for sysfs attributes hwmon: Use device_show_string() helper for sysfs attributes driver core: Add device_show_string() helper for sysfs attributes treewide: Use sysfs_bin_attr_simple_read() helper sysfs: Add sysfs_bin_attr_simple_read() helper module: don't ignore sysfs_create_link() failures driver core: Remove unused platform_notify, platform_notify_remove
-
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/stagingLinus Torvalds authored
Pull staging driver updates from Greg KH: "Here is the big set of staging driver changes for 6.10-rc1. Not a lot of cleanups happening this kernel release, intern applications must be out of sync at the moment. But we did delete two drivers, wlan-ng and pi433, as they are no longer in use and the developers involved wanted them just gone entirely, allowing us to drop 19k lines from the tree. Other than the normal coding style cleanups here, there has been a lot of work on the vc04_services code, with the intent to finally get that out of staging hopefully soon. It's getting closer, which is nice to see. All of these have been in linux-next for a while with no reported issues" * tag 'staging-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (98 commits) staging: pi433: Remove unused driver staging: vchiq_core: Add missing blank lines staging: vchiq_core: Drop unnecessary blank lines staging: vchiq_core: Add parentheses to VCHIQ_MSG_SRCPORT staging: vchiq_core: Use printk messages for devices staging: vchiq_arm: Drop unnecessary NULL check staging: vc04_services: Delete unnecessary NULL check staging: vc04_services: vchiq_arm: Fix NULL ptr dereferences Staging: rtl8192e: Rename variable DssCCk Staging: rtl8192e: Rename variable ExtHTCapInfo Staging: rtl8192e: Rename variable MPDUDensity Staging: rtl8192e: Rename variable MaxRxAMPDUFactor Staging: rtl8192e: Rename variable MaxAMSDUSize Staging: rtl8192e: Rename variable DelayBA Staging: rtl8192e: Rename variable RxSTBC Staging: rtl8192e: Rename variable TxSTBC Staging: rtl8192e: Rename variable GreenField Staging: rtl8192e: Rename variable ShortGI20Mhz Staging: rtl8192e: Rename variable ShortGI40Mhz Staging: rtl8192e: Rename variable MimoPwrSave ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/ttyLinus Torvalds authored
Pull tty / serial updates from Greg KH: "Here is the big set of tty/serial driver changes for 6.10-rc1. Included in here are: - Usual good set of api cleanups and evolution by Jiri Slaby to make the serial interfaces move out of the 1990's by using kfifos instead of hand-rolling their own logic. - 8250_exar driver updates - max3100 driver updates - sc16is7xx driver updates - exar driver updates - sh-sci driver updates - tty ldisc api addition to help refuse bindings - other smaller serial driver updates All of these have been in linux-next for a while with no reported issues" * tag 'tty-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (113 commits) serial: Clear UPF_DEAD before calling tty_port_register_device_attr_serdev() serial: imx: Raise TX trigger level to 8 serial: 8250_pnp: Simplify "line" related code serial: sh-sci: simplify locking when re-issuing RXDMA fails serial: sh-sci: let timeout timer only run when DMA is scheduled serial: sh-sci: describe locking requirements for invalidating RXDMA serial: sh-sci: protect invalidating RXDMA on shutdown tty: add the option to have a tty reject a new ldisc serial: core: Call device_set_awake_path() for console port dt-bindings: serial: brcm,bcm2835-aux-uart: convert to dtschema tty: serial: uartps: Add support for uartps controller reset arm64: zynqmp: Add resets property for UART nodes dt-bindings: serial: cdns,uart: Add optional reset property serial: 8250_pnp: Switch to DEFINE_SIMPLE_DEV_PM_OPS() serial: 8250_exar: Keep the includes sorted serial: 8250_exar: Make type of bit the same in exar_ee_*_bit() serial: 8250_exar: Use BIT() in exar_ee_read() serial: 8250_exar: Switch to use dev_err_probe() serial: 8250_exar: Return directly from switch-cases serial: 8250_exar: Decrease indentation level ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usbLinus Torvalds authored
Pull USB / Thunderbolt updates from Greg KH: "Here is the big set of USB and Thunderbolt changes for 6.10-rc1. Nothing hugely earth-shattering, just constant forward progress for hardware support of new devices and cleanups over the drivers. Included in here are: - Thunderbolt / USB 4 driver updates - typec driver updates - dwc3 driver updates - gadget driver updates - uss720 driver id additions and fixes (people use USB->arallel port devices still!) - onboard-hub driver rename and additions for new hardware - xhci driver updates - other small USB driver updates and additions for quirks and api changes All of these have been in linux-next for a while with no reported problems" * tag 'usb-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (154 commits) drm/bridge: aux-hpd-bridge: correct devm_drm_dp_hpd_bridge_add() stub usb: fotg210: Add missing kernel doc description usb: dwc3: core: Fix unused variable warning in core driver usb: typec: tipd: rely on i2c_get_match_data() usb: typec: tipd: fix event checking for tps6598x usb: typec: tipd: fix event checking for tps25750 dt-bindings: usb: qcom,dwc3: fix interrupt max items usb: fotg210: Use *-y instead of *-objs in Makefile usb: phy: tegra: Replace of_gpio.h by proper one usb: typec: ucsi: displayport: Fix potential deadlock usb: typec: qcom-pmic-typec: split HPD bridge alloc and registration usb: musc: Remove unused list 'buffers' usb: dwc3: Wait unconditionally after issuing EndXfer command usb: gadget: u_audio: Clear uac pointer when freed. usb: gadget: u_audio: Fix race condition use of controls after free during gadget unbind. dt-bindings: usb: dwc3: Add QDU1000 compatible usb: core: Remove the useless struct usb_devmap which is just a bitmap MAINTAINERS: Remove {ehci,uhci}-platform.c from ARM/VT8500 entry USB: usb_parse_endpoint: ignore reserved bits usb: xhci: compact 'trb_in_td()' arguments ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/lee/ledsLinus Torvalds authored
Pull LED updates from Lee Jones: "Core Frameworks: - Ensure seldom updated triggers have a brightness value before first update New Device Support: - Add support for Simatic IPC Device BX_59A to IPC LEDs Core - Add support for Qualcomm PMI8950 PWM to LPG Core New Functionality: - Add a bunch of new LED function identifiers - Add support for High Resolution Timers in LED Trigger Patten Fix-ups: - Shift out Audio Trigger to the Sound subsystem - Convert suitable calls to devm_* managed resources - Device Tree binding adaptions/conversions/creation - Remove superfluous code/variables/attributes and simplify overall - Use/convert to new/better APIs/helpers/MACROs instead of hand-rolling implementations Bug Fixes: - Repair enabling Torch Mode from V4L2 on the second LED - Ensure PWM is disabled when suspending" * tag 'leds-next-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/leds: (28 commits) leds: mt6370: Remove unused field 'reg_cfgs' from 'struct mt6370_priv' leds: lp50xx: Remove unused field 'num_of_banked_leds' from 'struct lp50xx' leds: lp50xx: Remove unused field 'bank_modules' from 'struct lp50xx_led' leds: aat1290: Remove unused field 'torch_brightness' from 'struct aat1290_led' leds: sun50i-a100: Use match_string() helper to simplify the code leds: pwm: Disable PWM when going to suspend leds: trigger: pattern: Add support for hrtimer leds: mt6360: Fix the second LED can not enable torch mode by V4L2 dt-bindings: leds: leds-qcom-lpg: Add support for PMI8950 PWM leds: qcom-lpg: Add support for PMI8950 PWM leds: apu: Remove duplicate DMI lookup data leds: trigger: netdev: Remove not needed call to led_set_brightness in deactivate dt-bindings: leds: Add LED_FUNCTION_SPEED_* for link speed on LAN/WAN dt-bindings: leds: Add LED_FUNCTION_MOBILE for mobile network leds: simatic-ipc-leds-gpio: Add support for module BX-59A dt-bindings: leds: qcom-lpg: Document PM6150L compatible dt-bindings: leds: pca963x: Convert text bindings to YAML leds: an30259a: Use devm_mutex_init() for mutex initialization leds: mlxreg: Use devm_mutex_init() for mutex initialization leds: nic78bx: Use devm API to cleanup module's resources ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlightLinus Torvalds authored
Pull backlight updates from Lee Jones: "Fix-ups: - FB Backlight interaction overhaul - Remove superfluous code and simplify overall - Constify various structs and struct attributes Bug Fixes: - Repair LED flickering - Fix signedness bugs" * tag 'backlight-next-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight: (42 commits) backlight: sky81452-backlight: Remove unnecessary call to of_node_get() backlight: mp3309c: Fix LEDs flickering in PWM mode backlight: otm3225a: Drop driver owner assignment backlight: lp8788: Drop support for platform data backlight: lcd: Make lcd_class constant backlight: Make backlight_class constant backlight: mp3309c: Fix signedness bug in mp3309c_parse_fwnode() const_structs.checkpatch: add lcd_ops fbdev: omap: lcd_ams_delta: Constify lcd_ops fbdev: imx: Constify lcd_ops fbdev: clps711x: Constify lcd_ops HID: picoLCD: Constify lcd_ops backlight: tdo24m: Constify lcd_ops backlight: platform_lcd: Constify lcd_ops backlight: otm3225a: Constify lcd_ops backlight: ltv350qv: Constify lcd_ops backlight: lms501kf03: Constify lcd_ops backlight: lms283gf05: Constify lcd_ops backlight: l4f00242t03: Constify lcd_ops backlight: jornada720_lcd: Constify lcd_ops ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfdLinus Torvalds authored
Pull MFD updates from Lee Jones: "New Device Support: - Add support for X-Powers AXP717 PMIC to AXP22X - Add support for Rockchip RK816 PMIC to RK8XX - Add support for TI TPS65224 PMIC to TPS6594 New Functionality: - Add Power Off functionality to Rohm BD71828 - Allow I2C SMBus access in Renesas RSMU Fix-ups: - Device Tree binding adaptions/conversions/creation - Shift Intel support over to MSI interrupts - Generify adding platform data away from being ACPI specific - Use device core supplied attribute to register sysfs entries - Replace hand-rolled functionality with generic APIs - Utilise centrally provided helpers and macros - Clean-up error handling - Remove superfluous/duplicated/unused sections - Trivial; spelling, whitespace, coding-style adaptions - More Maple Tree conversions" * tag 'mfd-next-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (44 commits) dt-bindings: mfd: Use full path to other schemas mfd: rsmu: support I2C SMBus access dt-bindings: mfd: Convert lp873x.txt to json-schema dt-bindings: mfd: aspeed: Drop 'oneOf' for pinctrl node dt-bindings: mfd: allwinner,sun6i-a31-prcm: Use hyphens in node names mfd: ssbi: Remove unused field 'slave' from 'struct ssbi' mfd: kempld: Remove custom DMI matching code mfd: cs42l43: Update patching revision check dt-bindings: mfd: qcom: pm8xxx: Add pm8901 compatible mfd: timberdale: Remove redundant assignment to variable err dt-bindings: mfd: qcom,spmi-pmic: Add pbs to SPMI device types dt-bindings: mfd: syscon: Add ti,am62p-cpsw-mac-efuse compatible dt-bindings: mfd: qcom,tcsr: Add compatible for SDX75 mfd: axp20x: Convert to use Maple Tree register cache mfd: bd71828: Remove commented code lines mfd: intel-m10-bmc: Change staging size to a variable dt-bindings: mfd: Add ROHM BD71879 mfd: Tidy Kconfig dependency's parentheses mfd: ocelot-spi: Use spi_sync_transfer() dt-bindings: mfd: syscon: Add missing simple syscon compatibles ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linuxLinus Torvalds authored
Pull RISC-V updates from Palmer Dabbelt: - Add byte/half-word compare-and-exchange, emulated via LR/SC loops - Support for Rust - Support for Zihintpause in hwprobe - Add PR_RISCV_SET_ICACHE_FLUSH_CTX prctl() - Support lockless lockrefs * tag 'riscv-for-linus-6.10-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (42 commits) riscv: defconfig: Enable CONFIG_CLK_SOPHGO_CV1800 riscv: select ARCH_HAS_FAST_MULTIPLIER riscv: mm: still create swiotlb buffer for kmalloc() bouncing if required riscv: Annotate pgtable_l{4,5}_enabled with __ro_after_init riscv: Remove redundant CONFIG_64BIT from pgtable_l{4,5}_enabled riscv: mm: Always use an ASID to flush mm contexts riscv: mm: Preserve global TLB entries when switching contexts riscv: mm: Make asid_bits a local variable riscv: mm: Use a fixed layout for the MM context ID riscv: mm: Introduce cntx2asid/cntx2version helper macros riscv: Avoid TLB flush loops when affected by SiFive CIP-1200 riscv: Apply SiFive CIP-1200 workaround to single-ASID sfence.vma riscv: mm: Combine the SMP and UP TLB flush code riscv: Only send remote fences when some other CPU is online riscv: mm: Broadcast kernel TLB flushes only when needed riscv: Use IPIs for remote cache/TLB flushes by default riscv: Factor out page table TLB synchronization riscv: Flush the instruction cache during SMP bringup riscv: hwprobe: export Zihintpause ISA extension riscv: misaligned: remove CONFIG_RISCV_M_MODE specific code ...
-
Linus Torvalds authored
Merge tag 'loongarch-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson Pull LoongArch updates from Huacai Chen: - Select some options in Kconfig - Give a chance to build with !CONFIG_SMP - Switch to use built-in rustc target - Add new supported device nodes to dts - Some bug fixes and other small changes - Update the default config file * tag 'loongarch-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: LoongArch: Update Loongson-3 default config file LoongArch: dts: Add new supported device nodes to Loongson-2K2000 LoongArch: dts: Add new supported device nodes to Loongson-2K0500 LoongArch: dts: Remove "disabled" state of clock controller node LoongArch: rust: Switch to use built-in rustc target LoongArch: Fix callchain parse error with kernel tracepoint events again LoongArch: Give a chance to build with !CONFIG_SMP LoongArch: Select THP_SWAP if HAVE_ARCH_TRANSPARENT_HUGEPAGE LoongArch: Select ARCH_WANT_DEFAULT_BPF_JIT LoongArch: Select ARCH_SUPPORTS_INT128 if CC_HAS_INT128 LoongArch: Select ARCH_HAS_FAST_MULTIPLIER
-
git://git.monstr.eu/linux-2.6-microblazeLinus Torvalds authored
Pull microblaze updates from Michal Simek: - Cleanup code around removed early_printk * tag 'microblaze-v6.10' of git://git.monstr.eu/linux-2.6-microblaze: microblaze: Remove early printk call from cpuinfo-static.c microblaze: Remove gcc flag for non existing early_printk.c file
-
git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfsLinus Torvalds authored
Pull overlayfs updates from Miklos Szeredi: - Add tmpfile support - Clean up include * tag 'ovl-update-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs: ovl: remove duplicate included header ovl: remove upper umask handling from ovl_create_upper() ovl: implement tmpfile
-
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuseLinus Torvalds authored
Pull fuse updates from Miklos Szeredi: - Add fs-verity support (Richard Fung) - Add multi-queue support to virtio-fs (Peter-Jan Gootzen) - Fix a bug in NOTIFY_RESEND handling (Hou Tao) - page -> folio cleanup (Matthew Wilcox) * tag 'fuse-update-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: virtio-fs: add multi-queue support virtio-fs: limit number of request queues fuse: clear FR_SENT when re-adding requests into pending list fuse: set FR_PENDING atomically in fuse_resend() fuse: Add initial support for fs-verity fuse: Convert fuse_readpages_end() to use folio_end_read()
-
Yafang Shao authored
Our applications, built on Elasticsearch[0], frequently create and delete files. These applications operate within containers, some with a memory limit exceeding 100GB. Over prolonged periods, the accumulation of negative dentries within these containers can amount to tens of gigabytes. Upon container exit, directories are deleted. However, due to the numerous associated dentries, this process can be time-consuming. Our users have expressed frustration with this prolonged exit duration, which constitutes our first issue. Simultaneously, other processes may attempt to access the parent directory of the Elasticsearch directories. Since the task responsible for deleting the dentries holds the inode lock, processes attempting directory lookup experience significant delays. This issue, our second problem, is easily demonstrated: - Task 1 generates negative dentries: $ pwd ~/test $ mkdir es && cd es/ && ./create_and_delete_files.sh [ After generating tens of GB dentries ] $ cd ~/test && rm -rf es [ It will take a long duration to finish ] - Task 2 attempts to lookup the 'test/' directory $ pwd ~/test $ ls The 'ls' command in Task 2 experiences prolonged execution as Task 1 is deleting the dentries. We've devised a solution to address both issues by deleting associated dentry when removing a file. Interestingly, we've noted that a similar patch was proposed years ago[1], although it was rejected citing the absence of tangible issues caused by negative dentries. Given our current challenges, we're resubmitting the proposal. All relevant stakeholders from previous discussions have been included for reference. Some alternative solutions are also under discussion[2][3], such as shrinking child dentries outside of the parent inode lock or even asynchronously shrinking child dentries. However, given the straightforward nature of the current solution, I believe this approach is still necessary. [ NOTE! This is a pretty fundamental change in how we deal with unlinking dentries, and it doesn't change the fact that you can have lots of negative dentries from just doing negative lookups. But the kernel test robot is at least initially happy with this from a performance angle, so I'm applying this ASAP just to get more testing and as a "known fix for an issue people hit in real life". Put another way: we should still look at the alternatives, and this patch may get reverted if somebody finds a performance regression on some other load. - Linus ] Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Link: https://github.com/elastic/elasticsearch [0] Link: https://patchwork.kernel.org/project/linux-fsdevel/patch/1502099673-31620-1-git-send-email-wangkai86@huawei.com [1] Link: https://lore.kernel.org/linux-fsdevel/20240511200240.6354-2-torvalds@linux-foundation.org/ [2] Link: https://lore.kernel.org/linux-fsdevel/CAHk-=wjEMf8Du4UFzxuToGDnF3yLaMcrYeyNAaH1NJWa6fwcNQ@mail.gmail.com/ [3] Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Waiman Long <longman@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Wangkai <wangkai86@huawei.com> Cc: Colin Walters <walters@verbum.org> Tested-by: kernel test robot <oliver.sang@intel.com> Link: https://lore.kernel.org/all/202405221518.ecea2810-oliver.sang@intel.com/Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
- 21 May, 2024 14 commits
-
-
Linus Torvalds authored
Merge tag 'perf-tools-for-v6.10-1-2024-05-21' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tools updates from Arnaldo Carvalho de Melo: "General: - Integrate the shellcheck utility with the build of perf to allow catching shell problems early in areas such as 'perf test', 'perf trace' scrape scripts, etc - Add 'uretprobe' variant in the 'perf bench uprobe' tool - Add script to run instances of 'perf script' in parallel - Allow parsing tracepoint names that start with digits, such as 9p/9p_client_req, etc. Make sure 'perf test' tests it even on systems where those tracepoints aren't available - Add Kan Liang to MAINTAINERS as a perf tools reviewer - Add support for using the 'capstone' disassembler library in various tools, such as 'perf script' and 'perf annotate'. This is an alternative for the use of the 'xed' and 'objdump' disassemblers Data-type profiling improvements: - Resolve types for a->b->c by backtracking the assignments until it finds DWARF info for one of those members - Support for global variables, keeping a cache to speed up lookups - Handle the 'call' instruction, dealing with effects on registers and handling its return when tracking register data types - Handle x86's segment based addressing like %gs:0x28, to support things like per CPU variables, the stack canary, etc - Data-type profiling got big speedups when using capstone for disassembling. The objdump outoput parsing method is left as a fallback when capstone fails or isn't available. There are patches posted for 6.11 that to use a LLVM disassembler - Support event group display in the TUI when annotating types with --data-type, for instance to show memory load and store events for the data type fields - Optimize the 'perf annotate' data structures, reducing memory usage - Add a initial 'perf test' for 'perf annotate', checking that a target symbol appears on the output, specifying objdump via the command line, etc Vendor Events: - Update Intel JSON files for Cascade Lake X, Emerald Rapids, Grand Ridge, Ice Lake X, Lunar Lake, Meteor Lake, Sapphire Rapids, Sierra Forest, Sky Lake X, Sky Lake and Snow Ridge X. Remove info metrics erroneously in TopdownL1 - Add AMD's Zen 5 core and uncore events and metrics. Those come from the "Performance Monitor Counters for AMD Family 1Ah Model 00h- 0Fh Processors" document, with events that capture information on op dispatch, execution and retirement, branch prediction, L1 and L2 cache activity, TLB activity, etc - Mark L1D_CACHE_INVAL impacted by errata for ARM64's AmpereOne/ AmpereOneX Miscellaneous: - Sync header copies with the kernel sources - Move some header copies used only for generating translation string tables for ioctl cmds and other syscall integer arguments to a new directory under tools/perf/beauty/, to separate from copies in tools/include/ that are used to build the tools - Introduce scrape script for several syscall 'flags'/'mask' arguments - Improve cpumap utilization, fixing up pairing of refcounts, using the right iterators (perf_cpu_map__for_each_cpu), etc - Give more details about raw event encodings in 'perf list', show tracepoint encoding in the detailed output - Refactor the DSOs handling code, reducing memory usage - Document the BPF event modifier and add a 'perf test' for it - Improve the event parser, better error messages and add further 'perf test's for it - Add reference count checking to 'struct comm_str' and 'struct mem_info' - Make ARM64's 'perf test' entries for the Neoverse N1 more robust - Tweak the ARM64's Coresight 'perf test's - Improve ARM64's CoreSight ETM version detection and error reporting - Fix handling of symbols when using kcore - Fix PAI (Processor Activity Instrumentation) counter names for s390 virtual machines in 'perf report' - Fix -g/--call-graph option failure in 'perf sched timehist' - Add LIBTRACEEVENT_DIR build option to allow building with libtraceevent installed in non-standard directories, such as when doing cross builds - Various 'perf test' and 'perf bench' fixes - Improve 'perf probe' error message for long C++ probe names" * tag 'perf-tools-for-v6.10-1-2024-05-21' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (260 commits) tools lib subcmd: Show parent options in help perf pmu: Count sys and cpuid JSON events separately perf stat: Don't display metric header for non-leader uncore events perf annotate-data: Ensure the number of type histograms perf annotate: Fix segfault on sample histogram perf daemon: Fix file leak in daemon_session__control libsubcmd: Fix parse-options memory leak perf lock: Avoid memory leaks from strdup() perf sched: Rename 'switches' column header to 'count' and add usage description, options for latency perf tools: Ignore deleted cgroups perf parse: Allow tracepoint names to start with digits perf parse-events: Add new 'fake_tp' parameter for tests perf parse-events: pass parse_state to add_tracepoint perf symbols: Fix ownership of string in dso__load_vmlinux() perf symbols: Update kcore map before merging in remaining symbols perf maps: Re-use __maps__free_maps_by_name() perf symbols: Remove map from list before updating addresses perf tracepoint: Don't scan all tracepoints to test if one exists perf dwarf-aux: Fix build with HAVE_DWARF_CFI_SUPPORT perf thread: Fixes to thread__new() related to initializing comm ...
-
https://github.com/norov/linuxLinus Torvalds authored
Pull bitmap updates from Yury Norov: - topology_span_sane() optimization from Kyle Meyer - fns() rework from Kuan-Wei Chiu (used in cpumask_local_spread() and other places) - headers cleanup from Andy - add a MAINTAINERS record for bitops API * tag 'bitmap-for-6.10v2' of https://github.com/norov/linux: usercopy: Don't use "proxy" headers bitops: Move aligned_byte_mask() to wordpart.h MAINTAINERS: add BITOPS API record bitmap: relax find_nth_bit() limitation on return value lib: make test_bitops compilable into the kernel image bitops: Optimize fns() for improved performance lib/test_bitops: Add benchmark test for fns() Compiler Attributes: Add __always_used macro sched/topology: Optimize topology_span_sane() cpumask: Add for_each_cpu_from()
-
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds authored
Pull misc vfs updates from Al Viro: "Assorted commits that had missed the last merge window..." * tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: remove call_{read,write}_iter() functions do_dentry_open(): kill inode argument kernel_file_open(): get rid of inode argument get_file_rcu(): no need to check for NULL separately fd_is_open(): move to fs/file.c close_on_exec(): pass files_struct instead of fdtable
-
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds authored
Pull bdev flags update from Al Viro: "Compactifying bdev flags. We can easily have up to 24 flags with sane atomicity, _without_ pushing anything out of the first cacheline of struct block_device" * tag 'pull-bd_flags-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: bdev: move ->bd_make_it_fail to ->__bd_flags bdev: move ->bd_ro_warned to ->__bd_flags bdev: move ->bd_has_subit_bio to ->__bd_flags bdev: move ->bd_write_holder into ->__bd_flags bdev: move ->bd_read_only to ->__bd_flags bdev: infrastructure for flags wrapper for access to ->bd_partno Use bdev_is_paritition() instead of open-coding it
-
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linuxLinus Torvalds authored
Pull more s390 updates from Alexander Gordeev: - Switch read and write software bits for PUDs - Add missing hardware bits for PUDs and PMDs - Generate unwind information for C modules to fix GDB unwind error for vDSO functions - Create .build-id links for unstripped vDSO files to enable vDSO debugging with symbols - Use standard stack frame layout for vDSO generated stack frames to manually walk stack frames without DWARF information - Rework perf_callchain_user() and arch_stack_walk_user() functions to reduce code duplication - Skip first stack frame when walking user stack - Add basic checks to identify invalid instruction pointers when walking stack frames - Introduce and use struct stack_frame_vdso_wrapper within vDSO user wrapper code to automatically generate an asm-offset define. Also use STACK_FRAME_USER_OVERHEAD instead of STACK_FRAME_OVERHEAD to document that the code works with user space stack - Clear the backchain of the extra stack frame added by the vDSO user wrapper code. This allows the user stack walker to detect and skip the non-standard stack frame. Without this an incorrect instruction pointer would be added to stack traces. - Rewrite psw_idle() function in C to ease maintenance and further enhancements - Remove get_vtimer() function and use get_cpu_timer() instead - Mark psw variable in __load_psw_mask() as __unitialized to avoid superfluous clearing of PSW - Remove obsolete and superfluous comment about removed TIF_FPU flag - Replace memzero_explicit() and kfree() with kfree_sensitive() to fix warnings reported by Coccinelle - Wipe sensitive data and all copies of protected- or secure-keys from stack when an IOCTL fails - Both do_airq_interrupt() and do_io_interrupt() functions set CIF_NOHZ_DELAY flag. Move it in do_io_irq() to simplify the code - Provide iucv_alloc_device() and iucv_release_device() helpers, which can be used to deduplicate more or less identical IUCV device allocation and release code in four different drivers - Make use of iucv_alloc_device() and iucv_release_device() helpers to get rid of quite some code and also remove a cast to an incompatible function (clang W=1) - There is no user of iucv_root outside of the core IUCV code left. Therefore remove the EXPORT_SYMBOL - __apply_alternatives() contains a runtime check which verifies that the size of the to be patched code area is even. Convert this to a compile time check - Increase size of buffers for sending z/VM CP DIAGNOSE X'008' commands from 128 to 240 - Do not accept z/VM CP DIAGNOSE X'008' commands longer than maximally allowed - Use correct defines IPL_BP_NVME_LEN and IPL_BP0_NVME_LEN instead of IPL_BP_FCP_LEN and IPL_BP0_FCP_LEN ones to initialize NVMe reIPL block on 'scp_data' sysfs attribute update - Initialize the correct fields of the NVMe dump block, which were confused with FCP fields - Refactor macros for 'scp_data' (re-)IPL sysfs attribute to reduce code duplication - Introduce 'scp_data' sysfs attribute for dump IPL to allow tools such as dumpconf passing additional kernel command line parameters to a stand-alone dumper - Rework the CPACF query functions to use the correct RRE or RRF instruction formats and set instruction register fields correctly - Instead of calling BUG() at runtime force a link error during compile when a unsupported opcode is used with __cpacf_query() or __cpacf_check_opcode() functions - Fix a crash in ap_parse_bitmap_str() function on /sys/bus/ap/apmask or /sys/bus/ap/aqmask sysfs file update with a relative mask value - Fix "bindings complete" udev event which should be sent once all AP devices have been bound to device drivers and again when unbind/bind actions take place and all AP devices are bound again - Facility list alt_stfle_fac_list is nowhere used in the decompressor, therefore remove it there - Remove custom kprobes insn slot allocator in favour of the standard module_alloc() one, since kernel image and module areas are located within 4GB - Use kvcalloc() instead of kvmalloc_array() in zcrypt driver to avoid calling memset() with a large byte count and get rid of the sparse warning as result * tag 's390-6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (39 commits) s390/zcrypt: Use kvcalloc() instead of kvmalloc_array() s390/kprobes: Remove custom insn slot allocator s390/boot: Remove alt_stfle_fac_list from decompressor s390/ap: Fix bind complete udev event sent after each AP bus scan s390/ap: Fix crash in AP internal function modify_bitmap() s390/cpacf: Make use of invalid opcode produce a link error s390/cpacf: Split and rework cpacf query functions s390/ipl: Introduce sysfs attribute 'scp_data' for dump ipl s390/ipl: Introduce macros for (re)ipl sysfs attribute 'scp_data' s390/ipl: Fix incorrect initialization of nvme dump block s390/ipl: Fix incorrect initialization of len fields in nvme reipl block s390/ipl: Do not accept z/VM CP diag X'008' cmds longer than max length s390/ipl: Fix size of vmcmd buffers for sending z/VM CP diag X'008' cmds s390/alternatives: Convert runtime sanity check into compile time check s390/iucv: Unexport iucv_root tty: hvc-iucv: Make use of iucv_alloc_device() s390/smsgiucv_app: Make use of iucv_alloc_device() s390/netiucv: Make use of iucv_alloc_device() s390/vmlogrdr: Make use of iucv_alloc_device() s390/iucv: Provide iucv_alloc_device() / iucv_release_device() ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommuLinus Torvalds authored
Pull m68knommu update from Greg Ungerer: . remove use of kernel config option from uapi header * tag 'm68knommu-for-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu: m68k: Avoid CONFIG_COLDFIRE switch in uapi header
-
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efiLinus Torvalds authored
Pull EFI fix from Ard Biesheuvel: - Followup fix for the EFI boot sequence refactor, which may result in physical KASLR putting the kernel in a region which is being used for a special purpose via a command line argument. * tag 'efi-fixes-for-v6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: x86/efistub: Omit physical KASLR when memory reservations exist
-
Linus Torvalds authored
Merge tag 'for-6.10/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mike Snitzer: - Fix DM discard regressions due to DM core switching over to using queue_limits_set() without DM core and targets first being updated to set (and stack) discard limits in terms of max_hw_discard_sectors and not max_discard_sectors - Fix stable@ DM integrity discard support to set device's discard_granularity limit to the device's logical block size * tag 'for-6.10/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm: always manage discard support in terms of max_hw_discard_sectors dm-integrity: set discard_granularity to logical block size
-
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pmLinus Torvalds authored
Pull power management fixes from Rafael Wysocki: "These fix the amd-pstate driver and the operating performance point (OPP) handling related to generic PM domains. Specifics: - Fix a memory leak in the exit path of amd-pstate (Peng Ma) - Fix required_opp_tables handling in the cases when multiple generic PM domains share one OPP table (Viresh Kumar)" * tag 'pm-6.10-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: OPP: Fix required_opp_tables for multiple genpds using same table cpufreq: amd-pstate: fix memory leak on CPU EPP exit
-
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pmLinus Torvalds authored
Pull ACPI fixes from Rafael Wysocki: "These make the ACPI EC driver always install the EC address space handler at the root of the ACPI namespace which causes it to take care of all EC operation regions everywhere. This means that the custom EC address space handler in the WMI driver is not needed any more and accordingly it gets removed altogether" * tag 'acpi-6.10-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: platform/x86: wmi: Remove custom EC address space handler ACPI: EC: Install address space handler at the namespace root
-
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pmLinus Torvalds authored
Pull thermal control fixes from Rafael Wysocki: "These fix the MediaTek lvts_thermal driver and the handling of trip points that start as invalid and are adjusted later by user space via sysfs. Specifics: - Fix and clean up the MediaTek lvts_thermal driver (Julien Panis) - Prevent invalid trip point handling from triggering spurious trip point crossing events and allow passive polling to stop when a passive trip point involved in it becomes invalid (Rafael Wysocki)" * tag 'thermal-6.10-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: thermal: core: Fix the handling of invalid trip points thermal/drivers/mediatek/lvts_thermal: Fix wrong lvts_ctrl index thermal/drivers/mediatek/lvts_thermal: Remove unused members from struct lvts_ctrl_data thermal/drivers/mediatek/lvts_thermal: Check NULL ptr on lvts_data
-
Linus Torvalds authored
Merge tag 'intel-gpio-v6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/andy/linux-gpio-intel Pull intel-gpio fixes from Andy Shevchenko: - NULL pointer dereference fix in GPIO APCI library - Restore ACPI handle matching for GPIO devices represented in banks * tag 'intel-gpio-v6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/andy/linux-gpio-intel: gpiolib: acpi: Fix failed in acpi_gpiochip_find() by adding parent node match gpiolib: acpi: Move ACPI device NULL check to acpi_can_fallback_to_crs()
-
git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwireLinus Torvalds authored
Pull soundwire updates from Vinod Koul: - cleanup and conversion for soundwire sysfs groups - intel support for ace2x bits, auxdevice pm improvements - qcom multi link device support * tag 'soundwire-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire: (33 commits) soundwire: intel_ace2.x: add support for DOAISE property soundwire: intel_ace2.x: add support for DODSE property soundwire: intel_ace2x: use DOAIS and DODS settings from firmware soundwire: intel_ace2x: cleanup DOAIS/DODS settings soundwire: intel_ace2x: simplify check_wake() soundwire: intel_ace2x: fix wakeup handling soundwire: intel_init: resume all devices on exit. soundwire: intel: export intel_resume_child_device soundwire: intel_auxdevice: use pm_runtime_resume() instead of pm_request_resume() ASoC: SOF: Intel: hda: disable SoundWire interrupt later soundwire: qcom: allow multi-link on newer devices soundwire: intel_ace2x: use legacy formula for intel_alh_id soundwire: reconcile dp0_prop and dpn_prop soundwire: intel_ace2x: set the clock source soundwire: intel_ace2.x: power-up first before setting SYNCPRD soundwire: intel_ace2x: move and extend clock selection soundwire: intel: add support for MeteorLake additional clocks soundwire: intel: add more values for SYNCPRD soundwire: bus: extend base clock checks to 96 MHz soundwire: cadence: show the bus frequency and frame shape ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phyLinus Torvalds authored
Pull generic phy updates from Vinod Koul: "New HW Support: - Support for Embedded DisplayPort and DisplayPort submodes and driver support on Qualcomm X1E80100 edp driver - Qualcomm QMP UFS PHY for SM8475, QMP USB phy for QDU1000/QRU1000 and eusb2-repeater for SMB2360 - Samsung HDMI PHY for i.MX8MP, gs101 UFS phy - Mediatek XFI T-PHY support for mt7988 - Rockchip usbdp combo phy driver Updates: - Qualcomm x4 lane EP support for sa8775p, v4 ad v6 support for X1E80100, SM8650 tables for UFS Gear 4 & 5 and correct voltage swing tables - Freescale imx8m-pci pcie link-up updates - Rockchip rx-common-refclk-mode support - More platform remove callback returning void conversions" * tag 'phy-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy: (43 commits) dt-bindings: phy: qcom,usb-snps-femto-v2: use correct fallback for sc8180x dt-bindings: phy: qcom,sc8280xp-qmp-ufs-phy: fix msm899[68] power-domains dt-bindings: phy: qcom,sc8280xp-qmp-pcie-phy: fix x1e80100-gen3x2 schema phy: qcpm-qmp-usb: Add support for QDU1000/QRU1000 dt-bindings: phy: qcom,qmp-usb: Add QDU1000 USB3 PHY dt-bindings: phy: qcom,usb-snps-femto-v2: Add bindings for QDU1000 phy: qcom-qmp-pcie: add x4 lane EP support for sa8775p phy: samsung-ufs: ufs: exit on first reported error phy: samsung-ufs: ufs: remove superfluous mfd/syscon.h header phy: rockchip: fix CONFIG_TYPEC dependency phy: rockchip: usbdp: fix uninitialized variable phy: rockchip-snps-pcie3: add support for rockchip,rx-common-refclk-mode dt-bindings: phy: rockchip,pcie3-phy: add rockchip,rx-common-refclk-mode phy: rockchip: add usbdp combo phy driver dt-bindings: phy: add rockchip usbdp combo phy document phy: add driver for MediaTek XFI T-PHY dt-bindings: phy: mediatek,mt7988-xfi-tphy: add new bindings phy: freescale: fsl-samsung-hdmi: Convert to platform remove callback returning void phy: qcom: qmp-ufs: update SM8650 tables for Gear 4 & 5 MAINTAINERS: Add phy-gs101-ufs file to Tensor GS101. ...
-