- 17 Feb, 2021 1 commit
-
-
Michal Hocko authored
Preemption mode selection is currently hardcoded on Kconfig choices. Introduce a dedicated option to tune preemption flavour at boot time, This will be only available on architectures efficiently supporting static calls in order not to tempt with the feature against additional overhead that might be prohibitive or undesirable. CONFIG_PREEMPT_DYNAMIC is automatically selected by CONFIG_PREEMPT if the architecture provides the necessary support (CONFIG_STATIC_CALL_INLINE, CONFIG_GENERIC_ENTRY, and provide with __preempt_schedule_function() / __preempt_schedule_notrace_function()). Suggested-by:
Peter Zijlstra <peterz@infradead.org> Signed-off-by:
Michal Hocko <mhocko@suse.com> Signed-off-by:
Frederic Weisbecker <frederic@kernel.org> [peterz: relax requirement to HAVE_STATIC_CALL] Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20210118141223.123667-5-frederic@kernel.org
-
- 13 Feb, 2021 1 commit
-
-
Heiko Carstens authored
s390 and alpha are the only 64 bit architectures with a 32-bit ino_t. Since this is quite unusual this causes bugs from time to time. See e.g. commit ebce3eb2 ("ceph: fix inode number handling on arches with 32-bit ino_t") for an example. This (obviously) also prevents s390 and alpha to use 64-bit ino_t for tmpfs. See commit b85a7a8b ("tmpfs: disallow CONFIG_TMPFS_INODE64 on s390"). Therefore switch both s390 and alpha to 64-bit ino_t. This should only have an effect on the ustat system call. To prevent ABI breakage define struct ustat compatible to the old layout and change sys_ustat() accordingly. Acked-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Heiko Carstens <hca@linux.ibm.com> Signed-off-by:
Vasily Gorbik <gor@linux.ibm.com>
-
- 29 Jan, 2021 1 commit
-
-
Viresh Kumar authored
The "oprofile" user-space tools don't use the kernel OPROFILE support any more, and haven't in a long time. User-space has been converted to the perf interfaces. Remove kernel's old oprofile support. Suggested-by:
Christoph Hellwig <hch@infradead.org> Suggested-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Viresh Kumar <viresh.kumar@linaro.org> Acked-by:
Robert Richter <rric@kernel.org> Acked-by: Paul E. McKenney <paulmck@kernel.org> #RCU Acked-by:
William Cohen <wcohen@redhat.com> Acked-by:
Al Viro <viro@zeniv.linux.org.uk> Acked-by:
Thomas Gleixner <tglx@linutronix.de>
-
- 21 Jan, 2021 1 commit
-
-
Lukas Bulwahn authored
Commit adab66b7 ("Revert: "ring-buffer: Remove HAVE_64BIT_ALIGNED_ACCESS"") added the config HAVE_64BIT_ALIGNED_ACCESS back into arch/Kconfig with this revert. In the meantime, commit c9b54d6f ("docs: move other kAPI documents to core-api") changed ./Documentation/unaligned-memory-access.txt to ./Documentation/core-api/unaligned-memory-access.rst. Fortunately, ./scripts/documentation-file-ref-check detects this and warns about this broken reference. Update the file reference in arch/Kconfig. Fixes: adab66b7 ("Revert: "ring-buffer: Remove HAVE_64BIT_ALIGNED_ACCESS"") Signed-off-by:
Lukas Bulwahn <lukas.bulwahn@gmail.com> Link: https://lore.kernel.org/r/20210119095326.13896-1-lukas.bulwahn@gmail.com Signed-off-by:
Jonathan Corbet <corbet@lwn.net>
-
- 14 Jan, 2021 2 commits
-
-
Sami Tolvanen authored
With CONFIG_MODVERSIONS, version information is linked into each compilation unit that exports symbols. With LTO, we cannot use this method as all C code is compiled into LLVM bitcode instead. This change collects symbol versions into .symversions files and merges them in link-vmlinux.sh where they are all linked into vmlinux.o at the same time. Signed-off-by:
Sami Tolvanen <samitolvanen@google.com> Reviewed-by:
Kees Cook <keescook@chromium.org> Signed-off-by:
Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20201211184633.3213045-4-samitolvanen@google.com
-
Sami Tolvanen authored
This change adds build system support for Clang's Link Time Optimization (LTO). With -flto, instead of ELF object files, Clang produces LLVM bitcode, which is compiled into native code at link time, allowing the final binary to be optimized globally. For more details, see: https://llvm.org/docs/LinkTimeOptimization.html The Kconfig option CONFIG_LTO_CLANG is implemented as a choice, which defaults to LTO being disabled. To use LTO, the architecture must select ARCH_SUPPORTS_LTO_CLANG and support: - compiling with Clang, - compiling all assembly code with Clang's integrated assembler, - and linking with LLD. While using CONFIG_LTO_CLANG_FULL results in the best runtime performance, the compilation is not scalable in time or memory. CONFIG_LTO_CLANG_THIN enables ThinLTO, which allows parallel optimization and faster incremental builds. ThinLTO is used by default if the architecture also selects ARCH_SUPPORTS_LTO_CLANG_THIN: https://clang.llvm.org/docs/ThinLTO.html To enable LTO, LLVM tools must be used to handle bitcode files, by passing LLVM=1 and LLVM_IAS=1 options to make: $ make LLVM=1 LLVM_IAS=1 defconfig $ scripts/config -e LTO_CLANG_THIN $ make LLVM=1 LLVM_IAS=1 To prepare for LTO support with other compilers, common parts are gated behind the CONFIG_LTO option, and LTO can be disabled for specific files by filtering out CC_FLAGS_LTO. Signed-off-by:
Sami Tolvanen <samitolvanen@google.com> Reviewed-by:
Kees Cook <keescook@chromium.org> Signed-off-by:
Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20201211184633.3213045-3-samitolvanen@google.com
-
- 06 Jan, 2021 1 commit
-
-
Al Viro authored
To get rid of hardcoded size/offset in those macros we need to have a definition of i386 variant of struct elf_prstatus. However, we can't do that in asm/compat.h - the types needed for that are not there and adding an include of asm/user32.h into asm/compat.h would cause a lot of mess. That could be conveniently done in elfcore-compat.h, but currently there is nowhere to put arch-dependent parts of it - no asm/elfcore-compat.h. So we introduce a new file (asm/elfcore-compat.h, present on architectures that have CONFIG_ARCH_HAS_ELFCORE_COMPAT set, currently only on x86), have it pulled by linux/elfcore-compat.h and move the definitions there. As a side benefit, we don't need to worry about accidental inclusion of that file into binfmt_elf.c itself, so we don't need the dance with COMPAT_PRSTATUS_SIZE, etc. - only fs/compat_binfmt_elf.c will see that header. Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- 28 Dec, 2020 1 commit
-
-
Brian Gerst authored
Commit 121b32a5 ("x86/entry/32: Use IA32-specific wrappers for syscalls taking 64-bit arguments") converted native x86-32 which take 64-bit arguments to use the compat handlers to allow conversion to passing args via pt_regs. sys_fanotify_mark() was however missed, as it has a general compat handler. Add a config option that will use the syscall wrapper that takes the split args for native 32-bit. [ bp: Fix typo in Kconfig help text. ] Fixes: 121b32a5 ("x86/entry/32: Use IA32-specific wrappers for syscalls taking 64-bit arguments") Reported-by:
Paweł Jasiak <pawel@jasiak.xyz> Signed-off-by:
Brian Gerst <brgerst@gmail.com> Signed-off-by:
Borislav Petkov <bp@suse.de> Acked-by:
Jan Kara <jack@suse.cz> Acked-by:
Andy Lutomirski <luto@kernel.org> Link: https://lkml.kernel.org/r/20201130223059.101286-1-brgerst@gmail.com
-
- 22 Dec, 2020 1 commit
-
-
Andrey Konovalov authored
Even though hardware tag-based mode currently doesn't support checking vmalloc allocations, it doesn't use shadow memory and works with VMAP_STACK as is. Change VMAP_STACK definition accordingly. Link: https://lkml.kernel.org/r/ecdb2a1658ebd88eb276dee2493518ac0e82de41.1606162397.git.andreyknvl@google.com Link: https://linux-review.googlesource.com/id/I3552cbc12321dec82cd7372676e9372a2eb452ac Signed-off-by:
Andrey Konovalov <andreyknvl@google.com> Reviewed-by:
Marco Elver <elver@google.com> Acked-by:
Catalin Marinas <catalin.marinas@arm.com> Reviewed-by:
Dmitry Vyukov <dvyukov@google.com> Tested-by:
Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Branislav Rankov <Branislav.Rankov@arm.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 15 Dec, 2020 4 commits
-
-
Mike Rapoport authored
The design of DEBUG_PAGEALLOC presumes that __kernel_map_pages() must never fail. With this assumption is wouldn't be safe to allow general usage of this function. Moreover, some architectures that implement __kernel_map_pages() have this function guarded by #ifdef DEBUG_PAGEALLOC and some refuse to map/unmap pages when page allocation debugging is disabled at runtime. As all the users of __kernel_map_pages() were converted to use debug_pagealloc_map_pages() it is safe to make it available only when DEBUG_PAGEALLOC is set. Link: https://lkml.kernel.org/r/20201109192128.960-4-rppt@kernel.org Signed-off-by:
Mike Rapoport <rppt@linux.ibm.com> Acked-by:
David Hildenbrand <david@redhat.com> Acked-by:
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Andy Lutomirski <luto@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Rientjes <rientjes@google.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Len Brown <len.brown@intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Mike Rapoport authored
ARM and ARM64 free unused parts of the memory map just before the initialization of the page allocator. To allow holes in the memory map both architectures overload pfn_valid() and define HAVE_ARCH_PFN_VALID. Allowing holes in the memory map for FLATMEM may be useful for small machines, such as ARC and m68k and will enable those architectures to cease using DISCONTIGMEM and still support more than one memory bank. Move the functions that free unused memory map to generic mm and enable them in case HAVE_ARCH_PFN_VALID=y. Link: https://lkml.kernel.org/r/20201101170454.9567-10-rppt@kernel.org Signed-off-by:
Mike Rapoport <rppt@linux.ibm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64] Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matt Turner <mattst88@gmail.com> Cc: Meelis Roos <mroos@linux.ee> Cc: Michael Schmitz <schmitzmic@gmail.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Tony Luck <tony.luck@intel.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will@kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Kalesh Singh authored
Android needs to move large memory regions for garbage collection. The GC requires moving physical pages of multi-gigabyte heap using mremap. During this move, the application threads have to be paused for correctness. It is critical to keep this pause as short as possible to avoid jitters during user interaction. Optimize mremap for >= 1GB-sized regions by moving at the PUD/PGD level if the source and destination addresses are PUD-aligned. For CONFIG_PGTABLE_LEVELS == 3, moving at the PUD level in effect moves PGD entries, since the PUD entry is “folded back” onto the PGD entry. Add HAVE_MOVE_PUD so that architectures where moving at the PUD level isn't supported/tested can turn this off by not selecting the config. Link: https://lkml.kernel.org/r/20201014005320.2233162-4-kaleshsingh@google.com Signed-off-by:
Kalesh Singh <kaleshsingh@google.com> Acked-by:
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by:
kernel test robot <lkp@intel.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Geffon <bgeffon@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Gavin Shan <gshan@redhat.com> Cc: Hassan Naveed <hnaveed@wavecomp.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jia He <justin.he@arm.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kees Cook <keescook@chromium.org> Cc: Krzysztof Kozlowski <krzk@kernel.org> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Mina Almasry <almasrymina@google.com> Cc: Minchan Kim <minchan@google.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Ram Pai <linuxram@us.ibm.com> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Sandipan Das <sandipan@linux.ibm.com> Cc: SeongJae Park <sjpark@amazon.de> Cc: Shuah Khan <shuah@kernel.org> Cc: Steven Price <steven.price@arm.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Colin Ian King authored
There are a few spelling mistakes in the Kconfig comments and help text. Fix these. Link: https://lkml.kernel.org/r/20201207155004.171962-1-colin.king@canonical.com Signed-off-by:
Colin Ian King <colin.king@canonical.com> Acked-by:
Randy Dunlap <rdunlap@infradead.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 14 Dec, 2020 1 commit
-
-
Steven Rostedt (VMware) authored
It was believed that metag was the only architecture that required the ring buffer to keep 8 byte words aligned on 8 byte architectures, and with its removal, it was assumed that the ring buffer code did not need to handle this case. It appears that sparc64 also requires this. The following was reported on a sparc64 boot up: kernel: futex hash table entries: 65536 (order: 9, 4194304 bytes, linear) kernel: Running postponed tracer tests: kernel: Testing tracer function: kernel: Kernel unaligned access at TPC[552a20] trace_function+0x40/0x140 kernel: Kernel unaligned access at TPC[552a24] trace_function+0x44/0x140 kernel: Kernel unaligned access at TPC[552a20] trace_function+0x40/0x140 kernel: Kernel unaligned access at TPC[552a24] trace_function+0x44/0x140 kernel: Kernel unaligned access at TPC[552a20] trace_function+0x40/0x140 kernel: PASSED Need to put back the 64BIT aligned code for the ring buffer. Link: https://lore.kernel.org/r/CADxRZqzXQRYgKc=y-KV=S_yHL+Y8Ay2mh5ezeZUnpRvg+syWKw@mail.gmail.com Cc: stable@vger.kernel.org Fixes: 86b3de60 ("ring-buffer: Remove HAVE_64BIT_ALIGNED_ACCESS") Reported-by:
Anatoly Pugachev <matorola@gmail.com> Signed-off-by:
Steven Rostedt (VMware) <rostedt@goodmis.org>
-
- 02 Dec, 2020 1 commit
-
-
Frederic Weisbecker authored
s390 has its own version of IRQ entry accounting because it doesn't account the idle time the same way the other architectures do. Only the actual idle sleep time is accounted as idle time, the rest of the idle task execution is accounted as system time. Make the generic IRQ entry accounting aware of architectures that have their own way of accounting idle time and convert s390 to use it. This prepares s390 to get involved in further consolidations of IRQ time accounting. Signed-off-by:
Frederic Weisbecker <frederic@kernel.org> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20201202115732.27827-3-frederic@kernel.org
-
- 01 Dec, 2020 1 commit
-
-
Nathan Chancellor authored
Currently, '--orphan-handling=warn' is spread out across four different architectures in their respective Makefiles, which makes it a little unruly to deal with in case it needs to be disabled for a specific linker version (in this case, ld.lld 10.0.1). To make it easier to control this, hoist this warning into Kconfig and the main Makefile so that disabling it is simpler, as the warning will only be enabled in a couple places (main Makefile and a couple of compressed boot folders that blow away LDFLAGS_vmlinx) and making it conditional is easier due to Kconfig syntax. One small additional benefit of this is saving a call to ld-option on incremental builds because we will have already evaluated it for CONFIG_LD_ORPHAN_WARN. To keep the list of supported architectures the same, introduce CONFIG_ARCH_WANT_LD_ORPHAN_WARN, which an architecture can select to gain this automatically after all of the sections are specified and size asserted. A special thanks to Kees Cook for the help text on this config. Link: https://github.com/ClangBuiltLinux/linux/issues/1187 Acked-by:
Kees Cook <keescook@chromium.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Reviewed-by:
Nick Desaulniers <ndesaulniers@google.com> Tested-by:
Nick Desaulniers <ndesaulniers@google.com> Signed-off-by:
Nathan Chancellor <natechancellor@gmail.com> Signed-off-by:
Masahiro Yamada <masahiroy@kernel.org>
-
- 20 Nov, 2020 1 commit
-
-
YiFei Zhu authored
Currently the kernel does not provide an infrastructure to translate architecture numbers to a human-readable name. Translating syscall numbers to syscall names is possible through FTRACE_SYSCALL infrastructure but it does not provide support for compat syscalls. This will create a file for each PID as /proc/pid/seccomp_cache. The file will be empty when no seccomp filters are loaded, or be in the format of: <arch name> <decimal syscall number> <ALLOW | FILTER> where ALLOW means the cache is guaranteed to allow the syscall, and filter means the cache will pass the syscall to the BPF filter. For the docker default profile on x86_64 it looks like: x86_64 0 ALLOW x86_64 1 ALLOW x86_64 2 ALLOW x86_64 3 ALLOW [...] x86_64 132 ALLOW x86_64 133 ALLOW x86_64 134 FILTER x86_64 135 FILTER x86_64 136 FILTER x86_64 137 ALLOW x86_64 138 ALLOW x86_64 139 FILTER x86_64 140 ALLOW x86_64 141 ALLOW [...] This file is guarded by CONFIG_SECCOMP_CACHE_DEBUG with a default of N because I think certain users of seccomp might not want the application to know which syscalls are definitely usable. For the same reason, it is also guarded by CAP_SYS_ADMIN. Suggested-by:
Jann Horn <jannh@google.com> Link: https://lore.kernel.org/lkml/CAG48ez3Ofqp4crXGksLmZY6=fGrF_tWyUCg7PBkAetvbbOPeOA@mail.gmail.com/ Signed-off-by:
YiFei Zhu <yifeifz2@illinois.edu> Signed-off-by:
Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/94e663fa53136f5a11f432c661794d1ee7060779.1605101222.git.yifeifz2@illinois.edu
-
- 19 Nov, 2020 1 commit
-
-
Frederic Weisbecker authored
Historically, context tracking had to deal with fragile entry code path, ie: before user_exit() is called and after user_enter() is called, in case some of those spots would call schedule() or use RCU. On such cases, the site had to be protected between exception_enter() and exception_exit() that save the context tracking state in the task stack. Such sleepable fragile code path had many different origins: tracing, exceptions, early or late calls to context tracking on syscalls... Aside of that not being pretty, saving the context tracking state on the task stack forces us to run context tracking on all CPUs, including housekeepers, and prevents us to completely shutdown nohz_full at runtime on a CPU in the future as context tracking and its overhead would still need to run system wide. Now thanks to the extensive efforts to sanitize x86 entry code, those conditions have been removed and we can now get rid of these workarounds in this architecture. Create a Kconfig feature to express this achievement. Signed-off-by:
Frederic Weisbecker <frederic@kernel.org> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201117151637.259084-2-frederic@kernel.org
-
- 08 Oct, 2020 1 commit
-
-
YiFei Zhu authored
In order to make adding configurable features into seccomp easier, it's better to have the options at one single location, considering especially that the bulk of seccomp code is arch-independent. An quick look also show that many SECCOMP descriptions are outdated; they talk about /proc rather than prctl. As a result of moving the config option and keeping it default on, architectures arm, arm64, csky, riscv, sh, and xtensa did not have SECCOMP on by default prior to this and SECCOMP will be default in this change. Architectures microblaze, mips, powerpc, s390, sh, and sparc have an outdated depend on PROC_FS and this dependency is removed in this change. Suggested-by:
Jann Horn <jannh@google.com> Link: https://lore.kernel.org/lkml/CAG48ez1YWz9cnp08UZgeieYRhHdqh-ch7aNwc4JRBnGyrmgfMg@mail.gmail.com/ Signed-off-by:
YiFei Zhu <yifeifz2@illinois.edu> [kees: added HAVE_ARCH_SECCOMP help text, tweaked wording] Signed-off-by:
Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/9ede6ef35c847e58d61e476c6a39540520066613.1600951211.git.yifeifz2@illinois.edu
-
- 16 Sep, 2020 1 commit
-
-
Nicholas Piggin authored
Reading and modifying current->mm and current->active_mm and switching mm should be done with irqs off, to prevent races seeing an intermediate state. This is similar to commit 38cf307c ("mm: fix kthread_use_mm() vs TLB invalidate"). At exec-time when the new mm is activated, the old one should usually be single-threaded and no longer used, unless something else is holding an mm_users reference (which may be possible). Absent other mm_users, there is also a race with preemption and lazy tlb switching. Consider the kernel_execve case where the current thread is using a lazy tlb active mm: call_usermodehelper() kernel_execve() old_mm = current->mm; active_mm = current->active_mm; *** preempt *** --------------------> schedule() prev->active_mm = NULL; mmdrop(prev active_mm); ... <-------------------- schedule() current->mm = mm; current->active_mm = mm; if (!old_mm) mmdrop(active_mm); If we switch back to the kernel thread from a different mm, there is a double free of the old active_mm, and a missing free of the new one. Closing this race only requires interrupts to be disabled while ->mm and ->active_mm are being switched, but the TLB problem requires also holding interrupts off over activate_mm. Unfortunately not all archs can do that yet, e.g., arm defers the switch if irqs are disabled and expects finish_arch_post_lock_switch() to be called to complete the flush; um takes a blocking lock in activate_mm(). So as a first step, disable interrupts across the mm/active_mm updates to close the lazy tlb preempt race, and provide an arch option to extend that to activate_mm which allows architectures doing IPI based TLB shootdowns to close the second race. This is a bit ugly, but in the interest of fixing the bug and backporting before all architectures are converted this is a compromise. Signed-off-by:
Nicholas Piggin <npiggin@gmail.com> Acked-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200914045219.3736466-2-npiggin@gmail.com
-
- 09 Sep, 2020 1 commit
-
-
Christoph Hellwig authored
Add a CONFIG_SET_FS option that is selected by architecturess that implement set_fs, which is all of them initially. If the option is not set stubs for routines related to overriding the address space are provided so that architectures can start to opt out of providing set_fs. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Kees Cook <keescook@chromium.org> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- 01 Sep, 2020 3 commits
-
-
Peter Zijlstra authored
Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200818135804.922581202@infradead.org
-
Josh Poimboeuf authored
Add infrastructure for an arch-specific CONFIG_HAVE_STATIC_CALL_INLINE option, which is a faster version of CONFIG_HAVE_STATIC_CALL. At runtime, the static call sites are patched directly, rather than using the out-of-line trampolines. Compared to out-of-line static calls, the performance benefits are more modest, but still measurable. Steven Rostedt did some tracepoint measurements: https://lkml.kernel.org/r/20181126155405.72b4f718@gandalf.local.home This code is heavily inspired by the jump label code (aka "static jumps"), as some of the concepts are very similar. For more details, see the comments in include/linux/static_call.h. [peterz: simplified interface; merged trampolines] Signed-off-by:
Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Reviewed-by:
Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20200818135804.684334440@infradead.org
-
Josh Poimboeuf authored
Static calls are a replacement for global function pointers. They use code patching to allow direct calls to be used instead of indirect calls. They give the flexibility of function pointers, but with improved performance. This is especially important for cases where retpolines would otherwise be used, as retpolines can significantly impact performance. The concept and code are an extension of previous work done by Ard Biesheuvel and Steven Rostedt: https://lkml.kernel.org/r/20181005081333.15018-1-ard.biesheuvel@linaro.org https://lkml.kernel.org/r/20181006015110.653946300@goodmis.org There are two implementations, depending on arch support: 1) out-of-line: patched trampolines (CONFIG_HAVE_STATIC_CALL) 2) basic function pointers For more details, see the comments in include/linux/static_call.h. [peterz: simplified interface] Signed-off-by:
Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Reviewed-by:
Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20200818135804.623259796@infradead.org
-
- 06 Aug, 2020 1 commit
-
-
Sven Schnelle authored
The initial assumption that all VDSO related data can be completely generic does not hold. S390 needs architecture specific storage to access the clock steering information. Add struct arch_vdso_data to the vdso data struct. For architectures which do not need extra data this defaults to an empty struct. Architectures which require it, enable CONFIG_ARCH_HAS_VDSO_DATA and provide their specific struct in asm/vdso/data.h. Signed-off-by:
Sven Schnelle <svens@linux.ibm.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200804150124.41692-2-svens@linux.ibm.com
-
- 24 Jul, 2020 1 commit
-
-
Thomas Gleixner authored
On syscall entry certain work needs to be done: - Establish state (lockdep, context tracking, tracing) - Conditional work (ptrace, seccomp, audit...) This code is needlessly duplicated and different in all architectures. Provide a generic version based on the x86 implementation which has all the RCU and instrumentation bits right. As interrupt/exception entry from user space needs parts of the same functionality, provide a function for this as well. syscall_enter_from_user_mode() and irqentry_enter_from_user_mode() must be called right after the low level ASM entry. The calling code must be non-instrumentable. After the functions returns state is correct and the subsequent functions can be instrumented. Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Acked-by:
Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20200722220519.513463269@linutronix.de
-
- 07 Jul, 2020 1 commit
-
-
Masahiro Yamada authored
Some Makefiles already pass -fno-stack-protector unconditionally. For example, arch/arm64/kernel/vdso/Makefile, arch/x86/xen/Makefile. No problem report so far about hard-coding this option. So, we can assume all supported compilers know -fno-stack-protector. GCC 4.8 and Clang support this option (https://godbolt.org/z/_HDGzN ) Get rid of cc-option from -fno-stack-protector. Remove CONFIG_CC_HAS_STACKPROTECTOR_NONE, which is always 'y'. Note: arch/mips/vdso/Makefile adds -fno-stack-protector twice, first unconditionally, and second conditionally. I removed the second one. Signed-off-by:
Masahiro Yamada <masahiroy@kernel.org> Reviewed-by:
Kees Cook <keescook@chromium.org> Acked-by:
Ard Biesheuvel <ardb@kernel.org> Reviewed-by:
Nick Desaulniers <ndesaulniers@google.com>
-
- 04 Jul, 2020 1 commit
-
-
Christian Brauner authored
All architectures support copy_thread_tls() now, so remove the legacy copy_thread() function and the HAVE_COPY_THREAD_TLS config option. Everyone uses the same process creation calling convention based on copy_thread_tls() and struct kernel_clone_args. This will make it easier to maintain the core process creation code under kernel/, simplifies the callpaths and makes the identical for all architectures. Cc: linux-arch@vger.kernel.org Acked-by:
Thomas Bogendoerfer <tsbogend@alpha.franken.de> Acked-by:
Greentime Hu <green.hu@gmail.com> Acked-by:
Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by:
Kees Cook <keescook@chromium.org> Signed-off-by:
Christian Brauner <christian.brauner@ubuntu.com>
-
- 26 Jun, 2020 1 commit
-
-
Mauro Carvalho Chehab authored
There are a number of random documents that seem to be describing some aspects of the core-api. Move them to such directory, adding them at the core-api/index.rst file. Signed-off-by:
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Link: https://lore.kernel.org/r/86d979ed183adb76af93a92f20189bccf97f0055.1592918949.git.mchehab+huawei@kernel.org Signed-off-by:
Jonathan Corbet <corbet@lwn.net>
-
- 13 Jun, 2020 1 commit
-
-
Masahiro Yamada authored
Since commit 84af7a61 ("checkpatch: kconfig: prefer 'help' over '---help---'"), the number of '---help---' has been gradually decreasing, but there are still more than 2400 instances. This commit finishes the conversion. While I touched the lines, I also fixed the indentation. There are a variety of indentation styles found. a) 4 spaces + '---help---' b) 7 spaces + '---help---' c) 8 spaces + '---help---' d) 1 space + 1 tab + '---help---' e) 1 tab + '---help---' (correct indentation) f) 1 tab + 1 space + '---help---' g) 1 tab + 2 spaces + '---help---' In order to convert all of them to 1 tab + 'help', I ran the following commend: $ find . -name 'Kconfig*' | xargs sed -i 's/^[[:space:]]*---help---/\thelp/' Signed-off-by:
Masahiro Yamada <masahiroy@kernel.org>
-
- 18 May, 2020 1 commit
-
-
Will Deacon authored
asm/scs.h is no longer needed by the core code, so remove a redundant header inclusion and update the stale Kconfig text. Tested-by:
Sami Tolvanen <samitolvanen@google.com> Reviewed-by:
Mark Rutland <mark.rutland@arm.com> Signed-off-by:
Will Deacon <will@kernel.org>
-
- 15 May, 2020 2 commits
-
-
Sami Tolvanen authored
The graph tracer hooks returns by modifying frame records on the (regular) stack, but with SCS the return address is taken from the shadow stack, and the value in the frame record has no effect. As we don't currently have a mechanism to determine the corresponding slot on the shadow stack (and to pass this through the ftrace infrastructure), for now let's disable SCS when the graph tracer is enabled. With SCS the return address is taken from the shadow stack and the value in the frame record has no effect. The mcount based graph tracer hooks returns by modifying frame records on the (regular) stack, and thus is not compatible. The patchable-function-entry graph tracer used for DYNAMIC_FTRACE_WITH_REGS modifies the LR before it is saved to the shadow stack, and is compatible. Modifying the mcount based graph tracer to work with SCS would require a mechanism to determine the corresponding slot on the shadow stack (and to pass this through the ftrace infrastructure), and we expect that everyone will eventually move to the patchable-function-entry based graph tracer anyway, so for now let's disable SCS when the mcount-based graph tracer is enabled. SCS and patchable-function-entry are both supported from LLVM 10.x. Signed-off-by:
Sami Tolvanen <samitolvanen@google.com> Reviewed-by:
Kees Cook <keescook@chromium.org> Reviewed-by:
Mark Rutland <mark.rutland@arm.com> Signed-off-by:
Will Deacon <will@kernel.org>
-
Sami Tolvanen authored
This change adds generic support for Clang's Shadow Call Stack, which uses a shadow stack to protect return addresses from being overwritten by an attacker. Details are available here: https://clang.llvm.org/docs/ShadowCallStack.html Note that security guarantees in the kernel differ from the ones documented for user space. The kernel must store addresses of shadow stacks in memory, which means an attacker capable reading and writing arbitrary memory may be able to locate them and hijack control flow by modifying the stacks. Signed-off-by:
Sami Tolvanen <samitolvanen@google.com> Reviewed-by:
Kees Cook <keescook@chromium.org> Reviewed-by:
Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> [will: Numerous cosmetic changes] Signed-off-by:
Will Deacon <will@kernel.org>
-
- 13 May, 2020 1 commit
-
-
Stephen Boyd authored
The implementation of 'struct clk' is not really an architectual detail anymore now that most architectures have migrated to the common clk framework. To sway new architecture ports away from trying to implement their own 'struct clk', move the config next to the common clk framework config. Cc: Russell King <linux@armlinux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by:
Stephen Boyd <sboyd@kernel.org> Link: https://lkml.kernel.org/r/20200409064416.83340-11-sboyd@kernel.org Reviewed-by:
Arnd Bergmann <arnd@arndb.de>
-
- 16 Mar, 2020 3 commits
-
-
Christoph Hellwig authored
This allows the arch code to reset the page tables to cached access when freeing a dma coherent allocation that was set to uncached using arch_dma_set_uncached. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Robin Murphy <robin.murphy@arm.com>
-
Christoph Hellwig authored
Rename the symbol to arch_dma_set_uncached, and pass a size to it as well as allow an error return. That will allow reusing this hook for in-place pagetable remapping. As the in-place remap doesn't always require an explicit cache flush, also detangle ARCH_HAS_DMA_PREP_COHERENT from ARCH_HAS_DMA_SET_UNCACHED. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Robin Murphy <robin.murphy@arm.com>
-
Christoph Hellwig authored
dma-direct now finds the kernel address for coherent allocations based on the dma address, so the cached_kernel_address hooks is unused and can be removed entirely. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Robin Murphy <robin.murphy@arm.com>
-
- 06 Mar, 2020 1 commit
-
-
Miroslav Benes authored
save_stack_trace_tsk_reliable() is not the only function providing the reliable stack traces anymore. Architecture might define ARCH_STACKWALK which provides a newer stack walking interface and has arch_stack_walk_reliable() function. Update the description accordingly. Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Miroslav Benes <mbenes@suse.cz> Acked-by:
Josh Poimboeuf <jpoimboe@redhat.com> Link: http://lkml.kernel.org/r/20200120154042.9934-1-mbenes@suse.cz Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 14 Feb, 2020 1 commit
-
-
Frederic Weisbecker authored
A few archs (x86, arm, arm64) don't rely anymore on TIF_NOHZ to call into context tracking on user entry/exit but instead use static keys (or not) to optimize those calls. Ideally every arch should migrate to that behaviour in the long run. Settle a config option to let those archs remove their TIF_NOHZ definitions. Signed-off-by:
Frederic Weisbecker <frederic@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Burton <paulburton@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: David S. Miller <davem@davemloft.net>
-
- 04 Feb, 2020 1 commit
-
-
Peter Zijlstra authored
As described in the comment, the correct order for freeing pages is: 1) unhook page 2) TLB invalidate page 3) free page This order equally applies to page directories. Currently there are two correct options: - use tlb_remove_page(), when all page directores are full pages and there are no futher contraints placed by things like software walkers (HAVE_FAST_GUP). - use MMU_GATHER_RCU_TABLE_FREE and tlb_remove_table() when the architecture does not do IPI based TLB invalidate and has HAVE_FAST_GUP (or software TLB fill). This however leaves architectures that don't have page based directories but don't need RCU in a bind. For those, provide MMU_GATHER_TABLE_FREE, which provides the independent batching for directories without the additional RCU freeing. Link: http://lkml.kernel.org/r/20200116064531.483522-10-aneesh.kumar@linux.ibm.com Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by:
Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-