1. 03 Oct, 2022 40 commits
    • Alexander Potapenko's avatar
      mm: kmsan: maintain KMSAN metadata for page operations · b073d7f8
      Alexander Potapenko authored
      Insert KMSAN hooks that make the necessary bookkeeping changes:
       - poison page shadow and origins in alloc_pages()/free_page();
       - clear page shadow and origins in clear_page(), copy_user_highpage();
       - copy page metadata in copy_highpage(), wp_page_copy();
       - handle vmap()/vunmap()/iounmap();
      
      Link: https://lkml.kernel.org/r/20220915150417.722975-15-glider@google.comSigned-off-by: default avatarAlexander Potapenko <glider@google.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Biggers <ebiggers@google.com>
      Cc: Eric Biggers <ebiggers@kernel.org>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Ilya Leoshkevich <iii@linux.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Marco Elver <elver@google.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vegard Nossum <vegard.nossum@oracle.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      b073d7f8
    • Alexander Potapenko's avatar
      MAINTAINERS: add entry for KMSAN · d596b04f
      Alexander Potapenko authored
      Add entry for KMSAN maintainers/reviewers.
      
      Link: https://lkml.kernel.org/r/20220915150417.722975-14-glider@google.comSigned-off-by: default avatarAlexander Potapenko <glider@google.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Biggers <ebiggers@google.com>
      Cc: Eric Biggers <ebiggers@kernel.org>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Ilya Leoshkevich <iii@linux.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Marco Elver <elver@google.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vegard Nossum <vegard.nossum@oracle.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      d596b04f
    • Alexander Potapenko's avatar
      kmsan: disable instrumentation of unsupported common kernel code · 79dbd006
      Alexander Potapenko authored
      EFI stub cannot be linked with KMSAN runtime, so we disable
      instrumentation for it.
      
      Instrumenting kcov, stackdepot or lockdep leads to infinite recursion
      caused by instrumentation hooks calling instrumented code again.
      
      Link: https://lkml.kernel.org/r/20220915150417.722975-13-glider@google.comSigned-off-by: default avatarAlexander Potapenko <glider@google.com>
      Reviewed-by: default avatarMarco Elver <elver@google.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Biggers <ebiggers@google.com>
      Cc: Eric Biggers <ebiggers@kernel.org>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Ilya Leoshkevich <iii@linux.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vegard Nossum <vegard.nossum@oracle.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      79dbd006
    • Alexander Potapenko's avatar
      kmsan: add KMSAN runtime core · f80be457
      Alexander Potapenko authored
      For each memory location KernelMemorySanitizer maintains two types of
      metadata:
      
      1. The so-called shadow of that location - а byte:byte mapping describing
         whether or not individual bits of memory are initialized (shadow is 0)
         or not (shadow is 1).
      2. The origins of that location - а 4-byte:4-byte mapping containing
         4-byte IDs of the stack traces where uninitialized values were
         created.
      
      Each struct page now contains pointers to two struct pages holding KMSAN
      metadata (shadow and origins) for the original struct page.  Utility
      routines in mm/kmsan/core.c and mm/kmsan/shadow.c handle the metadata
      creation, addressing, copying and checking.  mm/kmsan/report.c performs
      error reporting in the cases an uninitialized value is used in a way that
      leads to undefined behavior.
      
      KMSAN compiler instrumentation is responsible for tracking the metadata
      along with the kernel memory.  mm/kmsan/instrumentation.c provides the
      implementation for instrumentation hooks that are called from files
      compiled with -fsanitize=kernel-memory.
      
      To aid parameter passing (also done at instrumentation level), each
      task_struct now contains a struct kmsan_task_state used to track the
      metadata of function parameters and return values for that task.
      
      Finally, this patch provides CONFIG_KMSAN that enables KMSAN, and declares
      CFLAGS_KMSAN, which are applied to files compiled with KMSAN.  The
      KMSAN_SANITIZE:=n Makefile directive can be used to completely disable
      KMSAN instrumentation for certain files.
      
      Similarly, KMSAN_ENABLE_CHECKS:=n disables KMSAN checks and makes newly
      created stack memory initialized.
      
      Users can also use functions from include/linux/kmsan-checks.h to mark
      certain memory regions as uninitialized or initialized (this is called
      "poisoning" and "unpoisoning") or check that a particular region is
      initialized.
      
      Link: https://lkml.kernel.org/r/20220915150417.722975-12-glider@google.comSigned-off-by: default avatarAlexander Potapenko <glider@google.com>
      Acked-by: default avatarMarco Elver <elver@google.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Biggers <ebiggers@google.com>
      Cc: Eric Biggers <ebiggers@kernel.org>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Ilya Leoshkevich <iii@linux.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vegard Nossum <vegard.nossum@oracle.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f80be457
    • Alexander Potapenko's avatar
      libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE · 6e9f05dc
      Alexander Potapenko authored
      KMSAN adds extra metadata fields to struct page, so it does not fit into
      64 bytes anymore.
      
      This change leads to increased memory consumption of the nvdimm driver,
      regardless of whether the kernel is built with KMSAN or not.
      
      Link: https://lkml.kernel.org/r/20220915150417.722975-11-glider@google.comSigned-off-by: default avatarAlexander Potapenko <glider@google.com>
      Reviewed-by: default avatarMarco Elver <elver@google.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Biggers <ebiggers@google.com>
      Cc: Eric Biggers <ebiggers@kernel.org>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Ilya Leoshkevich <iii@linux.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vegard Nossum <vegard.nossum@oracle.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      6e9f05dc
    • Alexander Potapenko's avatar
      x86: kmsan: pgtable: reduce vmalloc space · 1a167ddd
      Alexander Potapenko authored
      KMSAN is going to use 3/4 of existing vmalloc space to hold the metadata,
      therefore we lower VMALLOC_END to make sure vmalloc() doesn't allocate
      past the first 1/4.
      
      Link: https://lkml.kernel.org/r/20220915150417.722975-10-glider@google.comSigned-off-by: default avatarAlexander Potapenko <glider@google.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Biggers <ebiggers@google.com>
      Cc: Eric Biggers <ebiggers@kernel.org>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Ilya Leoshkevich <iii@linux.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Marco Elver <elver@google.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vegard Nossum <vegard.nossum@oracle.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      1a167ddd
    • Alexander Potapenko's avatar
      kmsan: mark noinstr as __no_sanitize_memory · 5de0ce85
      Alexander Potapenko authored
      noinstr functions should never be instrumented, so make KMSAN skip them by
      applying the __no_sanitize_memory attribute.
      
      Link: https://lkml.kernel.org/r/20220915150417.722975-9-glider@google.comSigned-off-by: default avatarAlexander Potapenko <glider@google.com>
      Reviewed-by: default avatarMarco Elver <elver@google.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Biggers <ebiggers@google.com>
      Cc: Eric Biggers <ebiggers@kernel.org>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Ilya Leoshkevich <iii@linux.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vegard Nossum <vegard.nossum@oracle.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      5de0ce85
    • Alexander Potapenko's avatar
      kmsan: introduce __no_sanitize_memory and __no_kmsan_checks · 9b448bc2
      Alexander Potapenko authored
      __no_sanitize_memory is a function attribute that instructs KMSAN to skip
      a function during instrumentation.  This is needed to e.g.  implement the
      noinstr functions.
      
      __no_kmsan_checks is a function attribute that makes KMSAN ignore the
      uninitialized values coming from the function's inputs, and initialize the
      function's outputs.
      
      Functions marked with this attribute can't be inlined into functions not
      marked with it, and vice versa.  This behavior is overridden by
      __always_inline.
      
      __SANITIZE_MEMORY__ is a macro that's defined iff the file is instrumented
      with KMSAN.  This is not the same as CONFIG_KMSAN, which is defined for
      every file.
      
      Link: https://lkml.kernel.org/r/20220915150417.722975-8-glider@google.comSigned-off-by: default avatarAlexander Potapenko <glider@google.com>
      Reviewed-by: default avatarMarco Elver <elver@google.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Biggers <ebiggers@google.com>
      Cc: Eric Biggers <ebiggers@kernel.org>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Ilya Leoshkevich <iii@linux.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vegard Nossum <vegard.nossum@oracle.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      9b448bc2
    • Alexander Potapenko's avatar
      kmsan: add ReST documentation · 93858ae7
      Alexander Potapenko authored
      Add Documentation/dev-tools/kmsan.rst and reference it in the dev-tools
      index.
      
      Link: https://lkml.kernel.org/r/20220915150417.722975-7-glider@google.comSigned-off-by: default avatarAlexander Potapenko <glider@google.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Biggers <ebiggers@google.com>
      Cc: Eric Biggers <ebiggers@kernel.org>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Ilya Leoshkevich <iii@linux.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Marco Elver <elver@google.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vegard Nossum <vegard.nossum@oracle.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      93858ae7
    • Alexander Potapenko's avatar
      asm-generic: instrument usercopy in cacheflush.h · 2b420aaf
      Alexander Potapenko authored
      Notify memory tools about usercopy events in copy_to_user_page() and
      copy_from_user_page().
      
      Link: https://lkml.kernel.org/r/20220915150417.722975-6-glider@google.comSigned-off-by: default avatarAlexander Potapenko <glider@google.com>
      Reviewed-by: default avatarMarco Elver <elver@google.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Biggers <ebiggers@google.com>
      Cc: Eric Biggers <ebiggers@kernel.org>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Ilya Leoshkevich <iii@linux.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vegard Nossum <vegard.nossum@oracle.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      2b420aaf
    • Alexander Potapenko's avatar
      x86: asm: instrument usercopy in get_user() and put_user() · 888f84a6
      Alexander Potapenko authored
      Use hooks from instrumented.h to notify bug detection tools about usercopy
      events in variations of get_user() and put_user().
      
      Link: https://lkml.kernel.org/r/20220915150417.722975-5-glider@google.comSigned-off-by: default avatarAlexander Potapenko <glider@google.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Biggers <ebiggers@google.com>
      Cc: Eric Biggers <ebiggers@kernel.org>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Ilya Leoshkevich <iii@linux.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Marco Elver <elver@google.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vegard Nossum <vegard.nossum@oracle.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      888f84a6
    • Alexander Potapenko's avatar
      instrumented.h: allow instrumenting both sides of copy_from_user() · 33b75c1d
      Alexander Potapenko authored
      Introduce instrument_copy_from_user_before() and
      instrument_copy_from_user_after() hooks to be invoked before and after the
      call to copy_from_user().
      
      KASAN and KCSAN will be only using instrument_copy_from_user_before(), but
      for KMSAN we'll need to insert code after copy_from_user().
      
      Link: https://lkml.kernel.org/r/20220915150417.722975-4-glider@google.comSigned-off-by: default avatarAlexander Potapenko <glider@google.com>
      Reviewed-by: default avatarMarco Elver <elver@google.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Biggers <ebiggers@google.com>
      Cc: Eric Biggers <ebiggers@kernel.org>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Ilya Leoshkevich <iii@linux.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vegard Nossum <vegard.nossum@oracle.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      33b75c1d
    • Alexander Potapenko's avatar
      stackdepot: reserve 5 extra bits in depot_stack_handle_t · 83a4f1ef
      Alexander Potapenko authored
      Some users (currently only KMSAN) may want to use spare bits in
      depot_stack_handle_t.  Let them do so by adding @extra_bits to
      __stack_depot_save() to store arbitrary flags, and providing
      stack_depot_get_extra_bits() to retrieve those flags.
      
      Also adapt KASAN to the new prototype by passing extra_bits=0, as KASAN
      does not intend to store additional information in the stack handle.
      
      Link: https://lkml.kernel.org/r/20220915150417.722975-3-glider@google.comSigned-off-by: default avatarAlexander Potapenko <glider@google.com>
      Reviewed-by: default avatarMarco Elver <elver@google.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Biggers <ebiggers@google.com>
      Cc: Eric Biggers <ebiggers@kernel.org>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Ilya Leoshkevich <iii@linux.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vegard Nossum <vegard.nossum@oracle.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      83a4f1ef
    • Dmitry Vyukov's avatar
      x86: add missing include to sparsemem.h · e41e614f
      Dmitry Vyukov authored
      Patch series "Add KernelMemorySanitizer infrastructure", v7.
      
      KernelMemorySanitizer (KMSAN) is a detector of errors related to uses of
      uninitialized memory.  It relies on compile-time Clang instrumentation
      (similar to MSan in the userspace [1]) and tracks the state of every bit
      of kernel memory, being able to report an error if uninitialized value is
      used in a condition, dereferenced, or escapes to userspace, USB or DMA.
      
      KMSAN has reported more than 300 bugs in the past few years (recently
      fixed bugs: [2]), most of them with the help of syzkaller.  Such bugs keep
      getting introduced into the kernel despite new compiler warnings and other
      analyses (the 6.0 cycle already resulted in several KMSAN-reported bugs,
      e.g.  [3]).  Mitigations like total stack and heap initialization are
      unfortunately very far from being deployable.
      
      The proposed patchset contains KMSAN runtime implementation together with
      small changes to other subsystems needed to make KMSAN work.
      
      The latter changes fall into several categories:
      
      1. Changes and refactorings of existing code required to add KMSAN:
       - [01/43] x86: add missing include to sparsemem.h
       - [02/43] stackdepot: reserve 5 extra bits in depot_stack_handle_t
       - [03/43] instrumented.h: allow instrumenting both sides of copy_from_user()
       - [04/43] x86: asm: instrument usercopy in get_user() and __put_user_size()
       - [05/43] asm-generic: instrument usercopy in cacheflush.h
       - [10/43] libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE
      
      2. KMSAN-related declarations in generic code, KMSAN runtime library,
         docs and configs:
       - [06/43] kmsan: add ReST documentation
       - [07/43] kmsan: introduce __no_sanitize_memory and __no_kmsan_checks
       - [09/43] x86: kmsan: pgtable: reduce vmalloc space
       - [11/43] kmsan: add KMSAN runtime core
       - [13/43] MAINTAINERS: add entry for KMSAN
       - [24/43] kmsan: add tests for KMSAN
       - [31/43] objtool: kmsan: list KMSAN API functions as uaccess-safe
       - [35/43] x86: kmsan: use __msan_ string functions where possible
       - [43/43] x86: kmsan: enable KMSAN builds for x86
      
      3. Adding hooks from different subsystems to notify KMSAN about memory
         state changes:
       - [14/43] mm: kmsan: maintain KMSAN metadata for page
       - [15/43] mm: kmsan: call KMSAN hooks from SLUB code
       - [16/43] kmsan: handle task creation and exiting
       - [17/43] init: kmsan: call KMSAN initialization routines
       - [18/43] instrumented.h: add KMSAN support
       - [19/43] kmsan: add iomap support
       - [20/43] Input: libps2: mark data received in __ps2_command() as initialized
       - [21/43] dma: kmsan: unpoison DMA mappings
       - [34/43] x86: kmsan: handle open-coded assembly in lib/iomem.c
       - [36/43] x86: kmsan: sync metadata pages on page fault
      
      4. Changes that prevent false reports by explicitly initializing memory,
         disabling optimized code that may trick KMSAN, selectively skipping
         instrumentation:
       - [08/43] kmsan: mark noinstr as __no_sanitize_memory
       - [12/43] kmsan: disable instrumentation of unsupported common kernel code
       - [22/43] virtio: kmsan: check/unpoison scatterlist in vring_map_one_sg()
       - [23/43] kmsan: handle memory sent to/from USB
       - [25/43] kmsan: disable strscpy() optimization under KMSAN
       - [26/43] crypto: kmsan: disable accelerated configs under KMSAN
       - [27/43] kmsan: disable physical page merging in biovec
       - [28/43] block: kmsan: skip bio block merging logic for KMSAN
       - [29/43] kcov: kmsan: unpoison area->list in kcov_remote_area_put()
       - [30/43] security: kmsan: fix interoperability with auto-initialization
       - [32/43] x86: kmsan: disable instrumentation of unsupported code
       - [33/43] x86: kmsan: skip shadow checks in __switch_to()
       - [37/43] x86: kasan: kmsan: support CONFIG_GENERIC_CSUM on x86, enable it for KASAN/KMSAN
       - [38/43] x86: fs: kmsan: disable CONFIG_DCACHE_WORD_ACCESS
       - [39/43] x86: kmsan: don't instrument stack walking functions
       - [40/43] entry: kmsan: introduce kmsan_unpoison_entry_regs()
      
      5. Fixes for bugs detected with CONFIG_KMSAN_CHECK_PARAM_RETVAL:
       - [41/43] bpf: kmsan: initialize BPF registers with zeroes
       - [42/43] mm: fs: initialize fsdata passed to write_begin/write_end interface
      
      This patchset allows one to boot and run a defconfig+KMSAN kernel on a
      QEMU without known false positives.  It however doesn't guarantee there
      are no false positives in drivers of certain devices or less tested
      subsystems, although KMSAN is actively tested on syzbot with a large
      config.
      
      By default, KMSAN enforces conservative checks of most kernel function
      parameters passed by value (via CONFIG_KMSAN_CHECK_PARAM_RETVAL, which
      maps to the -fsanitize-memory-param-retval compiler flag).  As discussed
      in [4] and [5], passing uninitialized values as function parameters is
      considered undefined behavior, therefore KMSAN now reports such cases as
      errors.  Several newly added patches fix known manifestations of these
      errors.
      
      
      This patch (of 43):
      
      Including sparsemem.h from other files (e.g.  transitively via
      asm/pgtable_64_types.h) results in compilation errors due to unknown
      types:
      
      sparsemem.h:34:32: error: unknown type name 'phys_addr_t'
      extern int phys_to_target_node(phys_addr_t start);
                                     ^
      sparsemem.h:36:39: error: unknown type name 'u64'
      extern int memory_add_physaddr_to_nid(u64 start);
                                            ^
      
      Fix these errors by including linux/types.h from sparsemem.h This is
      required for the upcoming KMSAN patches.
      
      Link: https://lkml.kernel.org/r/20220915150417.722975-1-glider@google.com
      Link: https://lkml.kernel.org/r/20220915150417.722975-2-glider@google.comSigned-off-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarAlexander Potapenko <glider@google.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Eric Biggers <ebiggers@kernel.org>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Ilya Leoshkevich <iii@linux.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Marco Elver <elver@google.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vegard Nossum <vegard.nossum@oracle.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Eric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      e41e614f
    • Mike Kravetz's avatar
      hugetlb: clean up code checking for fault/truncation races · fa27759a
      Mike Kravetz authored
      With the new hugetlb vma lock in place, it can also be used to handle page
      fault races with file truncation.  The lock is taken at the beginning of
      the code fault path in read mode.  During truncation, it is taken in write
      mode for each vma which has the file mapped.  The file's size (i_size) is
      modified before taking the vma lock to unmap.
      
      How are races handled?
      
      The page fault code checks i_size early in processing after taking the vma
      lock.  If the fault is beyond i_size, the fault is aborted.  If the fault
      is not beyond i_size the fault will continue and a new page will be added
      to the file.  It could be that truncation code modifies i_size after the
      check in fault code.  That is OK, as truncation code will soon remove the
      page.  The truncation code will wait until the fault is finished, as it
      must obtain the vma lock in write mode.
      
      This patch cleans up/removes late checks in the fault paths that try to
      back out pages racing with truncation.  As noted above, we just let the
      truncation code remove the pages.
      
      [mike.kravetz@oracle.com: fix reserve_alloc set but not used compiler warning]
        Link: https://lkml.kernel.org/r/Yyj7HsJWfHDoU24U@monkey
      Link: https://lkml.kernel.org/r/20220914221810.95771-10-mike.kravetz@oracle.comSigned-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: James Houghton <jthoughton@google.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mina Almasry <almasrymina@google.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
      Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Prakash Sangappa <prakash.sangappa@oracle.com>
      Cc: Sven Schnelle <svens@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      fa27759a
    • Mike Kravetz's avatar
      hugetlb: use new vma_lock for pmd sharing synchronization · 40549ba8
      Mike Kravetz authored
      The new hugetlb vma lock is used to address this race:
      
      Faulting thread                                 Unsharing thread
      ...                                                  ...
      ptep = huge_pte_offset()
            or
      ptep = huge_pte_alloc()
      ...
                                                      i_mmap_lock_write
                                                      lock page table
      ptep invalid   <------------------------        huge_pmd_unshare()
      Could be in a previously                        unlock_page_table
      sharing process or worse                        i_mmap_unlock_write
      ...
      
      The vma_lock is used as follows:
      - During fault processing. The lock is acquired in read mode before
        doing a page table lock and allocation (huge_pte_alloc).  The lock is
        held until code is finished with the page table entry (ptep).
      - The lock must be held in write mode whenever huge_pmd_unshare is
        called.
      
      Lock ordering issues come into play when unmapping a page from all
      vmas mapping the page.  The i_mmap_rwsem must be held to search for the
      vmas, and the vma lock must be held before calling unmap which will
      call huge_pmd_unshare.  This is done today in:
      - try_to_migrate_one and try_to_unmap_ for page migration and memory
        error handling.  In these routines we 'try' to obtain the vma lock and
        fail to unmap if unsuccessful.  Calling routines already deal with the
        failure of unmapping.
      - hugetlb_vmdelete_list for truncation and hole punch.  This routine
        also tries to acquire the vma lock.  If it fails, it skips the
        unmapping.  However, we can not have file truncation or hole punch
        fail because of contention.  After hugetlb_vmdelete_list, truncation
        and hole punch call remove_inode_hugepages.  remove_inode_hugepages
        checks for mapped pages and call hugetlb_unmap_file_page to unmap them.
        hugetlb_unmap_file_page is designed to drop locks and reacquire in the
        correct order to guarantee unmap success.
      
      Link: https://lkml.kernel.org/r/20220914221810.95771-9-mike.kravetz@oracle.comSigned-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: James Houghton <jthoughton@google.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mina Almasry <almasrymina@google.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
      Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Prakash Sangappa <prakash.sangappa@oracle.com>
      Cc: Sven Schnelle <svens@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      40549ba8
    • Mike Kravetz's avatar
      hugetlb: create hugetlb_unmap_file_folio to unmap single file folio · 378397cc
      Mike Kravetz authored
      Create the new routine hugetlb_unmap_file_folio that will unmap a single
      file folio.  This is refactored code from hugetlb_vmdelete_list.  It is
      modified to do locking within the routine itself and check whether the
      page is mapped within a specific vma before unmapping.
      
      This refactoring will be put to use and expanded upon in a subsequent
      patch adding vma specific locking.
      
      Link: https://lkml.kernel.org/r/20220914221810.95771-8-mike.kravetz@oracle.comSigned-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: James Houghton <jthoughton@google.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mina Almasry <almasrymina@google.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
      Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Prakash Sangappa <prakash.sangappa@oracle.com>
      Cc: Sven Schnelle <svens@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      378397cc
    • Mike Kravetz's avatar
      hugetlb: add vma based lock for pmd sharing · 8d9bfb26
      Mike Kravetz authored
      Allocate a new hugetlb_vma_lock structure and hang off vm_private_data for
      synchronization use by vmas that could be involved in pmd sharing.  This
      data structure contains a rw semaphore that is the primary tool used for
      synchronization.
      
      This new structure is ref counted, so that it can exist when NOT attached
      to a vma.  This is only helpful in resolving lock ordering issues where
      code may need to obtain the vma_lock while there are no guarantees the vma
      may go away.  By obtaining a ref on the structure, it can be guaranteed
      that at least the rw semaphore will not go away.
      
      Only add infrastructure for the new lock here.  Actual use will be added
      in subsequent patches.
      
      [mike.kravetz@oracle.com: fix build issue for missing hugetlb_vma_lock_release]
        Link: https://lkml.kernel.org/r/YyNUtA1vRASOE4+M@monkey
      Link: https://lkml.kernel.org/r/20220914221810.95771-7-mike.kravetz@oracle.comSigned-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: James Houghton <jthoughton@google.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mina Almasry <almasrymina@google.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
      Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Prakash Sangappa <prakash.sangappa@oracle.com>
      Cc: Sven Schnelle <svens@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      8d9bfb26
    • Mike Kravetz's avatar
      hugetlb: rename vma_shareable() and refactor code · 12710fd6
      Mike Kravetz authored
      Rename the routine vma_shareable to vma_addr_pmd_shareable as it is
      checking a specific address within the vma.  Refactor code to check if an
      aligned range is shareable as this will be needed in a subsequent patch.
      
      Link: https://lkml.kernel.org/r/20220914221810.95771-6-mike.kravetz@oracle.comSigned-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: James Houghton <jthoughton@google.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mina Almasry <almasrymina@google.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
      Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Prakash Sangappa <prakash.sangappa@oracle.com>
      Cc: Sven Schnelle <svens@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      12710fd6
    • Mike Kravetz's avatar
      hugetlb: create remove_inode_single_folio to remove single file folio · c8627228
      Mike Kravetz authored
      Create the new routine remove_inode_single_folio that will remove a single
      folio from a file.  This is refactored code from remove_inode_hugepages. 
      It checks for the uncommon case in which the folio is still mapped and
      unmaps.
      
      No functional change.  This refactoring will be put to use and expanded
      upon in a subsequent patches.
      
      Link: https://lkml.kernel.org/r/20220914221810.95771-5-mike.kravetz@oracle.comSigned-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: James Houghton <jthoughton@google.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mina Almasry <almasrymina@google.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
      Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Prakash Sangappa <prakash.sangappa@oracle.com>
      Cc: Sven Schnelle <svens@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      c8627228
    • Mike Kravetz's avatar
      hugetlb: rename remove_huge_page to hugetlb_delete_from_page_cache · 7e1813d4
      Mike Kravetz authored
      remove_huge_page removes a hugetlb page from the page cache.  Change to
      hugetlb_delete_from_page_cache as it is a more descriptive name. 
      huge_add_to_page_cache is global in scope, but only deals with hugetlb
      pages.  For consistency and clarity, rename to hugetlb_add_to_page_cache.
      
      Link: https://lkml.kernel.org/r/20220914221810.95771-4-mike.kravetz@oracle.comSigned-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: James Houghton <jthoughton@google.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mina Almasry <almasrymina@google.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
      Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Prakash Sangappa <prakash.sangappa@oracle.com>
      Cc: Sven Schnelle <svens@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      7e1813d4
    • Mike Kravetz's avatar
      hugetlbfs: revert use i_mmap_rwsem for more pmd sharing synchronization · 3a47c54f
      Mike Kravetz authored
      Commit c0d0381a ("hugetlbfs: use i_mmap_rwsem for more pmd sharing
      synchronization") added code to take i_mmap_rwsem in read mode for the
      duration of fault processing.  However, this has been shown to cause
      performance/scaling issues.  Revert the code and go back to only taking
      the semaphore in huge_pmd_share during the fault path.
      
      Keep the code that takes i_mmap_rwsem in write mode before calling
      try_to_unmap as this is required if huge_pmd_unshare is called.
      
      NOTE: Reverting this code does expose the following race condition.
      
      Faulting thread                                 Unsharing thread
      ...                                                  ...
      ptep = huge_pte_offset()
            or
      ptep = huge_pte_alloc()
      ...
                                                      i_mmap_lock_write
                                                      lock page table
      ptep invalid   <------------------------        huge_pmd_unshare()
      Could be in a previously                        unlock_page_table
      sharing process or worse                        i_mmap_unlock_write
      ...
      ptl = huge_pte_lock(ptep)
      get/update pte
      set_pte_at(pte, ptep)
      
      It is unknown if the above race was ever experienced by a user.  It was
      discovered via code inspection when initially addressed.
      
      In subsequent patches, a new synchronization mechanism will be added to
      coordinate pmd sharing and eliminate this race.
      
      Link: https://lkml.kernel.org/r/20220914221810.95771-3-mike.kravetz@oracle.comSigned-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: James Houghton <jthoughton@google.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mina Almasry <almasrymina@google.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
      Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Prakash Sangappa <prakash.sangappa@oracle.com>
      Cc: Sven Schnelle <svens@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      3a47c54f
    • Mike Kravetz's avatar
      hugetlbfs: revert use i_mmap_rwsem to address page fault/truncate race · 188a3972
      Mike Kravetz authored
      Patch series "hugetlb: Use new vma lock for huge pmd sharing
      synchronization", v2.
      
      hugetlb fault scalability regressions have recently been reported [1]. 
      This is not the first such report, as regressions were also noted when
      commit c0d0381a ("hugetlbfs: use i_mmap_rwsem for more pmd sharing
      synchronization") was added [2] in v5.7.  At that time, a proposal to
      address the regression was suggested [3] but went nowhere.
      
      The regression and benefit of this patch series is not evident when
      using the vm_scalability benchmark reported in [2] on a recent kernel.
      Results from running,
      "./usemem -n 48 --prealloc --prefault -O -U 3448054972"
      
      			48 sample Avg
      next-20220913		next-20220913			next-20220913
      unmodified	revert i_mmap_sema locking	vma sema locking, this series
      -----------------------------------------------------------------------------
      498150 KB/s		501934 KB/s			504793 KB/s
      
      The recent regression report [1] notes page fault and fork latency of
      shared hugetlb mappings.  To measure this, I created two simple programs:
      1) map a shared hugetlb area, write fault all pages, unmap area
         Do this in a continuous loop to measure faults per second
      2) map a shared hugetlb area, write fault a few pages, fork and exit
         Do this in a continuous loop to measure forks per second
      These programs were run on a 48 CPU VM with 320GB memory.  The shared
      mapping size was 250GB.  For comparison, a single instance of the program
      was run.  Then, multiple instances were run in parallel to introduce
      lock contention.  Changing the locking scheme results in a significant
      performance benefit.
      
      test		instances	unmodified	revert		vma
      --------------------------------------------------------------------------
      faults per sec	1		393043		395680		389932
      faults per sec  24		 71405		 81191		 79048
      forks per sec   1		  2802		  2747		  2725
      forks per sec   24		   439		   536		   500
      Combined faults 24		  1621		 68070		 53662
      Combined forks  24		   358		    67		   142
      
      Combined test is when running both faulting program and forking program
      simultaneously.
      
      Patches 1 and 2 of this series revert c0d0381a and 87bf91d3 which
      depends on c0d0381a.  Acquisition of i_mmap_rwsem is still required in
      the fault path to establish pmd sharing, so this is moved back to
      huge_pmd_share.  With c0d0381a reverted, this race is exposed:
      
      Faulting thread                                 Unsharing thread
      ...                                                  ...
      ptep = huge_pte_offset()
            or
      ptep = huge_pte_alloc()
      ...
                                                      i_mmap_lock_write
                                                      lock page table
      ptep invalid   <------------------------        huge_pmd_unshare()
      Could be in a previously                        unlock_page_table
      sharing process or worse                        i_mmap_unlock_write
      ...
      ptl = huge_pte_lock(ptep)
      get/update pte
      set_pte_at(pte, ptep)
      
      Reverting 87bf91d3 exposes races in page fault/file truncation.  When
      the new vma lock is put to use in patch 8, this will handle the fault/file
      truncation races.  This is explained in patch 9 where code associated with
      these races is cleaned up.
      
      Patches 3 - 5 restructure existing code in preparation for using the new
      vma lock (rw semaphore) for pmd sharing synchronization.  The idea is that
      this semaphore will be held in read mode for the duration of fault
      processing, and held in write mode for unmap operations which may call
      huge_pmd_unshare.  Acquiring i_mmap_rwsem is also still required to
      synchronize huge pmd sharing.  However it is only required in the fault
      path when setting up sharing, and will be acquired in huge_pmd_share().
      
      Patch 6 adds the new vma lock and all supporting routines, but does not
      actually change code to use the new lock.
      
      Patch 7 refactors code in preparation for using the new lock.  And, patch
      8 finally adds code to make use of this new vma lock.  Unfortunately, the
      fault code and truncate/hole punch code would naturally take locks in the
      opposite order which could lead to deadlock.  Since the performance of
      page faults is more important, the truncation/hole punch code is modified
      to back out and take locks in the correct order if necessary.
      
      [1] https://lore.kernel.org/linux-mm/43faf292-245b-5db5-cce9-369d8fb6bd21@infradead.org/
      [2] https://lore.kernel.org/lkml/20200622005551.GK5535@shao2-debian/
      [3] https://lore.kernel.org/linux-mm/20200706202615.32111-1-mike.kravetz@oracle.com/
      
      
      This patch (of 9):
      
      Commit c0d0381a ("hugetlbfs: use i_mmap_rwsem for more pmd sharing
      synchronization") added code to take i_mmap_rwsem in read mode for the
      duration of fault processing.  The use of i_mmap_rwsem to prevent
      fault/truncate races depends on this.  However, this has been shown to
      cause performance/scaling issues.  As a result, that code will be
      reverted.  Since the use i_mmap_rwsem to address page fault/truncate races
      depends on this, it must also be reverted.
      
      In a subsequent patch, code will be added to detect the fault/truncate
      race and back out operations as required.
      
      Link: https://lkml.kernel.org/r/20220914221810.95771-1-mike.kravetz@oracle.com
      Link: https://lkml.kernel.org/r/20220914221810.95771-2-mike.kravetz@oracle.comSigned-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: James Houghton <jthoughton@google.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mina Almasry <almasrymina@google.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
      Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Prakash Sangappa <prakash.sangappa@oracle.com>
      Cc: Sven Schnelle <svens@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      188a3972
    • XU pengfei's avatar
      mm/hugetlb: remove unnecessary 'NULL' values from pointer · 3259914f
      XU pengfei authored
      Pointer variables allocate memory first, and then judge.  There is no need
      to initialize the assignment.
      
      Link: https://lkml.kernel.org/r/20220914012113.6271-1-xupengfei@nfschina.comSigned-off-by: default avatarXU pengfei <xupengfei@nfschina.com>
      Reviewed-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      3259914f
    • Ke Sun's avatar
      mm/filemap: make folio_put_wait_locked static · c195c321
      Ke Sun authored
      It's only used in mm/filemap.c, since commit <ffa65753>
      ("mm/migrate.c: rework migration_entry_wait() to not take a pageref").
      
      Make it static.
      
      Link: https://lkml.kernel.org/r/20220914021738.3228011-1-sunke@kylinos.cnSigned-off-by: default avatarKe Sun <sunke@kylinos.cn>
      Reported-by: default avatark2ci <kernel-bot@kylinos.cn>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      c195c321
    • Muchun Song's avatar
      mm: hugetlb: eliminate memory-less nodes handling · a4a00b45
      Muchun Song authored
      The memory-notify-based approach aims to handle meory-less nodes, however,
      it just adds the complexity of code as pointed by David in thread [1]. 
      The handling of memory-less nodes is introduced by commit 4faf8d95
      ("hugetlb: handle memory hot-plug events").  >From its commit message, we
      cannot find any necessity of handling this case.  So, we can simply
      register/unregister sysfs entries in register_node/unregister_node to
      simlify the code.
      
      BTW, hotplug callback added because in hugetlb_register_all_nodes() we
      register sysfs nodes only for N_MEMORY nodes, seeing commit 9b5e5d0f,
      which said it was a preparation for handling memory-less nodes via memory
      hotplug.  Since we want to remove memory hotplug, so make sure we only
      register per-node sysfs for online (N_ONLINE) nodes in
      hugetlb_register_all_nodes().
      
      https://lore.kernel.org/linux-mm/60933ffc-b850-976c-78a0-0ee6e0ea9ef0@redhat.com/ [1]
      Link: https://lkml.kernel.org/r/20220914072603.60293-3-songmuchun@bytedance.comSuggested-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Rafael J. Wysocki <rafael@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      a4a00b45
    • Muchun Song's avatar
      mm: hugetlb: simplify per-node sysfs creation and removal · b958d4d0
      Muchun Song authored
      Patch series "simplify handling of per-node sysfs creation and removal",
      v4.
      
      
      This patch (of 2):
      
      The following commit offload per-node sysfs creation and removal to a
      kworker and did not say why it is needed.  And it also said "I don't know
      that this is absolutely required".  It seems like the author was not sure
      as well.  Since it only complicates the code, this patch will revert the
      changes to simplify the code.
      
        39da08cb ("hugetlb: offload per node attribute registrations")
      
      We could use memory hotplug notifier to do per-node sysfs creation and
      removal instead of inserting those operations to node registration and
      unregistration.  Then, it can reduce the code coupling between node.c and
      hugetlb.c.  Also, it can simplify the code.
      
      Link: https://lkml.kernel.org/r/20220914072603.60293-1-songmuchun@bytedance.com
      Link: https://lkml.kernel.org/r/20220914072603.60293-2-songmuchun@bytedance.comSigned-off-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Acked-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Rafael J. Wysocki <rafael@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      b958d4d0
    • ze zuo's avatar
      aaa31e05
    • Andrew Morton's avatar
      mm/page_alloc.c: document bulkfree_pcp_prepare() return value · d452289f
      Andrew Morton authored
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: ke.wang <ke.wang@unisoc.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Zhaoyang Huang <huangzhaoyang@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      d452289f
    • Andrew Morton's avatar
      mm/page_alloc.c: rename check_free_page() to free_page_is_bad() · a8368cd8
      Andrew Morton authored
      The name "check_free_page()" provides no information regarding its return
      value when the page is indeed found to be bad.
      
      Renaming it to "free_page_is_bad()" makes it clear that a `true' return
      value means the page was bad.
      
      And make it return a bool, not an int.
      
      [akpm@linux-foundation.org: don't use bool as int]
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: ke.wang <ke.wang@unisoc.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Zhaoyang Huang <huangzhaoyang@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      a8368cd8
    • Liu Shixin's avatar
      mm/memcontrol: use kstrtobool for swapaccount param parsing · 4988fe69
      Liu Shixin authored
      Use kstrtobool which is more powerful to handle all kinds of parameters
      like 'Yy1Nn0' or [oO][NnFf] for "on" and "off".
      
      Link: https://lkml.kernel.org/r/20220913071358.1812206-1-liushixin2@huawei.comSigned-off-by: default avatarLiu Shixin <liushixin2@huawei.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Cc: Shakeel Butt <shakeelb@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      4988fe69
    • Kaixu Xia's avatar
      mm/damon/core: simplify the kdamond stop mechanism by removing 'done' · 29454cf6
      Kaixu Xia authored
      When the 'kdamond_wait_activation()' function or 'after_sampling()' or
      'after_aggregation()' DAMON callbacks return an error, it is unnecessary
      to use bool 'done' to check if kdamond should be finished.  This commit
      simplifies the kdamond stop mechanism by removing 'done' and break the
      while loop directly in the cases.
      
      Link: https://lkml.kernel.org/r/1663060287-30201-4-git-send-email-kaixuxia@tencent.comSigned-off-by: default avatarKaixu Xia <kaixuxia@tencent.com>
      Reviewed-by: default avatarSeongJae Park <sj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      29454cf6
    • Kaixu Xia's avatar
      mm/damon/sysfs: simplify the variable 'pid' assignment operation · f1c71c28
      Kaixu Xia authored
      We can initialize the variable 'pid' with '-1' in pid_show() to simplify
      the variable assignment operation and make the code more readable.
      
      Link: https://lkml.kernel.org/r/1663060287-30201-3-git-send-email-kaixuxia@tencent.comSigned-off-by: default avatarKaixu Xia <kaixuxia@tencent.com>
      Reviewed-by: default avatarSeongJae Park <sj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f1c71c28
    • Kaixu Xia's avatar
      mm/damon: simplify the parameter passing for 'prepare_access_checks' · 8ef4d5ca
      Kaixu Xia authored
      Patch series "mm/damon: code simplifications and cleanups".
      
      This patchset contains some code simplifications and cleanups for DAMON.
      
      
      This patch (of 4):
      
      The parameter 'struct damon_ctx *ctx' isn't used in the functions
      __damon_{p,v}a_prepare_access_check(), so we can remove it and simplify
      the parameter passing.
      
      Link: https://lkml.kernel.org/r/1663060287-30201-1-git-send-email-kaixuxia@tencent.com
      Link: https://lkml.kernel.org/r/1663060287-30201-2-git-send-email-kaixuxia@tencent.comSigned-off-by: default avatarKaixu Xia <kaixuxia@tencent.com>
      Reviewed-by: default avatarSeongJae Park <sj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      8ef4d5ca
    • SeongJae Park's avatar
      mm/damon/lru_sort: deduplicate hot/cold schemes generators · a62518ab
      SeongJae Park authored
      damon_lru_sort_new_{hot,cold}_scheme() have quite a lot of duplicates. 
      This commit factors out the duplicate to a separate function and use it
      for reducing the duplicate.
      
      Link: https://lkml.kernel.org/r/20220913174449.50645-23-sj@kernel.orgSigned-off-by: default avatarSeongJae Park <sj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      a62518ab
    • SeongJae Park's avatar
      mm/damon/lru_sort: use quotas param generator · 45b8212f
      SeongJae Park authored
      This commit makes DAMON_LRU_SORT to generate the module parameters for
      DAMOS watermarks using the generator macro to simplify the code and reduce
      duplicates.
      
      Link: https://lkml.kernel.org/r/20220913174449.50645-22-sj@kernel.orgSigned-off-by: default avatarSeongJae Park <sj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      45b8212f
    • SeongJae Park's avatar
      mm/damon/reclaim: use the quota params generator macro · a9d57c73
      SeongJae Park authored
      This commit makes DAMON_RECLAIM to generate the module parameters for
      DAMOS quotas using the generator macro to simplify the code and reduce
      duplicates.
      
      Link: https://lkml.kernel.org/r/20220913174449.50645-21-sj@kernel.orgSigned-off-by: default avatarSeongJae Park <sj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      a9d57c73
    • SeongJae Park's avatar
      mm/damon/modules-common: implement damos time quota params generator · 1f554026
      SeongJae Park authored
      DAMON_LRU_SORT have module parameters for DAMOS time quota only but size
      quota.  This commit implements a macro for generating the module
      parameters so that we can reuse later.
      
      Link: https://lkml.kernel.org/r/20220913174449.50645-20-sj@kernel.orgSigned-off-by: default avatarSeongJae Park <sj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      1f554026
    • SeongJae Park's avatar
      mm/damon/modules-common: implement a damos quota params generator · 63e0f90b
      SeongJae Park authored
      DAMON_RECLAIM and DAMON_LRU_SORT have module parameters for DAMOS quotas
      that having same names.  This commit implements a macro for generating
      such module parameters so that we can reuse later.
      
      Link: https://lkml.kernel.org/r/20220913174449.50645-19-sj@kernel.orgSigned-off-by: default avatarSeongJae Park <sj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      63e0f90b
    • SeongJae Park's avatar
      mm/damon/lru_sort: use stat generator · dd172fbf
      SeongJae Park authored
      This commit makes DAMON_LRU_SORT to generate the module parameters for
      DAMOS statistics using the generator macro to simplify the code and reduce
      duplicates.
      
      Link: https://lkml.kernel.org/r/20220913174449.50645-18-sj@kernel.orgSigned-off-by: default avatarSeongJae Park <sj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      dd172fbf