1. 30 Nov, 2022 8 commits
    • Vlastimil Babka's avatar
      Merge branch 'slub-tiny-v1r6' into slab/for-next · dc19745a
      Vlastimil Babka authored
      Merge my series [1] to deprecate the SLOB allocator.
      - Renames CONFIG_SLOB to CONFIG_SLOB_DEPRECATED with deprecation notice.
      - The recommended replacement is CONFIG_SLUB, optionally with the new
        CONFIG_SLUB_TINY tweaks for systems with 16MB or less RAM.
      - Use cases that stopped working with CONFIG_SLUB_TINY instead of SLOB
        should be reported to linux-mm@kvack.org and slab maintainers,
        otherwise SLOB will be removed in few cycles.
      
      [1] https://lore.kernel.org/all/20221121171202.22080-1-vbabka@suse.cz/
      dc19745a
    • Vlastimil Babka's avatar
      Merge branch 'slab/for-6.2/kmalloc_redzone' into slab/for-next · 61766652
      Vlastimil Babka authored
      Add a new slub_kunit test for the extended kmalloc redzone check, by
      Feng Tang. Also prevent unwanted kfence interaction with all slub kunit
      tests.
      61766652
    • Vlastimil Babka's avatar
      mm, slob: rename CONFIG_SLOB to CONFIG_SLOB_DEPRECATED · 149b6fa2
      Vlastimil Babka authored
      As explained in [1], we would like to remove SLOB if possible.
      
      - There are no known users that need its somewhat lower memory footprint
        so much that they cannot handle SLUB (after some modifications by the
        previous patches) instead.
      
      - It is an extra maintenance burden, and a number of features are
        incompatible with it.
      
      - It blocks the API improvement of allowing kfree() on objects allocated
        via kmem_cache_alloc().
      
      As the first step, rename the CONFIG_SLOB option in the slab allocator
      configuration choice to CONFIG_SLOB_DEPRECATED. Add CONFIG_SLOB
      depending on CONFIG_SLOB_DEPRECATED as an internal option to avoid code
      churn. This will cause existing .config files and defconfigs with
      CONFIG_SLOB=y to silently switch to the default (and recommended
      replacement) SLUB, while still allowing SLOB to be configured by anyone
      that notices and needs it. But those should contact the slab maintainers
      and linux-mm@kvack.org as explained in the updated help. With no valid
      objections, the plan is to update the existing defconfigs to SLUB and
      remove SLOB in a few cycles.
      
      To make SLUB more suitable replacement for SLOB, a CONFIG_SLUB_TINY
      option was introduced to limit SLUB's memory overhead.
      There is a number of defconfigs specifying CONFIG_SLOB=y. As part of
      this patch, update them to select CONFIG_SLUB and CONFIG_SLUB_TINY.
      
      [1] https://lore.kernel.org/all/b35c3f82-f67b-2103-7d82-7a7ba7521439@suse.cz/
      
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Aaro Koskinen <aaro.koskinen@iki.fi>
      Cc: Janusz Krzysztofik <jmkrzyszt@gmail.com>
      Cc: Tony Lindgren <tony@atomide.com>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Conor Dooley <conor@kernel.org>
      Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: Aaro Koskinen <aaro.koskinen@iki.fi> # OMAP1
      Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> # riscv k210
      Acked-by: Arnd Bergmann <arnd@arndb.de> # arm
      Acked-by: default avatarRoman Gushchin <roman.gushchin@linux.dev>
      Acked-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Reviewed-by: default avatarChristoph Lameter <cl@linux.com>
      149b6fa2
    • Vlastimil Babka's avatar
      mm, slub: don't aggressively inline with CONFIG_SLUB_TINY · be784ba8
      Vlastimil Babka authored
      SLUB fastpaths use __always_inline to avoid function calls. With
      CONFIG_SLUB_TINY we would rather save the memory. Add a
      __fastpath_inline macro that's __always_inline normally but empty with
      CONFIG_SLUB_TINY.
      
      bloat-o-meter results on x86_64 mm/slub.o:
      
      add/remove: 3/1 grow/shrink: 1/8 up/down: 865/-1784 (-919)
      Function                                     old     new   delta
      kmem_cache_free                               20     281    +261
      slab_alloc_node.isra                           -     245    +245
      slab_free.constprop.isra                       -     231    +231
      __kmem_cache_alloc_lru.isra                    -     128    +128
      __kmem_cache_release                          88      83      -5
      __kmem_cache_create                         1446    1436     -10
      __kmem_cache_free                            271     142    -129
      kmem_cache_alloc_node                        330     127    -203
      kmem_cache_free_bulk.part                    826     613    -213
      __kmem_cache_alloc_node                      230      10    -220
      kmem_cache_alloc_lru                         325      12    -313
      kmem_cache_alloc                             325      10    -315
      kmem_cache_free.part                         376       -    -376
      Total: Before=26103, After=25184, chg -3.52%
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Reviewed-by: default avatarChristoph Lameter <cl@linux.com>
      Acked-by: default avatarHyeonggon Yoo <42.hyeyoo@gmail.com>
      be784ba8
    • Vlastimil Babka's avatar
      mm, slub: remove percpu slabs with CONFIG_SLUB_TINY · 0af8489b
      Vlastimil Babka authored
      SLUB gets most of its scalability by percpu slabs. However for
      CONFIG_SLUB_TINY the goal is minimal memory overhead, not scalability.
      Thus, #ifdef out the whole kmem_cache_cpu percpu structure and
      associated code. Additionally to the slab page savings, this reduces
      percpu allocator usage, and code size.
      
      This change builds on recent commit c7323a5a ("mm/slub: restrict
      sysfs validation to debug caches and make it safe"), as caches with
      enabled debugging also avoid percpu slabs and all allocations and
      freeing ends up working with the partial list. With a bit more
      refactoring by the preceding patches, use the same code paths with
      CONFIG_SLUB_TINY.
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Reviewed-by: default avatarChristoph Lameter <cl@linux.com>
      0af8489b
    • Vlastimil Babka's avatar
      mm, slub: split out allocations from pre/post hooks · 56d5a2b9
      Vlastimil Babka authored
      In the following patch we want to introduce CONFIG_SLUB_TINY allocation
      paths that don't use the percpu slab. To prepare, refactor the
      allocation functions:
      
      Split out __slab_alloc_node() from slab_alloc_node() where the former
      does the actual allocation and the latter calls the pre/post hooks.
      
      Analogically, split out __kmem_cache_alloc_bulk() from
      kmem_cache_alloc_bulk().
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Reviewed-by: default avatarChristoph Lameter <cl@linux.com>
      Reviewed-by: default avatarHyeonggon Yoo <42.hyeyoo@gmail.com>
      56d5a2b9
    • Feng Tang's avatar
      mm/slub, kunit: Add a test case for kmalloc redzone check · 6cd6d33c
      Feng Tang authored
      kmalloc redzone check for slub has been merged, and it's better to add
      a kunit case for it, which is inspired by a real-world case as described
      in commit 120ee599 ("staging: octeon-usb: prevent memory corruption"):
      
      "
        octeon-hcd will crash the kernel when SLOB is used. This usually happens
        after the 18-byte control transfer when a device descriptor is read.
        The DMA engine is always transferring full 32-bit words and if the
        transfer is shorter, some random garbage appears after the buffer.
        The problem is not visible with SLUB since it rounds up the allocations
        to word boundary, and the extra bytes will go undetected.
      "
      
      To avoid interrupting the normal functioning of kmalloc caches, a
      kmem_cache mimicing kmalloc cache is created with similar flags, and
      kmalloc_trace() is used to really test the orig_size and redzone setup.
      Suggested-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarFeng Tang <feng.tang@intel.com>
      Reviewed-by: default avatarHyeonggon Yoo <42.hyeyoo@gmail.com>
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      6cd6d33c
    • Feng Tang's avatar
      mm/slub, kunit: add SLAB_SKIP_KFENCE flag for cache creation · 4d9dd4b0
      Feng Tang authored
      When kfence is enabled, the buffer allocated from the test case
      could be from a kfence pool, and the operation could be also
      caught and reported by kfence first, causing the case to fail.
      
      With default kfence setting, this is very difficult to be triggered.
      By changing CONFIG_KFENCE_NUM_OBJECTS from 255 to 16383, and
      CONFIG_KFENCE_SAMPLE_INTERVAL from 100 to 5, the allocation from
      kfence did hit 7 times in different slub_kunit cases out of 900
      times of boot test.
      
      To avoid this, initially we tried is_kfence_address() to check this
      and repeated allocation till finding a non-kfence address. Vlastimil
      Babka suggested SLAB_SKIP_KFENCE flag could be used to achieve this,
      and better add a wrapper function for simplifying cache creation.
      Signed-off-by: default avatarFeng Tang <feng.tang@intel.com>
      Reviewed-by: default avatarMarco Elver <elver@google.com>
      Reviewed-by: default avatarHyeonggon Yoo <42.hyeyoo@gmail.com>
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      4d9dd4b0
  2. 27 Nov, 2022 8 commits
  3. 21 Nov, 2022 14 commits
    • Vlastimil Babka's avatar
      Merge branch 'slab/for-6.2/alloc_size' into slab/for-next · b5e72d27
      Vlastimil Babka authored
      Two patches from Kees Cook [1]:
      
      These patches work around a deficiency in GCC (>=11) and Clang (<16)
      where the __alloc_size attribute does not apply to inlines.
      https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96503
      
      This manifests as reduced overflow detection coverage for many allocation
      sites under CONFIG_FORTIFY_SOURCE=y, where the allocation size was not
      actually being propagated to __builtin_dynamic_object_size().
      
      [1] https://lore.kernel.org/all/20221118034713.gonna.754-kees@kernel.org/
      b5e72d27
    • Vlastimil Babka's avatar
      Merge branch 'slab/for-6.2/kmalloc_redzone' into slab/for-next · 90e9b23a
      Vlastimil Babka authored
      kmalloc() redzone improvements by Feng Tang
      
      From cover letter [1]:
      
      kmalloc's API family is critical for mm, and one of its nature is that
      it will round up the request size to a fixed one (mostly power of 2).
      When user requests memory for '2^n + 1' bytes, actually 2^(n+1) bytes
      could be allocated, so there is an extra space than what is originally
      requested.
      
      This patchset tries to extend the redzone sanity check to the extra
      kmalloced buffer than requested, to better detect un-legitimate access
      to it. (depends on SLAB_STORE_USER & SLAB_RED_ZONE)
      
      [1] https://lore.kernel.org/all/20221021032405.1825078-1-feng.tang@intel.com/
      90e9b23a
    • Vlastimil Babka's avatar
      Merge branch 'slab/for-6.2/fit_rcu_head' into slab/for-next · 76537db3
      Vlastimil Babka authored
      A series by myself to reorder fields in struct slab to allow the
      embedded rcu_head to grow (for debugging purposes). Requires changes to
      isolate_movable_page() to skip slab pages which can otherwise become
      false-positive __PageMovable due to its use of low bits in
      page->mapping.
      76537db3
    • Vlastimil Babka's avatar
      Merge branch 'slab/for-6.2/tools' into slab/for-next · 1c1aaa33
      Vlastimil Babka authored
      A patch for tools/vm/slabinfo to give more useful feedback when not run
      as a root, by Rong Tao.
      1c1aaa33
    • Vlastimil Babka's avatar
      Merge branch 'slab/for-6.2/slub-sysfs' into slab/for-next · c64b95d3
      Vlastimil Babka authored
      - Two patches for SLUB's sysfs by Rasmus Villemoes to remove dead code
        and optimize boot time with late initialization.
      - Allow SLUB's sysfs 'failslab' parameter to be runtime-controllable
        again as it can be both useful and safe, by Alexander Atanasov.
      c64b95d3
    • Vlastimil Babka's avatar
      Merge branch 'slab/for-6.2/locking' into slab/for-next · 14d3eb66
      Vlastimil Babka authored
      A patch from Jiri Kosina that makes SLAB's list_lock a raw_spinlock_t.
      While there are no plans to make SLAB actually compatible with
      PREEMPT_RT or any other future, it makes !PREEMPT_RT lockdep happy.
      14d3eb66
    • Vlastimil Babka's avatar
      Merge branch 'slab/for-6.2/cleanups' into slab/for-next · 4b28ba9e
      Vlastimil Babka authored
      - Removal of dead code from deactivate_slab() by Hyeonggon Yoo.
      - Fix of BUILD_BUG_ON() for sufficient early percpu size by Baoquan He.
      - Make kmem_cache_alloc() kernel-doc less misleading, by myself.
      4b28ba9e
    • Kees Cook's avatar
      slab: Remove special-casing of const 0 size allocations · 6fa57d78
      Kees Cook authored
      Passing a constant-0 size allocation into kmalloc() or kmalloc_node()
      does not need to be a fast-path operation, so the static return value
      can be removed entirely. This makes sure that all paths through the
      inlines result in a full extern function call, where __alloc_size()
      hints will actually be seen[1] by GCC. (A constant return value of 0
      means the "0" allocation size won't be propagated by the inline.)
      
      [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96503
      
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
      Cc: linux-mm@kvack.org
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarHyeonggon Yoo <42.hyeyoo@gmail.com>
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      6fa57d78
    • Kees Cook's avatar
      slab: Clean up SLOB vs kmalloc() definition · 3bf01933
      Kees Cook authored
      As already done for kmalloc_node(), clean up the #ifdef usage in the
      definition of kmalloc() so that the SLOB-only version is an entirely
      separate and much more readable function.
      
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
      Cc: linux-mm@kvack.org
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarHyeonggon Yoo <42.hyeyoo@gmail.com>
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      3bf01933
    • Vlastimil Babka's avatar
      mm/sl[au]b: rearrange struct slab fields to allow larger rcu_head · 130d4df5
      Vlastimil Babka authored
      Joel reports [1] that increasing the rcu_head size for debugging
      purposes used to work before struct slab was split from struct page, but
      now runs into the various SLAB_MATCH() sanity checks of the layout.
      
      This is because the rcu_head in struct page is in union with large
      sub-structures and has space to grow without exceeding their size, while
      in struct slab (for SLAB and SLUB) it's in union only with a list_head.
      
      On closer inspection (and after the previous patch) we can put all
      fields except slab_cache to a union with rcu_head, as slab_cache is
      sufficient for the rcu freeing callbacks to work and the rest can be
      overwritten by rcu_head without causing issues.
      
      This is only somewhat complicated by the need to keep SLUB's
      freelist+counters aligned for cmpxchg_double. As a result the fields
      need to be reordered so that slab_cache is first (after page flags) and
      the union with rcu_head follows. For consistency, do that for SLAB as
      well, although not necessary there.
      
      As a result, the rcu_head field in struct page and struct slab is no
      longer at the same offset, but that doesn't matter as there is no
      casting that would rely on that in the slab freeing callbacks, so we can
      just drop the respective SLAB_MATCH() check.
      
      Also we need to update the SLAB_MATCH() for compound_head to reflect the
      new ordering.
      
      While at it, also add a static_assert to check the alignment needed for
      cmpxchg_double so mistakes are found sooner than a runtime GPF.
      
      [1] https://lore.kernel.org/all/85afd876-d8bb-0804-b2c5-48ed3055e702@joelfernandes.org/Reported-by: default avatarJoel Fernandes <joel@joelfernandes.org>
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarHyeonggon Yoo <42.hyeyoo@gmail.com>
      130d4df5
    • Vlastimil Babka's avatar
      mm/migrate: make isolate_movable_page() skip slab pages · 8b881763
      Vlastimil Babka authored
      In the next commit we want to rearrange struct slab fields to allow a larger
      rcu_head. Afterwards, the page->mapping field will overlap with SLUB's "struct
      list_head slab_list", where the value of prev pointer can become LIST_POISON2,
      which is 0x122 + POISON_POINTER_DELTA.  Unfortunately the bit 1 being set can
      confuse PageMovable() to be a false positive and cause a GPF as reported by lkp
      [1].
      
      To fix this, make isolate_movable_page() skip pages with the PageSlab flag set.
      This is a bit tricky as we need to add memory barriers to SLAB and SLUB's page
      allocation and freeing, and their counterparts to isolate_movable_page().
      
      Based on my RFC from [2]. Added a comment update from Matthew's variant in [3]
      and, as done there, moved the PageSlab checks to happen before trying to take
      the page lock.
      
      [1] https://lore.kernel.org/all/208c1757-5edd-fd42-67d4-1940cc43b50f@intel.com/
      [2] https://lore.kernel.org/all/aec59f53-0e53-1736-5932-25407125d4d4@suse.cz/
      [3] https://lore.kernel.org/all/YzsVM8eToHUeTP75@casper.infradead.org/Reported-by: default avatarkernel test robot <yujie.liu@intel.com>
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarHyeonggon Yoo <42.hyeyoo@gmail.com>
      8b881763
    • Vlastimil Babka's avatar
      mm/slab: move and adjust kernel-doc for kmem_cache_alloc · 838de63b
      Vlastimil Babka authored
      Alexander reports an issue with the kmem_cache_alloc() comment in
      mm/slab.c:
      
      > The current comment mentioned that the flags only matters if the
      > cache has no available objects. It's different for the __GFP_ZERO
      > flag which will ensure that the returned object is always zeroed
      > in any case.
      
      > I have the feeling I run into this question already two times if
      > the user need to zero the object or not, but the user does not need
      > to zero the object afterwards. However another use of __GFP_ZERO
      > and only zero the object if the cache has no available objects would
      > also make no sense.
      
      and suggests thus mentioning __GFP_ZERO as the exception. But on closer
      inspection, the part about flags being only relevant if cache has no
      available objects is misleading. The slab user has no reliable way to
      determine if there are available objects, and e.g. the might_sleep()
      debug check can be performed even if objects are available, so passing
      correct flags given the allocation context always matters.
      
      Thus remove that sentence completely, and while at it, move the comment
      to from SLAB-specific mm/slab.c to the common include/linux/slab.h
      The comment otherwise refers flags description for kmalloc(), so add
      __GFP_ZERO comment there and remove a very misleading GFP_HIGHUSER
      (not applicable to slab) description from there. Mention kzalloc() and
      kmem_cache_zalloc() shortcuts.
      Reported-by: default avatarAlexander Aring <aahringo@redhat.com>
      Link: https://lore.kernel.org/all/20221011145413.8025-1-aahringo@redhat.com/Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      838de63b
    • Baoquan He's avatar
      mm/slub, percpu: correct the calculation of early percpu allocation size · a0dc161a
      Baoquan He authored
      SLUB allocator relies on percpu allocator to initialize its ->cpu_slab
      during early boot. For that, the dynamic chunk of percpu which serves
      the early allocation need be large enough to satisfy the kmalloc
      creation.
      
      However, the current BUILD_BUG_ON() in alloc_kmem_cache_cpus() doesn't
      consider the kmalloc array with NR_KMALLOC_TYPES length. Fix that
      with correct calculation.
      Signed-off-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
      Acked-by: default avatarHyeonggon Yoo <42.hyeyoo@gmail.com>
      Acked-by: default avatarDennis Zhou <dennis@kernel.org>
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      a0dc161a
    • Baoquan He's avatar
      percpu: adjust the value of PERCPU_DYNAMIC_EARLY_SIZE · e8753e41
      Baoquan He authored
      LKP reported a build failure as below on the following patch "mm/slub,
      percpu: correct the calculation of early percpu allocation size"
      
      ~~~~~~
      In file included from <command-line>:
      In function 'alloc_kmem_cache_cpus',
         inlined from 'kmem_cache_open' at mm/slub.c:4340:6:
      >> >> include/linux/compiler_types.h:357:45: error: call to '__compiletime_assert_474' declared with attribute error:
      BUILD_BUG_ON failed: PERCPU_DYNAMIC_EARLY_SIZE < NR_KMALLOC_TYPES * KMALLOC_SHIFT_HIGH * sizeof(struct kmem_cache_cpu)
           357 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
      ~~~~~~
      
      From the kernel config file provided by LKP, the building was made on
      arm64 with below Kconfig item enabled:
      
        CONFIG_ZONE_DMA=y
        CONFIG_SLUB_CPU_PARTIAL=y
        CONFIG_DEBUG_LOCK_ALLOC=y
        CONFIG_SLUB_STATS=y
        CONFIG_ARM64_PAGE_SHIFT=16
        CONFIG_ARM64_64K_PAGES=y
      
      Then we will have:
        NR_KMALLOC_TYPES:4
        KMALLOC_SHIFT_HIGH:17
        sizeof(struct kmem_cache_cpu):184
      
      The product of them is 12512, which is bigger than PERCPU_DYNAMIC_EARLY_SIZE,
      12K. Hence, the BUILD_BUG_ON in alloc_kmem_cache_cpus() is triggered.
      
      Earlier, in commit 099a19d9 ("percpu: allow limited allocation
      before slab is online"), PERCPU_DYNAMIC_EARLY_SIZE was introduced and
      set to 12K which is equal to the then PERPCU_DYNAMIC_RESERVE.
      Later, in commit 1a4d7607 ("percpu: implement asynchronous chunk
      population"), PERPCU_DYNAMIC_RESERVE was increased by 8K, while
      PERCPU_DYNAMIC_EARLY_SIZE was kept unchanged.
      
      So, here increase PERCPU_DYNAMIC_EARLY_SIZE by 8K too to accommodate to
      the slub's requirement.
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarBaoquan He <bhe@redhat.com>
      Acked-by: default avatarDennis Zhou <dennis@kernel.org>
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      e8753e41
  4. 11 Nov, 2022 1 commit
    • Feng Tang's avatar
      mm/slub: extend redzone check to extra allocated kmalloc space than requested · 946fa0db
      Feng Tang authored
      kmalloc will round up the request size to a fixed size (mostly power
      of 2), so there could be a extra space than what is requested, whose
      size is the actual buffer size minus original request size.
      
      To better detect out of bound access or abuse of this space, add
      redzone sanity check for it.
      
      In current kernel, some kmalloc user already knows the existence of
      the space and utilizes it after calling 'ksize()' to know the real
      size of the allocated buffer. So we skip the sanity check for objects
      which have been called with ksize(), as treating them as legitimate
      users. Kees Cook is working on sanitizing all these user cases,
      by using kmalloc_size_roundup() to avoid ambiguous usages. And after
      this is done, this special handling for ksize() can be removed.
      
      In some cases, the free pointer could be saved inside the latter
      part of object data area, which may overlap the redzone part(for
      small sizes of kmalloc objects). As suggested by Hyeonggon Yoo,
      force the free pointer to be in meta data area when kmalloc redzone
      debug is enabled, to make all kmalloc objects covered by redzone
      check.
      Suggested-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarFeng Tang <feng.tang@intel.com>
      Acked-by: default avatarHyeonggon Yoo <42.hyeyoo@gmail.com>
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      946fa0db
  5. 10 Nov, 2022 3 commits
  6. 07 Nov, 2022 1 commit
  7. 06 Nov, 2022 1 commit
    • Kees Cook's avatar
      mm/slab_common: Restore passing "caller" for tracing · 32868715
      Kees Cook authored
      The "caller" argument was accidentally being ignored in a few places
      that were recently refactored. Restore these "caller" arguments, instead
      of _RET_IP_.
      
      Fixes: 11e9734b ("mm/slab_common: unify NUMA and UMA version of tracepoints")
      Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Cc: linux-mm@kvack.org
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Acked-by: default avatarHyeonggon Yoo <42.hyeyoo@gmail.com>
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      32868715
  8. 04 Nov, 2022 1 commit
  9. 03 Nov, 2022 1 commit
  10. 24 Oct, 2022 2 commits
    • Vlastimil Babka's avatar
      mm/slub: perform free consistency checks before call_rcu · bc29d5bd
      Vlastimil Babka authored
      For SLAB_TYPESAFE_BY_RCU caches we use call_rcu to perform empty slab
      freeing. The rcu callback rcu_free_slab() calls __free_slab() that
      currently includes checking the slab consistency for caches with
      SLAB_CONSISTENCY_CHECKS flags. This check needs the slab->objects field
      to be intact.
      
      Because in the next patch we want to allow rcu_head in struct slab to
      become larger in debug configurations and thus potentially overwrite
      more fields through a union than slab_list, we want to limit the fields
      used in rcu_free_slab().  Thus move the consistency checks to
      free_slab() before call_rcu(). This can be done safely even for
      SLAB_TYPESAFE_BY_RCU caches where accesses to the objects can still
      occur after freeing them.
      
      As a result, only the slab->slab_cache field has to be physically
      separate from rcu_head for the freeing callback to work. We also save
      some cycles in the rcu callback for caches with consistency checks
      enabled.
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: default avatarHyeonggon Yoo <42.hyeyoo@gmail.com>
      bc29d5bd
    • Jiri Kosina's avatar
      mm/slab: Annotate kmem_cache_node->list_lock as raw · b539ce9f
      Jiri Kosina authored
      The list_lock can be taken in hardirq context when do_drain() is being
      called via IPI on all cores, and therefore lockdep complains about it,
      because it can't be preempted on PREEMPT_RT.
      
      That's not a real issue, as SLAB can't be built on PREEMPT_RT anyway, but
      we still want to get rid of the warning on non-PREEMPT_RT builds.
      
      Annotate it therefore as a raw lock in order to get rid of he lockdep
      warning below.
      
      	 =============================
      	 [ BUG: Invalid wait context ]
      	 6.1.0-rc1-00134-ge35184f3 #4 Not tainted
      	 -----------------------------
      	 swapper/3/0 is trying to lock:
      	 ffff8bc88086dc18 (&parent->list_lock){..-.}-{3:3}, at: do_drain+0x57/0xb0
      	 other info that might help us debug this:
      	 context-{2:2}
      	 no locks held by swapper/3/0.
      	 stack backtrace:
      	 CPU: 3 PID: 0 Comm: swapper/3 Not tainted 6.1.0-rc1-00134-ge35184f3 #4
      	 Hardware name: LENOVO 20K5S22R00/20K5S22R00, BIOS R0IET38W (1.16 ) 05/31/2017
      	 Call Trace:
      	  <IRQ>
      	  dump_stack_lvl+0x6b/0x9d
      	  __lock_acquire+0x1519/0x1730
      	  ? build_sched_domains+0x4bd/0x1590
      	  ? __lock_acquire+0xad2/0x1730
      	  lock_acquire+0x294/0x340
      	  ? do_drain+0x57/0xb0
      	  ? sched_clock_tick+0x41/0x60
      	  _raw_spin_lock+0x2c/0x40
      	  ? do_drain+0x57/0xb0
      	  do_drain+0x57/0xb0
      	  __flush_smp_call_function_queue+0x138/0x220
      	  __sysvec_call_function+0x4f/0x210
      	  sysvec_call_function+0x4b/0x90
      	  </IRQ>
      	  <TASK>
      	  asm_sysvec_call_function+0x16/0x20
      	 RIP: 0010:mwait_idle+0x5e/0x80
      	 Code: 31 d2 65 48 8b 04 25 80 ed 01 00 48 89 d1 0f 01 c8 48 8b 00 a8 08 75 14 66 90 0f 00 2d 0b 78 46 00 31 c0 48 89 c1 fb 0f 01 c9 <eb> 06 fb 0f 1f 44 00 00 65 48 8b 04 25 80 ed 01 00 f0 80 60 02 df
      	 RSP: 0000:ffffa90940217ee0 EFLAGS: 00000246
      	 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
      	 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff9bb9f93a
      	 RBP: 0000000000000003 R08: 0000000000000001 R09: 0000000000000001
      	 R10: ffffa90940217ea8 R11: 0000000000000000 R12: ffffffffffffffff
      	 R13: 0000000000000000 R14: ffff8bc88127c500 R15: 0000000000000000
      	  ? default_idle_call+0x1a/0xa0
      	  default_idle_call+0x4b/0xa0
      	  do_idle+0x1f1/0x2c0
      	  ? _raw_spin_unlock_irqrestore+0x56/0x70
      	  cpu_startup_entry+0x19/0x20
      	  start_secondary+0x122/0x150
      	  secondary_startup_64_no_verify+0xce/0xdb
      	  </TASK>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Reviewed-by: default avatarHyeonggon Yoo <42.hyeyoo@gmail.com>
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      b539ce9f