• Linus Torvalds's avatar
    Merge tag 'slab-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab · cd97950c
    Linus Torvalds authored
    Pull slab updates from Vlastimil Babka:
     "This time it's mostly random cleanups and fixes, with two performance
      fixes that might have significant impact, but limited to systems
      experiencing particular bad corner case scenarios rather than general
      performance improvements.
    
      The memcg hook changes are going through the mm tree due to
      dependencies.
    
       - Prevent stalls when reading /proc/slabinfo (Jianfeng Wang)
    
         This fixes the long-standing problem that can happen with workloads
         that have alloc/free patterns resulting in many partially used
         slabs (in e.g. dentry cache). Reading /proc/slabinfo will traverse
         the long partial slab list under spinlock with disabled irqs and
         thus can stall other processes or even trigger the lockup
         detection. The traversal is only done to count free objects so that
         <active_objs> column can be reported along with <num_objs>.
    
         To avoid affecting fast paths with another shared counter
         (attempted in the past) or complex partial list traversal schemes
         that allow rescheduling, the chosen solution resorts to
         approximation - when the partial list is over 10000 slabs long, we
         will only traverse first 5000 slabs from head and tail each and use
         the average of those to estimate the whole list. Both head and tail
         are used as the slabs near head to tend to have more free objects
         than the slabs towards the tail.
    
         It is expected the approximation should not break existing
         /proc/slabinfo consumers. The <num_objs> field is still accurate
         and reflects the overall kmem_cache footprint. The <active_objs>
         was already imprecise due to cpu and percpu-partial slabs, so can't
         be relied upon to determine exact cache usage. The difference
         between <active_objs> and <num_objs> is mainly useful to determine
         the slab fragmentation, and that will be possible even with the
         approximation in place.
    
       - Prevent allocating many slabs when a NUMA node is full (Chen Jun)
    
         Currently, on NUMA systems with a node under significantly bigger
         pressure than other nodes, the fallback strategy may result in each
         kmalloc_node() that can't be safisfied from the preferred node, to
         allocate a new slab on a fallback node, and not reuse the slabs
         already on that node's partial list.
    
         This is now fixed and partial lists of fallback nodes are checked
         even for kmalloc_node() allocations. It's still preferred to
         allocate a new slab on the requested node before a fallback, but
         only with a GFP_NOWAIT attempt, which will fail quickly when the
         node is under a significant memory pressure.
    
       - More SLAB removal related cleanups (Xiu Jianfeng, Hyunmin Lee)
    
       - Fix slub_kunit self-test with hardened freelists (Guenter Roeck)
    
       - Mark racy accesses for KCSAN (linke li)
    
       - Misc cleanups (Xiongwei Song, Haifeng Xu, Sangyun Kim)"
    
    * tag 'slab-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
      mm/slub: remove the check for NULL kmalloc_caches
      mm/slub: create kmalloc 96 and 192 caches regardless cache size order
      mm/slub: mark racy access on slab->freelist
      slub: use count_partial_free_approx() in slab_out_of_memory()
      slub: introduce count_partial_free_approx()
      slub: Set __GFP_COMP in kmem_cache by default
      mm/slub: remove duplicate initialization for early_kmem_cache_node_alloc()
      mm/slub: correct comment in do_slab_free()
      mm/slub, kunit: Use inverted data to corrupt kmem cache
      mm/slub: simplify get_partial_node()
      mm/slub: add slub_get_cpu_partial() helper
      mm/slub: remove the check of !kmem_cache_has_cpu_partial()
      mm/slub: Reduce memory consumption in extreme scenarios
      mm/slub: mark racy accesses on slab->slabs
      mm/slub: remove dummy slabinfo functions
    cd97950c
slub.c 179 KB