• Vlastimil Babka's avatar
    mm, slub: stop freeing kmem_cache_node structures on node offline · 666716fd
    Vlastimil Babka authored
    Patch series "mm, slab, slub: remove cpu and memory hotplug locks".
    
    Some related work caused me to look at how we use get/put_mems_online()
    and get/put_online_cpus() during kmem cache
    creation/descruction/shrinking, and realize that it should be actually
    safe to remove all of that with rather small effort (as e.g.  Michal Hocko
    suspected in some of the past discussions already).  This has the benefit
    to avoid rather heavy locks that have caused locking order issues already
    in the past.  So this is the result, Patches 2 and 3 remove memory hotplug
    and cpu hotplug locking, respectively.  Patch 1 is due to realization that
    in fact some races exist despite the locks (even if not removed), but the
    most sane solution is not to introduce more of them, but rather accept
    some wasted memory in scenarios that should be rare anyway (full memory
    hot remove), as we do the same in other contexts already.
    
    This patch (of 3):
    
    Commit e4f8e513 ("mm/slub: fix a deadlock in show_slab_objects()") has
    fixed a problematic locking order by removing the memory hotplug lock
    get/put_online_mems() from show_slab_objects().  During the discussion, it
    was argued [1] that this is OK, because existing slabs on the node would
    prevent a hotremove to proceed.
    
    That's true, but per-node kmem_cache_node structures are not necessarily
    allocated on the same node and may exist even without actual slab pages on
    the same node.  Any path that uses get_node() directly or via
    for_each_kmem_cache_node() (such as show_slab_objects()) can race with
    freeing of kmem_cache_node even with the !NULL check, resulting in
    use-after-free.
    
    To that end, commit e4f8e513 argues in a comment that:
    
     * We don't really need mem_hotplug_lock (to hold off
     * slab_mem_going_offline_callback) here because slab's memory hot
     * unplug code doesn't destroy the kmem_cache->node[] data.
    
    While it's true that slab_mem_going_offline_callback() doesn't free the
    kmem_cache_node, the later callback slab_mem_offline_callback() actually
    does, so the race and use-after-free exists.  Not just for
    show_slab_objects() after commit e4f8e513, but also many other places
    that are not under slab_mutex.  And adding slab_mutex locking or other
    synchronization to SLUB paths such as get_any_partial() would be bad for
    performance and error-prone.
    
    The easiest solution is therefore to make the abovementioned comment true
    and stop freeing the kmem_cache_node structures, accepting some wasted
    memory in the full memory node removal scenario.  Analogically we also
    don't free hotremoved pgdat as mentioned in [1], nor the similar per-node
    structures in SLAB.  Importantly this approach will not block the
    hotremove, as generally such nodes should be movable in order to succeed
    hotremove in the first place, and thus the GFP_KERNEL allocated
    kmem_cache_node will come from elsewhere.
    
    [1] https://lore.kernel.org/linux-mm/20190924151147.GB23050@dhcp22.suse.cz/
    
    Link: https://lkml.kernel.org/r/20210113131634.3671-1-vbabka@suse.cz
    Link: https://lkml.kernel.org/r/20210113131634.3671-2-vbabka@suse.czSigned-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
    Cc: Christoph Lameter <cl@linux.com>
    Cc: Pekka Enberg <penberg@kernel.org>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
    Cc: Qian Cai <cai@redhat.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    666716fd
slub.c 142 KB