• David Hildenbrand's avatar
    mm/memory_hotplug: don't free usage map when removing a re-added early section · 8068df3b
    David Hildenbrand authored
    When we remove an early section, we don't free the usage map, as the
    usage maps of other sections are placed into the same page.  Once the
    section is removed, it is no longer an early section (especially, the
    memmap is freed).  When we re-add that section, the usage map is reused,
    however, it is no longer an early section.  When removing that section
    again, we try to kfree() a usage map that was allocated during early
    boot - bad.
    
    Let's check against PageReserved() to see if we are dealing with an
    usage map that was allocated during boot.  We could also check against
    !(PageSlab(usage_page) || PageCompound(usage_page)), but PageReserved() is
    cleaner.
    
    Can be triggered using memtrace under ppc64/powernv:
    
      $ mount -t debugfs none /sys/kernel/debug/
      $ echo 0x20000000 > /sys/kernel/debug/powerpc/memtrace/enable
      $ echo 0x20000000 > /sys/kernel/debug/powerpc/memtrace/enable
       ------------[ cut here ]------------
       kernel BUG at mm/slub.c:3969!
       Oops: Exception in kernel mode, sig: 5 [#1]
       LE PAGE_SIZE=3D64K MMU=3DHash SMP NR_CPUS=3D2048 NUMA PowerNV
       Modules linked in:
       CPU: 0 PID: 154 Comm: sh Not tainted 5.5.0-rc2-next-20191216-00005-g0be1dba7b7c0 #61
       NIP kfree+0x338/0x3b0
       LR section_deactivate+0x138/0x200
       Call Trace:
         section_deactivate+0x138/0x200
         __remove_pages+0x114/0x150
         arch_remove_memory+0x3c/0x160
         try_remove_memory+0x114/0x1a0
         __remove_memory+0x20/0x40
         memtrace_enable_set+0x254/0x850
         simple_attr_write+0x138/0x160
         full_proxy_write+0x8c/0x110
         __vfs_write+0x38/0x70
         vfs_write+0x11c/0x2a0
         ksys_write+0x84/0x140
         system_call+0x5c/0x68
       ---[ end trace 4b053cbd84e0db62 ]---
    
    The first invocation will offline+remove memory blocks.  The second
    invocation will first add+online them again, in order to offline+remove
    them again (usually we are lucky and the exact same memory blocks will
    get "reallocated").
    
    Tested on powernv with boot memory: The usage map will not get freed.
    Tested on x86-64 with DIMMs: The usage map will get freed.
    
    Using Dynamic Memory under a Power DLAPR can trigger it easily.
    
    Triggering removal (I assume after previously removed+re-added) of
    memory from the HMC GUI can crash the kernel with the same call trace
    and is fixed by this patch.
    
    Link: http://lkml.kernel.org/r/20191217104637.5509-1-david@redhat.com
    Fixes: 326e1b8f ("mm/sparsemem: introduce a SECTION_IS_EARLY flag")
    Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
    Tested-by: default avatarPingfan Liu <piliu@redhat.com>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Oscar Salvador <osalvador@suse.de>
    Cc: Michal Hocko <mhocko@kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    8068df3b
sparse.c 25.7 KB