• Dave Hansen's avatar
    mm/migrate: optimize hotplug-time demotion order updates · 295be91f
    Dave Hansen authored
    Patch series "mm/migrate: 5.15 fixes for automatic demotion", v2.
    
    This contains two fixes for the "automatic demotion" code which was
    merged into 5.15:
    
     * Fix memory hotplug performance regression by watching
       suppressing any real action on irrelevant hotplug events.
    
     * Ensure CPU hotplug handler is registered when memory hotplug
       is disabled.
    
    This patch (of 2):
    
    == tl;dr ==
    
    Automatic demotion opted for a simple, lazy approach to handling hotplug
    events.  This noticeably slows down memory hotplug[1].  Optimize away
    updates to the demotion order when memory hotplug events should have no
    effect.
    
    This has no effect on CPU hotplug.  There is no known problem on the CPU
    side and any work there will be in a separate series.
    
    == Background ==
    
    Automatic demotion is a memory migration strategy to ensure that new
    allocations have room in faster memory tiers on tiered memory systems.
    The kernel maintains an array (node_demotion[]) to drive these
    migrations.
    
    The node_demotion[] path is calculated by starting at nodes with CPUs
    and then "walking" to nodes with memory.  Only hotplug events which
    online or offline a node with memory (N_ONLINE) or CPUs (N_CPU) will
    actually affect the migration order.
    
    == Problem ==
    
    However, the current code is lazy.  It completely regenerates the
    migration order on *any* CPU or memory hotplug event.  The logic was
    that these events are extremely rare and that the overhead from
    indiscriminate order regeneration is minimal.
    
    Part of the update logic involves a synchronize_rcu(), which is a pretty
    big hammer.  Its overhead was large enough to be detected by some 0day
    tests that watch memory hotplug performance[1].
    
    == Solution ==
    
    Add a new helper (node_demotion_topo_changed()) which can differentiate
    between superfluous and impactful hotplug events.  Skip the expensive
    update operation for superfluous events.
    
    == Aside: Locking ==
    
    It took me a few moments to declare the locking to be safe enough for
    node_demotion_topo_changed() to work.  It all hinges on the memory
    hotplug lock:
    
    During memory hotplug events, 'mem_hotplug_lock' is held for write.
    This ensures that two memory hotplug events can not be called
    simultaneously.
    
    CPU hotplug has a similar lock (cpuhp_state_mutex) which also provides
    mutual exclusion between CPU hotplug events.  In addition, the demotion
    code acquire and hold the mem_hotplug_lock for read during its CPU
    hotplug handlers.  This provides mutual exclusion between the demotion
    memory hotplug callbacks and the CPU hotplug callbacks.
    
    This effectively allows treating the migration target generation code to
    act as if it is single-threaded.
    
    1. https://lore.kernel.org/all/20210905135932.GE15026@xsang-OptiPlex-9020/
    
    Link: https://lkml.kernel.org/r/20210924161251.093CCD06@davehans-spike.ostc.intel.com
    Link: https://lkml.kernel.org/r/20210924161253.D7673E31@davehans-spike.ostc.intel.com
    Fixes: 884a6e5d ("mm/migrate: update node demotion order on hotplug events")
    Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
    Reported-by: default avatarkernel test robot <oliver.sang@intel.com>
    Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
    Cc: "Huang, Ying" <ying.huang@intel.com>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Wei Xu <weixugc@google.com>
    Cc: Oscar Salvador <osalvador@suse.de>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Greg Thelen <gthelen@google.com>
    Cc: Yang Shi <yang.shi@linux.alibaba.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    295be91f
migrate.c 85.9 KB