• Huang Ying's avatar
    NUMA Balancing: add page promotion counter · e39bb6be
    Huang Ying authored
    Patch series "NUMA balancing: optimize memory placement for memory tiering system", v13
    
    With the advent of various new memory types, some machines will have
    multiple types of memory, e.g.  DRAM and PMEM (persistent memory).  The
    memory subsystem of these machines can be called memory tiering system,
    because the performance of the different types of memory are different.
    
    After commit c221c0b0 ("device-dax: "Hotplug" persistent memory for
    use like normal RAM"), the PMEM could be used as the cost-effective
    volatile memory in separate NUMA nodes.  In a typical memory tiering
    system, there are CPUs, DRAM and PMEM in each physical NUMA node.  The
    CPUs and the DRAM will be put in one logical node, while the PMEM will
    be put in another (faked) logical node.
    
    To optimize the system overall performance, the hot pages should be
    placed in DRAM node.  To do that, we need to identify the hot pages in
    the PMEM node and migrate them to DRAM node via NUMA migration.
    
    In the original NUMA balancing, there are already a set of existing
    mechanisms to identify the pages recently accessed by the CPUs in a node
    and migrate the pages to the node.  So we can reuse these mechanisms to
    build the mechanisms to optimize the page placement in the memory
    tiering system.  This is implemented in this patchset.
    
    At the other hand, the cold pages should be placed in PMEM node.  So, we
    also need to identify the cold pages in the DRAM node and migrate them
    to PMEM node.
    
    In commit 26aa2d19 ("mm/migrate: demote pages during reclaim"), a
    mechanism to demote the cold DRAM pages to PMEM node under memory
    pressure is implemented.  Based on that, the cold DRAM pages can be
    demoted to PMEM node proactively to free some memory space on DRAM node
    to accommodate the promoted hot PMEM pages.  This is implemented in this
    patchset too.
    
    We have tested the solution with the pmbench memory accessing benchmark
    with the 80:20 read/write ratio and the Gauss access address
    distribution on a 2 socket Intel server with Optane DC Persistent Memory
    Model.  The test results shows that the pmbench score can improve up to
    95.9%.
    
    This patch (of 3):
    
    In a system with multiple memory types, e.g.  DRAM and PMEM, the CPU
    and DRAM in one socket will be put in one NUMA node as before, while
    the PMEM will be put in another NUMA node as described in the
    description of the commit c221c0b0 ("device-dax: "Hotplug"
    persistent memory for use like normal RAM").  So, the NUMA balancing
    mechanism will identify all PMEM accesses as remote access and try to
    promote the PMEM pages to DRAM.
    
    To distinguish the number of the inter-type promoted pages from that of
    the inter-socket migrated pages.  A new vmstat count is added.  The
    counter is per-node (count in the target node).  So this can be used to
    identify promotion imbalance among the NUMA nodes.
    
    Link: https://lkml.kernel.org/r/20220301085329.3210428-1-ying.huang@intel.com
    Link: https://lkml.kernel.org/r/20220221084529.1052339-1-ying.huang@intel.com
    Link: https://lkml.kernel.org/r/20220221084529.1052339-2-ying.huang@intel.comSigned-off-by: default avatar"Huang, Ying" <ying.huang@intel.com>
    Reviewed-by: default avatarYang Shi <shy828301@gmail.com>
    Tested-by: default avatarBaolin Wang <baolin.wang@linux.alibaba.com>
    Reviewed-by: default avatarBaolin Wang <baolin.wang@linux.alibaba.com>
    Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
    Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Rik van Riel <riel@surriel.com>
    Cc: Mel Gorman <mgorman@techsingularity.net>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: Zi Yan <ziy@nvidia.com>
    Cc: Wei Xu <weixugc@google.com>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: zhongjiang-ali <zhongjiang-ali@linux.alibaba.com>
    Cc: Feng Tang <feng.tang@intel.com>
    Cc: Randy Dunlap <rdunlap@infradead.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    e39bb6be
migrate.c 88.1 KB