• Huang Ying's avatar
    mm: fix draining remote pageset · fa8c4f9a
    Huang Ying authored
    If there is no memory allocation/freeing in the PCP (Per-CPU Pageset) of a
    remote zone (zone in remote NUMA node) after some time (3 seconds for
    now), the pages of the PCP of the remote zone will be drained to avoid
    memory wastage.
    
    This behavior was introduced in the commit 4ae7c039 ("[PATCH]
    Periodically drain non local pagesets") and the commit 4037d452 ("Move
    remote node draining out of slab allocators")
    
    But, after the commit 7cc36bbd ("vmstat: on-demand vmstat workers
    V8"), the vmstat updater worker which is used to drain the PCP of remote
    zones may not be re-queued when we are waiting for the timeout
    (pcp->expire != 0) if there are no vmstat changes on this CPU, for
    example, when the CPU goes idle or runs user space only workloads.  This
    may cause the pages of a remote zone be kept in PCP of this CPU for long
    time.  So that, the page reclaiming of the remote zone may be triggered
    prematurely.  This isn't a severe problem in practice, because the PCP of
    the remote zone will be drained if some memory are allocated/freed again
    on this CPU.  And, the PCP will eventually be drained during the direct
    reclaiming if necessary.
    
    Anyway, the problem still deserves a fix via guaranteeing that the vmstat
    updater worker will always be re-queued when we are waiting for the
    timeout.  In effect, this restores the original behavior before the commit
    7cc36bbd.
    
    We can reproduce the bug via allocating/freeing pages from a remote zone
    then go idle as follows.  And the patch can fix it.
    
    - Run some workloads, use `numactl` to bind CPU to node 0 and memory to
      node 1.  So the PCP of the CPU on node 0 for zone on node 1 will be
      filled.
    
    - After workloads finish, idle for 60s
    
    - Check /proc/zoneinfo
    
    With the original kernel, the number of pages in the PCP of the CPU on
    node 0 for zone on node 1 is non-zero after idle.  With the patched
    kernel, it becomes 0 after idle.  That is, we avoid to keep pages in the
    remote PCP during idle.
    
    Link: https://lkml.kernel.org/r/20231007062356.187621-1-ying.huang@intel.com
    Link: https://lkml.kernel.org/r/20230811090819.60845-1-ying.huang@intel.com
    Fixes: 7cc36bbd ("vmstat: on-demand vmstat workers V8")
    Signed-off-by: default avatar"Huang, Ying" <ying.huang@intel.com>
    Reviewed-by: default avatarChristoph Lameter <cl@linux.com>
    Cc: Mel Gorman <mgorman@techsingularity.net>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: Michal Hocko <mhocko@kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    fa8c4f9a
vmstat.c 55.5 KB