• Bob Liu's avatar
    mm: thp: khugepaged: add policy for finding target node · fc2dd02e
    Bob Liu authored
    commit 9f1b868a upstream.
    
    Khugepaged will scan/free HPAGE_PMD_NR normal pages and replace with a
    hugepage which is allocated from the node of the first scanned normal
    page, but this policy is too rough and may end with unexpected result to
    upper users.
    
    The problem is the original page-balancing among all nodes will be
    broken after hugepaged started.  Thinking about the case if the first
    scanned normal page is allocated from node A, most of other scanned
    normal pages are allocated from node B or C..  But hugepaged will always
    allocate hugepage from node A which will cause extra memory pressure on
    node A which is not the situation before khugepaged started.
    
    This patch try to fix this problem by making khugepaged allocate
    hugepage from the node which have max record of scaned normal pages hit,
    so that the effect to original page-balancing can be minimized.
    
    The other problem is if normal scanned pages are equally allocated from
    Node A,B and C, after khugepaged started Node A will still suffer extra
    memory pressure.
    
    Andrew Davidoff reported a related issue several days ago.  He wanted
    his application interleaving among all nodes and "numactl
    --interleave=all ./test" was used to run the testcase, but the result
    wasn't not as expected.
    
      cat /proc/2814/numa_maps:
      7f50bd440000 interleave:0-3 anon=51403 dirty=51403 N0=435 N1=435 N2=435 N3=50098
    
    The end result showed that most pages are from Node3 instead of
    interleave among node0-3 which was unreasonable.
    
    This patch also fix this issue by allocating hugepage round robin from
    all nodes have the same record, after this patch the result was as
    expected:
    
      7f78399c0000 interleave:0-3 anon=51403 dirty=51403 N0=12723 N1=12723 N2=13235 N3=12722
    
    The simple testcase is like this:
    
    int main() {
    	char *p;
    	int i;
    	int j;
    
    	for (i=0; i < 200; i++) {
    		p = (char *)malloc(1048576);
    		printf("malloc done\n");
    
    		if (p == 0) {
    			printf("Out of memory\n");
    			return 1;
    		}
    		for (j=0; j < 1048576; j++) {
    			p[j] = 'A';
    		}
    		printf("touched memory\n");
    
    		sleep(1);
    	}
    	printf("enter sleep\n");
    	while(1) {
    		sleep(100);
    	}
    }
    
    [akpm@linux-foundation.org: make last_khugepaged_target_node local to khugepaged_find_target_node()]
    Reported-by: default avatarAndrew Davidoff <davidoff@qedmf.net>
    Tested-by: default avatarAndrew Davidoff <davidoff@qedmf.net>
    Signed-off-by: default avatarBob Liu <bob.liu@oracle.com>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Cc: Mel Gorman <mel@csn.ul.ie>
    Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
    Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
    Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
    fc2dd02e
huge_memory.c 76 KB