• Aneesh Kumar K.V's avatar
    mm/mempolicy: add set_mempolicy_home_node syscall · c6018b4b
    Aneesh Kumar K.V authored
    This syscall can be used to set a home node for the MPOL_BIND and
    MPOL_PREFERRED_MANY memory policy.  Users should use this syscall after
    setting up a memory policy for the specified range as shown below.
    
      mbind(p, nr_pages * page_size, MPOL_BIND, new_nodes->maskp,
            new_nodes->size + 1, 0);
      sys_set_mempolicy_home_node((unsigned long)p, nr_pages * page_size,
    				home_node, 0);
    
    The syscall allows specifying a home node/preferred node from which
    kernel will fulfill memory allocation requests first.
    
    For address range with MPOL_BIND memory policy, if nodemask specifies
    more than one node, page allocations will come from the node in the
    nodemask with sufficient free memory that is closest to the home
    node/preferred node.
    
    For MPOL_PREFERRED_MANY if the nodemask specifies more than one node,
    page allocation will come from the node in the nodemask with sufficient
    free memory that is closest to the home node/preferred node.  If there
    is not enough memory in all the nodes specified in the nodemask, the
    allocation will be attempted from the closest numa node to the home node
    in the system.
    
    This helps applications to hint at a memory allocation preference node
    and fallback to _only_ a set of nodes if the memory is not available on
    the preferred node.  Fallback allocation is attempted from the node
    which is nearest to the preferred node.
    
    This helps applications to have control on memory allocation numa nodes
    and avoids default fallback to slow memory NUMA nodes.  For example a
    system with NUMA nodes 1,2 and 3 with DRAM memory and 10, 11 and 12 of
    slow memory
    
     new_nodes = numa_bitmask_alloc(nr_nodes);
    
     numa_bitmask_setbit(new_nodes, 1);
     numa_bitmask_setbit(new_nodes, 2);
     numa_bitmask_setbit(new_nodes, 3);
    
     p = mmap(NULL, nr_pages * page_size, protflag, mapflag, -1, 0);
     mbind(p, nr_pages * page_size, MPOL_BIND, new_nodes->maskp,  new_nodes->size + 1, 0);
    
     sys_set_mempolicy_home_node(p, nr_pages * page_size, 2, 0);
    
    This will allocate from nodes closer to node 2 and will make sure the
    kernel will only allocate from nodes 1, 2, and 3.  Memory will not be
    allocated from slow memory nodes 10, 11, and 12.  This differs from
    default MPOL_BIND behavior in that with default MPOL_BIND the allocation
    will be attempted from node closer to the local node.  One of the
    reasons to specify a home node is to allow allocations from cpu less
    NUMA node and its nearby NUMA nodes.
    
    With MPOL_PREFERRED_MANY on the other hand will first try to allocate
    from the closest node to node 2 from the node list 1, 2 and 3.  If those
    nodes don't have enough memory, kernel will allocate from slow memory
    node 10, 11 and 12 which ever is closer to node 2.
    
    Link: https://lkml.kernel.org/r/20211202123810.267175-3-aneesh.kumar@linux.ibm.comSigned-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
    Cc: Ben Widawsky <ben.widawsky@intel.com>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: Feng Tang <feng.tang@intel.com>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: Mel Gorman <mgorman@techsingularity.net>
    Cc: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Randy Dunlap <rdunlap@infradead.org>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: Andi Kleen <ak@linux.intel.com>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Huang Ying <ying.huang@intel.com>
    Cc: <linux-api@vger.kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    c6018b4b
numa_memory_policy.rst 23.6 KB