• Barry Song's avatar
    mm/cma.c: use exact_nid true to fix possible per-numa cma leak · 40366bd7
    Barry Song authored
    Calling cma_declare_contiguous_nid() with false exact_nid for per-numa
    reservation can easily cause cma leak and various confusion.  For example,
    mm/hugetlb.c is trying to reserve per-numa cma for gigantic pages.  But it
    can easily leak cma and make users confused when system has memoryless
    nodes.
    
    In case the system has 4 numa nodes, and only numa node0 has memory.  if
    we set hugetlb_cma=4G in bootargs, mm/hugetlb.c will get 4 cma areas for 4
    different numa nodes.  since exact_nid=false in current code, all 4 numa
    nodes will get cma successfully from node0, but hugetlb_cma[1 to 3] will
    never be available to hugepage will only allocate memory from
    hugetlb_cma[0].
    
    In case the system has 4 numa nodes, both numa node0&2 has memory, other
    nodes have no memory.  if we set hugetlb_cma=4G in bootargs, mm/hugetlb.c
    will get 4 cma areas for 4 different numa nodes.  since exact_nid=false in
    current code, all 4 numa nodes will get cma successfully from node0 or 2,
    but hugetlb_cma[1] and [3] will never be available to hugepage as
    mm/hugetlb.c will only allocate memory from hugetlb_cma[0] and
    hugetlb_cma[2].  This causes permanent leak of the cma areas which are
    supposed to be used by memoryless node.
    
    Of cource we can workaround the issue by letting mm/hugetlb.c scan all cma
    areas in alloc_gigantic_page() even node_mask includes node0 only.  that
    means when node_mask includes node0 only, we can get page from
    hugetlb_cma[1] to hugetlb_cma[3].  But this will cause kernel crash in
    free_gigantic_page() while it wants to free page by:
    cma_release(hugetlb_cma[page_to_nid(page)], page, 1 << order)
    
    On the other hand, exact_nid=false won't consider numa distance, it might
    be not that useful to leverage cma areas on remote nodes.  I feel it is
    much simpler to make exact_nid true to make everything clear.  After that,
    memoryless nodes won't be able to reserve per-numa CMA from other nodes
    which have memory.
    
    Fixes: cf11e85f ("mm: hugetlb: optionally allocate gigantic hugepages using cma")
    Signed-off-by: default avatarBarry Song <song.bao.hua@hisilicon.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Acked-by: default avatarRoman Gushchin <guro@fb.com>
    Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
    Cc: Aslan Bakirov <aslan@fb.com>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Andreas Schaufler <andreas.schaufler@gmx.de>
    Cc: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Rik van Riel <riel@surriel.com>
    Cc: Joonsoo Kim <js1304@gmail.com>
    Cc: Robin Murphy <robin.murphy@arm.com>
    Cc: <stable@vger.kernel.org>
    Link: http://lkml.kernel.org/r/20200628074345.27228-1-song.bao.hua@hisilicon.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    40366bd7
cma.c 14.4 KB