• Baolin Wang's avatar
    mm: hugetlb: make the hugetlb migration strategy consistent · 42d0c3fb
    Baolin Wang authored
    As discussed in previous thread [1], there is an inconsistency when
    handing hugetlb migration.  When handling the migration of freed hugetlb,
    it prevents fallback to other NUMA nodes in
    alloc_and_dissolve_hugetlb_folio().  However, when dealing with in-use
    hugetlb, it allows fallback to other NUMA nodes in
    alloc_hugetlb_folio_nodemask(), which can break the per-node hugetlb pool
    and might result in unexpected failures when node bound workloads doesn't
    get what is asssumed available.
    
    To make hugetlb migration strategy more clear, we should list all the scenarios
    of hugetlb migration and analyze whether allocation fallback is permitted:
    
    1) Memory offline: will call dissolve_free_huge_pages() to free the
       freed hugetlb, and call do_migrate_range() to migrate the in-use
       hugetlb.  Both can break the per-node hugetlb pool, but as this is an
       explicit offlining operation, no better choice.  So should allow the
       hugetlb allocation fallback.
    
    2) Memory failure: same as memory offline.  Should allow fallback to a
       different node might be the only option to handle it, otherwise the
       impact of poisoned memory can be amplified.
    
    3) Longterm pinning: will call migrate_longterm_unpinnable_pages() to
       migrate in-use and not-longterm-pinnable hugetlb, which can break the
       per-node pool.  But we should fail to longterm pinning if can not
       allocate on current node to avoid breaking the per-node pool.
    
    4) Syscalls (mbind, migrate_pages, move_pages): these are explicit
       users operation to move pages to other nodes, so fallback to other
       nodes should not be prohibited.
    
    5) alloc_contig_range: used by CMA allocation and virtio-mem
       fake-offline to allocate given range of pages.  Now the freed hugetlb
       migration is not allowed to fallback, to keep consistency, the in-use
       hugetlb migration should be also not allowed to fallback.
    
    6) alloc_contig_pages: used by kfence, pgtable_debug etc.  The strategy
       should be consistent with that of alloc_contig_range().
    
    Based on the analysis of the various scenarios above, introducing a new
    helper to determine whether fallback is permitted according to the
    migration reason..
    
    [1] https://lore.kernel.org/all/6f26ce22d2fcd523418a085f2c588fe0776d46e7.1706794035.git.baolin.wang@linux.alibaba.com/
    Link: https://lkml.kernel.org/r/3519fcd41522817307a05b40fb551e2e17e68101.1709719720.git.baolin.wang@linux.alibaba.comSigned-off-by: default avatarBaolin Wang <baolin.wang@linux.alibaba.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Muchun Song <muchun.song@linux.dev>
    Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
    Cc: Oscar Salvador <osalvador@suse.de>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    42d0c3fb
mempolicy.c 90.1 KB