• Mike Kravetz's avatar
    hugetlb: freeze allocated pages before creating hugetlb pages · 2b21624f
    Mike Kravetz authored
    When creating hugetlb pages, the hugetlb code must first allocate
    contiguous pages from a low level allocator such as buddy, cma or
    memblock.  The pages returned from these low level allocators are ref
    counted.  This creates potential issues with other code taking speculative
    references on these pages before they can be transformed to a hugetlb
    page.  This issue has been addressed with methods and code such as that
    provided in [1].
    
    Recent discussions about vmemmap freeing [2] have indicated that it would
    be beneficial to freeze all sub pages, including the head page of pages
    returned from low level allocators before converting to a hugetlb page. 
    This helps avoid races if we want to replace the page containing vmemmap
    for the head page.
    
    There have been proposals to change at least the buddy allocator to return
    frozen pages as described at [3].  If such a change is made, it can be
    employed by the hugetlb code.  However, as mentioned above hugetlb uses
    several low level allocators so each would need to be modified to return
    frozen pages.  For now, we can manually freeze the returned pages.  This
    is done in two places:
    
    1) alloc_buddy_huge_page, only the returned head page is ref counted.
       We freeze the head page, retrying once in the VERY rare case where
       there may be an inflated ref count.
    2) prep_compound_gigantic_page, for gigantic pages the current code
       freezes all pages except the head page.  New code will simply freeze
       the head page as well.
    
    In a few other places, code checks for inflated ref counts on newly
    allocated hugetlb pages.  With the modifications to freeze after
    allocating, this code can be removed.
    
    After hugetlb pages are freshly allocated, they are often added to the
    hugetlb free lists.  Since these pages were previously ref counted, this
    was done via put_page() which would end up calling the hugetlb destructor:
    free_huge_page.  With changes to freeze pages, we simply call
    free_huge_page directly to add the pages to the free list.
    
    In a few other places, freshly allocated hugetlb pages were immediately
    put into use, and the expectation was they were already ref counted.  In
    these cases, we must manually ref count the page.
    
    [1] https://lore.kernel.org/linux-mm/20210622021423.154662-3-mike.kravetz@oracle.com/
    [2] https://lore.kernel.org/linux-mm/20220802180309.19340-1-joao.m.martins@oracle.com/
    [3] https://lore.kernel.org/linux-mm/20220809171854.3725722-1-willy@infradead.org/
    
    [mike.kravetz@oracle.com: fix NULL pointer dereference]
      Link: https://lkml.kernel.org/r/20220921202702.106069-1-mike.kravetz@oracle.com
    Link: https://lkml.kernel.org/r/20220916214638.155744-1-mike.kravetz@oracle.comSigned-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
    Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
    Reviewed-by: default avatarMuchun Song <songmuchun@bytedance.com>
    Reviewed-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
    Cc: Joao Martins <joao.m.martins@oracle.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Peter Xu <peterx@redhat.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    2b21624f
hugetlb.c 205 KB