• Peter Xu's avatar
    mm/Kconfig: CONFIG_PGTABLE_HAS_HUGE_LEAVES · ac3830c3
    Peter Xu authored
    Patch series "mm/gup: Unify hugetlb, part 2", v4.
    
    The series removes the hugetlb slow gup path after a previous refactor
    work [1], so that slow gup now uses the exact same path to process all
    kinds of memory including hugetlb.
    
    For the long term, we may want to remove most, if not all, call sites of
    huge_pte_offset().  It'll be ideal if that API can be completely dropped
    from arch hugetlb API.  This series is one small step towards merging
    hugetlb specific codes into generic mm paths.  From that POV, this series
    removes one reference to huge_pte_offset() out of many others.
    
    One goal of such a route is that we can reconsider merging hugetlb
    features like High Granularity Mapping (HGM).  It was not accepted in the
    past because it may add lots of hugetlb specific codes and make the mm
    code even harder to maintain.  With a merged codeset, features like HGM
    can hopefully share some code with THP, legacy (PMD+) or modern
    (continuous PTEs).
    
    To make it work, the generic slow gup code will need to at least
    understand hugepd, which is already done like so in fast-gup.  Due to the
    specialty of hugepd to be software-only solution (no hardware recognizes
    the hugepd format, so it's purely artificial structures), there's chance
    we can merge some or all hugepd formats with cont_pte in the future.  That
    question is yet unsettled from Power side to have an acknowledgement.  As
    of now for this series, I kept the hugepd handling because we may still
    need to do so before getting a clearer picture of the future of hugepd. 
    The other reason is simply that we did it already for fast-gup and most
    codes are still around to be reused.  It'll make more sense to keep
    slow/fast gup behave the same before a decision is made to remove hugepd.
    
    There's one major difference for slow-gup on cont_pte / cont_pmd handling,
    currently supported on three architectures (aarch64, riscv, ppc).  Before
    the series, slow gup will be able to recognize e.g.  cont_pte entries with
    the help of huge_pte_offset() when hstate is around.  Now it's gone but
    still working, by looking up pgtable entries one by one.
    
    It's not ideal, but hopefully this change should not affect yet on major
    workloads.  There's some more information in the commit message of the
    last patch.  If this would be a concern, we can consider teaching slow gup
    to recognize cont pte/pmd entries, and that should recover the lost
    performance.  But I doubt its necessity for now, so I kept it as simple as
    it can be.
    
    Patch layout
    =============
    
    Patch 1-8:    Preparation works, or cleanups in relevant code paths
    Patch 9-11:   Teach slow gup with all kinds of huge entries (pXd, hugepd)
    Patch 12:     Drop hugetlb_follow_page_mask()
    
    More information can be found in the commit messages of each patch.
    
    [1] https://lore.kernel.org/all/20230628215310.73782-1-peterx@redhat.com
    [2] https://lore.kernel.org/r/20240321215047.678172-1-peterx@redhat.com
    
    
    
    
    Introduce a config option that will be selected as long as huge leaves are
    involved in pgtable (thp or hugetlbfs).  It would be useful to mark any
    code with this new config that can process either hugetlb or thp pages in
    any level that is higher than pte level.
    
    Link: https://lkml.kernel.org/r/20240327152332.950956-1-peterx@redhat.com
    Link: https://lkml.kernel.org/r/20240327152332.950956-2-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
    Reviewed-by: default avatarJason Gunthorpe <jgg@nvidia.com>
    Tested-by: default avatarRyan Roberts <ryan.roberts@arm.com>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: Andrew Jones <andrew.jones@linux.dev>
    Cc: Aneesh Kumar K.V (IBM) <aneesh.kumar@kernel.org>
    Cc: Axel Rasmussen <axelrasmussen@google.com>
    Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
    Cc: Christoph Hellwig <hch@infradead.org>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: James Houghton <jthoughton@google.com>
    Cc: John Hubbard <jhubbard@nvidia.com>
    Cc: Kirill A. Shutemov <kirill@shutemov.name>
    Cc: Lorenzo Stoakes <lstoakes@gmail.com>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Michael Ellerman <mpe@ellerman.id.au>
    Cc: "Mike Rapoport (IBM)" <rppt@kernel.org>
    Cc: Muchun Song <muchun.song@linux.dev>
    Cc: Rik van Riel <riel@surriel.com>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: Yang Shi <shy828301@gmail.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    ac3830c3
Kconfig 38.8 KB