• Zach O'Keefe's avatar
    mm/shmem: add flag to enforce shmem THP in hugepage_vma_check() · 7c6c6cc4
    Zach O'Keefe authored
    Patch series "mm: add file/shmem support to MADV_COLLAPSE", v4.
    
    This series builds on top of the previous "mm: userspace hugepage
    collapse" series which introduced the MADV_COLLAPSE madvise mode and added
    support for private, anonymous mappings[2], by adding support for file and
    shmem backed memory to CONFIG_READ_ONLY_THP_FOR_FS=y kernels.
    
    File and shmem support have been added with effort to align with existing
    MADV_COLLAPSE semantics and policy decisions[3].  Collapse of shmem-backed
    memory ignores kernel-guiding directives and heuristics including all
    sysfs settings (transparent_hugepage/shmem_enabled), and tmpfs huge= mount
    options (shmem always supports large folios).  Like anonymous mappings, on
    successful return of MADV_COLLAPSE on file/shmem memory, the contents of
    memory mapped by the addresses provided will be synchronously pmd-mapped
    THPs.
    
    This functionality unlocks two important uses:
    
    (1)	Immediately back executable text by THPs.  Current support provided
    	by CONFIG_READ_ONLY_THP_FOR_FS may take a long time on a large
    	system which might impair services from serving at their full rated
    	load after (re)starting.  Tricks like mremap(2)'ing text onto
    	anonymous memory to immediately realize iTLB performance prevents
    	page sharing and demand paging, both of which increase steady state
    	memory footprint.  Now, we can have the best of both worlds: Peak
    	upfront performance and lower RAM footprints.
    
    (2)	userfaultfd-based live migration of virtual machines satisfy UFFD
    	faults by fetching native-sized pages over the network (to avoid
    	latency of transferring an entire hugepage).  However, after guest
    	memory has been fully copied to the new host, MADV_COLLAPSE can
    	be used to immediately increase guest performance.
    
    khugepaged has received a small improvement by association and can now
    detect and collapse pte-mapped THPs.  However, there is still work to be
    done along the file collapse path.  Compound pages of arbitrary order
    still needs to be supported and THP collapse needs to be converted to
    using folios in general.  Eventually, we'd like to move away from the
    read-only and executable-mapped constraints currently imposed on eligible
    files and support any inode claiming huge folio support.  That said, I
    think the series as-is covers enough to claim that MADV_COLLAPSE supports
    file/shmem memory.
    
    Patches 1-3	Implement the guts of the series.
    Patch 4 	Is a tracepoint for debugging.
    Patches 5-9 	Refactor existing khugepaged selftests to work with new
    		memory types + new collapse tests.
    Patch 10 	Adds a userfaultfd selftest mode to mimic a functional test
    		of UFFDIO_REGISTER_MODE_MINOR+MADV_COLLAPSE live migration.
    		(v4 note: "userfaultfd shmem" selftest is failing as of
    		Sep 22 mm-unstable)
    
    [1] https://lore.kernel.org/linux-mm/YyiK8YvVcrtZo0z3@google.com/
    [2] https://lore.kernel.org/linux-mm/20220706235936.2197195-1-zokeefe@google.com/
    [3] https://lore.kernel.org/linux-mm/YtBmhaiPHUTkJml8@google.com/
    [4] https://lore.kernel.org/linux-mm/20220922222731.1124481-1-zokeefe@google.com/
    [5] https://lore.kernel.org/linux-mm/20220922184651.1016461-1-zokeefe@google.com/
    
    
    This patch (of 10):
    
    Extend 'mm/thp: add flag to enforce sysfs THP in hugepage_vma_check()' to
    shmem, allowing callers to ignore
    /sys/kernel/transparent_hugepage/shmem_enabled and tmpfs huge= mount.
    
    This is intended to be used by MADV_COLLAPSE, and the rationale is
    analogous to the anon/file case: MADV_COLLAPSE is not coupled to
    directives that advise the kernel's decisions on when THPs should be
    considered eligible.  shmem/tmpfs always claims large folio support,
    regardless of sysfs or mount options.
    
    [shy828301@gmail.com: test shmem_huge_force explicitly]
      Link: https://lore.kernel.org/linux-mm/CAHbLzko3A5-TpS0BgBeKkx5cuOkWgLvWXQH=TdgW-baO4rPtdg@mail.gmail.com/
    Link: https://lkml.kernel.org/r/20220922224046.1143204-1-zokeefe@google.com
    Link: https://lkml.kernel.org/r/20220907144521.3115321-2-zokeefe@google.com
    Link: https://lkml.kernel.org/r/20220922224046.1143204-2-zokeefe@google.comSigned-off-by: default avatarZach O'Keefe <zokeefe@google.com>
    Reviewed-by: default avatarYang Shi <shy828301@gmail.com>
    Cc: Axel Rasmussen <axelrasmussen@google.com>
    Cc: Chris Kennelly <ckennelly@google.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: James Houghton <jthoughton@google.com>
    Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Minchan Kim <minchan@kernel.org>
    Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
    Cc: Peter Xu <peterx@redhat.com>
    Cc: Rongwei Wang <rongwei.wang@linux.alibaba.com>
    Cc: SeongJae Park <sj@kernel.org>
    Cc: Song Liu <songliubraving@fb.com>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    7c6c6cc4
shmem.c 111 KB