Commit 2686f71f authored by Yang Shi's avatar Yang Shi Committed by Greg Kroah-Hartman

mm: thp: handle page cache THP correctly in PageTransCompoundMap

commit 169226f7 upstream.

We have a usecase to use tmpfs as QEMU memory backend and we would like
to take the advantage of THP as well.  But, our test shows the EPT is
not PMD mapped even though the underlying THP are PMD mapped on host.
The number showed by /sys/kernel/debug/kvm/largepage is much less than
the number of PMD mapped shmem pages as the below:

  7f2778200000-7f2878200000 rw-s 00000000 00:14 262232 /dev/shm/qemu_back_mem.mem.Hz2hSf (deleted)
  Size:            4194304 kB
  [snip]
  AnonHugePages:         0 kB
  ShmemPmdMapped:   579584 kB
  [snip]
  Locked:                0 kB

  cat /sys/kernel/debug/kvm/largepages
  12

And some benchmarks do worse than with anonymous THPs.

By digging into the code we figured out that commit 127393fb ("mm:
thp: kvm: fix memory corruption in KVM with THP enabled") checks if
there is a single PTE mapping on the page for anonymous THP when setting
up EPT map.  But the _mapcount < 0 check doesn't work for page cache THP
since every subpage of page cache THP would get _mapcount inc'ed once it
is PMD mapped, so PageTransCompoundMap() always returns false for page
cache THP.  This would prevent KVM from setting up PMD mapped EPT entry.

So we need handle page cache THP correctly.  However, when page cache
THP's PMD gets split, kernel just remove the map instead of setting up
PTE map like what anonymous THP does.  Before KVM calls get_user_pages()
the subpages may get PTE mapped even though it is still a THP since the
page cache THP may be mapped by other processes at the mean time.

Checking its _mapcount and whether the THP has PTE mapped or not.
Although this may report some false negative cases (PTE mapped by other
processes), it looks not trivial to make this accurate.

With this fix /sys/kernel/debug/kvm/largepage would show reasonable
pages are PMD mapped by EPT as the below:

  7fbeaee00000-7fbfaee00000 rw-s 00000000 00:14 275464 /dev/shm/qemu_back_mem.mem.SKUvat (deleted)
  Size:            4194304 kB
  [snip]
  AnonHugePages:         0 kB
  ShmemPmdMapped:   557056 kB
  [snip]
  Locked:                0 kB

  cat /sys/kernel/debug/kvm/largepages
  271

And the benchmarks are as same as anonymous THPs.

[yang.shi@linux.alibaba.com: v4]
  Link: http://lkml.kernel.org/r/1571865575-42913-1-git-send-email-yang.shi@linux.alibaba.com
Link: http://lkml.kernel.org/r/1571769577-89735-1-git-send-email-yang.shi@linux.alibaba.com
Fixes: dd78fedd ("rmap: support file thp")
Signed-off-by: default avatarYang Shi <yang.shi@linux.alibaba.com>
Reported-by: default avatarGang Deng <gavin.dg@linux.alibaba.com>
Tested-by: default avatarGang Deng <gavin.dg@linux.alibaba.com>
Suggested-by: default avatarHugh Dickins <hughd@google.com>
Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>	[4.8+]
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
parent 7dfa51be
...@@ -602,11 +602,6 @@ static inline void *kvcalloc(size_t n, size_t size, gfp_t flags) ...@@ -602,11 +602,6 @@ static inline void *kvcalloc(size_t n, size_t size, gfp_t flags)
extern void kvfree(const void *addr); extern void kvfree(const void *addr);
static inline atomic_t *compound_mapcount_ptr(struct page *page)
{
return &page[1].compound_mapcount;
}
static inline int compound_mapcount(struct page *page) static inline int compound_mapcount(struct page *page)
{ {
VM_BUG_ON_PAGE(!PageCompound(page), page); VM_BUG_ON_PAGE(!PageCompound(page), page);
......
...@@ -226,6 +226,11 @@ struct page_frag_cache { ...@@ -226,6 +226,11 @@ struct page_frag_cache {
typedef unsigned long vm_flags_t; typedef unsigned long vm_flags_t;
static inline atomic_t *compound_mapcount_ptr(struct page *page)
{
return &page[1].compound_mapcount;
}
/* /*
* A region containing a mapping of a non-memory backed file under NOMMU * A region containing a mapping of a non-memory backed file under NOMMU
* conditions. These are held in a global tree and are pinned by the VMAs that * conditions. These are held in a global tree and are pinned by the VMAs that
......
...@@ -577,12 +577,28 @@ static inline int PageTransCompound(struct page *page) ...@@ -577,12 +577,28 @@ static inline int PageTransCompound(struct page *page)
* *
* Unlike PageTransCompound, this is safe to be called only while * Unlike PageTransCompound, this is safe to be called only while
* split_huge_pmd() cannot run from under us, like if protected by the * split_huge_pmd() cannot run from under us, like if protected by the
* MMU notifier, otherwise it may result in page->_mapcount < 0 false * MMU notifier, otherwise it may result in page->_mapcount check false
* positives. * positives.
*
* We have to treat page cache THP differently since every subpage of it
* would get _mapcount inc'ed once it is PMD mapped. But, it may be PTE
* mapped in the current process so comparing subpage's _mapcount to
* compound_mapcount to filter out PTE mapped case.
*/ */
static inline int PageTransCompoundMap(struct page *page) static inline int PageTransCompoundMap(struct page *page)
{ {
return PageTransCompound(page) && atomic_read(&page->_mapcount) < 0; struct page *head;
if (!PageTransCompound(page))
return 0;
if (PageAnon(page))
return atomic_read(&page->_mapcount) < 0;
head = compound_head(page);
/* File THP is PMD mapped and not PTE mapped */
return atomic_read(&page->_mapcount) ==
atomic_read(compound_mapcount_ptr(head));
} }
/* /*
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment