Commit cd62734c authored by Alistair Popple's avatar Alistair Popple Committed by Linus Torvalds

mm/rmap: split try_to_munlock from try_to_unmap

The behaviour of try_to_unmap_one() is difficult to follow because it
performs different operations based on a fairly large set of flags used in
different combinations.

TTU_MUNLOCK is one such flag.  However it is exclusively used by
try_to_munlock() which specifies no other flags.  Therefore rather than
overload try_to_unmap_one() with unrelated behaviour split this out into
it's own function and remove the flag.

Link: https://lkml.kernel.org/r/20210616105937.23201-4-apopple@nvidia.comSigned-off-by: default avatarAlistair Popple <apopple@nvidia.com>
Reviewed-by: default avatarRalph Campbell <rcampbell@nvidia.com>
Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
parent 4dd845b5
...@@ -389,14 +389,14 @@ mlocked, munlock_vma_page() updates that zone statistics for the number of ...@@ -389,14 +389,14 @@ mlocked, munlock_vma_page() updates that zone statistics for the number of
mlocked pages. Note, however, that at this point we haven't checked whether mlocked pages. Note, however, that at this point we haven't checked whether
the page is mapped by other VM_LOCKED VMAs. the page is mapped by other VM_LOCKED VMAs.
We can't call try_to_munlock(), the function that walks the reverse map to We can't call page_mlock(), the function that walks the reverse map to
check for other VM_LOCKED VMAs, without first isolating the page from the LRU. check for other VM_LOCKED VMAs, without first isolating the page from the LRU.
try_to_munlock() is a variant of try_to_unmap() and thus requires that the page page_mlock() is a variant of try_to_unmap() and thus requires that the page
not be on an LRU list [more on these below]. However, the call to not be on an LRU list [more on these below]. However, the call to
isolate_lru_page() could fail, in which case we couldn't try_to_munlock(). So, isolate_lru_page() could fail, in which case we can't call page_mlock(). So,
we go ahead and clear PG_mlocked up front, as this might be the only chance we we go ahead and clear PG_mlocked up front, as this might be the only chance we
have. If we can successfully isolate the page, we go ahead and have. If we can successfully isolate the page, we go ahead and call
try_to_munlock(), which will restore the PG_mlocked flag and update the zone page_mlock(), which will restore the PG_mlocked flag and update the zone
page statistics if it finds another VMA holding the page mlocked. If we fail page statistics if it finds another VMA holding the page mlocked. If we fail
to isolate the page, we'll have left a potentially mlocked page on the LRU. to isolate the page, we'll have left a potentially mlocked page on the LRU.
This is fine, because we'll catch it later if and if vmscan tries to reclaim This is fine, because we'll catch it later if and if vmscan tries to reclaim
...@@ -545,31 +545,24 @@ munlock or munmap system calls, mm teardown (munlock_vma_pages_all), reclaim, ...@@ -545,31 +545,24 @@ munlock or munmap system calls, mm teardown (munlock_vma_pages_all), reclaim,
holepunching, and truncation of file pages and their anonymous COWed pages. holepunching, and truncation of file pages and their anonymous COWed pages.
try_to_munlock() Reverse Map Scan page_mlock() Reverse Map Scan
--------------------------------- ---------------------------------
.. warning::
[!] TODO/FIXME: a better name might be page_mlocked() - analogous to the
page_referenced() reverse map walker.
When munlock_vma_page() [see section :ref:`munlock()/munlockall() System Call When munlock_vma_page() [see section :ref:`munlock()/munlockall() System Call
Handling <munlock_munlockall_handling>` above] tries to munlock a Handling <munlock_munlockall_handling>` above] tries to munlock a
page, it needs to determine whether or not the page is mapped by any page, it needs to determine whether or not the page is mapped by any
VM_LOCKED VMA without actually attempting to unmap all PTEs from the VM_LOCKED VMA without actually attempting to unmap all PTEs from the
page. For this purpose, the unevictable/mlock infrastructure page. For this purpose, the unevictable/mlock infrastructure
introduced a variant of try_to_unmap() called try_to_munlock(). introduced a variant of try_to_unmap() called page_mlock().
try_to_munlock() calls the same functions as try_to_unmap() for anonymous and page_mlock() walks the respective reverse maps looking for VM_LOCKED VMAs. When
mapped file and KSM pages with a flag argument specifying unlock versus unmap such a VMA is found the page is mlocked via mlock_vma_page(). This undoes the
processing. Again, these functions walk the respective reverse maps looking pre-clearing of the page's PG_mlocked done by munlock_vma_page.
for VM_LOCKED VMAs. When such a VMA is found, as in the try_to_unmap() case,
the functions mlock the page via mlock_vma_page() and return SWAP_MLOCK. This
undoes the pre-clearing of the page's PG_mlocked done by munlock_vma_page.
Note that try_to_munlock()'s reverse map walk must visit every VMA in a page's Note that page_mlock()'s reverse map walk must visit every VMA in a page's
reverse map to determine that a page is NOT mapped into any VM_LOCKED VMA. reverse map to determine that a page is NOT mapped into any VM_LOCKED VMA.
However, the scan can terminate when it encounters a VM_LOCKED VMA. However, the scan can terminate when it encounters a VM_LOCKED VMA.
Although try_to_munlock() might be called a great many times when munlocking a Although page_mlock() might be called a great many times when munlocking a
large region or tearing down a large address space that has been mlocked via large region or tearing down a large address space that has been mlocked via
mlockall(), overall this is a fairly rare event. mlockall(), overall this is a fairly rare event.
...@@ -602,7 +595,7 @@ inactive lists to the appropriate node's unevictable list. ...@@ -602,7 +595,7 @@ inactive lists to the appropriate node's unevictable list.
shrink_inactive_list() should only see SHM_LOCK'd pages that became SHM_LOCK'd shrink_inactive_list() should only see SHM_LOCK'd pages that became SHM_LOCK'd
after shrink_active_list() had moved them to the inactive list, or pages mapped after shrink_active_list() had moved them to the inactive list, or pages mapped
into VM_LOCKED VMAs that munlock_vma_page() couldn't isolate from the LRU to into VM_LOCKED VMAs that munlock_vma_page() couldn't isolate from the LRU to
recheck via try_to_munlock(). shrink_inactive_list() won't notice the latter, recheck via page_mlock(). shrink_inactive_list() won't notice the latter,
but will pass on to shrink_page_list(). but will pass on to shrink_page_list().
shrink_page_list() again culls obviously unevictable pages that it could shrink_page_list() again culls obviously unevictable pages that it could
......
...@@ -87,7 +87,6 @@ struct anon_vma_chain { ...@@ -87,7 +87,6 @@ struct anon_vma_chain {
enum ttu_flags { enum ttu_flags {
TTU_MIGRATION = 0x1, /* migration mode */ TTU_MIGRATION = 0x1, /* migration mode */
TTU_MUNLOCK = 0x2, /* munlock mode */
TTU_SPLIT_HUGE_PMD = 0x4, /* split huge PMD if any */ TTU_SPLIT_HUGE_PMD = 0x4, /* split huge PMD if any */
TTU_IGNORE_MLOCK = 0x8, /* ignore mlock */ TTU_IGNORE_MLOCK = 0x8, /* ignore mlock */
...@@ -240,7 +239,7 @@ int page_mkclean(struct page *); ...@@ -240,7 +239,7 @@ int page_mkclean(struct page *);
* called in munlock()/munmap() path to check for other vmas holding * called in munlock()/munmap() path to check for other vmas holding
* the page mlocked. * the page mlocked.
*/ */
void try_to_munlock(struct page *); void page_mlock(struct page *page);
void remove_migration_ptes(struct page *old, struct page *new, bool locked); void remove_migration_ptes(struct page *old, struct page *new, bool locked);
......
...@@ -108,7 +108,7 @@ void mlock_vma_page(struct page *page) ...@@ -108,7 +108,7 @@ void mlock_vma_page(struct page *page)
/* /*
* Finish munlock after successful page isolation * Finish munlock after successful page isolation
* *
* Page must be locked. This is a wrapper for try_to_munlock() * Page must be locked. This is a wrapper for page_mlock()
* and putback_lru_page() with munlock accounting. * and putback_lru_page() with munlock accounting.
*/ */
static void __munlock_isolated_page(struct page *page) static void __munlock_isolated_page(struct page *page)
...@@ -118,7 +118,7 @@ static void __munlock_isolated_page(struct page *page) ...@@ -118,7 +118,7 @@ static void __munlock_isolated_page(struct page *page)
* and we don't need to check all the other vmas. * and we don't need to check all the other vmas.
*/ */
if (page_mapcount(page) > 1) if (page_mapcount(page) > 1)
try_to_munlock(page); page_mlock(page);
/* Did try_to_unlock() succeed or punt? */ /* Did try_to_unlock() succeed or punt? */
if (!PageMlocked(page)) if (!PageMlocked(page))
...@@ -158,7 +158,7 @@ static void __munlock_isolation_failed(struct page *page) ...@@ -158,7 +158,7 @@ static void __munlock_isolation_failed(struct page *page)
* munlock()ed or munmap()ed, we want to check whether other vmas hold the * munlock()ed or munmap()ed, we want to check whether other vmas hold the
* page locked so that we can leave it on the unevictable lru list and not * page locked so that we can leave it on the unevictable lru list and not
* bother vmscan with it. However, to walk the page's rmap list in * bother vmscan with it. However, to walk the page's rmap list in
* try_to_munlock() we must isolate the page from the LRU. If some other * page_mlock() we must isolate the page from the LRU. If some other
* task has removed the page from the LRU, we won't be able to do that. * task has removed the page from the LRU, we won't be able to do that.
* So we clear the PageMlocked as we might not get another chance. If we * So we clear the PageMlocked as we might not get another chance. If we
* can't isolate the page, we leave it for putback_lru_page() and vmscan * can't isolate the page, we leave it for putback_lru_page() and vmscan
...@@ -168,7 +168,7 @@ unsigned int munlock_vma_page(struct page *page) ...@@ -168,7 +168,7 @@ unsigned int munlock_vma_page(struct page *page)
{ {
int nr_pages; int nr_pages;
/* For try_to_munlock() and to serialize with page migration */ /* For page_mlock() and to serialize with page migration */
BUG_ON(!PageLocked(page)); BUG_ON(!PageLocked(page));
VM_BUG_ON_PAGE(PageTail(page), page); VM_BUG_ON_PAGE(PageTail(page), page);
...@@ -205,7 +205,7 @@ static int __mlock_posix_error_return(long retval) ...@@ -205,7 +205,7 @@ static int __mlock_posix_error_return(long retval)
* *
* The fast path is available only for evictable pages with single mapping. * The fast path is available only for evictable pages with single mapping.
* Then we can bypass the per-cpu pvec and get better performance. * Then we can bypass the per-cpu pvec and get better performance.
* when mapcount > 1 we need try_to_munlock() which can fail. * when mapcount > 1 we need page_mlock() which can fail.
* when !page_evictable(), we need the full redo logic of putback_lru_page to * when !page_evictable(), we need the full redo logic of putback_lru_page to
* avoid leaving evictable page in unevictable list. * avoid leaving evictable page in unevictable list.
* *
...@@ -414,7 +414,7 @@ static unsigned long __munlock_pagevec_fill(struct pagevec *pvec, ...@@ -414,7 +414,7 @@ static unsigned long __munlock_pagevec_fill(struct pagevec *pvec,
* *
* We don't save and restore VM_LOCKED here because pages are * We don't save and restore VM_LOCKED here because pages are
* still on lru. In unmap path, pages might be scanned by reclaim * still on lru. In unmap path, pages might be scanned by reclaim
* and re-mlocked by try_to_{munlock|unmap} before we unmap and * and re-mlocked by page_mlock/try_to_unmap before we unmap and
* free them. This will result in freeing mlocked pages. * free them. This will result in freeing mlocked pages.
*/ */
void munlock_vma_pages_range(struct vm_area_struct *vma, void munlock_vma_pages_range(struct vm_area_struct *vma,
......
...@@ -1411,10 +1411,6 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, ...@@ -1411,10 +1411,6 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
if (flags & TTU_SYNC) if (flags & TTU_SYNC)
pvmw.flags = PVMW_SYNC; pvmw.flags = PVMW_SYNC;
/* munlock has nothing to gain from examining un-locked vmas */
if ((flags & TTU_MUNLOCK) && !(vma->vm_flags & VM_LOCKED))
return true;
if (IS_ENABLED(CONFIG_MIGRATION) && (flags & TTU_MIGRATION) && if (IS_ENABLED(CONFIG_MIGRATION) && (flags & TTU_MIGRATION) &&
is_zone_device_page(page) && !is_device_private_page(page)) is_zone_device_page(page) && !is_device_private_page(page))
return true; return true;
...@@ -1476,8 +1472,6 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, ...@@ -1476,8 +1472,6 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
page_vma_mapped_walk_done(&pvmw); page_vma_mapped_walk_done(&pvmw);
break; break;
} }
if (flags & TTU_MUNLOCK)
continue;
} }
/* Unexpected PMD-mapped THP? */ /* Unexpected PMD-mapped THP? */
...@@ -1790,20 +1784,58 @@ void try_to_unmap(struct page *page, enum ttu_flags flags) ...@@ -1790,20 +1784,58 @@ void try_to_unmap(struct page *page, enum ttu_flags flags)
rmap_walk(page, &rwc); rmap_walk(page, &rwc);
} }
/*
* Walks the vma's mapping a page and mlocks the page if any locked vma's are
* found. Once one is found the page is locked and the scan can be terminated.
*/
static bool page_mlock_one(struct page *page, struct vm_area_struct *vma,
unsigned long address, void *unused)
{
struct page_vma_mapped_walk pvmw = {
.page = page,
.vma = vma,
.address = address,
};
/* An un-locked vma doesn't have any pages to lock, continue the scan */
if (!(vma->vm_flags & VM_LOCKED))
return true;
while (page_vma_mapped_walk(&pvmw)) {
/*
* Need to recheck under the ptl to serialise with
* __munlock_pagevec_fill() after VM_LOCKED is cleared in
* munlock_vma_pages_range().
*/
if (vma->vm_flags & VM_LOCKED) {
/* PTE-mapped THP are never mlocked */
if (!PageTransCompound(page))
mlock_vma_page(page);
page_vma_mapped_walk_done(&pvmw);
}
/*
* no need to continue scanning other vma's if the page has
* been locked.
*/
return false;
}
return true;
}
/** /**
* try_to_munlock - try to munlock a page * page_mlock - try to mlock a page
* @page: the page to be munlocked * @page: the page to be mlocked
* *
* Called from munlock code. Checks all of the VMAs mapping the page * Called from munlock code. Checks all of the VMAs mapping the page and mlocks
* to make sure nobody else has this page mlocked. The page will be * the page if any are found. The page will be returned with PG_mlocked cleared
* returned with PG_mlocked cleared if no other vmas have it mlocked. * if it is not mapped by any locked vmas.
*/ */
void page_mlock(struct page *page)
void try_to_munlock(struct page *page)
{ {
struct rmap_walk_control rwc = { struct rmap_walk_control rwc = {
.rmap_one = try_to_unmap_one, .rmap_one = page_mlock_one,
.arg = (void *)TTU_MUNLOCK,
.done = page_not_mapped, .done = page_not_mapped,
.anon_lock = page_lock_anon_vma_read, .anon_lock = page_lock_anon_vma_read,
...@@ -1855,7 +1887,7 @@ static struct anon_vma *rmap_walk_anon_lock(struct page *page, ...@@ -1855,7 +1887,7 @@ static struct anon_vma *rmap_walk_anon_lock(struct page *page,
* Find all the mappings of a page using the mapping pointer and the vma chains * Find all the mappings of a page using the mapping pointer and the vma chains
* contained in the anon_vma struct it points to. * contained in the anon_vma struct it points to.
* *
* When called from try_to_munlock(), the mmap_lock of the mm containing the vma * When called from page_mlock(), the mmap_lock of the mm containing the vma
* where the page was found will be held for write. So, we won't recheck * where the page was found will be held for write. So, we won't recheck
* vm_flags for that VMA. That should be OK, because that vma shouldn't be * vm_flags for that VMA. That should be OK, because that vma shouldn't be
* LOCKED. * LOCKED.
...@@ -1908,7 +1940,7 @@ static void rmap_walk_anon(struct page *page, struct rmap_walk_control *rwc, ...@@ -1908,7 +1940,7 @@ static void rmap_walk_anon(struct page *page, struct rmap_walk_control *rwc,
* Find all the mappings of a page using the mapping pointer and the vma chains * Find all the mappings of a page using the mapping pointer and the vma chains
* contained in the address_space struct it points to. * contained in the address_space struct it points to.
* *
* When called from try_to_munlock(), the mmap_lock of the mm containing the vma * When called from page_mlock(), the mmap_lock of the mm containing the vma
* where the page was found will be held for write. So, we won't recheck * where the page was found will be held for write. So, we won't recheck
* vm_flags for that VMA. That should be OK, because that vma shouldn't be * vm_flags for that VMA. That should be OK, because that vma shouldn't be
* LOCKED. * LOCKED.
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment