Commit 9dc8af80 authored by Andrew Morton's avatar Andrew Morton Committed by Linus Torvalds

[PATCH] rmap pte_chain speedup and space saving

The pte_chains presently consist of a pte pointer and a `next' link.
So there's a 50% memory wastage here as well as potential for a lot of
misses during walks of the singly-linked per-page list.

This patch increases the pte_chain structure to occupy a full
cacheline.  There are 7, 15 or 31 pte pointers per structure rather
than just one.  So the wastage falls to a few percent and the number of
misses during the walk is reduced.

The patch doesn't make much difference in simple testing, because in
those tests the pte_chain list from the previous page has good cache
locality with the next page's list.

The patch sped up Anton's "10,000 concurrently exitting shells" test by
3x or 4x.  It gives a 10% reduction in system time for a kernel build
on 16p NUMAQ.

It saves memory and reduces the amount of work performed in the slab
allocator.

Pages which are mapped by only a single process continue to not have a
pte_chain.  The pointer in struct page points directly at the mapping
pte (a "PageDirect" pte pointer).  Once the page is shared a pte_chain
is allocated and both the new and old pte pointers are moved into it.

We used to collapse the pte_chain back to a PageDirect representation
in page_remove_rmap().  That has been changed.  That collapse is now
performed inside page reclaim, via page_referenced().  The thinking
here is that if a page was previously shared then it may become shared
again, so leave the pte_chain structure in place.  But if the system is
under memory pressure then start reaping them anyway.
parent e182d612
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment