• Andrew Morton's avatar
    [PATCH] rmap pte_chain speedup and space saving · 9dc8af80
    Andrew Morton authored
    The pte_chains presently consist of a pte pointer and a `next' link.
    So there's a 50% memory wastage here as well as potential for a lot of
    misses during walks of the singly-linked per-page list.
    
    This patch increases the pte_chain structure to occupy a full
    cacheline.  There are 7, 15 or 31 pte pointers per structure rather
    than just one.  So the wastage falls to a few percent and the number of
    misses during the walk is reduced.
    
    The patch doesn't make much difference in simple testing, because in
    those tests the pte_chain list from the previous page has good cache
    locality with the next page's list.
    
    The patch sped up Anton's "10,000 concurrently exitting shells" test by
    3x or 4x.  It gives a 10% reduction in system time for a kernel build
    on 16p NUMAQ.
    
    It saves memory and reduces the amount of work performed in the slab
    allocator.
    
    Pages which are mapped by only a single process continue to not have a
    pte_chain.  The pointer in struct page points directly at the mapping
    pte (a "PageDirect" pte pointer).  Once the page is shared a pte_chain
    is allocated and both the new and old pte pointers are moved into it.
    
    We used to collapse the pte_chain back to a PageDirect representation
    in page_remove_rmap().  That has been changed.  That collapse is now
    performed inside page reclaim, via page_referenced().  The thinking
    here is that if a page was previously shared then it may become shared
    again, so leave the pte_chain structure in place.  But if the system is
    under memory pressure then start reaping them anyway.
    9dc8af80
rmap.c 12 KB