• Andrew Morton's avatar
    [PATCH] minimal rmap · c48c43e6
    Andrew Morton authored
    This is the "minimal rmap" patch, writen by Rik, ported to 2.5 by Craig
    Kulsea.
    
    Basically,
    
    before: When the page reclaim code decides that is has scanned too many
    unreclaimable pages on the LRU it does a scan of process virtual
    address spaces for pages to add to swapcache.  ptes pointing at the
    page are unmapped as the scan proceeds.  When all ptes referring to a
    page have been unmapped and it has been written to swap the page is
    reclaimable.
    
    after: When an anonymous page is encountered on the tail of the LRU we
    use the rmap to see if it hasn't been referenced lately.  If so then
    add it to swapcache.  When the page is again encountered on the LRU, if
    it is still unreferenced then try to unmap all ptes which refer to it
    in one hit, and if it is clean (ie: on swap) then free it.
    
    The rest of the VM - list management, the classzone concept, etc
    remains unchanged.
    
    There are a number of things which the per-page pte chain could be
    used for.  Bill Irwin has identified the following.
    
    
    (1)  page replacement no longer goes around randomly unmapping things
    
    (2)  referenced bits are more accurate because there aren't several ms
            or even seconds between find the multiple pte's mapping a page
    
    (3)  reduces page replacement from O(total virtually mapped) to O(physical)
    
    (4)  enables defragmentation of physical memory
    
    (5)  enables cooperative offlining of memory for friendly guest instance
            behavior in UML and/or LPAR settings
    
    (6)  demonstrable benefit in performance of swapping which is common in
            end-user interactive workstation workloads (I don't like the word
            "desktop"). c.f. Craig Kulesa's post wrt. swapping performance
    
    (7)  evidence from 2.4-based rmap trees indicates approximate parity
            with mainline in kernel compiles with appropriate locking bits
    
    (8)  partitioning of physical memory can reduce the complexity of page
            replacement searches by scanning only the "interesting" zones
            implemented and merged in 2.4-based rmap
    
    (9)  partitioning of physical memory can increase the parallelism of page
            replacement searches by independently processing different zones
            implemented, but not merged in 2.4-based rmap
    
    (10) the reverse mappings may be used for efficiently keeping pte cache
            attributes coherent
    
    (11) they may be used for virtual cache invalidation (with changes)
    
    (12) the reverse mappings enable proper RSS limit enforcement
            implemented and merged in 2.4-based rmap
    
    
    
    The code adds a pointer to struct page, consumes additional storage for
    the pte chains and adds computational expense to the page reclaim code
    (I measured it at 3% additional load during streaming I/O).  The
    benefits which we get back for all this are, I must say, theoretical
    and unproven.  If it has real advantages (or, indeed, disadvantages)
    then why has nobody demonstrated them?
    
    
    
    There are a number of things remaining to be done:
    
    1: Demonstrate the above advantages.
    
    2: Make it work with pte-highmem  (Bill Irwin is signed up for this)
    
    3: Don't add pte_chains to non-shared pages optimisation (Dave McCracken's
       patch does this)
    
    4: Move the pte_chains into highmem too (Bill, I guess)
    
    5: per-cpu pte_chain freelists (Rik?)
    
    6: maybe GC the pte_chain backing pages. (Seems unavoidable.  Rik?)
    
    7: multithread the page reclaim code.  (I have patches).
    
    8: clustered add-to-swap.  Not sure if I buy this.  anon pages are
       often well-ordered-by-virtual-address on the LRU, so it "just
       works" for benchmarky loads.  But there may be some other loads...
    
    9: Fix bad IO latency in page reclaim (I have lame patches)
    
    10: Develop tuning tools, use them.
    
    11: The nightly updatedb run is still evicting everything.
    c48c43e6
swapfile.c 34.9 KB