• Mel Gorman's avatar
    mm: compaction: acquire the zone->lru_lock as late as possible · 2a1402aa
    Mel Gorman authored
    Richard Davies and Shaohua Li have both reported lock contention problems
    in compaction on the zone and LRU locks as well as significant amounts of
    time being spent in compaction.  This series aims to reduce lock
    contention and scanning rates to reduce that CPU usage.  Richard reported
    at https://lkml.org/lkml/2012/9/21/91 that this series made a big
    different to a problem he reported in August:
    
       http://marc.info/?l=kvm&m=134511507015614&w=2
    
    Patch 1 defers acquiring the zone->lru_lock as long as possible.
    
    Patch 2 defers acquiring the zone->lock as lock as possible.
    
    Patch 3 reverts Rik's "skip-free" patches as the core concept gets
    	reimplemented later and the remaining patches are easier to
    	understand if this is reverted first.
    
    Patch 4 adds a pageblock-skip bit to the pageblock flags to cache what
    	pageblocks should be skipped by the migrate and free scanners.
    	This drastically reduces the amount of scanning compaction has
    	to do.
    
    Patch 5 reimplements something similar to Rik's idea except it uses the
    	pageblock-skip information to decide where the scanners should
    	restart from and does not need to wrap around.
    
    I tested this on 3.6-rc6 + linux-next/akpm. Kernels tested were
    
    akpm-20120920	3.6-rc6 + linux-next/akpm as of Septeber 20th, 2012
    lesslock	Patches 1-6
    revert		Patches 1-7
    cachefail	Patches 1-8
    skipuseless	Patches 1-9
    
    Stress high-order allocation tests looked ok.  Success rates are more or
    less the same with the full series applied but there is an expectation
    that there is less opportunity to race with other allocation requests if
    there is less scanning.  The time to complete the tests did not vary that
    much and are uninteresting as were the vmstat statistics so I will not
    present them here.
    
    Using ftrace I recorded how much scanning was done by compaction and got this
    
                                3.6.0-rc6     3.6.0-rc6   3.6.0-rc6  3.6.0-rc6 3.6.0-rc6
                                akpm-20120920 lockless  revert-v2r2  cachefail skipuseless
    
    Total   free    scanned         360753976  515414028  565479007   17103281   18916589
    Total   free    isolated          2852429    3597369    4048601     670493     727840
    Total   free    efficiency        0.0079%    0.0070%    0.0072%    0.0392%    0.0385%
    Total   migrate scanned         247728664  822729112 1004645830   17946827   14118903
    Total   migrate isolated          2555324    3245937    3437501     616359     658616
    Total   migrate efficiency        0.0103%    0.0039%    0.0034%    0.0343%    0.0466%
    
    The efficiency is worthless because of the nature of the test and the
    number of failures.  The really interesting point as far as this patch
    series is concerned is the number of pages scanned.  Note that reverting
    Rik's patches massively increases the number of pages scanned indicating
    that those patches really did make a difference to CPU usage.
    
    However, caching what pageblocks should be skipped has a much higher
    impact.  With patches 1-8 applied, free page and migrate page scanning are
    both reduced by 95% in comparison to the akpm kernel.  If the basic
    concept of Rik's patches are implemened on top then scanning then the free
    scanner barely changed but migrate scanning was further reduced.  That
    said, tests on 3.6-rc5 indicated that the last patch had greater impact
    than what was measured here so it is a bit variable.
    
    One way or the other, this series has a large impact on the amount of
    scanning compaction does when there is a storm of THP allocations.
    
    This patch:
    
    Compaction's migrate scanner acquires the zone->lru_lock when scanning a
    range of pages looking for LRU pages to acquire.  It does this even if
    there are no LRU pages in the range.  If multiple processes are compacting
    then this can cause severe locking contention.  To make matters worse
    commit b2eef8c0 ("mm: compaction: minimise the time IRQs are disabled
    while isolating pages for migration") releases the lru_lock every
    SWAP_CLUSTER_MAX pages that are scanned.
    
    This patch makes two changes to how the migrate scanner acquires the LRU
    lock.  First, it only releases the LRU lock every SWAP_CLUSTER_MAX pages
    if the lock is contended.  This reduces the number of times it
    unnecessarily disables and re-enables IRQs.  The second is that it defers
    acquiring the LRU lock for as long as possible.  If there are no LRU pages
    or the only LRU pages are transhuge then the LRU lock will not be acquired
    at all which reduces contention on zone->lru_lock.
    
    [minchan@kernel.org: augment comment]
    [akpm@linux-foundation.org: tweak comment text]
    Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
    Acked-by: default avatarRik van Riel <riel@redhat.com>
    Cc: Richard Davies <richard@arachsys.com>
    Cc: Shaohua Li <shli@kernel.org>
    Cc: Avi Kivity <avi@redhat.com>
    Acked-by: default avatarRafael Aquini <aquini@redhat.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    2a1402aa
compaction.c 30.2 KB