• Rik van Riel's avatar
    vmscan: bail out of direct reclaim after swap_cluster_max pages · a79311c1
    Rik van Riel authored
    When the VM is under pressure, it can happen that several direct reclaim
    processes are in the pageout code simultaneously.  It also happens that
    the reclaiming processes run into mostly referenced, mapped and dirty
    pages in the first round.
    
    This results in multiple direct reclaim processes having a lower
    pageout priority, which corresponds to a higher target of pages to
    scan.
    
    This in turn can result in each direct reclaim process freeing
    many pages.  Together, they can end up freeing way too many pages.
    
    This kicks useful data out of memory (in some cases more than half
    of all memory is swapped out).  It also impacts performance by
    keeping tasks stuck in the pageout code for too long.
    
    A 30% improvement in hackbench has been observed with this patch.
    
    The fix is relatively simple: in shrink_zone() we can check how many
    pages we have already freed, direct reclaim tasks break out of the
    scanning loop if they have already freed enough pages and have reached
    a lower priority level.
    
    We do not break out of shrink_zone() when priority == DEF_PRIORITY,
    to ensure that equal pressure is applied to every zone in the common
    case.
    
    However, in order to do this we do need to know how many pages we already
    freed, so move nr_reclaimed into scan_control.
    
    akpm: a historical interlude...
    
    We tried this in 2004:
    
    :commit e468e46a9bea3297011d5918663ce6d19094cf87
    :Author: akpm <akpm>
    :Date:   Thu Jun 24 15:53:52 2004 +0000
    :
    :[PATCH] vmscan.c: dont reclaim too many pages
    :
    :    The shrink_zone() logic can, under some circumstances, cause far too many
    :    pages to be reclaimed.  Say, we're scanning at high priority and suddenly hit
    :    a large number of reclaimable pages on the LRU.
    :    Change things so we bale out when SWAP_CLUSTER_MAX pages have been reclaimed.
    
    And we reverted it in 2006:
    
    :commit 210fe530
    :Author: Andrew Morton <akpm@osdl.org>
    :Date:   Fri Jan 6 00:11:14 2006 -0800
    :
    :    [PATCH] vmscan: balancing fix
    :
    :    Revert a patch which went into 2.6.8-rc1.  The changelog for that patch was:
    :
    :      The shrink_zone() logic can, under some circumstances, cause far too many
    :      pages to be reclaimed.  Say, we're scanning at high priority and suddenly
    :      hit a large number of reclaimable pages on the LRU.
    :
    :      Change things so we bale out when SWAP_CLUSTER_MAX pages have been
    :      reclaimed.
    :
    :    Problem is, this change caused significant imbalance in inter-zone scan
    :    balancing by truncating scans of larger zones.
    :
    :    Suppose, for example, ZONE_HIGHMEM is 10x the size of ZONE_NORMAL.  The zone
    :    balancing algorithm would require that if we're scanning 100 pages of
    :    ZONE_HIGHMEM, we should scan 10 pages of ZONE_NORMAL.  But this logic will
    :    cause the scanning of ZONE_HIGHMEM to bale out after only 32 pages are
    :    reclaimed.  Thus effectively causing smaller zones to be scanned relatively
    :    harder than large ones.
    :
    :    Now I need to remember what the workload was which caused me to write this
    :    patch originally, then fix it up in a different way...
    
    And we haven't demonstrated that whatever problem caused that reversion is
    not being reintroduced by this change in 2008.
    Signed-off-by: default avatarRik van Riel <riel@redhat.com>
    Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    a79311c1
vmscan.c 70.6 KB