• Johannes Weiner's avatar
    mm: vmscan: move dirty pages out of the way until they're flushed · c55e8d03
    Johannes Weiner authored
    We noticed a performance regression when moving hadoop workloads from
    3.10 kernels to 4.0 and 4.6.  This is accompanied by increased pageout
    activity initiated by kswapd as well as frequent bursts of allocation
    stalls and direct reclaim scans.  Even lowering the dirty ratios to the
    equivalent of less than 1% of memory would not eliminate the issue,
    suggesting that dirty pages concentrate where the scanner is looking.
    
    This can be traced back to recent efforts of thrash avoidance.  Where
    3.10 would not detect refaulting pages and continuously supply clean
    cache to the inactive list, a thrashing workload on 4.0+ will detect and
    activate refaulting pages right away, distilling used-once pages on the
    inactive list much more effectively.  This is by design, and it makes
    sense for clean cache.  But for the most part our workload's cache
    faults are refaults and its use-once cache is from streaming writes.  We
    end up with most of the inactive list dirty, and we don't go after the
    active cache as long as we have use-once pages around.
    
    But waiting for writes to avoid reclaiming clean cache that *might*
    refault is a bad trade-off.  Even if the refaults happen, reads are
    faster than writes.  Before getting bogged down on writeback, reclaim
    should first look at *all* cache in the system, even active cache.
    
    To accomplish this, activate pages that are dirty or under writeback
    when they reach the end of the inactive LRU.  The pages are marked for
    immediate reclaim, meaning they'll get moved back to the inactive LRU
    tail as soon as they're written back and become reclaimable.  But in the
    meantime, by reducing the inactive list to only immediately reclaimable
    pages, we allow the scanner to deactivate and refill the inactive list
    with clean cache from the active list tail to guarantee forward
    progress.
    
    [hannes@cmpxchg.org: update comment]
      Link: http://lkml.kernel.org/r/20170202191957.22872-8-hannes@cmpxchg.org
    Link: http://lkml.kernel.org/r/20170123181641.23938-6-hannes@cmpxchg.orgSigned-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
    Acked-by: default avatarMinchan Kim <minchan@kernel.org>
    Acked-by: default avatarMichal Hocko <mhocko@suse.com>
    Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
    Cc: Mel Gorman <mgorman@suse.de>
    Cc: Rik van Riel <riel@redhat.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    c55e8d03
swap.c 26.9 KB