• Mel Gorman's avatar
    mm: use zonelists instead of zones when direct reclaiming pages · dac1d27b
    Mel Gorman authored
    The following patches replace multiple zonelists per node with two zonelists
    that are filtered based on the GFP flags.  The patches as a set fix a bug with
    regard to the use of MPOL_BIND and ZONE_MOVABLE.  With this patchset, the
    MPOL_BIND will apply to the two highest zones when the highest zone is
    ZONE_MOVABLE.  This should be considered as an alternative fix for the
    MPOL_BIND+ZONE_MOVABLE in 2.6.23 to the previously discussed hack that filters
    only custom zonelists.
    
    The first patch cleans up an inconsistency where direct reclaim uses
    zonelist->zones where other places use zonelist.
    
    The second patch introduces a helper function node_zonelist() for looking up
    the appropriate zonelist for a GFP mask which simplifies patches later in the
    set.
    
    The third patch defines/remembers the "preferred zone" for numa statistics, as
    it is no longer always the first zone in a zonelist.
    
    The forth patch replaces multiple zonelists with two zonelists that are
    filtered.  The two zonelists are due to the fact that the memoryless patchset
    introduces a second set of zonelists for __GFP_THISNODE.
    
    The fifth patch introduces helper macros for retrieving the zone and node
    indices of entries in a zonelist.
    
    The final patch introduces filtering of the zonelists based on a nodemask.
    Two zonelists exist per node, one for normal allocations and one for
    __GFP_THISNODE.
    
    Performance results varied depending on the machine configuration.  In real
    workloads the gain/loss will depend on how much the userspace portion of the
    benchmark benefits from having more cache available due to reduced referencing
    of zonelists.
    
    These are the range of performance losses/gains when running against
    2.6.24-rc4-mm1.  The set and these machines are a mix of i386, x86_64 and
    ppc64 both NUMA and non-NUMA.
    			     loss   to  gain
    Total CPU time on Kernbench: -0.86% to  1.13%
    Elapsed   time on Kernbench: -0.79% to  0.76%
    page_test from aim9:         -4.37% to  0.79%
    brk_test  from aim9:         -0.71% to  4.07%
    fork_test from aim9:         -1.84% to  4.60%
    exec_test from aim9:         -0.71% to  1.08%
    
    This patch:
    
    The allocator deals with zonelists which indicate the order in which zones
    should be targeted for an allocation.  Similarly, direct reclaim of pages
    iterates over an array of zones.  For consistency, this patch converts direct
    reclaim to use a zonelist.  No functionality is changed by this patch.  This
    simplifies zonelist iterators in the next patch.
    Signed-off-by: default avatarMel Gorman <mel@csn.ul.ie>
    Acked-by: default avatarChristoph Lameter <clameter@sgi.com>
    Signed-off-by: default avatarLee Schermerhorn <lee.schermerhorn@hp.com>
    Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
    Cc: Mel Gorman <mel@csn.ul.ie>
    Cc: Christoph Lameter <clameter@sgi.com>
    Cc: Hugh Dickins <hugh@veritas.com>
    Cc: Nick Piggin <nickpiggin@yahoo.com.au>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    dac1d27b
page_alloc.c 124 KB