1. 17 Jun, 2009 40 commits
    • David Rientjes's avatar
      oom: move oom_adj value from task_struct to mm_struct · 2ff05b2b
      David Rientjes authored
      The per-task oom_adj value is a characteristic of its mm more than the
      task itself since it's not possible to oom kill any thread that shares the
      mm.  If a task were to be killed while attached to an mm that could not be
      freed because another thread were set to OOM_DISABLE, it would have
      needlessly been terminated since there is no potential for future memory
      freeing.
      
      This patch moves oomkilladj (now more appropriately named oom_adj) from
      struct task_struct to struct mm_struct.  This requires task_lock() on a
      task to check its oom_adj value to protect against exec, but it's already
      necessary to take the lock when dereferencing the mm to find the total VM
      size for the badness heuristic.
      
      This fixes a livelock if the oom killer chooses a task and another thread
      sharing the same memory has an oom_adj value of OOM_DISABLE.  This occurs
      because oom_kill_task() repeatedly returns 1 and refuses to kill the
      chosen task while select_bad_process() will repeatedly choose the same
      task during the next retry.
      
      Taking task_lock() in select_bad_process() to check for OOM_DISABLE and in
      oom_kill_task() to check for threads sharing the same memory will be
      removed in the next patch in this series where it will no longer be
      necessary.
      
      Writing to /proc/pid/oom_adj for a kthread will now return -EINVAL since
      these threads are immune from oom killing already.  They simply report an
      oom_adj value of OOM_DISABLE.
      
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2ff05b2b
    • KAMEZAWA Hiroyuki's avatar
      mm: reuse unused swap entry if necessary · c9e44410
      KAMEZAWA Hiroyuki authored
      Presently we can know a swap entry is just used as SwapCache via swap_map,
      without looking up swap cache.
      
      Then, we have a chance to reuse swap-cache-only swap entries in
      get_swap_pages().
      
      This patch tries to free swap-cache-only swap entries if swap is not
      enough.
      
      Note: We hit following path when swap_cluster code cannot find a free
      cluster.  Then, vm_swap_full() is not only condition to allow the kernel
      to reclaim unused swap.
      Signed-off-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: default avatarBalbir Singh <balbir@in.ibm.com>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Tested-by: default avatarDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c9e44410
    • KAMEZAWA Hiroyuki's avatar
      mm: modify swap_map and add SWAP_HAS_CACHE flag · 355cfa73
      KAMEZAWA Hiroyuki authored
      This is a part of the patches for fixing memcg's swap accountinf leak.
      But, IMHO, not a bad patch even if no memcg.
      
      There are 2 kinds of references to swap.
       - reference from swap entry
       - reference from swap cache
      
      Then,
      
       - If there is swap cache && swap's refcnt is 1, there is only swap cache.
        (*) swapcount(entry) == 1 && find_get_page(swapper_space, entry) != NULL
      
      This counting logic have worked well for a long time.  But considering
      that we cannot know there is a _real_ reference or not by swap_map[],
      current usage of counter is not very good.
      
      This patch adds a flag SWAP_HAS_CACHE and recored information that a swap
      entry has a cache or not.  This will remove -1 magic used in swapfile.c
      and be a help to avoid unnecessary find_get_page().
      Signed-off-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Tested-by: default avatarDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      355cfa73
    • KAMEZAWA Hiroyuki's avatar
      mm: add swap cache interface for swap reference · cb4b86ba
      KAMEZAWA Hiroyuki authored
      In a following patch, the usage of swap cache is recorded into swap_map.
      This patch is for necessary interface changes to do that.
      
      2 interfaces:
      
        - swapcache_prepare()
        - swapcache_free()
      
      are added for allocating/freeing refcnt from swap-cache to existing swap
      entries.  But implementation itself is not changed under this patch.  At
      adding swapcache_free(), memcg's hook code is moved under
      swapcache_free().  This is better than using scattered hooks.
      Signed-off-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Reviewed-by: default avatarDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Acked-by: default avatarBalbir Singh <balbir@in.ibm.com>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cb4b86ba
    • KOSAKI Motohiro's avatar
      mm: remove CONFIG_UNEVICTABLE_LRU config option · 68377659
      KOSAKI Motohiro authored
      Currently, nobody wants to turn UNEVICTABLE_LRU off.  Thus this
      configurability is unnecessary.
      Signed-off-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Andi Kleen <andi@firstfloor.org>
      Acked-by: default avatarMinchan Kim <minchan.kim@gmail.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      68377659
    • Minchan Kim's avatar
      page-allocator: reset wmark_min and inactive ratio of zone when hotplug happens · bce7394a
      Minchan Kim authored
      Solve two problems.
      
      Whenever memory hotplug sucessfully happens, zone->present_pages
      have to be changed.
      
      1) Now memory hotplug calls setup_per_zone_wmark_min only when
         online_pages called, not offline_pages.
      
         It breaks balance.
      
      2) If zone->present_pages is changed, we also have to change
         zone->inactive_ratio.  That's because inactive_ratio depends on
         zone->present_pages.
      Signed-off-by: default avatarMinchan Kim <minchan.kim@gmail.com>
      Acked-by: default avatarYasunori Goto <y-goto@jp.fujitsu.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bce7394a
    • Minchan Kim's avatar
      page-allocator: add inactive ratio calculation function of each zone · 96cb4df5
      Minchan Kim authored
      Factor the per-zone arithemetic inside setup_per_zone_inactive_ratio()'s
      loop into a a separate function, calculate_zone_inactive_ratio().  This
      function will be used in a later patch
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarMinchan Kim <minchan.kim@gmail.com>
      Reviewed-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      96cb4df5
    • Minchan Kim's avatar
      page-allocator: clean up functions related to pages_min · bc75d33f
      Minchan Kim authored
      Change the names of two functions. It doesn't affect behavior.
      
      Presently, setup_per_zone_pages_min() changes low, high of zone as well as
      min.  So a better name is setup_per_zone_wmarks().  That's because Mel
      changed zone->pages_[hig/low/min] to zone->watermark array in "page
      allocator: replace the watermark-related union in struct zone with a
      watermark[] array".
      
       * setup_per_zone_pages_min => setup_per_zone_wmarks
      
      Of course, we have to change init_per_zone_pages_min, too.  There are not
      pages_min any more.
      
       * init_per_zone_pages_min => init_per_zone_wmark_min
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarMinchan Kim <minchan.kim@gmail.com>
      Acked-by: default avatarMel Gorman <mel@csn.ul.ie>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bc75d33f
    • Christoph Lameter's avatar
      page-allocator: use integer fields lookup for gfp_zone and check for errors in... · b70d94ee
      Christoph Lameter authored
      page-allocator: use integer fields lookup for gfp_zone and check for errors in flags passed to the page allocator
      
      This simplifies the code in gfp_zone() and also keeps the ability of the
      compiler to use constant folding to get rid of gfp_zone processing.
      
      The lookup of the zone is done using a bitfield stored in an integer.  So
      the code in gfp_zone is a simple extraction of bits from a constant
      bitfield.  The compiler is generating a load of a constant into a register
      and then performs a shift and mask operation to get the zone from a gfp_t.
       No cachelines are touched and no branches have to be predicted by the
      compiler.
      
      We are doing some macro tricks here to convince the compiler to always do
      the constant folding if possible.
      Signed-off-by: default avatarChristoph Lameter <cl@linux-foundation.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Reviewed-by: default avatarMel Gorman <mel@csn.ul.ie>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b70d94ee
    • Matthew Wilcox's avatar
      mm: check the argument of kunmap on architectures without highmem · 31c91132
      Matthew Wilcox authored
      If you're using a non-highmem architecture, passing an argument with the
      wrong type to kunmap() doesn't give you a warning because the ifdef
      doesn't check the type.
      
      Using a static inline function solves the problem nicely.
      Reported-by: default avatarDavid Woodhouse <dwmw2@infradead.org>
      Signed-off-by: default avatarMatthew Wilcox <willy@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      31c91132
    • MinChan Kim's avatar
      vmscan: prevent shrinking of active anon lru list in case of no swap space V3 · 69c85481
      MinChan Kim authored
      shrink_zone() can deactivate active anon pages even if we don't have a
      swap device.  Many embedded products don't have a swap device.  So the
      deactivation of anon pages is unnecessary.
      
      This patch prevents unnecessary deactivation of anon lru pages.  But, it
      don't prevent aging of anon pages to swap out.
      Signed-off-by: default avatarMinchan Kim <minchan.kim@gmail.com>
      Acked-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      69c85481
    • Brice Goglin's avatar
      migration: only migrate_prep() once per move_pages() · 35282a2d
      Brice Goglin authored
      migrate_prep() is fairly expensive (72us on 16-core barcelona 1.9GHz).
      Commit 3140a227 improved move_pages()
      throughput by breaking it into chunks, but it also made migrate_prep() be
      called once per chunk (every 128pages or so) instead of once per
      move_pages().
      
      This patch reverts to calling migrate_prep() only once per chunk as we did
      before 2.6.29.  It is also a followup to commit
      0aedadf9 ("mm: move migrate_prep out from
      under mmap_sem").
      
      This improves migration throughput on the above machine from 600MB/s to
      750MB/s.
      Signed-off-by: default avatarBrice Goglin <Brice.Goglin@inria.fr>
      Acked-by: default avatarChristoph Lameter <cl@linux-foundation.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Reviewed-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      35282a2d
    • Rafael J. Wysocki's avatar
      mm, PM/Freezer: Disable OOM killer when tasks are frozen · 7f33d49a
      Rafael J. Wysocki authored
      Currently, the following scenario appears to be possible in theory:
      
      * Tasks are frozen for hibernation or suspend.
      * Free pages are almost exhausted.
      * Certain piece of code in the suspend code path attempts to allocate
        some memory using GFP_KERNEL and allocation order less than or
        equal to PAGE_ALLOC_COSTLY_ORDER.
      * __alloc_pages_internal() cannot find a free page so it invokes the
        OOM killer.
      * The OOM killer attempts to kill a task, but the task is frozen, so
        it doesn't die immediately.
      * __alloc_pages_internal() jumps to 'restart', unsuccessfully tries
        to find a free page and invokes the OOM killer.
      * No progress can be made.
      
      Although it is now hard to trigger during hibernation due to the memory
      shrinking carried out by the hibernation code, it is theoretically
      possible to trigger during suspend after the memory shrinking has been
      removed from that code path.  Moreover, since memory allocations are
      going to be used for the hibernation memory shrinking, it will be even
      more likely to happen during hibernation.
      
      To prevent it from happening, introduce the oom_killer_disabled switch
      that will cause __alloc_pages_internal() to fail in the situations in
      which the OOM killer would have been called and make the freezer set
      this switch after tasks have been successfully frozen.
      
      [akpm@linux-foundation.org: be nicer to the namespace]
      Signed-off-by: default avatarRafael J. Wysocki <rjw@sisk.pl>
      Cc: Fengguang Wu <fengguang.wu@gmail.com>
      Cc: David Rientjes <rientjes@google.com>
      Acked-by: default avatarPavel Machek <pavel@ucw.cz>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7f33d49a
    • Nick Piggin's avatar
      mm: madvise(): correct return code · 75927af8
      Nick Piggin authored
      The posix_madvise() function succeeds (and does nothing) when called with
      parameters (NULL, 0, -1); according to LSB tests, it should fail with
      EINVAL because -1 is not a valid flag.
      
      When called with a valid address and size, it correctly fails.
      
      So perform an initial check for valid flags first.
      Reported-by: default avatarJiri Dluhos <jdluhos@novell.com>
      Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
      Reviewed-and-Tested-by: default avatarWANG Cong <xiyou.wangcong@gmail.com>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      75927af8
    • Andrew Morton's avatar
      page-allocator: warn if __GFP_NOFAIL is used for a large allocation · dab48dab
      Andrew Morton authored
      __GFP_NOFAIL is a bad fiction.  Allocations _can_ fail, and callers should
      detect and suitably handle this (and not by lamely moving the infinite
      loop up to the caller level either).
      
      Attempting to use __GFP_NOFAIL for a higher-order allocation is even
      worse, so add a once-off runtime check for this to slap people around for
      even thinking about trying it.
      
      Cc: David Rientjes <rientjes@google.com>
      Acked-by: default avatarMel Gorman <mel@csn.ul.ie>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dab48dab
    • Magnus Damm's avatar
      videobuf-dma-contig: zero copy USERPTR support · 720b17e7
      Magnus Damm authored
      Since videobuf-dma-contig is designed to handle physically contiguous
      memory, this patch modifies the videobuf-dma-contig code to only accept a
      user space pointer to physically contiguous memory.  For now only
      VM_PFNMAP vmas are supported, so forget hotplug.
      
      On SuperH Mobile we use this with our sh_mobile_ceu_camera driver together
      with various multimedia accelerator blocks that are exported to user space
      using UIO.  The UIO kernel code exports physically contiguous memory to
      user space and lets the user space application mmap() this memory and pass
      a pointer using the USERPTR interface for V4L2 zero copy operation.
      
      With this approach we support zero copy capture, hardware scaling and
      various forms of hardware encoding and decoding.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarMagnus Damm <damm@igel.co.jp>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Acked-by: default avatarMauro Carvalho Chehab <mchehab@infradead.org>
      Cc: Hans Verkuil <hverkuil@xs4all.nl>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      720b17e7
    • Johannes Weiner's avatar
      mm: introduce follow_pfn() · 3b6748e2
      Johannes Weiner authored
      Analoguous to follow_phys(), add a helper that looks up the PFN at a
      user virtual address in an IO mapping or a raw PFN mapping.
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Acked-by: default avatarMagnus Damm <magnus.damm@gmail.com>
      Cc: Hans Verkuil <hverkuil@xs4all.nl>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3b6748e2
    • Johannes Weiner's avatar
      mm: use generic follow_pte() in follow_phys() · 03668a4d
      Johannes Weiner authored
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Acked-by: default avatarMagnus Damm <magnus.damm@gmail.com>
      Cc: Hans Verkuil <hverkuil@xs4all.nl>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      03668a4d
    • Johannes Weiner's avatar
      mm: introduce follow_pte() · f8ad0f49
      Johannes Weiner authored
      A generic readonly page table lookup helper to map an address space and an
      address from it to a pte.
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Acked-by: default avatarMagnus Damm <magnus.damm@gmail.com>
      Cc: Hans Verkuil <hverkuil@xs4all.nl>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f8ad0f49
    • Cyrill Gorcunov's avatar
      mm: setup_per_zone_inactive_ratio - fix comment and make it __init · e9bb35df
      Cyrill Gorcunov authored
      The caller of setup_per_zone_inactive_ratio is an __init function.  There
      is no need to keep the callee after it completed as well.  Also fix a
      comment.
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@openvz.org>
      Reviewed-by: default avatarMinchan Kim <minchan.kim@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e9bb35df
    • Cyrill Gorcunov's avatar
      mm: setup_per_zone_inactive_ratio - do not call for int_sqrt if not needed · 5c87eada
      Cyrill Gorcunov authored
      int_sqrt() returns 0 if its argument is zero so call it if only needed.
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@openvz.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5c87eada
    • Wu Fengguang's avatar
      vmscan: ZVC updates in shrink_active_list() can be done once · af166777
      Wu Fengguang authored
      This effectively lifts the unit of updates to nr_inactive_* and
      pgdeactivate from PAGEVEC_SIZE=14 to SWAP_CLUSTER_MAX=32, or
      MAX_ORDER_NR_PAGES=1024 for reclaim_zone().
      
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Reviewed-by: default avatarMinchan Kim <minchan.kim@gmail.com>
      Signed-off-by: default avatarWu Fengguang <fengguang.wu@intel.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      af166777
    • Wu Fengguang's avatar
      vmscan: don't export nr_saved_scan in /proc/zoneinfo · 08d9ae7c
      Wu Fengguang authored
      The lru->nr_saved_scan's are not meaningful counters for even kernel
      developers.  They typically are smaller than 32 and are always 0 for large
      lists.  So remove them from /proc/zoneinfo.
      
      Hopefully this interface change won't break too many scripts.
      /proc/zoneinfo is too unstructured to be script friendly, and I wonder the
      affected scripts - if there are any - are still bleeding since the not
      long ago commit "vmscan: split LRU lists into anon & file sets", which
      also touched the "scanned" line :)
      
      If we are to re-export accumulated vmscan counts in the future, they can
      go to new lines in /proc/zoneinfo instead of the current form, or to
      /sys/devices/system/node/node0/meminfo?
      Signed-off-by: default avatarWu Fengguang <fengguang.wu@intel.com>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Acked-by: default avatarChristoph Lameter <cl@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      08d9ae7c
    • Wu Fengguang's avatar
      vmscan: cleanup the scan batching code · 6e08a369
      Wu Fengguang authored
      The vmscan batching logic is twisting.  Move it into a standalone function
      nr_scan_try_batch() and document it.  No behavior change.
      Signed-off-by: default avatarWu Fengguang <fengguang.wu@intel.com>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6e08a369
    • Rik van Riel's avatar
      vmscan: evict use-once pages first · 56e49d21
      Rik van Riel authored
      When the file LRU lists are dominated by streaming IO pages, evict those
      pages first, before considering evicting other pages.
      
      This should be safe from deadlocks or performance problems
      because only three things can happen to an inactive file page:
      
      1) referenced twice and promoted to the active list
      2) evicted by the pageout code
      3) under IO, after which it will get evicted or promoted
      
      The pages freed in this way can either be reused for streaming IO, or
      allocated for something else.  If the pages are used for streaming IO,
      this pageout pattern continues.  Otherwise, we will fall back to the
      normal pageout pattern.
      Signed-off-by: default avatarRik van Riel <riel@redhat.com>
      Reported-by: default avatarElladan <elladan@eskimo.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      56e49d21
    • Wu Fengguang's avatar
      pagemap: add page-types tool · 35efa5e9
      Wu Fengguang authored
      Add page-types, a handy tool for querying page flags.
      
      It will expand some of the overloaded flags:
      	PG_slob_free   = PG_private
      	PG_slub_frozen = PG_active
      	PG_slub_debug  = PG_error
      	PG_readahead   = PG_reclaim
      
      and mask out obscure flags except in -raw mode:
      	PG_reserved
      	PG_mlocked
      	PG_mappedtodisk
      	PG_private
      	PG_private_2
      	PG_owner_priv_1
      	PG_arch_1
      	PG_uncached
      	PG_compound* for non hugeTLB pages
      
      [akpm@linux-foundation.org: fix warning]
      Signed-off-by: default avatarWu Fengguang <fengguang.wu@intel.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      35efa5e9
    • Wu Fengguang's avatar
      pagemap: document 9 more exported page flags · 17e89501
      Wu Fengguang authored
      Also add short descriptions for all of the 20 exported page flags.
      Signed-off-by: default avatarWu Fengguang <fengguang.wu@intel.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      17e89501
    • Wu Fengguang's avatar
      pagemap: document clarifications · c9ba78e2
      Wu Fengguang authored
      Some bit ranges were inclusive and some not.  Fix them to be consistently
      inclusive.
      Signed-off-by: default avatarWu Fengguang <fengguang.wu@intel.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c9ba78e2
    • Wu Fengguang's avatar
      proc: export more page flags in /proc/kpageflags · 17797549
      Wu Fengguang authored
      Export all page flags faithfully in /proc/kpageflags.
      
      	11. KPF_MMAP		(pseudo flag) memory mapped page
      	12. KPF_ANON		(pseudo flag) memory mapped page (anonymous)
      	13. KPF_SWAPCACHE	page is in swap cache
      	14. KPF_SWAPBACKED	page is swap/RAM backed
      	15. KPF_COMPOUND_HEAD	(*)
      	16. KPF_COMPOUND_TAIL	(*)
      	17. KPF_HUGE		hugeTLB pages
      	18. KPF_UNEVICTABLE	page is in the unevictable LRU list
      	19. KPF_HWPOISON(TBD)	hardware detected corruption
      	20. KPF_NOPAGE		(pseudo flag) no page frame at the address
      	32-39.			more obscure flags for kernel developers
      
      	(*) For compound pages, exporting _both_ head/tail info enables
      	    users to tell where a compound page starts/ends, and its order.
      
      The accompanying page-types tool will handle the details like decoupling
      overloaded flags and hiding obscure flags to normal users.
      
      Thanks to KOSAKI and Andi for their valuable recommendations!
      Signed-off-by: default avatarWu Fengguang <fengguang.wu@intel.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      17797549
    • Wu Fengguang's avatar
      proc: kpagecount/kpageflags code cleanup · ed7ce0f1
      Wu Fengguang authored
      Move increments of pfn/out to bottom of the loop.
      Signed-off-by: default avatarWu Fengguang <fengguang.wu@intel.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Acked-by: default avatarMatt Mackall <mpm@selenic.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ed7ce0f1
    • Wu Fengguang's avatar
      mm: introduce PageHuge() for testing huge/gigantic pages · 20a0307c
      Wu Fengguang authored
      A series of patches to enhance the /proc/pagemap interface and to add a
      userspace executable which can be used to present the pagemap data.
      
      Export 10 more flags to end users (and more for kernel developers):
      
              11. KPF_MMAP            (pseudo flag) memory mapped page
              12. KPF_ANON            (pseudo flag) memory mapped page (anonymous)
              13. KPF_SWAPCACHE       page is in swap cache
              14. KPF_SWAPBACKED      page is swap/RAM backed
              15. KPF_COMPOUND_HEAD   (*)
              16. KPF_COMPOUND_TAIL   (*)
              17. KPF_HUGE		hugeTLB pages
              18. KPF_UNEVICTABLE     page is in the unevictable LRU list
              19. KPF_HWPOISON        hardware detected corruption
              20. KPF_NOPAGE          (pseudo flag) no page frame at the address
      
              (*) For compound pages, exporting _both_ head/tail info enables
                  users to tell where a compound page starts/ends, and its order.
      
      a simple demo of the page-types tool
      
      # ./page-types -h
      page-types [options]
                  -r|--raw                  Raw mode, for kernel developers
                  -a|--addr    addr-spec    Walk a range of pages
                  -b|--bits    bits-spec    Walk pages with specified bits
                  -l|--list                 Show page details in ranges
                  -L|--list-each            Show page details one by one
                  -N|--no-summary           Don't show summay info
                  -h|--help                 Show this usage message
      addr-spec:
                  N                         one page at offset N (unit: pages)
                  N+M                       pages range from N to N+M-1
                  N,M                       pages range from N to M-1
                  N,                        pages range from N to end
                  ,M                        pages range from 0 to M
      bits-spec:
                  bit1,bit2                 (flags & (bit1|bit2)) != 0
                  bit1,bit2=bit1            (flags & (bit1|bit2)) == bit1
                  bit1,~bit2                (flags & (bit1|bit2)) == bit1
                  =bit1,bit2                flags == (bit1|bit2)
      bit-names:
                locked              error         referenced           uptodate
                 dirty                lru             active               slab
             writeback            reclaim              buddy               mmap
             anonymous          swapcache         swapbacked      compound_head
         compound_tail               huge        unevictable           hwpoison
                nopage           reserved(r)         mlocked(r)    mappedtodisk(r)
               private(r)       private_2(r)   owner_private(r)            arch(r)
              uncached(r)       readahead(o)       slob_free(o)     slub_frozen(o)
            slub_debug(o)
                                         (r) raw mode bits  (o) overloaded bits
      
      # ./page-types
                   flags      page-count       MB  symbolic-flags                     long-symbolic-flags
      0x0000000000000000          487369     1903  _________________________________
      0x0000000000000014               5        0  __R_D____________________________  referenced,dirty
      0x0000000000000020               1        0  _____l___________________________  lru
      0x0000000000000024              34        0  __R__l___________________________  referenced,lru
      0x0000000000000028            3838       14  ___U_l___________________________  uptodate,lru
      0x0001000000000028              48        0  ___U_l_______________________I___  uptodate,lru,readahead
      0x000000000000002c            6478       25  __RU_l___________________________  referenced,uptodate,lru
      0x000100000000002c              47        0  __RU_l_______________________I___  referenced,uptodate,lru,readahead
      0x0000000000000040            8344       32  ______A__________________________  active
      0x0000000000000060               1        0  _____lA__________________________  lru,active
      0x0000000000000068             348        1  ___U_lA__________________________  uptodate,lru,active
      0x0001000000000068              12        0  ___U_lA______________________I___  uptodate,lru,active,readahead
      0x000000000000006c             988        3  __RU_lA__________________________  referenced,uptodate,lru,active
      0x000100000000006c              48        0  __RU_lA______________________I___  referenced,uptodate,lru,active,readahead
      0x0000000000004078               1        0  ___UDlA_______b__________________  uptodate,dirty,lru,active,swapbacked
      0x000000000000407c              34        0  __RUDlA_______b__________________  referenced,uptodate,dirty,lru,active,swapbacked
      0x0000000000000400             503        1  __________B______________________  buddy
      0x0000000000000804               1        0  __R________M_____________________  referenced,mmap
      0x0000000000000828            1029        4  ___U_l_____M_____________________  uptodate,lru,mmap
      0x0001000000000828              43        0  ___U_l_____M_________________I___  uptodate,lru,mmap,readahead
      0x000000000000082c             382        1  __RU_l_____M_____________________  referenced,uptodate,lru,mmap
      0x000100000000082c              12        0  __RU_l_____M_________________I___  referenced,uptodate,lru,mmap,readahead
      0x0000000000000868             192        0  ___U_lA____M_____________________  uptodate,lru,active,mmap
      0x0001000000000868              12        0  ___U_lA____M_________________I___  uptodate,lru,active,mmap,readahead
      0x000000000000086c             800        3  __RU_lA____M_____________________  referenced,uptodate,lru,active,mmap
      0x000100000000086c              31        0  __RU_lA____M_________________I___  referenced,uptodate,lru,active,mmap,readahead
      0x0000000000004878               2        0  ___UDlA____M__b__________________  uptodate,dirty,lru,active,mmap,swapbacked
      0x0000000000001000             492        1  ____________a____________________  anonymous
      0x0000000000005808               4        0  ___U_______Ma_b__________________  uptodate,mmap,anonymous,swapbacked
      0x0000000000005868            2839       11  ___U_lA____Ma_b__________________  uptodate,lru,active,mmap,anonymous,swapbacked
      0x000000000000586c              30        0  __RU_lA____Ma_b__________________  referenced,uptodate,lru,active,mmap,anonymous,swapbacked
                   total          513968     2007
      
      # ./page-types -r
                   flags      page-count       MB  symbolic-flags                     long-symbolic-flags
      0x0000000000000000          468002     1828  _________________________________
      0x0000000100000000           19102       74  _____________________r___________  reserved
      0x0000000000008000              41        0  _______________H_________________  compound_head
      0x0000000000010000             188        0  ________________T________________  compound_tail
      0x0000000000008014               1        0  __R_D__________H_________________  referenced,dirty,compound_head
      0x0000000000010014               4        0  __R_D___________T________________  referenced,dirty,compound_tail
      0x0000000000000020               1        0  _____l___________________________  lru
      0x0000000800000024              34        0  __R__l__________________P________  referenced,lru,private
      0x0000000000000028            3794       14  ___U_l___________________________  uptodate,lru
      0x0001000000000028              46        0  ___U_l_______________________I___  uptodate,lru,readahead
      0x0000000400000028              44        0  ___U_l_________________d_________  uptodate,lru,mappedtodisk
      0x0001000400000028               2        0  ___U_l_________________d_____I___  uptodate,lru,mappedtodisk,readahead
      0x000000000000002c            6434       25  __RU_l___________________________  referenced,uptodate,lru
      0x000100000000002c              47        0  __RU_l_______________________I___  referenced,uptodate,lru,readahead
      0x000000040000002c              14        0  __RU_l_________________d_________  referenced,uptodate,lru,mappedtodisk
      0x000000080000002c              30        0  __RU_l__________________P________  referenced,uptodate,lru,private
      0x0000000800000040            8124       31  ______A_________________P________  active,private
      0x0000000000000040             219        0  ______A__________________________  active
      0x0000000800000060               1        0  _____lA_________________P________  lru,active,private
      0x0000000000000068             322        1  ___U_lA__________________________  uptodate,lru,active
      0x0001000000000068              12        0  ___U_lA______________________I___  uptodate,lru,active,readahead
      0x0000000400000068              13        0  ___U_lA________________d_________  uptodate,lru,active,mappedtodisk
      0x0000000800000068              12        0  ___U_lA_________________P________  uptodate,lru,active,private
      0x000000000000006c             977        3  __RU_lA__________________________  referenced,uptodate,lru,active
      0x000100000000006c              48        0  __RU_lA______________________I___  referenced,uptodate,lru,active,readahead
      0x000000040000006c               5        0  __RU_lA________________d_________  referenced,uptodate,lru,active,mappedtodisk
      0x000000080000006c               3        0  __RU_lA_________________P________  referenced,uptodate,lru,active,private
      0x0000000c0000006c               3        0  __RU_lA________________dP________  referenced,uptodate,lru,active,mappedtodisk,private
      0x0000000c00000068               1        0  ___U_lA________________dP________  uptodate,lru,active,mappedtodisk,private
      0x0000000000004078               1        0  ___UDlA_______b__________________  uptodate,dirty,lru,active,swapbacked
      0x000000000000407c              34        0  __RUDlA_______b__________________  referenced,uptodate,dirty,lru,active,swapbacked
      0x0000000000000400             538        2  __________B______________________  buddy
      0x0000000000000804               1        0  __R________M_____________________  referenced,mmap
      0x0000000000000828            1029        4  ___U_l_____M_____________________  uptodate,lru,mmap
      0x0001000000000828              43        0  ___U_l_____M_________________I___  uptodate,lru,mmap,readahead
      0x000000000000082c             382        1  __RU_l_____M_____________________  referenced,uptodate,lru,mmap
      0x000100000000082c              12        0  __RU_l_____M_________________I___  referenced,uptodate,lru,mmap,readahead
      0x0000000000000868             192        0  ___U_lA____M_____________________  uptodate,lru,active,mmap
      0x0001000000000868              12        0  ___U_lA____M_________________I___  uptodate,lru,active,mmap,readahead
      0x000000000000086c             800        3  __RU_lA____M_____________________  referenced,uptodate,lru,active,mmap
      0x000100000000086c              31        0  __RU_lA____M_________________I___  referenced,uptodate,lru,active,mmap,readahead
      0x0000000000004878               2        0  ___UDlA____M__b__________________  uptodate,dirty,lru,active,mmap,swapbacked
      0x0000000000001000             492        1  ____________a____________________  anonymous
      0x0000000000005008               2        0  ___U________a_b__________________  uptodate,anonymous,swapbacked
      0x0000000000005808               4        0  ___U_______Ma_b__________________  uptodate,mmap,anonymous,swapbacked
      0x000000000000580c               1        0  __RU_______Ma_b__________________  referenced,uptodate,mmap,anonymous,swapbacked
      0x0000000000005868            2839       11  ___U_lA____Ma_b__________________  uptodate,lru,active,mmap,anonymous,swapbacked
      0x000000000000586c              29        0  __RU_lA____Ma_b__________________  referenced,uptodate,lru,active,mmap,anonymous,swapbacked
                   total          513968     2007
      
      # ./page-types --raw --list --no-summary --bits reserved
      offset  count   flags
      0       15      _____________________r___________
      31      4       _____________________r___________
      159     97      _____________________r___________
      4096    2067    _____________________r___________
      6752    2390    _____________________r___________
      9355    3       _____________________r___________
      9728    14526   _____________________r___________
      
      This patch:
      
      Introduce PageHuge(), which identifies huge/gigantic pages by their
      dedicated compound destructor functions.
      
      Also move prep_compound_gigantic_page() to hugetlb.c and make
      __free_pages_ok() non-static.
      Signed-off-by: default avatarWu Fengguang <fengguang.wu@intel.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      20a0307c
    • Mel Gorman's avatar
      mm: use alloc_pages_exact() in alloc_large_system_hash() to avoid duplicated logic · a1dd268c
      Mel Gorman authored
      alloc_large_system_hash() has logic for freeing pages at the end of an
      excessively large power-of-two buffer that is a duplicate of what is in
      alloc_pages_exact().  This patch converts alloc_large_system_hash() to use
      alloc_pages_exact().
      Signed-off-by: default avatarMel Gorman <mel@csn.ul.ie>
      Acked-by: default avatarHugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: Christoph Lameter <cl@linux.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a1dd268c
    • Mel Gorman's avatar
      page allocator: sanity check order in the page allocator slow path · 72807a74
      Mel Gorman authored
      Callers may speculatively call different allocators in order of preference
      trying to allocate a buffer of a given size.  The order needed to allocate
      this may be larger than what the page allocator can normally handle.
      While the allocator mostly does the right thing, it should not direct
      reclaim or wakeup kswapd with a bogus order.  This patch sanity checks the
      order in the slow path and returns NULL if it is too large.
      Signed-off-by: default avatarMel Gorman <mel@csn.ul.ie>
      Signed-off-by: default avatarDave Hansen <dave@linux.vnet.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      72807a74
    • KOSAKI Motohiro's avatar
      page allocator: move free_page_mlock() to page_alloc.c · 092cead6
      KOSAKI Motohiro authored
      Currently, free_page_mlock() is only called from page_alloc.c.  Thus, we
      can move it to page_alloc.c.
      
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Dave Hansen <dave@linux.vnet.ibm.com>
      Signed-off-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      092cead6
    • Mel Gorman's avatar
      page allocator: slab: use nr_online_nodes to check for a NUMA platform · b6e68bc1
      Mel Gorman authored
      SLAB currently avoids checking a bitmap repeatedly by checking once and
      storing a flag.  When the addition of nr_online_nodes as a cheaper version
      of num_online_nodes(), this check can be replaced by nr_online_nodes.
      
      (Christoph did a patch that this is lifted almost verbatim from)
      Signed-off-by: default avatarChristoph Lameter <cl@linux.com>
      Signed-off-by: default avatarMel Gorman <mel@csn.ul.ie>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Reviewed-by: default avatarPekka Enberg <penberg@cs.helsinki.fi>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Dave Hansen <dave@linux.vnet.ibm.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b6e68bc1
    • Christoph Lameter's avatar
      page allocator: use a pre-calculated value instead of num_online_nodes() in fast paths · 62bc62a8
      Christoph Lameter authored
      num_online_nodes() is called in a number of places but most often by the
      page allocator when deciding whether the zonelist needs to be filtered
      based on cpusets or the zonelist cache.  This is actually a heavy function
      and touches a number of cache lines.
      
      This patch stores the number of online nodes at boot time and updates the
      value when nodes get onlined and offlined.  The value is then used in a
      number of important paths in place of num_online_nodes().
      
      [rientjes@google.com: do not override definition of node_set_online() with macro]
      Signed-off-by: default avatarChristoph Lameter <cl@linux-foundation.org>
      Signed-off-by: default avatarMel Gorman <mel@csn.ul.ie>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Dave Hansen <dave@linux.vnet.ibm.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      62bc62a8
    • Mel Gorman's avatar
      page allocator: get the pageblock migratetype without disabling interrupts · 974709bd
      Mel Gorman authored
      Local interrupts are disabled when freeing pages to the PCP list.  Part of
      that free checks what the migratetype of the pageblock the page is in but
      it checks this with interrupts disabled and interupts should never be
      disabled longer than necessary.  This patch checks the pagetype with
      interrupts enabled with the impact that it is possible a page is freed to
      the wrong list when a pageblock changes type.  As that block is now
      already considered mixed from an anti-fragmentation perspective, it's not
      of vital importance.
      Signed-off-by: default avatarMel Gorman <mel@csn.ul.ie>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Dave Hansen <dave@linux.vnet.ibm.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      974709bd
    • Mel Gorman's avatar
      page allocator: update NR_FREE_PAGES only as necessary · f2260e6b
      Mel Gorman authored
      When pages are being freed to the buddy allocator, the zone NR_FREE_PAGES
      counter must be updated.  In the case of bulk per-cpu page freeing, it's
      updated once per page.  This retouches cache lines more than necessary.
      Update the counters one per per-cpu bulk free.
      Signed-off-by: default avatarMel Gorman <mel@csn.ul.ie>
      Reviewed-by: default avatarChristoph Lameter <cl@linux-foundation.org>
      Reviewed-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Dave Hansen <dave@linux.vnet.ibm.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f2260e6b
    • Mel Gorman's avatar
      page allocator: use allocation flags as an index to the zone watermark · 41858966
      Mel Gorman authored
      ALLOC_WMARK_MIN, ALLOC_WMARK_LOW and ALLOC_WMARK_HIGH determin whether
      pages_min, pages_low or pages_high is used as the zone watermark when
      allocating the pages.  Two branches in the allocator hotpath determine
      which watermark to use.
      
      This patch uses the flags as an array index into a watermark array that is
      indexed with WMARK_* defines accessed via helpers.  All call sites that
      use zone->pages_* are updated to use the helpers for accessing the values
      and the array offsets for setting.
      Signed-off-by: default avatarMel Gorman <mel@csn.ul.ie>
      Reviewed-by: default avatarChristoph Lameter <cl@linux-foundation.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Dave Hansen <dave@linux.vnet.ibm.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      41858966
    • Nick Piggin's avatar
      page allocator: do not check for compound pages during the page allocator sanity checks · a3af9c38
      Nick Piggin authored
      A number of sanity checks are made on each page allocation and free
      including that the page count is zero.  page_count() checks for compound
      pages and checks the count of the head page if true.  However, in these
      paths, we do not care if the page is compound or not as the count of each
      tail page should also be zero.
      
      This patch makes two changes to the use of page_count() in the free path.
      It converts one check of page_count() to a VM_BUG_ON() as the count should
      have been unconditionally checked earlier in the free path.  It also
      avoids checking for compound pages.
      
      [mel@csn.ul.ie: Wrote changelog]
      Signed-off-by: default avatarMel Gorman <mel@csn.ul.ie>
      Signed-off-by: default avatarNick Piggin <nickpiggin@yahoo.com.au>
      Reviewed-by: default avatarChristoph Lameter <cl@linux-foundation.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Dave Hansen <dave@linux.vnet.ibm.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a3af9c38