1. 24 Feb, 2013 8 commits
    • Johannes Weiner's avatar
      mm: vmscan: clarify how swappiness, highest priority, memcg interact · 10316b31
      Johannes Weiner authored
      A swappiness of 0 has a slightly different meaning for global reclaim
      (may swap if file cache really low) and memory cgroup reclaim (never
      swap, ever).
      
      In addition, global reclaim at highest priority will scan all LRU lists
      equal to their size and ignore other balancing heuristics.  UNLESS
      swappiness forbids swapping, then the lists are balanced based on recent
      reclaim effectiveness.  UNLESS file cache is running low, then anonymous
      pages are force-scanned.
      
      This (total mess of a) behaviour is implicit and not obvious from the
      way the code is organized.  At least make it apparent in the code flow
      and document the conditions.  It will be it easier to come up with sane
      semantics later.
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Reviewed-by: default avatarSatoru Moriya <satoru.moriya@hds.com>
      Reviewed-by: default avatarMichal Hocko <mhocko@suse.cz>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Simon Jeons <simon.jeons@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      10316b31
    • Johannes Weiner's avatar
      mm: vmscan: save work scanning (almost) empty LRU lists · d778df51
      Johannes Weiner authored
      In certain cases (kswapd reclaim, memcg target reclaim), a fixed minimum
      amount of pages is scanned from the LRU lists on each iteration, to make
      progress.
      
      Do not make this minimum bigger than the respective LRU list size,
      however, and save some busy work trying to isolate and reclaim pages
      that are not there.
      
      Empty LRU lists are quite common with memory cgroups in NUMA
      environments because there exists a set of LRU lists for each zone for
      each memory cgroup, while the memory of a single cgroup is expected to
      stay on just one node.  The number of expected empty LRU lists is thus
      
        memcgs * (nodes - 1) * lru types
      
      Each attempt to reclaim from an empty LRU list does expensive size
      comparisons between lists, acquires the zone's lru lock etc.  Avoid
      that.
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Reviewed-by: default avatarMichal Hocko <mhocko@suse.cz>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Satoru Moriya <satoru.moriya@hds.com>
      Cc: Simon Jeons <simon.jeons@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d778df51
    • Johannes Weiner's avatar
      mm: memcg: only evict file pages when we have plenty · 7c5bd705
      Johannes Weiner authored
      Commit e9868505 ("mm, vmscan: only evict file pages when we have
      plenty") makes a point of not going for anonymous memory while there is
      still enough inactive cache around.
      
      The check was added only for global reclaim, but it is just as useful to
      reduce swapping in memory cgroup reclaim:
      
          200M-memcg-defconfig-j2
      
                                           vanilla                   patched
          Real time              454.06 (  +0.00%)         453.71 (  -0.08%)
          User time              668.57 (  +0.00%)         668.73 (  +0.02%)
          System time            128.92 (  +0.00%)         129.53 (  +0.46%)
          Swap in               1246.80 (  +0.00%)         814.40 ( -34.65%)
          Swap out              1198.90 (  +0.00%)         827.00 ( -30.99%)
          Pages allocated   16431288.10 (  +0.00%)    16434035.30 (  +0.02%)
          Major faults           681.50 (  +0.00%)         593.70 ( -12.86%)
          THP faults             237.20 (  +0.00%)         242.40 (  +2.18%)
          THP collapse           241.20 (  +0.00%)         248.50 (  +3.01%)
          THP splits             157.30 (  +0.00%)         161.40 (  +2.59%)
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.cz>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Satoru Moriya <satoru.moriya@hds.com>
      Cc: Simon Jeons <simon.jeons@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7c5bd705
    • Srinivas Pandruvada's avatar
      CMA: make putback_lru_pages() call conditional · 2a6f5124
      Srinivas Pandruvada authored
      As per documentation and other places calling putback_lru_pages(),
      putback_lru_pages() is called on error only.  Make the CMA code behave
      consistently.
      
      [akpm@linux-foundation.org: remove a test-n-branch in the wrapup code]
      Signed-off-by: default avatarSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Acked-by: default avatarMichal Nazarewicz <mina86@mina86.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2a6f5124
    • Andrew Morton's avatar
      mm/hugetlb.c: convert to pr_foo() · ffb22af5
      Andrew Morton authored
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: default avatarHillf Danton <dhillf@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ffb22af5
    • Andrew Morton's avatar
      mm/memcontrol.c: convert printk(KERN_FOO) to pr_foo() · d045197f
      Andrew Morton authored
      Acked-by: default avatarSha Zhengju <handai.szj@taobao.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.cz>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d045197f
    • Sha Zhengju's avatar
      memcg, oom: provide more precise dump info while memcg oom happening · 58cf188e
      Sha Zhengju authored
      Currently when a memcg oom is happening the oom dump messages is still
      global state and provides few useful info for users.  This patch prints
      more pointed memcg page statistics for memcg-oom and take hierarchy into
      consideration:
      
      Based on Michal's advice, we take hierarchy into consideration: supppose
      we trigger an OOM on A's limit
      
              root_memcg
                  |
                  A (use_hierachy=1)
                 / \
                B   C
                |
                D
      then the printed info will be:
      
        Memory cgroup stats for /A:...
        Memory cgroup stats for /A/B:...
        Memory cgroup stats for /A/C:...
        Memory cgroup stats for /A/B/D:...
      
      Following are samples of oom output:
      
      (1) Before change:
      
          mal-80 invoked oom-killer:gfp_mask=0xd0, order=0, oom_score_adj=0
          mal-80 cpuset=/ mems_allowed=0
          Pid: 2976, comm: mal-80 Not tainted 3.7.0+ #10
          Call Trace:
           [<ffffffff8167fbfb>] dump_header+0x83/0x1ca
           ..... (call trace)
           [<ffffffff8168a818>] page_fault+0x28/0x30
                                   <<<<<<<<<<<<<<<<<<<<< memcg specific information
          Task in /A/B/D killed as a result of limit of /A
          memory: usage 101376kB, limit 101376kB, failcnt 57
          memory+swap: usage 101376kB, limit 101376kB, failcnt 0
          kmem: usage 0kB, limit 9007199254740991kB, failcnt 0
                                   <<<<<<<<<<<<<<<<<<<<< print per cpu pageset stat
          Mem-Info:
          Node 0 DMA per-cpu:
          CPU    0: hi:    0, btch:   1 usd:   0
          ......
          CPU    3: hi:    0, btch:   1 usd:   0
          Node 0 DMA32 per-cpu:
          CPU    0: hi:  186, btch:  31 usd: 173
          ......
          CPU    3: hi:  186, btch:  31 usd: 130
                                   <<<<<<<<<<<<<<<<<<<<< print global page state
          active_anon:92963 inactive_anon:40777 isolated_anon:0
           active_file:33027 inactive_file:51718 isolated_file:0
           unevictable:0 dirty:3 writeback:0 unstable:0
           free:729995 slab_reclaimable:6897 slab_unreclaimable:6263
           mapped:20278 shmem:35971 pagetables:5885 bounce:0
           free_cma:0
                                   <<<<<<<<<<<<<<<<<<<<< print per zone page state
          Node 0 DMA free:15836kB ... all_unreclaimable? no
          lowmem_reserve[]: 0 3175 3899 3899
          Node 0 DMA32 free:2888564kB ... all_unrelaimable? no
          lowmem_reserve[]: 0 0 724 724
          lowmem_reserve[]: 0 0 0 0
          Node 0 DMA: 1*4kB (U) ... 3*4096kB (M) = 15836kB
          Node 0 DMA32: 41*4kB (UM) ... 702*4096kB (MR) = 2888316kB
          120710 total pagecache pages
          0 pages in swap cache
                                   <<<<<<<<<<<<<<<<<<<<< print global swap cache stat
          Swap cache stats: add 0, delete 0, find 0/0
          Free swap  = 499708kB
          Total swap = 499708kB
          1040368 pages RAM
          58678 pages reserved
          169065 pages shared
          173632 pages non-shared
          [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
          [ 2693]     0  2693     6005     1324      17        0             0 god
          [ 2754]     0  2754     6003     1320      16        0             0 god
          [ 2811]     0  2811     5992     1304      18        0             0 god
          [ 2874]     0  2874     6005     1323      18        0             0 god
          [ 2935]     0  2935     8720     7742      21        0             0 mal-30
          [ 2976]     0  2976    21520    17577      42        0             0 mal-80
          Memory cgroup out of memory: Kill process 2976 (mal-80) score 665 or sacrifice child
          Killed process 2976 (mal-80) total-vm:86080kB, anon-rss:69964kB, file-rss:344kB
      
      We can see that messages dumped by show_free_areas() are longsome and can
      provide so limited info for memcg that just happen oom.
      
      (2) After change
          mal-80 invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0
          mal-80 cpuset=/ mems_allowed=0
          Pid: 2704, comm: mal-80 Not tainted 3.7.0+ #10
          Call Trace:
           [<ffffffff8167fd0b>] dump_header+0x83/0x1d1
           .......(call trace)
           [<ffffffff8168a918>] page_fault+0x28/0x30
          Task in /A/B/D killed as a result of limit of /A
                                   <<<<<<<<<<<<<<<<<<<<< memcg specific information
          memory: usage 102400kB, limit 102400kB, failcnt 140
          memory+swap: usage 102400kB, limit 102400kB, failcnt 0
          kmem: usage 0kB, limit 9007199254740991kB, failcnt 0
          Memory cgroup stats for /A: cache:32KB rss:30984KB mapped_file:0KB swap:0KB inactive_anon:6912KB active_anon:24072KB inactive_file:32KB active_file:0KB unevictable:0KB
          Memory cgroup stats for /A/B: cache:0KB rss:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
          Memory cgroup stats for /A/C: cache:0KB rss:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
          Memory cgroup stats for /A/B/D: cache:32KB rss:71352KB mapped_file:0KB swap:0KB inactive_anon:6656KB active_anon:64696KB inactive_file:16KB active_file:16KB unevictable:0KB
          [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
          [ 2260]     0  2260     6006     1325      18        0             0 god
          [ 2383]     0  2383     6003     1319      17        0             0 god
          [ 2503]     0  2503     6004     1321      18        0             0 god
          [ 2622]     0  2622     6004     1321      16        0             0 god
          [ 2695]     0  2695     8720     7741      22        0             0 mal-30
          [ 2704]     0  2704    21520    17839      43        0             0 mal-80
          Memory cgroup out of memory: Kill process 2704 (mal-80) score 669 or sacrifice child
          Killed process 2704 (mal-80) total-vm:86080kB, anon-rss:71016kB, file-rss:340kB
      
      This version provides more pointed info for memcg in "Memory cgroup stats
      for XXX" section.
      Signed-off-by: default avatarSha Zhengju <handai.szj@taobao.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.cz>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      58cf188e
    • Andrew Morton's avatar
      drivers/md/persistent-data/dm-transaction-manager.c: rename HASH_SIZE · df855798
      Andrew Morton authored
      Fix the warning:
      
        drivers/md/persistent-data/dm-transaction-manager.c:28:1: warning: "HASH_SIZE" redefined
        In file included from include/linux/elevator.h:5,
                         from include/linux/blkdev.h:216,
                         from drivers/md/persistent-data/dm-block-manager.h:11,
                         from drivers/md/persistent-data/dm-transaction-manager.h:10,
                         from drivers/md/persistent-data/dm-transaction-manager.c:6:
        include/linux/hashtable.h:22:1: warning: this is the location of the previous definition
      
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Neil Brown <neilb@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      df855798
  2. 22 Feb, 2013 32 commits