• Balbir Singh's avatar
    memcg: improve resource counter scalability · 0c3e73e8
    Balbir Singh authored
    Reduce the resource counter overhead (mostly spinlock) associated with the
    root cgroup.  This is a part of the several patches to reduce mem cgroup
    overhead.  I had posted other approaches earlier (including using percpu
    counters).  Those patches will be a natural addition and will be added
    iteratively on top of these.
    
    The patch stops resource counter accounting for the root cgroup.  The data
    for display is derived from the statisitcs we maintain via
    mem_cgroup_charge_statistics (which is more scalable).  What happens today
    is that, we do double accounting, once using res_counter_charge() and once
    using memory_cgroup_charge_statistics().  For the root, since we don't
    implement limits any more, we don't need to track every charge via
    res_counter_charge() and check for limit being exceeded and reclaim.
    
    The main mem->res usage_in_bytes can be derived by summing the cache and
    rss usage data from memory statistics (MEM_CGROUP_STAT_RSS and
    MEM_CGROUP_STAT_CACHE).  However, for memsw->res usage_in_bytes, we need
    additional data about swapped out memory.  This patch adds a
    MEM_CGROUP_STAT_SWAPOUT and uses that along with MEM_CGROUP_STAT_RSS and
    MEM_CGROUP_STAT_CACHE to derive the memsw data.  This data is computed
    recursively when hierarchy is enabled.
    
    The tests results I see on a 24 way show that
    
    1. The lock contention disappears from /proc/lock_stats
    2. The results of the test are comparable to running with
       cgroup_disable=memory.
    
    Here is a sample of my program runs
    
    Without Patch
    
     Performance counter stats for '/home/balbir/parallel_pagefault':
    
     7192804.124144  task-clock-msecs         #     23.937 CPUs
             424691  context-switches         #      0.000 M/sec
                267  CPU-migrations           #      0.000 M/sec
           28498113  page-faults              #      0.004 M/sec
      5826093739340  cycles                   #    809.989 M/sec
       408883496292  instructions             #      0.070 IPC
         7057079452  cache-references         #      0.981 M/sec
         3036086243  cache-misses             #      0.422 M/sec
    
      300.485365680  seconds time elapsed
    
    With cgroup_disable=memory
    
     Performance counter stats for '/home/balbir/parallel_pagefault':
    
     7182183.546587  task-clock-msecs         #     23.915 CPUs
             425458  context-switches         #      0.000 M/sec
                203  CPU-migrations           #      0.000 M/sec
           92545093  page-faults              #      0.013 M/sec
      6034363609986  cycles                   #    840.185 M/sec
       437204346785  instructions             #      0.072 IPC
         6636073192  cache-references         #      0.924 M/sec
         2358117732  cache-misses             #      0.328 M/sec
    
      300.320905827  seconds time elapsed
    
    With this patch applied
    
     Performance counter stats for '/home/balbir/parallel_pagefault':
    
     7191619.223977  task-clock-msecs         #     23.955 CPUs
             422579  context-switches         #      0.000 M/sec
                 88  CPU-migrations           #      0.000 M/sec
           91946060  page-faults              #      0.013 M/sec
      5957054385619  cycles                   #    828.333 M/sec
      1058117350365  instructions             #      0.178 IPC
         9161776218  cache-references         #      1.274 M/sec
         1920494280  cache-misses             #      0.267 M/sec
    
      300.218764862  seconds time elapsed
    
    Data from Prarit (kernel compile with make -j64 on a 64
    CPU/32G machine)
    
    For a single run
    
    Without patch
    
    real 27m8.988s
    user 87m24.916s
    sys 382m6.037s
    
    With patch
    
    real    4m18.607s
    user    84m58.943s
    sys     50m52.682s
    
    With config turned off
    
    real    4m54.972s
    user    90m13.456s
    sys     50m19.711s
    
    NOTE: The data looks counterintuitive due to the increased performance
    with the patch, even over the config being turned off. We probably need
    more runs, but so far all testing has shown that the patches definitely
    help.
    
    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: default avatarBalbir Singh <balbir@linux.vnet.ibm.com>
    Cc: Prarit Bhargava <prarit@redhat.com>
    Cc: Andi Kleen <andi@firstfloor.org>
    Reviewed-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
    Reviewed-by: default avatarDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
    Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
    Cc: Paul Menage <menage@google.com>
    Cc: Li Zefan <lizf@cn.fujitsu.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    0c3e73e8
memcontrol.c 79.5 KB