• Chris Down's avatar
    mm, memcg: consider subtrees in memory.events · 9852ae3f
    Chris Down authored
    memory.stat and other files already consider subtrees in their output, and
    we should too in order to not present an inconsistent interface.
    
    The current situation is fairly confusing, because people interacting with
    cgroups expect hierarchical behaviour in the vein of memory.stat,
    cgroup.events, and other files.  For example, this causes confusion when
    debugging reclaim events under low, as currently these always read "0" at
    non-leaf memcg nodes, which frequently causes people to misdiagnose breach
    behaviour.  The same confusion applies to other counters in this file when
    debugging issues.
    
    Aggregation is done at write time instead of at read-time since these
    counters aren't hot (unlike memory.stat which is per-page, so it does it
    at read time), and it makes sense to bundle this with the file
    notifications.
    
    After this patch, events are propagated up the hierarchy:
    
        [root@ktst ~]# cat /sys/fs/cgroup/system.slice/memory.events
        low 0
        high 0
        max 0
        oom 0
        oom_kill 0
        [root@ktst ~]# systemd-run -p MemoryMax=1 true
        Running as unit: run-r251162a189fb4562b9dabfdc9b0422f5.service
        [root@ktst ~]# cat /sys/fs/cgroup/system.slice/memory.events
        low 0
        high 0
        max 7
        oom 1
        oom_kill 1
    
    As this is a change in behaviour, this can be reverted to the old
    behaviour by mounting with the `memory_localevents' flag set.  However, we
    use the new behaviour by default as there's a lack of evidence that there
    are any current users of memory.events that would find this change
    undesirable.
    
    akpm: this is a behaviour change, so Cc:stable.  THis is so that
    forthcoming distros which use cgroup v2 are more likely to pick up the
    revised behaviour.
    
    Link: http://lkml.kernel.org/r/20190208224419.GA24772@chrisdown.nameSigned-off-by: default avatarChris Down <chris@chrisdown.name>
    Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
    Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Tejun Heo <tj@kernel.org>
    Cc: Roman Gushchin <guro@fb.com>
    Cc: Dennis Zhou <dennis@kernel.org>
    Cc: Suren Baghdasaryan <surenb@google.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    9852ae3f
cgroup-v2.rst 87.3 KB