• Vladimir Davydov's avatar
    mm: memcontrol: fix possible memcg leak due to interrupted reclaim · 6df38689
    Vladimir Davydov authored
    Memory cgroup reclaim can be interrupted with mem_cgroup_iter_break()
    once enough pages have been reclaimed, in which case, in contrast to a
    full round-trip over a cgroup sub-tree, the current position stored in
    mem_cgroup_reclaim_iter of the target cgroup does not get invalidated
    and so is left holding the reference to the last scanned cgroup.  If the
    target cgroup does not get scanned again (we might have just reclaimed
    the last page or all processes might exit and free their memory
    voluntary), we will leak it, because there is nobody to put the
    reference held by the iterator.
    
    The problem is easy to reproduce by running the following command
    sequence in a loop:
    
        mkdir /sys/fs/cgroup/memory/test
        echo 100M > /sys/fs/cgroup/memory/test/memory.limit_in_bytes
        echo $$ > /sys/fs/cgroup/memory/test/cgroup.procs
        memhog 150M
        echo $$ > /sys/fs/cgroup/memory/cgroup.procs
        rmdir test
    
    The cgroups generated by it will never get freed.
    
    This patch fixes this issue by making mem_cgroup_iter avoid taking
    reference to the current position.  In order not to hit use-after-free
    bug while running reclaim in parallel with cgroup deletion, we make use
    of ->css_released cgroup callback to clear references to the dying
    cgroup in all reclaim iterators that might refer to it.  This callback
    is called right before scheduling rcu work which will free css, so if we
    access iter->position from rcu read section, we might be sure it won't
    go away under us.
    
    [hannes@cmpxchg.org: clean up css ref handling]
    Fixes: 5ac8fb31 ("mm: memcontrol: convert reclaim iterator to simple css refcounting")
    Signed-off-by: default avatarVladimir Davydov <vdavydov@virtuozzo.com>
    Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
    Acked-by: default avatarMichal Hocko <mhocko@kernel.org>
    Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
    Cc: <stable@vger.kernel.org>	[3.19+]
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    6df38689
memcontrol.c 146 KB