• Muchun Song's avatar
    mm: memcontrol: slab: fix obtain a reference to a freeing memcg · 9f38f03a
    Muchun Song authored
    Patch series "Use obj_cgroup APIs to charge kmem pages", v5.
    
    Since Roman's series "The new cgroup slab memory controller" applied.
    All slab objects are charged with the new APIs of obj_cgroup.  The new
    APIs introduce a struct obj_cgroup to charge slab objects.  It prevents
    long-living objects from pinning the original memory cgroup in the
    memory.  But there are still some corner objects (e.g.  allocations
    larger than order-1 page on SLUB) which are not charged with the new
    APIs.  Those objects (include the pages which are allocated from buddy
    allocator directly) are charged as kmem pages which still hold a
    reference to the memory cgroup.
    
    E.g.  We know that the kernel stack is charged as kmem pages because the
    size of the kernel stack can be greater than 2 pages (e.g.  16KB on
    x86_64 or arm64).  If we create a thread (suppose the thread stack is
    charged to memory cgroup A) and then move it from memory cgroup A to
    memory cgroup B.  Because the kernel stack of the thread hold a
    reference to the memory cgroup A.  The thread can pin the memory cgroup
    A in the memory even if we remove the cgroup A.  If we want to see this
    scenario by using the following script.  We can see that the system has
    added 500 dying cgroups (This is not a real world issue, just a script
    to show that the large kmallocs are charged as kmem pages which can pin
    the memory cgroup in the memory).
    
    	#!/bin/bash
    
    	cat /proc/cgroups | grep memory
    
    	cd /sys/fs/cgroup/memory
    	echo 1 > memory.move_charge_at_immigrate
    
    	for i in range{1..500}
    	do
    		mkdir kmem_test
    		echo $$ > kmem_test/cgroup.procs
    		sleep 3600 &
    		echo $$ > cgroup.procs
    		echo `cat kmem_test/cgroup.procs` > cgroup.procs
    		rmdir kmem_test
    	done
    
    	cat /proc/cgroups | grep memory
    
    This patchset aims to make those kmem pages to drop the reference to
    memory cgroup by using the APIs of obj_cgroup.  Finally, we can see that
    the number of the dying cgroups will not increase if we run the above test
    script.
    
    This patch (of 7):
    
    The rcu_read_lock/unlock only can guarantee that the memcg will not be
    freed, but it cannot guarantee the success of css_get (which is in the
    refill_stock when cached memcg changed) to memcg.
    
      rcu_read_lock()
      memcg = obj_cgroup_memcg(old)
      __memcg_kmem_uncharge(memcg)
          refill_stock(memcg)
              if (stock->cached != memcg)
                  // css_get can change the ref counter from 0 back to 1.
                  css_get(&memcg->css)
      rcu_read_unlock()
    
    This fix is very like the commit:
    
      eefbfa7f ("mm: memcg/slab: fix use after free in obj_cgroup_charge")
    
    Fix this by holding a reference to the memcg which is passed to the
    __memcg_kmem_uncharge() before calling __memcg_kmem_uncharge().
    
    Link: https://lkml.kernel.org/r/20210319163821.20704-1-songmuchun@bytedance.com
    Link: https://lkml.kernel.org/r/20210319163821.20704-2-songmuchun@bytedance.com
    Fixes: 3de7d4f2 ("mm: memcg/slab: optimize objcg stock draining")
    Signed-off-by: default avatarMuchun Song <songmuchun@bytedance.com>
    Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
    Acked-by: default avatarRoman Gushchin <guro@fb.com>
    Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
    Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    9f38f03a
memcontrol.c 190 KB