• Oleg Nesterov's avatar
    cgroup/pids: turn cgroup_subsys->free() into cgroup_subsys->release() to fix the accounting · f3b3b543
    Oleg Nesterov authored
    [ Upstream commit 51bee5ab ]
    
    The only user of cgroup_subsys->free() callback is pids_cgrp_subsys which
    needs pids_free() to uncharge the pid.
    
    However, ->free() is called from __put_task_struct()->cgroup_free() and this
    is too late. Even the trivial program which does
    
    	for (;;) {
    		int pid = fork();
    		assert(pid >= 0);
    		if (pid)
    			wait(NULL);
    		else
    			exit(0);
    	}
    
    can run out of limits because release_task()->call_rcu(delayed_put_task_struct)
    implies an RCU gp after the task/pid goes away and before the final put().
    
    Test-case:
    
    	mkdir -p /tmp/CG
    	mount -t cgroup2 none /tmp/CG
    	echo '+pids' > /tmp/CG/cgroup.subtree_control
    
    	mkdir /tmp/CG/PID
    	echo 2 > /tmp/CG/PID/pids.max
    
    	perl -e 'while ($p = fork) { wait; } $p // die "fork failed: $!\n"' &
    	echo $! > /tmp/CG/PID/cgroup.procs
    
    Without this patch the forking process fails soon after migration.
    
    Rename cgroup_subsys->free() to cgroup_subsys->release() and move the callsite
    into the new helper, cgroup_release(), called by release_task() which actually
    frees the pid(s).
    Reported-by: default avatarHerton R. Krzesinski <hkrzesin@redhat.com>
    Reported-by: default avatarJan Stancek <jstancek@redhat.com>
    Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
    Signed-off-by: default avatarTejun Heo <tj@kernel.org>
    Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
    f3b3b543
exit.c 44.5 KB