• Daniel Lezcano's avatar
    cgroup: add clone_children control file · 97978e6d
    Daniel Lezcano authored
    The ns_cgroup is a control group interacting with the namespaces.  When a
    new namespace is created, a corresponding cgroup is automatically created
    too.  The cgroup name is the pid of the process who did 'unshare' or the
    child of 'clone'.
    
    This cgroup is tied with the namespace because it prevents a process to
    escape the control group and use the post_clone callback, so the child
    cgroup inherits the values of the parent cgroup.
    
    Unfortunately, the more we use this cgroup and the more we are facing
    problems with it:
    
    (1) when a process unshares, the cgroup name may conflict with a
        previous cgroup with the same pid, so unshare or clone return -EEXIST
    
    (2) the cgroup creation is out of control because there may have an
        application creating several namespaces where the system will
        automatically create several cgroups in his back and let them on the
        cgroupfs (eg.  a vrf based on the network namespace).
    
    (3) the mix of (1) and (2) force an administrator to regularly check
        and clean these cgroups.
    
    This patchset removes the ns_cgroup by adding a new flag to the cgroup and
    the cgroupfs mount option.  It enables the copy of the parent cgroup when
    a child cgroup is created.  We can then safely remove the ns_cgroup as
    this flag brings a compatibility.  We have now to manually create and add
    the task to a cgroup, which is consistent with the cgroup framework.
    
    This patch:
    
    Sent as an answer to a previous thread around the ns_cgroup.
    
    https://lists.linux-foundation.org/pipermail/containers/2009-June/018627.html
    
    It adds a control file 'clone_children' for a cgroup.  This control file
    is a boolean specifying if the child cgroup should be a clone of the
    parent cgroup or not.  The default value is 'false'.
    
    This flag makes the child cgroup to call the post_clone callback of all
    the subsystem, if it is available.
    
    At present, the cpuset is the only one which had implemented the
    post_clone callback.
    
    The option can be set at mount time by specifying the 'clone_children'
    mount option.
    Signed-off-by: default avatarDaniel Lezcano <daniel.lezcano@free.fr>
    Signed-off-by: default avatarSerge E. Hallyn <serge.hallyn@canonical.com>
    Cc: Eric W. Biederman <ebiederm@xmission.com>
    Acked-by: default avatarPaul Menage <menage@google.com>
    Reviewed-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
    Cc: Jamal Hadi Salim <hadi@cyberus.ca>
    Cc: Matt Helsley <matthltc@us.ibm.com>
    Acked-by: default avatarBalbir Singh <balbir@linux.vnet.ibm.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    97978e6d
cgroup.c 128 KB