Commit 8cfd8147 authored by Tejun Heo's avatar Tejun Heo

cgroup: implement cgroup v2 thread support

This patch implements cgroup v2 thread support.  The goal of the
thread mode is supporting hierarchical accounting and control at
thread granularity while staying inside the resource domain model
which allows coordination across different resource controllers and
handling of anonymous resource consumptions.

A cgroup is always created as a domain and can be made threaded by
writing to the "cgroup.type" file.  When a cgroup becomes threaded, it
becomes a member of a threaded subtree which is anchored at the
closest ancestor which isn't threaded.

The threads of the processes which are in a threaded subtree can be
placed anywhere without being restricted by process granularity or
no-internal-process constraint.  Note that the threads aren't allowed
to escape to a different threaded subtree.  To be used inside a
threaded subtree, a controller should explicitly support threaded mode
and be able to handle internal competition in the way which is
appropriate for the resource.

The root of a threaded subtree, the nearest ancestor which isn't
threaded, is called the threaded domain and serves as the resource
domain for the whole subtree.  This is the last cgroup where domain
controllers are operational and where all the domain-level resource
consumptions in the subtree are accounted.  This allows threaded
controllers to operate at thread granularity when requested while
staying inside the scope of system-level resource distribution.

As the root cgroup is exempt from the no-internal-process constraint,
it can serve as both a threaded domain and a parent to normal cgroups,
so, unlike non-root cgroups, the root cgroup can have both domain and
threaded children.

Internally, in a threaded subtree, each css_set has its ->dom_cset
pointing to a matching css_set which belongs to the threaded domain.
This ensures that thread root level cgroup_subsys_state for all
threaded controllers are readily accessible for domain-level
operations.

This patch enables threaded mode for the pids and perf_events
controllers.  Neither has to worry about domain-level resource
consumptions and it's enough to simply set the flag.

For more details on the interface and behavior of the thread mode,
please refer to the section 2-2-2 in Documentation/cgroup-v2.txt added
by this patch.

v5: - Dropped silly no-op ->dom_cgrp init from cgroup_create().
      Spotted by Waiman.
    - Documentation updated as suggested by Waiman.
    - cgroup.type content slightly reformatted.
    - Mark the debug controller threaded.

v4: - Updated to the general idea of marking specific cgroups
      domain/threaded as suggested by PeterZ.

v3: - Dropped "join" and always make mixed children join the parent's
      threaded subtree.

v2: - After discussions with Waiman, support for mixed thread mode is
      added.  This should address the issue that Peter pointed out
      where any nesting should be avoided for thread subtrees while
      coexisting with other domain cgroups.
    - Enabling / disabling thread mode now piggy backs on the existing
      control mask update mechanism.
    - Bug fixes and cleanup.
Signed-off-by: default avatarTejun Heo <tj@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
parent 450ee0c1
...@@ -18,7 +18,9 @@ v1 is available under Documentation/cgroup-v1/. ...@@ -18,7 +18,9 @@ v1 is available under Documentation/cgroup-v1/.
1-2. What is cgroup? 1-2. What is cgroup?
2. Basic Operations 2. Basic Operations
2-1. Mounting 2-1. Mounting
2-2. Organizing Processes 2-2. Organizing Processes and Threads
2-2-1. Processes
2-2-2. Threads
2-3. [Un]populated Notification 2-3. [Un]populated Notification
2-4. Controlling Controllers 2-4. Controlling Controllers
2-4-1. Enabling and Disabling 2-4-1. Enabling and Disabling
...@@ -167,8 +169,11 @@ cgroup v2 currently supports the following mount options. ...@@ -167,8 +169,11 @@ cgroup v2 currently supports the following mount options.
Delegation section for details. Delegation section for details.
Organizing Processes Organizing Processes and Threads
-------------------- --------------------------------
Processes
~~~~~~~~~
Initially, only the root cgroup exists to which all processes belong. Initially, only the root cgroup exists to which all processes belong.
A child cgroup can be created by creating a sub-directory:: A child cgroup can be created by creating a sub-directory::
...@@ -219,6 +224,104 @@ is removed subsequently, " (deleted)" is appended to the path:: ...@@ -219,6 +224,104 @@ is removed subsequently, " (deleted)" is appended to the path::
0::/test-cgroup/test-cgroup-nested (deleted) 0::/test-cgroup/test-cgroup-nested (deleted)
Threads
~~~~~~~
cgroup v2 supports thread granularity for a subset of controllers to
support use cases requiring hierarchical resource distribution across
the threads of a group of processes. By default, all threads of a
process belong to the same cgroup, which also serves as the resource
domain to host resource consumptions which are not specific to a
process or thread. The thread mode allows threads to be spread across
a subtree while still maintaining the common resource domain for them.
Controllers which support thread mode are called threaded controllers.
The ones which don't are called domain controllers.
Marking a cgroup threaded makes it join the resource domain of its
parent as a threaded cgroup. The parent may be another threaded
cgroup whose resource domain is further up in the hierarchy. The root
of a threaded subtree, that is, the nearest ancestor which is not
threaded, is called threaded domain or thread root interchangeably and
serves as the resource domain for the entire subtree.
Inside a threaded subtree, threads of a process can be put in
different cgroups and are not subject to the no internal process
constraint - threaded controllers can be enabled on non-leaf cgroups
whether they have threads in them or not.
As the threaded domain cgroup hosts all the domain resource
consumptions of the subtree, it is considered to have internal
resource consumptions whether there are processes in it or not and
can't have populated child cgroups which aren't threaded. Because the
root cgroup is not subject to no internal process constraint, it can
serve both as a threaded domain and a parent to domain cgroups.
The current operation mode or type of the cgroup is shown in the
"cgroup.type" file which indicates whether the cgroup is a normal
domain, a domain which is serving as the domain of a threaded subtree,
or a threaded cgroup.
On creation, a cgroup is always a domain cgroup and can be made
threaded by writing "threaded" to the "cgroup.type" file. The
operation is single direction::
# echo threaded > cgroup.type
Once threaded, the cgroup can't be made a domain again. To enable the
thread mode, the following conditions must be met.
- As the cgroup will join the parent's resource domain. The parent
must either be a valid (threaded) domain or a threaded cgroup.
- The cgroup must be empty. No enabled controllers, child cgroups or
processes.
Topology-wise, a cgroup can be in an invalid state. Please consider
the following toplogy::
A (threaded domain) - B (threaded) - C (domain, just created)
C is created as a domain but isn't connected to a parent which can
host child domains. C can't be used until it is turned into a
threaded cgroup. "cgroup.type" file will report "domain (invalid)" in
these cases. Operations which fail due to invalid topology use
EOPNOTSUPP as the errno.
A domain cgroup is turned into a threaded domain when one of its child
cgroup becomes threaded or threaded controllers are enabled in the
"cgroup.subtree_control" file while there are processes in the cgroup.
A threaded domain reverts to a normal domain when the conditions
clear.
When read, "cgroup.threads" contains the list of the thread IDs of all
threads in the cgroup. Except that the operations are per-thread
instead of per-process, "cgroup.threads" has the same format and
behaves the same way as "cgroup.procs". While "cgroup.threads" can be
written to in any cgroup, as it can only move threads inside the same
threaded domain, its operations are confined inside each threaded
subtree.
The threaded domain cgroup serves as the resource domain for the whole
subtree, and, while the threads can be scattered across the subtree,
all the processes are considered to be in the threaded domain cgroup.
"cgroup.procs" in a threaded domain cgroup contains the PIDs of all
processes in the subtree and is not readable in the subtree proper.
However, "cgroup.procs" can be written to from anywhere in the subtree
to migrate all threads of the matching process to the cgroup.
Only threaded controllers can be enabled in a threaded subtree. When
a threaded controller is enabled inside a threaded subtree, it only
accounts for and controls resource consumptions associated with the
threads in the cgroup and its descendants. All consumptions which
aren't tied to a specific thread belong to the threaded domain cgroup.
Because a threaded subtree is exempt from no internal process
constraint, a threaded controller must be able to handle competition
between threads in a non-leaf cgroup and its child cgroups. Each
threaded controller defines how such competitions are handled.
[Un]populated Notification [Un]populated Notification
-------------------------- --------------------------
...@@ -302,15 +405,15 @@ disabled if one or more children have it enabled. ...@@ -302,15 +405,15 @@ disabled if one or more children have it enabled.
No Internal Process Constraint No Internal Process Constraint
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Non-root cgroups can only distribute resources to their children when Non-root cgroups can distribute domain resources to their children
they don't have any processes of their own. In other words, only only when they don't have any processes of their own. In other words,
cgroups which don't contain any processes can have controllers enabled only domain cgroups which don't contain any processes can have domain
in their "cgroup.subtree_control" files. controllers enabled in their "cgroup.subtree_control" files.
This guarantees that, when a controller is looking at the part of the This guarantees that, when a domain controller is looking at the part
hierarchy which has it enabled, processes are always only on the of the hierarchy which has it enabled, processes are always only on
leaves. This rules out situations where child cgroups compete against the leaves. This rules out situations where child cgroups compete
internal processes of the parent. against internal processes of the parent.
The root cgroup is exempt from this restriction. Root contains The root cgroup is exempt from this restriction. Root contains
processes and anonymous resource consumption which can't be associated processes and anonymous resource consumption which can't be associated
...@@ -334,10 +437,10 @@ Model of Delegation ...@@ -334,10 +437,10 @@ Model of Delegation
~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~
A cgroup can be delegated in two ways. First, to a less privileged A cgroup can be delegated in two ways. First, to a less privileged
user by granting write access of the directory and its "cgroup.procs" user by granting write access of the directory and its "cgroup.procs",
and "cgroup.subtree_control" files to the user. Second, if the "cgroup.threads" and "cgroup.subtree_control" files to the user.
"nsdelegate" mount option is set, automatically to a cgroup namespace Second, if the "nsdelegate" mount option is set, automatically to a
on namespace creation. cgroup namespace on namespace creation.
Because the resource control interface files in a given directory Because the resource control interface files in a given directory
control the distribution of the parent's resources, the delegatee control the distribution of the parent's resources, the delegatee
...@@ -644,6 +747,29 @@ Core Interface Files ...@@ -644,6 +747,29 @@ Core Interface Files
All cgroup core files are prefixed with "cgroup." All cgroup core files are prefixed with "cgroup."
cgroup.type
A read-write single value file which exists on non-root
cgroups.
When read, it indicates the current type of the cgroup, which
can be one of the following values.
- "domain" : A normal valid domain cgroup.
- "domain threaded" : A threaded domain cgroup which is
serving as the root of a threaded subtree.
- "domain invalid" : A cgroup which is in an invalid state.
It can't be populated or have controllers enabled. It may
be allowed to become a threaded cgroup.
- "threaded" : A threaded cgroup which is a member of a
threaded subtree.
A cgroup can be turned into a threaded cgroup by writing
"threaded" to this file.
cgroup.procs cgroup.procs
A read-write new-line separated values file which exists on A read-write new-line separated values file which exists on
all cgroups. all cgroups.
...@@ -666,6 +792,35 @@ All cgroup core files are prefixed with "cgroup." ...@@ -666,6 +792,35 @@ All cgroup core files are prefixed with "cgroup."
When delegating a sub-hierarchy, write access to this file When delegating a sub-hierarchy, write access to this file
should be granted along with the containing directory. should be granted along with the containing directory.
In a threaded cgroup, reading this file fails with EOPNOTSUPP
as all the processes belong to the thread root. Writing is
supported and moves every thread of the process to the cgroup.
cgroup.threads
A read-write new-line separated values file which exists on
all cgroups.
When read, it lists the TIDs of all threads which belong to
the cgroup one-per-line. The TIDs are not ordered and the
same TID may show up more than once if the thread got moved to
another cgroup and then back or the TID got recycled while
reading.
A TID can be written to migrate the thread associated with the
TID to the cgroup. The writer should match all of the
following conditions.
- It must have write access to the "cgroup.threads" file.
- The cgroup that the thread is currently in must be in the
same resource domain as the destination cgroup.
- It must have write access to the "cgroup.procs" file of the
common ancestor of the source and destination cgroups.
When delegating a sub-hierarchy, write access to this file
should be granted along with the containing directory.
cgroup.controllers cgroup.controllers
A read-only space separated values file which exists on all A read-only space separated values file which exists on all
cgroups. cgroups.
......
...@@ -521,6 +521,18 @@ struct cgroup_subsys { ...@@ -521,6 +521,18 @@ struct cgroup_subsys {
*/ */
bool implicit_on_dfl:1; bool implicit_on_dfl:1;
/*
* If %true, the controller, supports threaded mode on the default
* hierarchy. In a threaded subtree, both process granularity and
* no-internal-process constraint are ignored and a threaded
* controllers should be able to handle that.
*
* Note that as an implicit controller is automatically enabled on
* all cgroups on the default hierarchy, it should also be
* threaded. implicit && !threaded is not supported.
*/
bool threaded:1;
/* /*
* If %false, this subsystem is properly hierarchical - * If %false, this subsystem is properly hierarchical -
* configuration, resource accounting and restriction on a parent * configuration, resource accounting and restriction on a parent
......
...@@ -170,7 +170,7 @@ struct dentry *cgroup_do_mount(struct file_system_type *fs_type, int flags, ...@@ -170,7 +170,7 @@ struct dentry *cgroup_do_mount(struct file_system_type *fs_type, int flags,
struct cgroup_root *root, unsigned long magic, struct cgroup_root *root, unsigned long magic,
struct cgroup_namespace *ns); struct cgroup_namespace *ns);
bool cgroup_may_migrate_to(struct cgroup *dst_cgrp); int cgroup_migrate_vet_dst(struct cgroup *dst_cgrp);
void cgroup_migrate_finish(struct cgroup_mgctx *mgctx); void cgroup_migrate_finish(struct cgroup_mgctx *mgctx);
void cgroup_migrate_add_src(struct css_set *src_cset, struct cgroup *dst_cgrp, void cgroup_migrate_add_src(struct css_set *src_cset, struct cgroup *dst_cgrp,
struct cgroup_mgctx *mgctx); struct cgroup_mgctx *mgctx);
......
...@@ -99,8 +99,9 @@ int cgroup_transfer_tasks(struct cgroup *to, struct cgroup *from) ...@@ -99,8 +99,9 @@ int cgroup_transfer_tasks(struct cgroup *to, struct cgroup *from)
if (cgroup_on_dfl(to)) if (cgroup_on_dfl(to))
return -EINVAL; return -EINVAL;
if (!cgroup_may_migrate_to(to)) ret = cgroup_migrate_vet_dst(to);
return -EBUSY; if (ret)
return ret;
mutex_lock(&cgroup_mutex); mutex_lock(&cgroup_mutex);
......
This diff is collapsed.
...@@ -352,6 +352,7 @@ static int __init enable_cgroup_debug(char *str) ...@@ -352,6 +352,7 @@ static int __init enable_cgroup_debug(char *str)
{ {
debug_cgrp_subsys.dfl_cftypes = debug_files; debug_cgrp_subsys.dfl_cftypes = debug_files;
debug_cgrp_subsys.implicit_on_dfl = true; debug_cgrp_subsys.implicit_on_dfl = true;
debug_cgrp_subsys.threaded = true;
return 1; return 1;
} }
__setup("cgroup_debug", enable_cgroup_debug); __setup("cgroup_debug", enable_cgroup_debug);
...@@ -345,4 +345,5 @@ struct cgroup_subsys pids_cgrp_subsys = { ...@@ -345,4 +345,5 @@ struct cgroup_subsys pids_cgrp_subsys = {
.free = pids_free, .free = pids_free,
.legacy_cftypes = pids_files, .legacy_cftypes = pids_files,
.dfl_cftypes = pids_files, .dfl_cftypes = pids_files,
.threaded = true,
}; };
...@@ -11210,5 +11210,6 @@ struct cgroup_subsys perf_event_cgrp_subsys = { ...@@ -11210,5 +11210,6 @@ struct cgroup_subsys perf_event_cgrp_subsys = {
* controller is not mounted on a legacy hierarchy. * controller is not mounted on a legacy hierarchy.
*/ */
.implicit_on_dfl = true, .implicit_on_dfl = true,
.threaded = true,
}; };
#endif /* CONFIG_CGROUP_PERF */ #endif /* CONFIG_CGROUP_PERF */
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment