Commit 895b9b12 authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'cgroup-for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup updates from Tejun Heo:

 - Added Michal Koutný as a maintainer

 - Counters in pids.events were behaving inconsistently. pids.events
   made properly hierarchical and pids.events.local added

 - misc.peak and misc.events.local added

 - cpuset remote partition creation and cpuset.cpus.exclusive handling
   improved

 - Code cleanups, non-critical fixes, doc updates

 - for-6.10-fixes is merged in to receive two non-critical fixes that
   didn't trigger pull

* tag 'cgroup-for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (23 commits)
  cgroup: Add Michal Koutný as a maintainer
  cgroup/misc: Introduce misc.events.local
  cgroup/rstat: add force idle show helper
  cgroup: Protect css->cgroup write under css_set_lock
  cgroup/misc: Introduce misc.peak
  cgroup_misc: add kernel-doc comments for enum misc_res_type
  cgroup/cpuset: Prevent UAF in proc_cpuset_show()
  selftest/cgroup: Update test_cpuset_prs.sh to match changes
  cgroup/cpuset: Make cpuset.cpus.exclusive independent of cpuset.cpus
  cgroup/cpuset: Delay setting of CS_CPU_EXCLUSIVE until valid partition
  selftest/cgroup: Fix test_cpuset_prs.sh problems reported by test robot
  cgroup/cpuset: Fix remote root partition creation problem
  cgroup: avoid the unnecessary list_add(dying_tasks) in cgroup_exit()
  cgroup/cpuset: Optimize isolated partition only generate_sched_domains() calls
  cgroup/cpuset: Reduce the lock protecting CS_SCHED_LOAD_BALANCE
  kernel/cgroup: cleanup cgroup_base_files when fail to add cgroup_psi_files
  selftests: cgroup: Add basic tests for pids controller
  selftests: cgroup: Lexicographic order in Makefile
  cgroup/pids: Add pids.events.local
  cgroup/pids: Make event counters hierarchical
  ...
parents f97b956b 9283ff5b
...@@ -36,7 +36,8 @@ superset of parent/child/pids.current. ...@@ -36,7 +36,8 @@ superset of parent/child/pids.current.
The pids.events file contains event counters: The pids.events file contains event counters:
- max: Number of times fork failed because limit was hit. - max: Number of times fork failed in the cgroup because limit was hit in
self or ancestors.
Example Example
------- -------
......
...@@ -239,6 +239,13 @@ cgroup v2 currently supports the following mount options. ...@@ -239,6 +239,13 @@ cgroup v2 currently supports the following mount options.
will not be tracked by the memory controller (even if cgroup will not be tracked by the memory controller (even if cgroup
v2 is remounted later on). v2 is remounted later on).
pids_localevents
The option restores v1-like behavior of pids.events:max, that is only
local (inside cgroup proper) fork failures are counted. Without this
option pids.events.max represents any pids.max enforcemnt across
cgroup's subtree.
Organizing Processes and Threads Organizing Processes and Threads
-------------------------------- --------------------------------
...@@ -2205,12 +2212,18 @@ PID Interface Files ...@@ -2205,12 +2212,18 @@ PID Interface Files
descendants has ever reached. descendants has ever reached.
pids.events pids.events
A read-only flat-keyed file which exists on non-root cgroups. The A read-only flat-keyed file which exists on non-root cgroups. Unless
following entries are defined. Unless specified otherwise, a value specified otherwise, a value change in this file generates a file
change in this file generates a file modified event. modified event. The following entries are defined.
max max
Number of times fork failed because limit was hit. The number of times the cgroup's total number of processes hit the pids.max
limit (see also pids_localevents).
pids.events.local
Similar to pids.events but the fields in the file are local
to the cgroup i.e. not hierarchical. The file modified event
generated on this file reflects only the local events.
Organisational operations are not blocked by cgroup policies, so it is Organisational operations are not blocked by cgroup policies, so it is
possible to have pids.current > pids.max. This can be done by either possible to have pids.current > pids.max. This can be done by either
...@@ -2346,8 +2359,12 @@ Cpuset Interface Files ...@@ -2346,8 +2359,12 @@ Cpuset Interface Files
is always a subset of it. is always a subset of it.
Users can manually set it to a value that is different from Users can manually set it to a value that is different from
"cpuset.cpus". The only constraint in setting it is that the "cpuset.cpus". One constraint in setting it is that the list of
list of CPUs must be exclusive with respect to its sibling. CPUs must be exclusive with respect to "cpuset.cpus.exclusive"
of its sibling. If "cpuset.cpus.exclusive" of a sibling cgroup
isn't set, its "cpuset.cpus" value, if set, cannot be a subset
of it to leave at least one CPU available when the exclusive
CPUs are taken away.
For a parent cgroup, any one of its exclusive CPUs can only For a parent cgroup, any one of its exclusive CPUs can only
be distributed to at most one of its child cgroups. Having an be distributed to at most one of its child cgroups. Having an
...@@ -2363,8 +2380,8 @@ Cpuset Interface Files ...@@ -2363,8 +2380,8 @@ Cpuset Interface Files
cpuset-enabled cgroups. cpuset-enabled cgroups.
This file shows the effective set of exclusive CPUs that This file shows the effective set of exclusive CPUs that
can be used to create a partition root. The content of this can be used to create a partition root. The content
file will always be a subset of "cpuset.cpus" and its parent's of this file will always be a subset of its parent's
"cpuset.cpus.exclusive.effective" if its parent is not the root "cpuset.cpus.exclusive.effective" if its parent is not the root
cgroup. It will also be a subset of "cpuset.cpus.exclusive" cgroup. It will also be a subset of "cpuset.cpus.exclusive"
if it is set. If "cpuset.cpus.exclusive" is not set, it is if it is set. If "cpuset.cpus.exclusive" is not set, it is
...@@ -2625,6 +2642,15 @@ Miscellaneous controller provides 3 interface files. If two misc resources (res_ ...@@ -2625,6 +2642,15 @@ Miscellaneous controller provides 3 interface files. If two misc resources (res_
res_a 3 res_a 3
res_b 0 res_b 0
misc.peak
A read-only flat-keyed file shown in all cgroups. It shows the
historical maximum usage of the resources in the cgroup and its
children.::
$ cat misc.peak
res_a 10
res_b 8
misc.max misc.max
A read-write flat-keyed file shown in the non root cgroups. Allowed A read-write flat-keyed file shown in the non root cgroups. Allowed
maximum usage of the resources in the cgroup and its children.:: maximum usage of the resources in the cgroup and its children.::
...@@ -2654,6 +2680,11 @@ Miscellaneous controller provides 3 interface files. If two misc resources (res_ ...@@ -2654,6 +2680,11 @@ Miscellaneous controller provides 3 interface files. If two misc resources (res_
The number of times the cgroup's resource usage was The number of times the cgroup's resource usage was
about to go over the max boundary. about to go over the max boundary.
misc.events.local
Similar to misc.events but the fields in the file are local to the
cgroup i.e. not hierarchical. The file modified event generated on
this file reflects only the local events.
Migration and Ownership Migration and Ownership
~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~
......
...@@ -5528,6 +5528,7 @@ CONTROL GROUP (CGROUP) ...@@ -5528,6 +5528,7 @@ CONTROL GROUP (CGROUP)
M: Tejun Heo <tj@kernel.org> M: Tejun Heo <tj@kernel.org>
M: Zefan Li <lizefan.x@bytedance.com> M: Zefan Li <lizefan.x@bytedance.com>
M: Johannes Weiner <hannes@cmpxchg.org> M: Johannes Weiner <hannes@cmpxchg.org>
M: Michal Koutný <mkoutny@suse.com>
L: cgroups@vger.kernel.org L: cgroups@vger.kernel.org
S: Maintained S: Maintained
T: git git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git T: git git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git
......
...@@ -120,6 +120,11 @@ enum { ...@@ -120,6 +120,11 @@ enum {
* Enable hugetlb accounting for the memory controller. * Enable hugetlb accounting for the memory controller.
*/ */
CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING = (1 << 19), CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING = (1 << 19),
/*
* Enable legacy local pids.events.
*/
CGRP_ROOT_PIDS_LOCAL_EVENTS = (1 << 20),
}; };
/* cftype->flags */ /* cftype->flags */
......
...@@ -9,15 +9,16 @@ ...@@ -9,15 +9,16 @@
#define _MISC_CGROUP_H_ #define _MISC_CGROUP_H_
/** /**
* Types of misc cgroup entries supported by the host. * enum misc_res_type - Types of misc cgroup entries supported by the host.
*/ */
enum misc_res_type { enum misc_res_type {
#ifdef CONFIG_KVM_AMD_SEV #ifdef CONFIG_KVM_AMD_SEV
/* AMD SEV ASIDs resource */ /** @MISC_CG_RES_SEV: AMD SEV ASIDs resource */
MISC_CG_RES_SEV, MISC_CG_RES_SEV,
/* AMD SEV-ES ASIDs resource */ /** @MISC_CG_RES_SEV_ES: AMD SEV-ES ASIDs resource */
MISC_CG_RES_SEV_ES, MISC_CG_RES_SEV_ES,
#endif #endif
/** @MISC_CG_RES_TYPES: count of enum misc_res_type constants */
MISC_CG_RES_TYPES MISC_CG_RES_TYPES
}; };
...@@ -30,13 +31,16 @@ struct misc_cg; ...@@ -30,13 +31,16 @@ struct misc_cg;
/** /**
* struct misc_res: Per cgroup per misc type resource * struct misc_res: Per cgroup per misc type resource
* @max: Maximum limit on the resource. * @max: Maximum limit on the resource.
* @watermark: Historical maximum usage of the resource.
* @usage: Current usage of the resource. * @usage: Current usage of the resource.
* @events: Number of times, the resource limit exceeded. * @events: Number of times, the resource limit exceeded.
*/ */
struct misc_res { struct misc_res {
u64 max; u64 max;
atomic64_t watermark;
atomic64_t usage; atomic64_t usage;
atomic64_t events; atomic64_t events;
atomic64_t events_local;
}; };
/** /**
...@@ -50,6 +54,8 @@ struct misc_cg { ...@@ -50,6 +54,8 @@ struct misc_cg {
/* misc.events */ /* misc.events */
struct cgroup_file events_file; struct cgroup_file events_file;
/* misc.events.local */
struct cgroup_file events_local_file;
struct misc_res res[MISC_CG_RES_TYPES]; struct misc_res res[MISC_CG_RES_TYPES];
}; };
......
...@@ -1744,9 +1744,12 @@ static int css_populate_dir(struct cgroup_subsys_state *css) ...@@ -1744,9 +1744,12 @@ static int css_populate_dir(struct cgroup_subsys_state *css)
if (cgroup_psi_enabled()) { if (cgroup_psi_enabled()) {
ret = cgroup_addrm_files(css, cgrp, ret = cgroup_addrm_files(css, cgrp,
cgroup_psi_files, true); cgroup_psi_files, true);
if (ret < 0) if (ret < 0) {
cgroup_addrm_files(css, cgrp,
cgroup_base_files, false);
return ret; return ret;
} }
}
} else { } else {
ret = cgroup_addrm_files(css, cgrp, ret = cgroup_addrm_files(css, cgrp,
cgroup1_base_files, true); cgroup1_base_files, true);
...@@ -1839,9 +1842,9 @@ int rebind_subsystems(struct cgroup_root *dst_root, u16 ss_mask) ...@@ -1839,9 +1842,9 @@ int rebind_subsystems(struct cgroup_root *dst_root, u16 ss_mask)
RCU_INIT_POINTER(scgrp->subsys[ssid], NULL); RCU_INIT_POINTER(scgrp->subsys[ssid], NULL);
rcu_assign_pointer(dcgrp->subsys[ssid], css); rcu_assign_pointer(dcgrp->subsys[ssid], css);
ss->root = dst_root; ss->root = dst_root;
css->cgroup = dcgrp;
spin_lock_irq(&css_set_lock); spin_lock_irq(&css_set_lock);
css->cgroup = dcgrp;
WARN_ON(!list_empty(&dcgrp->e_csets[ss->id])); WARN_ON(!list_empty(&dcgrp->e_csets[ss->id]));
list_for_each_entry_safe(cset, cset_pos, &scgrp->e_csets[ss->id], list_for_each_entry_safe(cset, cset_pos, &scgrp->e_csets[ss->id],
e_cset_node[ss->id]) { e_cset_node[ss->id]) {
...@@ -1922,6 +1925,7 @@ enum cgroup2_param { ...@@ -1922,6 +1925,7 @@ enum cgroup2_param {
Opt_memory_localevents, Opt_memory_localevents,
Opt_memory_recursiveprot, Opt_memory_recursiveprot,
Opt_memory_hugetlb_accounting, Opt_memory_hugetlb_accounting,
Opt_pids_localevents,
nr__cgroup2_params nr__cgroup2_params
}; };
...@@ -1931,6 +1935,7 @@ static const struct fs_parameter_spec cgroup2_fs_parameters[] = { ...@@ -1931,6 +1935,7 @@ static const struct fs_parameter_spec cgroup2_fs_parameters[] = {
fsparam_flag("memory_localevents", Opt_memory_localevents), fsparam_flag("memory_localevents", Opt_memory_localevents),
fsparam_flag("memory_recursiveprot", Opt_memory_recursiveprot), fsparam_flag("memory_recursiveprot", Opt_memory_recursiveprot),
fsparam_flag("memory_hugetlb_accounting", Opt_memory_hugetlb_accounting), fsparam_flag("memory_hugetlb_accounting", Opt_memory_hugetlb_accounting),
fsparam_flag("pids_localevents", Opt_pids_localevents),
{} {}
}; };
...@@ -1960,6 +1965,9 @@ static int cgroup2_parse_param(struct fs_context *fc, struct fs_parameter *param ...@@ -1960,6 +1965,9 @@ static int cgroup2_parse_param(struct fs_context *fc, struct fs_parameter *param
case Opt_memory_hugetlb_accounting: case Opt_memory_hugetlb_accounting:
ctx->flags |= CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING; ctx->flags |= CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING;
return 0; return 0;
case Opt_pids_localevents:
ctx->flags |= CGRP_ROOT_PIDS_LOCAL_EVENTS;
return 0;
} }
return -EINVAL; return -EINVAL;
} }
...@@ -1989,6 +1997,11 @@ static void apply_cgroup_root_flags(unsigned int root_flags) ...@@ -1989,6 +1997,11 @@ static void apply_cgroup_root_flags(unsigned int root_flags)
cgrp_dfl_root.flags |= CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING; cgrp_dfl_root.flags |= CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING;
else else
cgrp_dfl_root.flags &= ~CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING; cgrp_dfl_root.flags &= ~CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING;
if (root_flags & CGRP_ROOT_PIDS_LOCAL_EVENTS)
cgrp_dfl_root.flags |= CGRP_ROOT_PIDS_LOCAL_EVENTS;
else
cgrp_dfl_root.flags &= ~CGRP_ROOT_PIDS_LOCAL_EVENTS;
} }
} }
...@@ -2004,6 +2017,8 @@ static int cgroup_show_options(struct seq_file *seq, struct kernfs_root *kf_root ...@@ -2004,6 +2017,8 @@ static int cgroup_show_options(struct seq_file *seq, struct kernfs_root *kf_root
seq_puts(seq, ",memory_recursiveprot"); seq_puts(seq, ",memory_recursiveprot");
if (cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING) if (cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING)
seq_puts(seq, ",memory_hugetlb_accounting"); seq_puts(seq, ",memory_hugetlb_accounting");
if (cgrp_dfl_root.flags & CGRP_ROOT_PIDS_LOCAL_EVENTS)
seq_puts(seq, ",pids_localevents");
return 0; return 0;
} }
...@@ -6686,8 +6701,10 @@ void cgroup_exit(struct task_struct *tsk) ...@@ -6686,8 +6701,10 @@ void cgroup_exit(struct task_struct *tsk)
WARN_ON_ONCE(list_empty(&tsk->cg_list)); WARN_ON_ONCE(list_empty(&tsk->cg_list));
cset = task_css_set(tsk); cset = task_css_set(tsk);
css_set_move_task(tsk, cset, NULL, false); css_set_move_task(tsk, cset, NULL, false);
list_add_tail(&tsk->cg_list, &cset->dying_tasks);
cset->nr_tasks--; cset->nr_tasks--;
/* matches the signal->live check in css_task_iter_advance() */
if (thread_group_leader(tsk) && atomic_read(&tsk->signal->live))
list_add_tail(&tsk->cg_list, &cset->dying_tasks);
if (dl_task(tsk)) if (dl_task(tsk))
dec_dl_tasks_cs(tsk); dec_dl_tasks_cs(tsk);
...@@ -6714,10 +6731,12 @@ void cgroup_release(struct task_struct *task) ...@@ -6714,10 +6731,12 @@ void cgroup_release(struct task_struct *task)
ss->release(task); ss->release(task);
} while_each_subsys_mask(); } while_each_subsys_mask();
if (!list_empty(&task->cg_list)) {
spin_lock_irq(&css_set_lock); spin_lock_irq(&css_set_lock);
css_set_skip_task_iters(task_css_set(task), task); css_set_skip_task_iters(task_css_set(task), task);
list_del_init(&task->cg_list); list_del_init(&task->cg_list);
spin_unlock_irq(&css_set_lock); spin_unlock_irq(&css_set_lock);
}
} }
void cgroup_free(struct task_struct *task) void cgroup_free(struct task_struct *task)
...@@ -7062,7 +7081,8 @@ static ssize_t features_show(struct kobject *kobj, struct kobj_attribute *attr, ...@@ -7062,7 +7081,8 @@ static ssize_t features_show(struct kobject *kobj, struct kobj_attribute *attr,
"favordynmods\n" "favordynmods\n"
"memory_localevents\n" "memory_localevents\n"
"memory_recursiveprot\n" "memory_recursiveprot\n"
"memory_hugetlb_accounting\n"); "memory_hugetlb_accounting\n"
"pids_localevents\n");
} }
static struct kobj_attribute cgroup_features_attr = __ATTR_RO(features); static struct kobj_attribute cgroup_features_attr = __ATTR_RO(features);
......
This diff is collapsed.
...@@ -121,6 +121,30 @@ static void misc_cg_cancel_charge(enum misc_res_type type, struct misc_cg *cg, ...@@ -121,6 +121,30 @@ static void misc_cg_cancel_charge(enum misc_res_type type, struct misc_cg *cg,
misc_res_name[type]); misc_res_name[type]);
} }
static void misc_cg_update_watermark(struct misc_res *res, u64 new_usage)
{
u64 old;
while (true) {
old = atomic64_read(&res->watermark);
if (new_usage <= old)
break;
if (atomic64_cmpxchg(&res->watermark, old, new_usage) == old)
break;
}
}
static void misc_cg_event(enum misc_res_type type, struct misc_cg *cg)
{
atomic64_inc(&cg->res[type].events_local);
cgroup_file_notify(&cg->events_local_file);
for (; parent_misc(cg); cg = parent_misc(cg)) {
atomic64_inc(&cg->res[type].events);
cgroup_file_notify(&cg->events_file);
}
}
/** /**
* misc_cg_try_charge() - Try charging the misc cgroup. * misc_cg_try_charge() - Try charging the misc cgroup.
* @type: Misc res type to charge. * @type: Misc res type to charge.
...@@ -159,14 +183,12 @@ int misc_cg_try_charge(enum misc_res_type type, struct misc_cg *cg, u64 amount) ...@@ -159,14 +183,12 @@ int misc_cg_try_charge(enum misc_res_type type, struct misc_cg *cg, u64 amount)
ret = -EBUSY; ret = -EBUSY;
goto err_charge; goto err_charge;
} }
misc_cg_update_watermark(res, new_usage);
} }
return 0; return 0;
err_charge: err_charge:
for (j = i; j; j = parent_misc(j)) { misc_cg_event(type, i);
atomic64_inc(&j->res[type].events);
cgroup_file_notify(&j->events_file);
}
for (j = cg; j != i; j = parent_misc(j)) for (j = cg; j != i; j = parent_misc(j))
misc_cg_cancel_charge(type, j, amount); misc_cg_cancel_charge(type, j, amount);
...@@ -307,6 +329,29 @@ static int misc_cg_current_show(struct seq_file *sf, void *v) ...@@ -307,6 +329,29 @@ static int misc_cg_current_show(struct seq_file *sf, void *v)
return 0; return 0;
} }
/**
* misc_cg_peak_show() - Show the peak usage of the misc cgroup.
* @sf: Interface file
* @v: Arguments passed
*
* Context: Any context.
* Return: 0 to denote successful print.
*/
static int misc_cg_peak_show(struct seq_file *sf, void *v)
{
int i;
u64 watermark;
struct misc_cg *cg = css_misc(seq_css(sf));
for (i = 0; i < MISC_CG_RES_TYPES; i++) {
watermark = atomic64_read(&cg->res[i].watermark);
if (READ_ONCE(misc_res_capacity[i]) || watermark)
seq_printf(sf, "%s %llu\n", misc_res_name[i], watermark);
}
return 0;
}
/** /**
* misc_cg_capacity_show() - Show the total capacity of misc res on the host. * misc_cg_capacity_show() - Show the total capacity of misc res on the host.
* @sf: Interface file * @sf: Interface file
...@@ -331,13 +376,16 @@ static int misc_cg_capacity_show(struct seq_file *sf, void *v) ...@@ -331,13 +376,16 @@ static int misc_cg_capacity_show(struct seq_file *sf, void *v)
return 0; return 0;
} }
static int misc_events_show(struct seq_file *sf, void *v) static int __misc_events_show(struct seq_file *sf, bool local)
{ {
struct misc_cg *cg = css_misc(seq_css(sf)); struct misc_cg *cg = css_misc(seq_css(sf));
u64 events; u64 events;
int i; int i;
for (i = 0; i < MISC_CG_RES_TYPES; i++) { for (i = 0; i < MISC_CG_RES_TYPES; i++) {
if (local)
events = atomic64_read(&cg->res[i].events_local);
else
events = atomic64_read(&cg->res[i].events); events = atomic64_read(&cg->res[i].events);
if (READ_ONCE(misc_res_capacity[i]) || events) if (READ_ONCE(misc_res_capacity[i]) || events)
seq_printf(sf, "%s.max %llu\n", misc_res_name[i], events); seq_printf(sf, "%s.max %llu\n", misc_res_name[i], events);
...@@ -345,6 +393,16 @@ static int misc_events_show(struct seq_file *sf, void *v) ...@@ -345,6 +393,16 @@ static int misc_events_show(struct seq_file *sf, void *v)
return 0; return 0;
} }
static int misc_events_show(struct seq_file *sf, void *v)
{
return __misc_events_show(sf, false);
}
static int misc_events_local_show(struct seq_file *sf, void *v)
{
return __misc_events_show(sf, true);
}
/* Misc cgroup interface files */ /* Misc cgroup interface files */
static struct cftype misc_cg_files[] = { static struct cftype misc_cg_files[] = {
{ {
...@@ -357,6 +415,10 @@ static struct cftype misc_cg_files[] = { ...@@ -357,6 +415,10 @@ static struct cftype misc_cg_files[] = {
.name = "current", .name = "current",
.seq_show = misc_cg_current_show, .seq_show = misc_cg_current_show,
}, },
{
.name = "peak",
.seq_show = misc_cg_peak_show,
},
{ {
.name = "capacity", .name = "capacity",
.seq_show = misc_cg_capacity_show, .seq_show = misc_cg_capacity_show,
...@@ -368,6 +430,12 @@ static struct cftype misc_cg_files[] = { ...@@ -368,6 +430,12 @@ static struct cftype misc_cg_files[] = {
.file_offset = offsetof(struct misc_cg, events_file), .file_offset = offsetof(struct misc_cg, events_file),
.seq_show = misc_events_show, .seq_show = misc_events_show,
}, },
{
.name = "events.local",
.flags = CFTYPE_NOT_ON_ROOT,
.file_offset = offsetof(struct misc_cg, events_local_file),
.seq_show = misc_events_local_show,
},
{} {}
}; };
......
...@@ -38,6 +38,14 @@ ...@@ -38,6 +38,14 @@
#define PIDS_MAX (PID_MAX_LIMIT + 1ULL) #define PIDS_MAX (PID_MAX_LIMIT + 1ULL)
#define PIDS_MAX_STR "max" #define PIDS_MAX_STR "max"
enum pidcg_event {
/* Fork failed in subtree because this pids_cgroup limit was hit. */
PIDCG_MAX,
/* Fork failed in this pids_cgroup because ancestor limit was hit. */
PIDCG_FORKFAIL,
NR_PIDCG_EVENTS,
};
struct pids_cgroup { struct pids_cgroup {
struct cgroup_subsys_state css; struct cgroup_subsys_state css;
...@@ -49,11 +57,12 @@ struct pids_cgroup { ...@@ -49,11 +57,12 @@ struct pids_cgroup {
atomic64_t limit; atomic64_t limit;
int64_t watermark; int64_t watermark;
/* Handle for "pids.events" */ /* Handles for pids.events[.local] */
struct cgroup_file events_file; struct cgroup_file events_file;
struct cgroup_file events_local_file;
/* Number of times fork failed because limit was hit. */ atomic64_t events[NR_PIDCG_EVENTS];
atomic64_t events_limit; atomic64_t events_local[NR_PIDCG_EVENTS];
}; };
static struct pids_cgroup *css_pids(struct cgroup_subsys_state *css) static struct pids_cgroup *css_pids(struct cgroup_subsys_state *css)
...@@ -148,12 +157,13 @@ static void pids_charge(struct pids_cgroup *pids, int num) ...@@ -148,12 +157,13 @@ static void pids_charge(struct pids_cgroup *pids, int num)
* pids_try_charge - hierarchically try to charge the pid count * pids_try_charge - hierarchically try to charge the pid count
* @pids: the pid cgroup state * @pids: the pid cgroup state
* @num: the number of pids to charge * @num: the number of pids to charge
* @fail: storage of pid cgroup causing the fail
* *
* This function follows the set limit. It will fail if the charge would cause * This function follows the set limit. It will fail if the charge would cause
* the new value to exceed the hierarchical limit. Returns 0 if the charge * the new value to exceed the hierarchical limit. Returns 0 if the charge
* succeeded, otherwise -EAGAIN. * succeeded, otherwise -EAGAIN.
*/ */
static int pids_try_charge(struct pids_cgroup *pids, int num) static int pids_try_charge(struct pids_cgroup *pids, int num, struct pids_cgroup **fail)
{ {
struct pids_cgroup *p, *q; struct pids_cgroup *p, *q;
...@@ -166,9 +176,10 @@ static int pids_try_charge(struct pids_cgroup *pids, int num) ...@@ -166,9 +176,10 @@ static int pids_try_charge(struct pids_cgroup *pids, int num)
* p->limit is %PIDS_MAX then we know that this test will never * p->limit is %PIDS_MAX then we know that this test will never
* fail. * fail.
*/ */
if (new > limit) if (new > limit) {
*fail = p;
goto revert; goto revert;
}
/* /*
* Not technically accurate if we go over limit somewhere up * Not technically accurate if we go over limit somewhere up
* the hierarchy, but that's tolerable for the watermark. * the hierarchy, but that's tolerable for the watermark.
...@@ -229,6 +240,36 @@ static void pids_cancel_attach(struct cgroup_taskset *tset) ...@@ -229,6 +240,36 @@ static void pids_cancel_attach(struct cgroup_taskset *tset)
} }
} }
static void pids_event(struct pids_cgroup *pids_forking,
struct pids_cgroup *pids_over_limit)
{
struct pids_cgroup *p = pids_forking;
bool limit = false;
/* Only log the first time limit is hit. */
if (atomic64_inc_return(&p->events_local[PIDCG_FORKFAIL]) == 1) {
pr_info("cgroup: fork rejected by pids controller in ");
pr_cont_cgroup_path(p->css.cgroup);
pr_cont("\n");
}
cgroup_file_notify(&p->events_local_file);
if (!cgroup_subsys_on_dfl(pids_cgrp_subsys) ||
cgrp_dfl_root.flags & CGRP_ROOT_PIDS_LOCAL_EVENTS)
return;
for (; parent_pids(p); p = parent_pids(p)) {
if (p == pids_over_limit) {
limit = true;
atomic64_inc(&p->events_local[PIDCG_MAX]);
cgroup_file_notify(&p->events_local_file);
}
if (limit)
atomic64_inc(&p->events[PIDCG_MAX]);
cgroup_file_notify(&p->events_file);
}
}
/* /*
* task_css_check(true) in pids_can_fork() and pids_cancel_fork() relies * task_css_check(true) in pids_can_fork() and pids_cancel_fork() relies
* on cgroup_threadgroup_change_begin() held by the copy_process(). * on cgroup_threadgroup_change_begin() held by the copy_process().
...@@ -236,7 +277,7 @@ static void pids_cancel_attach(struct cgroup_taskset *tset) ...@@ -236,7 +277,7 @@ static void pids_cancel_attach(struct cgroup_taskset *tset)
static int pids_can_fork(struct task_struct *task, struct css_set *cset) static int pids_can_fork(struct task_struct *task, struct css_set *cset)
{ {
struct cgroup_subsys_state *css; struct cgroup_subsys_state *css;
struct pids_cgroup *pids; struct pids_cgroup *pids, *pids_over_limit;
int err; int err;
if (cset) if (cset)
...@@ -244,16 +285,10 @@ static int pids_can_fork(struct task_struct *task, struct css_set *cset) ...@@ -244,16 +285,10 @@ static int pids_can_fork(struct task_struct *task, struct css_set *cset)
else else
css = task_css_check(current, pids_cgrp_id, true); css = task_css_check(current, pids_cgrp_id, true);
pids = css_pids(css); pids = css_pids(css);
err = pids_try_charge(pids, 1); err = pids_try_charge(pids, 1, &pids_over_limit);
if (err) { if (err)
/* Only log the first time events_limit is incremented. */ pids_event(pids, pids_over_limit);
if (atomic64_inc_return(&pids->events_limit) == 1) {
pr_info("cgroup: fork rejected by pids controller in ");
pr_cont_cgroup_path(css->cgroup);
pr_cont("\n");
}
cgroup_file_notify(&pids->events_file);
}
return err; return err;
} }
...@@ -337,11 +372,32 @@ static s64 pids_peak_read(struct cgroup_subsys_state *css, ...@@ -337,11 +372,32 @@ static s64 pids_peak_read(struct cgroup_subsys_state *css,
return READ_ONCE(pids->watermark); return READ_ONCE(pids->watermark);
} }
static int pids_events_show(struct seq_file *sf, void *v) static int __pids_events_show(struct seq_file *sf, bool local)
{ {
struct pids_cgroup *pids = css_pids(seq_css(sf)); struct pids_cgroup *pids = css_pids(seq_css(sf));
enum pidcg_event pe = PIDCG_MAX;
atomic64_t *events;
if (!cgroup_subsys_on_dfl(pids_cgrp_subsys) ||
cgrp_dfl_root.flags & CGRP_ROOT_PIDS_LOCAL_EVENTS) {
pe = PIDCG_FORKFAIL;
local = true;
}
events = local ? pids->events_local : pids->events;
seq_printf(sf, "max %lld\n", (s64)atomic64_read(&events[pe]));
return 0;
}
static int pids_events_show(struct seq_file *sf, void *v)
{
__pids_events_show(sf, false);
return 0;
}
seq_printf(sf, "max %lld\n", (s64)atomic64_read(&pids->events_limit)); static int pids_events_local_show(struct seq_file *sf, void *v)
{
__pids_events_show(sf, true);
return 0; return 0;
} }
...@@ -368,9 +424,42 @@ static struct cftype pids_files[] = { ...@@ -368,9 +424,42 @@ static struct cftype pids_files[] = {
.file_offset = offsetof(struct pids_cgroup, events_file), .file_offset = offsetof(struct pids_cgroup, events_file),
.flags = CFTYPE_NOT_ON_ROOT, .flags = CFTYPE_NOT_ON_ROOT,
}, },
{
.name = "events.local",
.seq_show = pids_events_local_show,
.file_offset = offsetof(struct pids_cgroup, events_local_file),
.flags = CFTYPE_NOT_ON_ROOT,
},
{ } /* terminate */ { } /* terminate */
}; };
static struct cftype pids_files_legacy[] = {
{
.name = "max",
.write = pids_max_write,
.seq_show = pids_max_show,
.flags = CFTYPE_NOT_ON_ROOT,
},
{
.name = "current",
.read_s64 = pids_current_read,
.flags = CFTYPE_NOT_ON_ROOT,
},
{
.name = "peak",
.flags = CFTYPE_NOT_ON_ROOT,
.read_s64 = pids_peak_read,
},
{
.name = "events",
.seq_show = pids_events_show,
.file_offset = offsetof(struct pids_cgroup, events_file),
.flags = CFTYPE_NOT_ON_ROOT,
},
{ } /* terminate */
};
struct cgroup_subsys pids_cgrp_subsys = { struct cgroup_subsys pids_cgrp_subsys = {
.css_alloc = pids_css_alloc, .css_alloc = pids_css_alloc,
.css_free = pids_css_free, .css_free = pids_css_free,
...@@ -379,7 +468,7 @@ struct cgroup_subsys pids_cgrp_subsys = { ...@@ -379,7 +468,7 @@ struct cgroup_subsys pids_cgrp_subsys = {
.can_fork = pids_can_fork, .can_fork = pids_can_fork,
.cancel_fork = pids_cancel_fork, .cancel_fork = pids_cancel_fork,
.release = pids_release, .release = pids_release,
.legacy_cftypes = pids_files, .legacy_cftypes = pids_files_legacy,
.dfl_cftypes = pids_files, .dfl_cftypes = pids_files,
.threaded = true, .threaded = true,
}; };
...@@ -594,49 +594,46 @@ static void root_cgroup_cputime(struct cgroup_base_stat *bstat) ...@@ -594,49 +594,46 @@ static void root_cgroup_cputime(struct cgroup_base_stat *bstat)
} }
} }
static void cgroup_force_idle_show(struct seq_file *seq, struct cgroup_base_stat *bstat)
{
#ifdef CONFIG_SCHED_CORE
u64 forceidle_time = bstat->forceidle_sum;
do_div(forceidle_time, NSEC_PER_USEC);
seq_printf(seq, "core_sched.force_idle_usec %llu\n", forceidle_time);
#endif
}
void cgroup_base_stat_cputime_show(struct seq_file *seq) void cgroup_base_stat_cputime_show(struct seq_file *seq)
{ {
struct cgroup *cgrp = seq_css(seq)->cgroup; struct cgroup *cgrp = seq_css(seq)->cgroup;
u64 usage, utime, stime; u64 usage, utime, stime;
struct cgroup_base_stat bstat;
#ifdef CONFIG_SCHED_CORE
u64 forceidle_time;
#endif
if (cgroup_parent(cgrp)) { if (cgroup_parent(cgrp)) {
cgroup_rstat_flush_hold(cgrp); cgroup_rstat_flush_hold(cgrp);
usage = cgrp->bstat.cputime.sum_exec_runtime; usage = cgrp->bstat.cputime.sum_exec_runtime;
cputime_adjust(&cgrp->bstat.cputime, &cgrp->prev_cputime, cputime_adjust(&cgrp->bstat.cputime, &cgrp->prev_cputime,
&utime, &stime); &utime, &stime);
#ifdef CONFIG_SCHED_CORE
forceidle_time = cgrp->bstat.forceidle_sum;
#endif
cgroup_rstat_flush_release(cgrp); cgroup_rstat_flush_release(cgrp);
} else { } else {
root_cgroup_cputime(&bstat); /* cgrp->bstat of root is not actually used, reuse it */
usage = bstat.cputime.sum_exec_runtime; root_cgroup_cputime(&cgrp->bstat);
utime = bstat.cputime.utime; usage = cgrp->bstat.cputime.sum_exec_runtime;
stime = bstat.cputime.stime; utime = cgrp->bstat.cputime.utime;
#ifdef CONFIG_SCHED_CORE stime = cgrp->bstat.cputime.stime;
forceidle_time = bstat.forceidle_sum;
#endif
} }
do_div(usage, NSEC_PER_USEC); do_div(usage, NSEC_PER_USEC);
do_div(utime, NSEC_PER_USEC); do_div(utime, NSEC_PER_USEC);
do_div(stime, NSEC_PER_USEC); do_div(stime, NSEC_PER_USEC);
#ifdef CONFIG_SCHED_CORE
do_div(forceidle_time, NSEC_PER_USEC);
#endif
seq_printf(seq, "usage_usec %llu\n" seq_printf(seq, "usage_usec %llu\n"
"user_usec %llu\n" "user_usec %llu\n"
"system_usec %llu\n", "system_usec %llu\n",
usage, utime, stime); usage, utime, stime);
#ifdef CONFIG_SCHED_CORE cgroup_force_idle_show(seq, &cgrp->bstat);
seq_printf(seq, "core_sched.force_idle_usec %llu\n", forceidle_time);
#endif
} }
/* Add bpf kfuncs for cgroup_rstat_updated() and cgroup_rstat_flush() */ /* Add bpf kfuncs for cgroup_rstat_updated() and cgroup_rstat_flush() */
......
# SPDX-License-Identifier: GPL-2.0-only # SPDX-License-Identifier: GPL-2.0-only
test_memcontrol
test_core test_core
test_freezer
test_kmem
test_kill
test_cpu test_cpu
test_cpuset test_cpuset
test_zswap test_freezer
test_hugetlb_memcg test_hugetlb_memcg
test_kill
test_kmem
test_memcontrol
test_pids
test_zswap
wait_inotify wait_inotify
...@@ -6,26 +6,29 @@ all: ${HELPER_PROGS} ...@@ -6,26 +6,29 @@ all: ${HELPER_PROGS}
TEST_FILES := with_stress.sh TEST_FILES := with_stress.sh
TEST_PROGS := test_stress.sh test_cpuset_prs.sh test_cpuset_v1_hp.sh TEST_PROGS := test_stress.sh test_cpuset_prs.sh test_cpuset_v1_hp.sh
TEST_GEN_FILES := wait_inotify TEST_GEN_FILES := wait_inotify
TEST_GEN_PROGS = test_memcontrol # Keep the lists lexicographically sorted
TEST_GEN_PROGS += test_kmem TEST_GEN_PROGS = test_core
TEST_GEN_PROGS += test_core
TEST_GEN_PROGS += test_freezer
TEST_GEN_PROGS += test_kill
TEST_GEN_PROGS += test_cpu TEST_GEN_PROGS += test_cpu
TEST_GEN_PROGS += test_cpuset TEST_GEN_PROGS += test_cpuset
TEST_GEN_PROGS += test_zswap TEST_GEN_PROGS += test_freezer
TEST_GEN_PROGS += test_hugetlb_memcg TEST_GEN_PROGS += test_hugetlb_memcg
TEST_GEN_PROGS += test_kill
TEST_GEN_PROGS += test_kmem
TEST_GEN_PROGS += test_memcontrol
TEST_GEN_PROGS += test_pids
TEST_GEN_PROGS += test_zswap
LOCAL_HDRS += $(selfdir)/clone3/clone3_selftests.h $(selfdir)/pidfd/pidfd.h LOCAL_HDRS += $(selfdir)/clone3/clone3_selftests.h $(selfdir)/pidfd/pidfd.h
include ../lib.mk include ../lib.mk
$(OUTPUT)/test_memcontrol: cgroup_util.c
$(OUTPUT)/test_kmem: cgroup_util.c
$(OUTPUT)/test_core: cgroup_util.c $(OUTPUT)/test_core: cgroup_util.c
$(OUTPUT)/test_freezer: cgroup_util.c
$(OUTPUT)/test_kill: cgroup_util.c
$(OUTPUT)/test_cpu: cgroup_util.c $(OUTPUT)/test_cpu: cgroup_util.c
$(OUTPUT)/test_cpuset: cgroup_util.c $(OUTPUT)/test_cpuset: cgroup_util.c
$(OUTPUT)/test_zswap: cgroup_util.c $(OUTPUT)/test_freezer: cgroup_util.c
$(OUTPUT)/test_hugetlb_memcg: cgroup_util.c $(OUTPUT)/test_hugetlb_memcg: cgroup_util.c
$(OUTPUT)/test_kill: cgroup_util.c
$(OUTPUT)/test_kmem: cgroup_util.c
$(OUTPUT)/test_memcontrol: cgroup_util.c
$(OUTPUT)/test_pids: cgroup_util.c
$(OUTPUT)/test_zswap: cgroup_util.c
...@@ -28,6 +28,14 @@ CPULIST=$(cat $CGROUP2/cpuset.cpus.effective) ...@@ -28,6 +28,14 @@ CPULIST=$(cat $CGROUP2/cpuset.cpus.effective)
NR_CPUS=$(lscpu | grep "^CPU(s):" | sed -e "s/.*:[[:space:]]*//") NR_CPUS=$(lscpu | grep "^CPU(s):" | sed -e "s/.*:[[:space:]]*//")
[[ $NR_CPUS -lt 8 ]] && skip_test "Test needs at least 8 cpus available!" [[ $NR_CPUS -lt 8 ]] && skip_test "Test needs at least 8 cpus available!"
# Check to see if /dev/console exists and is writable
if [[ -c /dev/console && -w /dev/console ]]
then
CONSOLE=/dev/console
else
CONSOLE=/dev/null
fi
# Set verbose flag and delay factor # Set verbose flag and delay factor
PROG=$1 PROG=$1
VERBOSE=0 VERBOSE=0
...@@ -103,8 +111,8 @@ console_msg() ...@@ -103,8 +111,8 @@ console_msg()
{ {
MSG=$1 MSG=$1
echo "$MSG" echo "$MSG"
echo "" > /dev/console echo "" > $CONSOLE
echo "$MSG" > /dev/console echo "$MSG" > $CONSOLE
pause 0.01 pause 0.01
} }
...@@ -161,6 +169,14 @@ test_add_proc() ...@@ -161,6 +169,14 @@ test_add_proc()
# T = put a task into cgroup # T = put a task into cgroup
# O<c>=<v> = Write <v> to CPU online file of <c> # O<c>=<v> = Write <v> to CPU online file of <c>
# #
# ECPUs - effective CPUs of cpusets
# Pstate - partition root state
# ISOLCPUS - isolated CPUs (<icpus>[,<icpus2>])
#
# Note that if there are 2 fields in ISOLCPUS, the first one is for
# sched-debug matching which includes offline CPUs and single-CPU partitions
# while the second one is for matching cpuset.cpus.isolated.
#
SETUP_A123_PARTITIONS="C1-3:P1:S+ C2-3:P1:S+ C3:P1" SETUP_A123_PARTITIONS="C1-3:P1:S+ C2-3:P1:S+ C3:P1"
TEST_MATRIX=( TEST_MATRIX=(
# old-A1 old-A2 old-A3 old-B1 new-A1 new-A2 new-A3 new-B1 fail ECPUs Pstate ISOLCPUS # old-A1 old-A2 old-A3 old-B1 new-A1 new-A2 new-A3 new-B1 fail ECPUs Pstate ISOLCPUS
...@@ -220,23 +236,29 @@ TEST_MATRIX=( ...@@ -220,23 +236,29 @@ TEST_MATRIX=(
" C0-3:S+ C1-3:S+ C2-3 . X2-3 X2-3:P2 . . 0 A1:0-1,A2:2-3,A3:2-3 A1:P0,A2:P2 2-3" " C0-3:S+ C1-3:S+ C2-3 . X2-3 X2-3:P2 . . 0 A1:0-1,A2:2-3,A3:2-3 A1:P0,A2:P2 2-3"
" C0-3:S+ C1-3:S+ C2-3 . X2-3 X3:P2 . . 0 A1:0-2,A2:3,A3:3 A1:P0,A2:P2 3" " C0-3:S+ C1-3:S+ C2-3 . X2-3 X3:P2 . . 0 A1:0-2,A2:3,A3:3 A1:P0,A2:P2 3"
" C0-3:S+ C1-3:S+ C2-3 . X2-3 X2-3 X2-3:P2 . 0 A1:0-1,A2:1,A3:2-3 A1:P0,A3:P2 2-3" " C0-3:S+ C1-3:S+ C2-3 . X2-3 X2-3 X2-3:P2 . 0 A1:0-1,A2:1,A3:2-3 A1:P0,A3:P2 2-3"
" C0-3:S+ C1-3:S+ C2-3 . X2-3 X2-3 X2-3:P2:C3 . 0 A1:0-2,A2:1-2,A3:3 A1:P0,A3:P2 3" " C0-3:S+ C1-3:S+ C2-3 . X2-3 X2-3 X2-3:P2:C3 . 0 A1:0-1,A2:1,A3:2-3 A1:P0,A3:P2 2-3"
" C0-3:S+ C1-3:S+ C2-3 C2-3 . . . P2 0 A1:0-3,A2:1-3,A3:2-3,B1:2-3 A1:P0,A3:P0,B1:P-2" " C0-3:S+ C1-3:S+ C2-3 C2-3 . . . P2 0 A1:0-3,A2:1-3,A3:2-3,B1:2-3 A1:P0,A3:P0,B1:P-2"
" C0-3:S+ C1-3:S+ C2-3 C4-5 . . . P2 0 B1:4-5 B1:P2 4-5" " C0-3:S+ C1-3:S+ C2-3 C4-5 . . . P2 0 B1:4-5 B1:P2 4-5"
" C0-3:S+ C1-3:S+ C2-3 C4 X2-3 X2-3 X2-3:P2 P2 0 A3:2-3,B1:4 A3:P2,B1:P2 2-4" " C0-3:S+ C1-3:S+ C2-3 C4 X2-3 X2-3 X2-3:P2 P2 0 A3:2-3,B1:4 A3:P2,B1:P2 2-4"
" C0-3:S+ C1-3:S+ C2-3 C4 X2-3 X2-3 X2-3:P2:C1-3 P2 0 A3:2-3,B1:4 A3:P2,B1:P2 2-4" " C0-3:S+ C1-3:S+ C2-3 C4 X2-3 X2-3 X2-3:P2:C1-3 P2 0 A3:2-3,B1:4 A3:P2,B1:P2 2-4"
" C0-3:S+ C1-3:S+ C2-3 C4 X1-3 X1-3:P2 P2 . 0 A2:1,A3:2-3 A2:P2,A3:P2 1-3" " C0-3:S+ C1-3:S+ C2-3 C4 X1-3 X1-3:P2 P2 . 0 A2:1,A3:2-3 A2:P2,A3:P2 1-3"
" C0-3:S+ C1-3:S+ C2-3 C4 X2-3 X2-3 X2-3:P2 P2:C4-5 0 A3:2-3,B1:4-5 A3:P2,B1:P2 2-5" " C0-3:S+ C1-3:S+ C2-3 C4 X2-3 X2-3 X2-3:P2 P2:C4-5 0 A3:2-3,B1:4-5 A3:P2,B1:P2 2-5"
" C4:X0-3:S+ X1-3:S+ X2-3 . . P2 . . 0 A1:4,A2:1-3,A3:1-3 A2:P2 1-3"
" C4:X0-3:S+ X1-3:S+ X2-3 . . . P2 . 0 A1:4,A2:4,A3:2-3 A3:P2 2-3"
# Nested remote/local partition tests # Nested remote/local partition tests
" C0-3:S+ C1-3:S+ C2-3 C4-5 X2-3 X2-3:P1 P2 P1 0 A1:0-1,A2:,A3:2-3,B1:4-5 \ " C0-3:S+ C1-3:S+ C2-3 C4-5 X2-3 X2-3:P1 P2 P1 0 A1:0-1,A2:,A3:2-3,B1:4-5 \
A1:P0,A2:P1,A3:P2,B1:P1 2-3" A1:P0,A2:P1,A3:P2,B1:P1 2-3"
" C0-3:S+ C1-3:S+ C2-3 C4 X2-3 X2-3:P1 P2 P1 0 A1:0-1,A2:,A3:2-3,B1:4 \ " C0-3:S+ C1-3:S+ C2-3 C4 X2-3 X2-3:P1 P2 P1 0 A1:0-1,A2:,A3:2-3,B1:4 \
A1:P0,A2:P1,A3:P2,B1:P1 2-4,2-3" A1:P0,A2:P1,A3:P2,B1:P1 2-4,2-3"
" C0-3:S+ C1-3:S+ C2-3 C4 X2-3 X2-3:P1 . P1 0 A1:0-1,A2:2-3,A3:2-3,B1:4 \
A1:P0,A2:P1,A3:P0,B1:P1"
" C0-3:S+ C1-3:S+ C3 C4 X2-3 X2-3:P1 P2 P1 0 A1:0-1,A2:2,A3:3,B1:4 \ " C0-3:S+ C1-3:S+ C3 C4 X2-3 X2-3:P1 P2 P1 0 A1:0-1,A2:2,A3:3,B1:4 \
A1:P0,A2:P1,A3:P2,B1:P1 2-4,3" A1:P0,A2:P1,A3:P2,B1:P1 2-4,3"
" C0-4:S+ C1-4:S+ C2-4 . X2-4 X2-4:P2 X4:P1 . 0 A1:0-1,A2:2-3,A3:4 \ " C0-4:S+ C1-4:S+ C2-4 . X2-4 X2-4:P2 X4:P1 . 0 A1:0-1,A2:2-3,A3:4 \
A1:P0,A2:P2,A3:P1 2-4,2-3" A1:P0,A2:P2,A3:P1 2-4,2-3"
" C0-4:S+ C1-4:S+ C2-4 . X2-4 X2-4:P2 X3-4:P1 . 0 A1:0-1,A2:2,A3:3-4 \
A1:P0,A2:P2,A3:P1 2"
" C0-4:X2-4:S+ C1-4:X2-4:S+:P2 C2-4:X4:P1 \ " C0-4:X2-4:S+ C1-4:X2-4:S+:P2 C2-4:X4:P1 \
. . X5 . . 0 A1:0-4,A2:1-4,A3:2-4 \ . . X5 . . 0 A1:0-4,A2:1-4,A3:2-4 \
A1:P0,A2:P-2,A3:P-1" A1:P0,A2:P-2,A3:P-1"
...@@ -262,8 +284,8 @@ TEST_MATRIX=( ...@@ -262,8 +284,8 @@ TEST_MATRIX=(
. . X2-3 P2 . . 0 A1:0-2,A2:3,XA2:3 A2:P2 3" . . X2-3 P2 . . 0 A1:0-2,A2:3,XA2:3 A2:P2 3"
# Invalid to valid local partition direct transition tests # Invalid to valid local partition direct transition tests
" C1-3:S+:P2 C2-3:X1:P2 . . . . . . 0 A1:1-3,XA1:1-3,A2:2-3:XA2: A1:P2,A2:P-2 1-3" " C1-3:S+:P2 X4:P2 . . . . . . 0 A1:1-3,XA1:1-3,A2:1-3:XA2: A1:P2,A2:P-2 1-3"
" C1-3:S+:P2 C2-3:X1:P2 . . . X3:P2 . . 0 A1:1-2,XA1:1-3,A2:3:XA2:3 A1:P2,A2:P2 1-3" " C1-3:S+:P2 X4:P2 . . . X3:P2 . . 0 A1:1-2,XA1:1-3,A2:3:XA2:3 A1:P2,A2:P2 1-3"
" C0-3:P2 . . C4-6 C0-4 . . . 0 A1:0-4,B1:4-6 A1:P-2,B1:P0" " C0-3:P2 . . C4-6 C0-4 . . . 0 A1:0-4,B1:4-6 A1:P-2,B1:P0"
" C0-3:P2 . . C4-6 C0-4:C0-3 . . . 0 A1:0-3,B1:4-6 A1:P2,B1:P0 0-3" " C0-3:P2 . . C4-6 C0-4:C0-3 . . . 0 A1:0-3,B1:4-6 A1:P2,B1:P0 0-3"
" C0-3:P2 . . C3-5:C4-5 . . . . 0 A1:0-3,B1:4-5 A1:P2,B1:P0 0-3" " C0-3:P2 . . C3-5:C4-5 . . . . 0 A1:0-3,B1:4-5 A1:P2,B1:P0 0-3"
...@@ -274,21 +296,18 @@ TEST_MATRIX=( ...@@ -274,21 +296,18 @@ TEST_MATRIX=(
" C0-3:X1-3:S+:P2 C1-3:X2-3:S+:P2 C2-3:X3:P2 \ " C0-3:X1-3:S+:P2 C1-3:X2-3:S+:P2 C2-3:X3:P2 \
. . X4 . . 0 A1:1-3,A2:1-3,A3:2-3,XA2:,XA3: A1:P2,A2:P-2,A3:P-2 1-3" . . X4 . . 0 A1:1-3,A2:1-3,A3:2-3,XA2:,XA3: A1:P2,A2:P-2,A3:P-2 1-3"
" C0-3:X1-3:S+:P2 C1-3:X2-3:S+:P2 C2-3:X3:P2 \ " C0-3:X1-3:S+:P2 C1-3:X2-3:S+:P2 C2-3:X3:P2 \
. . C4 . . 0 A1:1-3,A2:1-3,A3:2-3,XA2:,XA3: A1:P2,A2:P-2,A3:P-2 1-3" . . C4:X . . 0 A1:1-3,A2:1-3,A3:2-3,XA2:,XA3: A1:P2,A2:P-2,A3:P-2 1-3"
# Local partition CPU change tests # Local partition CPU change tests
" C0-5:S+:P2 C4-5:S+:P1 . . . C3-5 . . 0 A1:0-2,A2:3-5 A1:P2,A2:P1 0-2" " C0-5:S+:P2 C4-5:S+:P1 . . . C3-5 . . 0 A1:0-2,A2:3-5 A1:P2,A2:P1 0-2"
" C0-5:S+:P2 C4-5:S+:P1 . . C1-5 . . . 0 A1:1-3,A2:4-5 A1:P2,A2:P1 1-3" " C0-5:S+:P2 C4-5:S+:P1 . . C1-5 . . . 0 A1:1-3,A2:4-5 A1:P2,A2:P1 1-3"
# cpus_allowed/exclusive_cpus update tests # cpus_allowed/exclusive_cpus update tests
" C0-3:X2-3:S+ C1-3:X2-3:S+ C2-3:X2-3 \ " C0-3:X2-3:S+ C1-3:X2-3:S+ C2-3:X2-3 \
. C4 . P2 . 0 A1:4,A2:4,XA2:,XA3:,A3:4 \ . X:C4 . P2 . 0 A1:4,A2:4,XA2:,XA3:,A3:4 \
A1:P0,A3:P-2" A1:P0,A3:P-2"
" C0-3:X2-3:S+ C1-3:X2-3:S+ C2-3:X2-3 \ " C0-3:X2-3:S+ C1-3:X2-3:S+ C2-3:X2-3 \
. X1 . P2 . 0 A1:0-3,A2:1-3,XA1:1,XA2:,XA3:,A3:2-3 \ . X1 . P2 . 0 A1:0-3,A2:1-3,XA1:1,XA2:,XA3:,A3:2-3 \
A1:P0,A3:P-2" A1:P0,A3:P-2"
" C0-3:X2-3:S+ C1-3:X2-3:S+ C2-3:X2-3 \
. . C3 P2 . 0 A1:0-2,A2:0-2,XA2:3,XA3:3,A3:3 \
A1:P0,A3:P2 3"
" C0-3:X2-3:S+ C1-3:X2-3:S+ C2-3:X2-3 \ " C0-3:X2-3:S+ C1-3:X2-3:S+ C2-3:X2-3 \
. . X3 P2 . 0 A1:0-2,A2:1-2,XA2:3,XA3:3,A3:3 \ . . X3 P2 . 0 A1:0-2,A2:1-2,XA2:3,XA3:3,A3:3 \
A1:P0,A3:P2 3" A1:P0,A3:P2 3"
...@@ -296,10 +315,7 @@ TEST_MATRIX=( ...@@ -296,10 +315,7 @@ TEST_MATRIX=(
. . X3 . . 0 A1:0-3,A2:1-3,XA2:3,XA3:3,A3:2-3 \ . . X3 . . 0 A1:0-3,A2:1-3,XA2:3,XA3:3,A3:2-3 \
A1:P0,A3:P-2" A1:P0,A3:P-2"
" C0-3:X2-3:S+ C1-3:X2-3:S+ C2-3:X2-3:P2 \ " C0-3:X2-3:S+ C1-3:X2-3:S+ C2-3:X2-3:P2 \
. . C3 . . 0 A1:0-3,A2:3,XA2:3,XA3:3,A3:3 \ . X4 . . . 0 A1:0-3,A2:1-3,A3:2-3,XA1:4,XA2:,XA3 \
A1:P0,A3:P-2"
" C0-3:X2-3:S+ C1-3:X2-3:S+ C2-3:X2-3:P2 \
. C4 . . . 0 A1:4,A2:4,A3:4,XA1:,XA2:,XA3 \
A1:P0,A3:P-2" A1:P0,A3:P-2"
# old-A1 old-A2 old-A3 old-B1 new-A1 new-A2 new-A3 new-B1 fail ECPUs Pstate ISOLCPUS # old-A1 old-A2 old-A3 old-B1 new-A1 new-A2 new-A3 new-B1 fail ECPUs Pstate ISOLCPUS
...@@ -346,6 +362,9 @@ TEST_MATRIX=( ...@@ -346,6 +362,9 @@ TEST_MATRIX=(
" C0-1:P1 . . P1:C2-3 C0-2 . . . 0 A1:0-2,B1:2-3 A1:P-1,B1:P-1" " C0-1:P1 . . P1:C2-3 C0-2 . . . 0 A1:0-2,B1:2-3 A1:P-1,B1:P-1"
" C0-1 . . P1:C2-3 C0-2 . . . 0 A1:0-2,B1:2-3 A1:P0,B1:P-1" " C0-1 . . P1:C2-3 C0-2 . . . 0 A1:0-2,B1:2-3 A1:P0,B1:P-1"
# cpuset.cpus can overlap with sibling cpuset.cpus.exclusive but not subsumed by it
" C0-3 . . C4-5 X5 . . . 0 A1:0-3,B1:4-5"
# old-A1 old-A2 old-A3 old-B1 new-A1 new-A2 new-A3 new-B1 fail ECPUs Pstate ISOLCPUS # old-A1 old-A2 old-A3 old-B1 new-A1 new-A2 new-A3 new-B1 fail ECPUs Pstate ISOLCPUS
# ------ ------ ------ ------ ------ ------ ------ ------ ---- ----- ------ -------- # ------ ------ ------ ------ ------ ------ ------ ------ ---- ----- ------ --------
# Failure cases: # Failure cases:
...@@ -355,6 +374,9 @@ TEST_MATRIX=( ...@@ -355,6 +374,9 @@ TEST_MATRIX=(
# Changes to cpuset.cpus.exclusive that violate exclusivity rule is rejected # Changes to cpuset.cpus.exclusive that violate exclusivity rule is rejected
" C0-3 . . C4-5 X0-3 . . X3-5 1 A1:0-3,B1:4-5" " C0-3 . . C4-5 X0-3 . . X3-5 1 A1:0-3,B1:4-5"
# cpuset.cpus cannot be a subset of sibling cpuset.cpus.exclusive
" C0-3 . . C4-5 X3-5 . . . 1 A1:0-3,B1:4-5"
) )
# #
...@@ -556,14 +578,15 @@ check_cgroup_states() ...@@ -556,14 +578,15 @@ check_cgroup_states()
do do
set -- $(echo $CHK | sed -e "s/:/ /g") set -- $(echo $CHK | sed -e "s/:/ /g")
CGRP=$1 CGRP=$1
CGRP_DIR=$CGRP
STATE=$2 STATE=$2
FILE= FILE=
EVAL=$(expr substr $STATE 2 2) EVAL=$(expr substr $STATE 2 2)
[[ $CGRP = A2 ]] && CGRP=A1/A2 [[ $CGRP = A2 ]] && CGRP_DIR=A1/A2
[[ $CGRP = A3 ]] && CGRP=A1/A2/A3 [[ $CGRP = A3 ]] && CGRP_DIR=A1/A2/A3
case $STATE in case $STATE in
P*) FILE=$CGRP/cpuset.cpus.partition P*) FILE=$CGRP_DIR/cpuset.cpus.partition
;; ;;
*) echo "Unknown state: $STATE!" *) echo "Unknown state: $STATE!"
exit 1 exit 1
...@@ -587,6 +610,16 @@ check_cgroup_states() ...@@ -587,6 +610,16 @@ check_cgroup_states()
;; ;;
esac esac
[[ $EVAL != $VAL ]] && return 1 [[ $EVAL != $VAL ]] && return 1
#
# For root partition, dump sched-domains info to console if
# verbose mode set for manual comparison with sched debug info.
#
[[ $VAL -eq 1 && $VERBOSE -gt 0 ]] && {
DOMS=$(cat $CGRP_DIR/cpuset.cpus.effective)
[[ -n "$DOMS" ]] &&
echo " [$CGRP] sched-domain: $DOMS" > $CONSOLE
}
done done
return 0 return 0
} }
...@@ -694,9 +727,9 @@ null_isolcpus_check() ...@@ -694,9 +727,9 @@ null_isolcpus_check()
[[ $VERBOSE -gt 0 ]] || return 0 [[ $VERBOSE -gt 0 ]] || return 0
# Retry a few times before printing error # Retry a few times before printing error
RETRY=0 RETRY=0
while [[ $RETRY -lt 5 ]] while [[ $RETRY -lt 8 ]]
do do
pause 0.01 pause 0.02
check_isolcpus "." check_isolcpus "."
[[ $? -eq 0 ]] && return 0 [[ $? -eq 0 ]] && return 0
((RETRY++)) ((RETRY++))
...@@ -726,7 +759,7 @@ run_state_test() ...@@ -726,7 +759,7 @@ run_state_test()
while [[ $I -lt $CNT ]] while [[ $I -lt $CNT ]]
do do
echo "Running test $I ..." > /dev/console echo "Running test $I ..." > $CONSOLE
[[ $VERBOSE -gt 1 ]] && { [[ $VERBOSE -gt 1 ]] && {
echo "" echo ""
eval echo \${$TEST[$I]} eval echo \${$TEST[$I]}
...@@ -783,7 +816,7 @@ run_state_test() ...@@ -783,7 +816,7 @@ run_state_test()
while [[ $NEWLIST != $CPULIST && $RETRY -lt 8 ]] while [[ $NEWLIST != $CPULIST && $RETRY -lt 8 ]]
do do
# Wait a bit longer & recheck a few times # Wait a bit longer & recheck a few times
pause 0.01 pause 0.02
((RETRY++)) ((RETRY++))
NEWLIST=$(cat cpuset.cpus.effective) NEWLIST=$(cat cpuset.cpus.effective)
done done
......
// SPDX-License-Identifier: GPL-2.0
#define _GNU_SOURCE
#include <errno.h>
#include <linux/limits.h>
#include <signal.h>
#include <string.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>
#include "../kselftest.h"
#include "cgroup_util.h"
static int run_success(const char *cgroup, void *arg)
{
return 0;
}
static int run_pause(const char *cgroup, void *arg)
{
return pause();
}
/*
* This test checks that pids.max prevents forking new children above the
* specified limit in the cgroup.
*/
static int test_pids_max(const char *root)
{
int ret = KSFT_FAIL;
char *cg_pids;
int pid;
cg_pids = cg_name(root, "pids_test");
if (!cg_pids)
goto cleanup;
if (cg_create(cg_pids))
goto cleanup;
if (cg_read_strcmp(cg_pids, "pids.max", "max\n"))
goto cleanup;
if (cg_write(cg_pids, "pids.max", "2"))
goto cleanup;
if (cg_enter_current(cg_pids))
goto cleanup;
pid = cg_run_nowait(cg_pids, run_pause, NULL);
if (pid < 0)
goto cleanup;
if (cg_run_nowait(cg_pids, run_success, NULL) != -1 || errno != EAGAIN)
goto cleanup;
if (kill(pid, SIGINT))
goto cleanup;
ret = KSFT_PASS;
cleanup:
cg_enter_current(root);
cg_destroy(cg_pids);
free(cg_pids);
return ret;
}
/*
* This test checks that pids.events are counted in cgroup associated with pids.max
*/
static int test_pids_events(const char *root)
{
int ret = KSFT_FAIL;
char *cg_parent = NULL, *cg_child = NULL;
int pid;
cg_parent = cg_name(root, "pids_parent");
cg_child = cg_name(cg_parent, "pids_child");
if (!cg_parent || !cg_child)
goto cleanup;
if (cg_create(cg_parent))
goto cleanup;
if (cg_write(cg_parent, "cgroup.subtree_control", "+pids"))
goto cleanup;
if (cg_create(cg_child))
goto cleanup;
if (cg_write(cg_parent, "pids.max", "2"))
goto cleanup;
if (cg_read_strcmp(cg_child, "pids.max", "max\n"))
goto cleanup;
if (cg_enter_current(cg_child))
goto cleanup;
pid = cg_run_nowait(cg_child, run_pause, NULL);
if (pid < 0)
goto cleanup;
if (cg_run_nowait(cg_child, run_success, NULL) != -1 || errno != EAGAIN)
goto cleanup;
if (kill(pid, SIGINT))
goto cleanup;
if (cg_read_key_long(cg_child, "pids.events", "max ") != 0)
goto cleanup;
if (cg_read_key_long(cg_parent, "pids.events", "max ") != 1)
goto cleanup;
ret = KSFT_PASS;
cleanup:
cg_enter_current(root);
if (cg_child)
cg_destroy(cg_child);
if (cg_parent)
cg_destroy(cg_parent);
free(cg_child);
free(cg_parent);
return ret;
}
#define T(x) { x, #x }
struct pids_test {
int (*fn)(const char *root);
const char *name;
} tests[] = {
T(test_pids_max),
T(test_pids_events),
};
#undef T
int main(int argc, char **argv)
{
char root[PATH_MAX];
ksft_print_header();
ksft_set_plan(ARRAY_SIZE(tests));
if (cg_find_unified_root(root, sizeof(root), NULL))
ksft_exit_skip("cgroup v2 isn't mounted\n");
/*
* Check that pids controller is available:
* pids is listed in cgroup.controllers
*/
if (cg_read_strstr(root, "cgroup.controllers", "pids"))
ksft_exit_skip("pids controller isn't available\n");
if (cg_read_strstr(root, "cgroup.subtree_control", "pids"))
if (cg_write(root, "cgroup.subtree_control", "+pids"))
ksft_exit_skip("Failed to set pids controller\n");
for (int i = 0; i < ARRAY_SIZE(tests); i++) {
switch (tests[i].fn(root)) {
case KSFT_PASS:
ksft_test_result_pass("%s\n", tests[i].name);
break;
case KSFT_SKIP:
ksft_test_result_skip("%s\n", tests[i].name);
break;
default:
ksft_test_result_fail("%s\n", tests[i].name);
break;
}
}
ksft_finished();
}
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment