- 10 Sep, 2024 1 commit
-
-
Michal Koutný authored
The cpuset filesystem is a legacy interface to cpuset controller with (pre-)v1 features. It makes little sense to co-mount it on systems without cpuset v1, so do not build it when cpuset v1 is not built neither. Signed-off-by:
Michal Koutný <mkoutny@suse.com> Reviewed-by:
Waiman Long <longman@redhat.com> Signed-off-by:
Tejun Heo <tj@kernel.org>
-
- 04 Sep, 2024 1 commit
-
-
Waiman Long authored
The newly created cpuset-v1.c file uses cpus_read_lock/unlock() functions which are defined in cpu.h but not included in cpuset-internal.h yet leading to compilation error under certain kernel configurations. Fix it by moving the cpu.h include from cpuset.c to cpuset-internal.h. While at it, sort the include files in alphabetic order. Reported-by:
kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202408311612.mQTuO946-lkp@intel.com/ Fixes: 047b8309 ("cgroup/cpuset: move relax_domain_level to cpuset-v1.c") Signed-off-by:
Waiman Long <longman@redhat.com> Signed-off-by:
Tejun Heo <tj@kernel.org>
-
- 30 Aug, 2024 14 commits
-
-
Chen Ridong authored
There is only hotplug test for cpuset v1, just add base read/write test for cpuset v1. Signed-off-by:
Chen Ridong <chenridong@huawei.com> Acked-by:
Waiman Long <longman@redhat.com> Signed-off-by:
Tejun Heo <tj@kernel.org>
-
Chen Ridong authored
This patch introduces CONFIG_CPUSETS_V1 and guard cpuset-v1 code under CONFIG_CPUSETS_V1. The default value of CONFIG_CPUSETS_V1 is N, so that user who adopted v2 don't have 'pay' for cpuset v1. Signed-off-by:
Chen Ridong <chenridong@huawei.com> Acked-by:
Waiman Long <longman@redhat.com> Signed-off-by:
Tejun Heo <tj@kernel.org>
-
Chen Ridong authored
Some functions name declared in cpuset-internel.h are generic. To avoid confilicting with other variables for the same name, rename these functions with cpuset_/cpuset1_ prefix to make them unique to cpuset. Signed-off-by:
Chen Ridong <chenridong@huawei.com> Acked-by:
Waiman Long <longman@redhat.com> Signed-off-by:
Tejun Heo <tj@kernel.org>
-
Chen Ridong authored
Move legacy cpuset controller interfaces files and corresponding code into cpuset-v1.c. 'update_flag', 'cpuset_write_resmask' and 'cpuset_common_seq_show' are also used for v1, so declare them in cpuset-internal.h. 'cpuset_write_s64', 'cpuset_read_s64' and 'fmeter_getrate' are only used cpuset-v1.c now, make it static. Signed-off-by:
Chen Ridong <chenridong@huawei.com> Acked-by:
Waiman Long <longman@redhat.com> Signed-off-by:
Tejun Heo <tj@kernel.org>
-
Chen Ridong authored
The validate_change_legacy functions is used for v1, move it to cpuset-v1.c. And two micro 'cpuset_for_each_child' and 'cpuset_for_each_descendant_pre' are common for v1 and v2, move them to cpuset-internal.h. Signed-off-by:
Chen Ridong <chenridong@huawei.com> Acked-by:
Waiman Long <longman@redhat.com> Signed-off-by:
Tejun Heo <tj@kernel.org>
-
Chen Ridong authored
There are some differents about hotplug update between cpuset v1 and cpuset v2. Move the legacy code to cpuset-v1.c. 'update_tasks_cpumask' and 'update_tasks_nodemask' are both used in cpuset v1 and cpuset v2, declare them in cpuset-internal.h. The change from original code is that use callback_lock helpers to get callback_lock lock/unlock. Signed-off-by:
Chen Ridong <chenridong@huawei.com> Acked-by:
Waiman Long <longman@redhat.com> Signed-off-by:
Tejun Heo <tj@kernel.org>
-
Chen Ridong authored
To modify cpuset, both cpuset_mutex and callback_lock are needed. Add helpers for cpuset-v1 to get callback_lock. Signed-off-by:
Chen Ridong <chenridong@huawei.com> Acked-by:
Waiman Long <longman@redhat.com> Signed-off-by:
Tejun Heo <tj@kernel.org>
-
Chen Ridong authored
'memory_spread' is only set in cpuset v1. move corresponding code into cpuset-v1.c. Currently, 'cpuset_update_task_spread_flags' and 'update_tasks_flags' are exposed to cpuset.c. Signed-off-by:
Chen Ridong <chenridong@huawei.com> Acked-by:
Waiman Long <longman@redhat.com> Signed-off-by:
Tejun Heo <tj@kernel.org>
-
Chen Ridong authored
Setting domain level is not supported at cpuset v2, so move corresponding code into cpuset-v1.c. The 'cpuset_write_s64' and 'cpuset_read_s64' are only used for setting domain level, move them to cpuset-v1.c. Currently, expose to cpuset.c. After cpuset legacy interface files are move to cpuset-v1.c, they can be static. The 'rebuild_sched_domains_locked' is exposed to cpuset-v1.c. The change from original code is that using 'cpuset_lock' and 'cpuset_unlock' functions to lock or unlock cpuset_mutex. Signed-off-by:
Chen Ridong <chenridong@huawei.com> Acked-by:
Waiman Long <longman@redhat.com> Signed-off-by:
Tejun Heo <tj@kernel.org>
-
Chen Ridong authored
Collection of memory_pressure can be enabled by writing 1 to the cpuset file 'memory_pressure_enabled', which is only for cpuset-v1. Therefore, move the corresponding code to cpuset-v1.c. Currently, the 'fmeter_init' and 'fmeter_getrate' functions are called at cpuset.c, so expose them to cpuset.c. Signed-off-by:
Chen Ridong <chenridong@huawei.com> Acked-by:
Waiman Long <longman@redhat.com> Signed-off-by:
Tejun Heo <tj@kernel.org>
-
Chen Ridong authored
Move some declarations that will be used for cpuset v1 and v2, including 'cpuset struct', 'cpuset_flagbits_t', cpuset_filetype_t,etc. No logical change. Signed-off-by:
Chen Ridong <chenridong@huawei.com> Acked-by:
Waiman Long <longman@redhat.com> Signed-off-by:
Tejun Heo <tj@kernel.org>
-
Chen Ridong authored
This patch introduces the cgroup/cpuset-v1.c source file which will be used for all legacy (cgroup v1) cpuset cgroup code. It also introduces cgroup/cpuset-internal.h to keep declarations shared between cgroup/cpuset.c and cpuset/cpuset-v1.c. As of now, let's compile it if CONFIG_CPUSET is set. Later on it can be switched to use a separate config option, so that the legacy code won't be compiled if not required. Signed-off-by:
Chen Ridong <chenridong@huawei.com> Acked-by:
Waiman Long <longman@redhat.com> Signed-off-by:
Tejun Heo <tj@kernel.org>
-
Waiman Long authored
Since isolated CPUs can be reserved at boot time via the "isolcpus" boot command line option, these pre-isolated CPUs may interfere with testing done by test_cpuset_prs.sh. With the previous commit that incorporates those boot time isolated CPUs into "cpuset.cpus.isolated", we can check for those before testing is started to make sure that there will be no interference. Otherwise, this test will be skipped if incorrect test failure can happen. As "cpuset.cpus.isolated" is now available in a non cgroup_debug kernel, we don't need to check for its existence anymore. Signed-off-by:
Waiman Long <longman@redhat.com> Signed-off-by:
Tejun Heo <tj@kernel.org>
-
Waiman Long authored
With the "isolcpus" boot command line parameter, we are able to create isolated CPUs at boot time. These isolated CPUs aren't fully accounted for in the cpuset code. For instance, the root cgroup's "cpuset.cpus.isolated" control file does not include the boot time isolated CPUs. Fix that by looking for pre-isolated CPUs at init time. The prstate_housekeeping_conflict() function does check the HK_TYPE_DOMAIN housekeeping cpumask to make sure that CPUs outside of it can only be used in isolated partition. Given the fact that we are going to make housekeeping cpumasks dynamic, the current check may not be right anymore. Save the boot time HK_TYPE_DOMAIN cpumask and check against it instead of the upcoming dynamic HK_TYPE_DOMAIN housekeeping cpumask. Signed-off-by:
Waiman Long <longman@redhat.com> Signed-off-by:
Tejun Heo <tj@kernel.org>
-
- 20 Aug, 2024 3 commits
-
-
Chen Ridong authored
use_parent_ecpus is used to track whether the children are using the parent's effective_cpus. When a parent's effective_cpus is changed due to changes in a child partition's effective_xcpus, any child using parent'effective_cpus must call update_cpumasks_hier. However, if a child is not a valid partition, it is sufficient to determine whether to call update_cpumasks_hier based on whether the child's effective_cpus is going to change. To make the code more succinct, it is suggested to remove use_parent_ecpus. Signed-off-by:
Chen Ridong <chenridong@huawei.com> Reviewed-by:
Waiman Long <longman@redhat.com> Signed-off-by:
Tejun Heo <tj@kernel.org>
-
Chen Ridong authored
Both fetch_xcpus and user_xcpus functions are used to retrieve the value of exclusive_cpus. If exclusive_cpus is not set, cpus_allowed is the implicit value used as exclusive in a local partition. I can not imagine a scenario where effective_xcpus is not empty when exclusive_cpus is empty. Therefore, I suggest removing the fetch_xcpus function. Signed-off-by:
Chen Ridong <chenridong@huawei.com> Reviewed-by:
Waiman Long <longman@redhat.com> Signed-off-by:
Tejun Heo <tj@kernel.org>
-
Chen Ridong authored
When enable a remote partition, I found that: cd /sys/fs/cgroup/ mkdir test mkdir test/test1 echo +cpuset > cgroup.subtree_control echo +cpuset > test/cgroup.subtree_control echo 3 > test/test1/cpuset.cpus echo root > test/test1/cpuset.cpus.partition cat test/test1/cpuset.cpus.partition root invalid (Parent is not a partition root) The parent of a remote partition could not be a root. This is due to the emtpy effective_xcpus. It would be better to prompt the message "invalid cpu list in cpuset.cpus.exclusive". Signed-off-by:
Chen Ridong <chenridong@huawei.com> Reviewed-by:
Waiman Long <longman@redhat.com> Signed-off-by:
Tejun Heo <tj@kernel.org>
-
- 19 Aug, 2024 1 commit
-
-
Chen Ridong authored
The comment in cgroup_file_write is missing some interfaces, such as 'cgroup.threads'. All delegatable files are listed in '/sys/kernel/cgroup/delegate', so update the comment in cgroup_file_write. Besides, add a statement that files outside the namespace shouldn't be visible from inside the delegated namespace. tj: Reflowed text for consistency. Signed-off-by:
Chen Ridong <chenridong@huawei.com> Signed-off-by:
Tejun Heo <tj@kernel.org>
-
- 09 Aug, 2024 1 commit
-
-
Waiman Long authored
It turns out that the WARN_ON_ONCE() call in css_release_work_fn introduced by commit ab031252 ("cgroup: Show # of subsystem CSSes in cgroup.stat") is incorrect. Although css->nr_descendants must be 0 when a css is released and ready to be freed, the corresponding cgrp->nr_dying_subsys[ss->id] may not be 0 if a subsystem is activated and deactivated multiple times with one or more of its previous activation leaving behind dying csses. Fix the incorrect warning by removing the cgrp->nr_dying_subsys check. Fixes: ab031252 ("cgroup: Show # of subsystem CSSes in cgroup.stat") Closes: https://lore.kernel.org/cgroups/6f301773-2fce-4602-a391-8af7ef00b2fb@redhat.com/T/#tSigned-off-by:
Waiman Long <longman@redhat.com> Signed-off-by:
Tejun Heo <tj@kernel.org>
-
- 05 Aug, 2024 7 commits
-
-
Waiman Long authored
Add new test cases to test_cpuset_prs.sh to cover corner cases reported in previous fix commits. Signed-off-by:
Waiman Long <longman@redhat.com> Signed-off-by:
Tejun Heo <tj@kernel.org>
-
Waiman Long authored
With the previous commit that eliminates the overlapping partition root corner cases in the hotplug code, the partition roots passed down to generate_sched_domains() should not have overlapping CPUs. Enable overlapping cpuset check for v2 and warn if that happens. This patch also has the benefit of increasing test coverage of the new Union-Find cpuset merging code to cgroup v2. Signed-off-by:
Waiman Long <longman@redhat.com> Signed-off-by:
Tejun Heo <tj@kernel.org>
-
Tejun Heo authored
cgroup/for-6.12 is about to receive updates that are dependent on changes from both for-6.11-fixes and for-6.12. Pull in for-6.11-fixes. Signed-off-by:
Tejun Heo <tj@kernel.org>
-
Waiman Long authored
It was found that some hotplug operations may cause multiple rebuild_sched_domains_locked() calls. Some of those intermediate calls may use cpuset states not in the final correct form leading to incorrect sched domain setting. Fix this problem by using the existing force_rebuild flag to inhibit immediate rebuild_sched_domains_locked() calls if set and only doing one final call at the end. Also renaming the force_rebuild flag to force_sd_rebuild to make its meaning for clear. Signed-off-by:
Waiman Long <longman@redhat.com> Signed-off-by:
Tejun Heo <tj@kernel.org>
-
Waiman Long authored
Commit e2ffe502 ("cgroup/cpuset: Add cpuset.cpus.exclusive for v2") adds a user writable cpuset.cpus.exclusive file for setting exclusive CPUs to be used for the creation of partitions. Since then effective_xcpus depends on both the cpuset.cpus and cpuset.cpus.exclusive setting. If cpuset.cpus.exclusive is set, effective_xcpus will depend only on cpuset.cpus.exclusive. When it is not set, effective_xcpus will be set according to the cpuset.cpus value when the cpuset becomes a valid partition root. When cpuset.cpus is being cleared by the user, effective_xcpus should only be cleared when cpuset.cpus.exclusive is not set. However, that is not currently the case. # cd /sys/fs/cgroup/ # mkdir test # echo +cpuset > cgroup.subtree_control # cd test # echo 3 > cpuset.cpus.exclusive # cat cpuset.cpus.exclusive.effective 3 # echo > cpuset.cpus # cat cpuset.cpus.exclusive.effective // was cleared Fix it by clearing effective_xcpus only if cpuset.cpus.exclusive is not set. Fixes: e2ffe502 ("cgroup/cpuset: Add cpuset.cpus.exclusive for v2") Cc: stable@vger.kernel.org # v6.7+ Reported-by:
Chen Ridong <chenridong@huawei.com> Signed-off-by:
Waiman Long <longman@redhat.com> Signed-off-by:
Tejun Heo <tj@kernel.org>
-
Chen Ridong authored
We find a bug as below: BUG: unable to handle page fault for address: 00000003 PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 3 PID: 358 Comm: bash Tainted: G W I 6.6.0-10893-g60d6 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/4 RIP: 0010:partition_sched_domains_locked+0x483/0x600 Code: 01 48 85 d2 74 0d 48 83 05 29 3f f8 03 01 f3 48 0f bc c2 89 c0 48 9 RSP: 0018:ffffc90000fdbc58 EFLAGS: 00000202 RAX: 0000000100000003 RBX: ffff888100b3dfa0 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000002fe80 RBP: ffff888100b3dfb0 R08: 0000000000000001 R09: 0000000000000000 R10: ffffc90000fdbcb0 R11: 0000000000000004 R12: 0000000000000002 R13: ffff888100a92b48 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f44a5425740(0000) GS:ffff888237d80000(0000) knlGS:0000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000100030973 CR3: 000000010722c000 CR4: 00000000000006e0 Call Trace: <TASK> ? show_regs+0x8c/0xa0 ? __die_body+0x23/0xa0 ? __die+0x3a/0x50 ? page_fault_oops+0x1d2/0x5c0 ? partition_sched_domains_locked+0x483/0x600 ? search_module_extables+0x2a/0xb0 ? search_exception_tables+0x67/0x90 ? kernelmode_fixup_or_oops+0x144/0x1b0 ? __bad_area_nosemaphore+0x211/0x360 ? up_read+0x3b/0x50 ? bad_area_nosemaphore+0x1a/0x30 ? exc_page_fault+0x890/0xd90 ? __lock_acquire.constprop.0+0x24f/0x8d0 ? __lock_acquire.constprop.0+0x24f/0x8d0 ? asm_exc_page_fault+0x26/0x30 ? partition_sched_domains_locked+0x483/0x600 ? partition_sched_domains_locked+0xf0/0x600 rebuild_sched_domains_locked+0x806/0xdc0 update_partition_sd_lb+0x118/0x130 cpuset_write_resmask+0xffc/0x1420 cgroup_file_write+0xb2/0x290 kernfs_fop_write_iter+0x194/0x290 new_sync_write+0xeb/0x160 vfs_write+0x16f/0x1d0 ksys_write+0x81/0x180 __x64_sys_write+0x21/0x30 x64_sys_call+0x2f25/0x4630 do_syscall_64+0x44/0xb0 entry_SYSCALL_64_after_hwframe+0x78/0xe2 RIP: 0033:0x7f44a553c887 It can be reproduced with cammands: cd /sys/fs/cgroup/ mkdir test cd test/ echo +cpuset > ../cgroup.subtree_control echo root > cpuset.cpus.partition cat /sys/fs/cgroup/cpuset.cpus.effective 0-3 echo 0-3 > cpuset.cpus // taking away all cpus from root This issue is caused by the incorrect rebuilding of scheduling domains. In this scenario, test/cpuset.cpus.partition should be an invalid root and should not trigger the rebuilding of scheduling domains. When calling update_parent_effective_cpumask with partcmd_update, if newmask is not null, it should recheck newmask whether there are cpus is available for parect/cs that has tasks. Fixes: 0c7f293e ("cgroup/cpuset: Add cpuset.cpus.exclusive.effective for v2") Cc: stable@vger.kernel.org # v6.7+ Signed-off-by:
Chen Ridong <chenridong@huawei.com> Signed-off-by:
Waiman Long <longman@redhat.com> Signed-off-by:
Tejun Heo <tj@kernel.org>
-
Xiu Jianfeng authored
According to the implementation of cgroup_css_set_fork(), it will fail if cset cannot be found and the can_fork/cancel_fork methods will not be called in this case, which means that the argument 'cset' for these methods must not be NULL, so remove the unrechable paths in them. Signed-off-by:
Xiu Jianfeng <xiujianfeng@huawei.com> Reviewed-by:
Waiman Long <longman@redhat.com> Signed-off-by:
Tejun Heo <tj@kernel.org>
-
- 02 Aug, 2024 1 commit
-
-
Xavier authored
Fix the compilation errors and warnings caused by merging Documentation/core-api/union_find.rst and Documentation/translations/zh_CN/core-api/union_find.rst. Signed-off-by:
Xavier <xavier_qy@163.com> Signed-off-by:
Tejun Heo <tj@kernel.org>
-
- 31 Jul, 2024 1 commit
-
-
Waiman Long authored
Cgroup subsystem state (CSS) is an abstraction in the cgroup layer to help manage different structures in various cgroup subsystems by being an embedded element inside a larger structure like cpuset or mem_cgroup. The /proc/cgroups file shows the number of cgroups for each of the subsystems. With cgroup v1, the number of CSSes is the same as the number of cgroups. That is not the case anymore with cgroup v2. The /proc/cgroups file cannot show the actual number of CSSes for the subsystems that are bound to cgroup v2. So if a v2 cgroup subsystem is leaking cgroups (usually memory cgroup), we can't tell by looking at /proc/cgroups which cgroup subsystems may be responsible. As cgroup v2 had deprecated the use of /proc/cgroups, the hierarchical cgroup.stat file is now being extended to show the number of live and dying CSSes associated with all the non-inhibited cgroup subsystems that have been bound to cgroup v2. The number includes CSSes in the current cgroup as well as in all the descendants underneath it. This will help us pinpoint which subsystems are responsible for the increasing number of dying (nr_dying_descendants) cgroups. The CSSes dying counts are stored in the cgroup structure itself instead of inside the CSS as suggested by Johannes. This will allow us to accurately track dying counts of cgroup subsystems that have recently been disabled in a cgroup. It is now possible that a zero subsystem number is coupled with a non-zero dying subsystem number. The cgroup-v2.rst file is updated to discuss this new behavior. With this patch applied, a sample output from root cgroup.stat file was shown below. nr_descendants 56 nr_subsys_cpuset 1 nr_subsys_cpu 43 nr_subsys_io 43 nr_subsys_memory 56 nr_subsys_perf_event 57 nr_subsys_hugetlb 1 nr_subsys_pids 56 nr_subsys_rdma 1 nr_subsys_misc 1 nr_dying_descendants 30 nr_dying_subsys_cpuset 0 nr_dying_subsys_cpu 0 nr_dying_subsys_io 0 nr_dying_subsys_memory 30 nr_dying_subsys_perf_event 0 nr_dying_subsys_hugetlb 0 nr_dying_subsys_pids 0 nr_dying_subsys_rdma 0 nr_dying_subsys_misc 0 Another sample output from system.slice/cgroup.stat was: nr_descendants 34 nr_subsys_cpuset 0 nr_subsys_cpu 32 nr_subsys_io 32 nr_subsys_memory 34 nr_subsys_perf_event 35 nr_subsys_hugetlb 0 nr_subsys_pids 34 nr_subsys_rdma 0 nr_subsys_misc 0 nr_dying_descendants 30 nr_dying_subsys_cpuset 0 nr_dying_subsys_cpu 0 nr_dying_subsys_io 0 nr_dying_subsys_memory 30 nr_dying_subsys_perf_event 0 nr_dying_subsys_hugetlb 0 nr_dying_subsys_pids 0 nr_dying_subsys_rdma 0 nr_dying_subsys_misc 0 Note that 'debug' controller wasn't used to provide this information because the controller is not recommended in productions kernels, also many of them won't enable CONFIG_CGROUP_DEBUG by default. Similar information could be retrieved with debuggers like drgn but that's also not always available (e.g. lockdown) and the additional cost of runtime tracking here is deemed marginal. tj: Added Michal's paragraphs on why this is not added the debug controller to the commit message. Signed-off-by:
Waiman Long <longman@redhat.com> Acked-by:
Johannes Weiner <hannes@cmpxchg.org> Acked-by:
Roman Gushchin <roman.gushchin@linux.dev> Reviewed-by:
Kamalesh Babulal <kamalesh.babulal@oracle.com> Cc: Michal Koutný <mkoutny@suse.com> Link: http://lkml.kernel.org/r/20240715150034.2583772-1-longman@redhat.comSigned-off-by:
Tejun Heo <tj@kernel.org>
-
- 30 Jul, 2024 6 commits
-
-
Xavier authored
The process of constructing scheduling domains involves multiple loops and repeated evaluations, leading to numerous redundant and ineffective assessments that impact code efficiency. Here, we use union-find to optimize the merging of cpumasks. By employing path compression and union by rank, we effectively reduce the number of lookups and merge comparisons. Signed-off-by:
Xavier <xavier_qy@163.com> Signed-off-by:
Tejun Heo <tj@kernel.org>
-
Xavier authored
This patch implements a union-find data structure in the kernel library, which includes operations for allocating nodes, freeing nodes, finding the root of a node, and merging two nodes. Signed-off-by:
Xavier <xavier_qy@163.com> Signed-off-by:
Tejun Heo <tj@kernel.org>
-
Chen Ridong authored
There are several functions to decrease attach_in_progress, and they will wake up cpuset_attach_wq when attach_in_progress is zero. So, add a helper to make it concise. Signed-off-by:
Chen Ridong <chenridong@huawei.com> Reviewed-by:
Kamalesh Babulal <kamalesh.babulal@oracle.com> Reviewed-by:
Waiman Long <longman@redhat.com> Signed-off-by:
Tejun Heo <tj@kernel.org>
-
Xiu Jianfeng authored
Since the SLAB implementation was removed in v6.8, so the cpuset_slab_spread_rotor is no longer used and can be removed. Signed-off-by:
Xiu Jianfeng <xiujianfeng@huawei.com> Reviewed-by:
Waiman Long <longman@redhat.com> Signed-off-by:
Tejun Heo <tj@kernel.org>
-
Xiu Jianfeng authored
Currently when a process in a group forks and fails due to it's parent's max restriction, all the cgroups from 'pids_forking' to root will generate event notifications but only the cgroups from 'pids_over_limit' to root will increase the counter of PIDCG_MAX. Consider this scenario: there are 4 groups A, B, C,and D, the relationships are as follows, and user is watching on C.pids.events. root->A->B->C->D When a process in D forks and fails due to B.max restriction, the user will get a spurious event notification because when he wakes up and reads C.pids.events, he will find that the content has not changed. To address this issue, only the cgroups from 'pids_over_limit' to root will have their PIDCG_MAX counters increased and event notifications generated. Fixes: 385a635c ("cgroup/pids: Make event counters hierarchical") Signed-off-by:
Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by:
Tejun Heo <tj@kernel.org>
-
Chen Ridong authored
The child_ecpus_count variable was previously used to update sibling cpumask when parent's effective_cpus is updated. However, it became obsolete after commit e2ffe502 ("cgroup/cpuset: Add cpuset.cpus.exclusive for v2"). It should be removed. tj: Restored {} for style consistency. Signed-off-by:
Chen Ridong <chenridong@huawei.com> Acked-by:
Waiman Long <longman@redhat.com> Signed-off-by:
Tejun Heo <tj@kernel.org>
-
- 28 Jul, 2024 4 commits
-
-
Linus Torvalds authored
-
Linus Torvalds authored
Merge tag 'kbuild-fixes-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - Fix RPM package build error caused by an incorrect locale setup - Mark modules.weakdep as ghost in RPM package - Fix the odd combination of -S and -c in stack protector scripts, which is an error with the latest Clang * tag 'kbuild-fixes-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kbuild: Fix '-S -c' in x86 stack protector scripts kbuild: rpm-pkg: ghost modules.weakdep file kbuild: rpm-pkg: Fix C locale setup
-
Linus Torvalds authored
This simplifies the min_t() and max_t() macros by no longer making them work in the context of a C constant expression. That means that you can no longer use them for static initializers or for array sizes in type definitions, but there were only a couple of such uses, and all of them were converted (famous last words) to use MIN_T/MAX_T instead. Cc: David Laight <David.Laight@aculab.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Linus Torvalds authored
Commit 3a7e02c0 ("minmax: avoid overly complicated constant expressions in VM code") added the simpler MIN_T/MAX_T macros in order to avoid some excessive expansion from the rather complicated regular min/max macros. The complexity of those macros stems from two issues: (a) trying to use them in situations that require a C constant expression (in static initializers and for array sizes) (b) the type sanity checking and MIN_T/MAX_T avoids both of these issues. Now, in the whole (long) discussion about all this, it was pointed out that the whole type sanity checking is entirely unnecessary for min_t/max_t which get a fixed type that the comparison is done in. But that still leaves min_t/max_t unnecessarily complicated due to worries about the C constant expression case. However, it turns out that there really aren't very many cases that use min_t/max_t for this, and we can just force-convert those. This does exactly that. Which in turn will then allow for much simpler implementations of min_t()/max_t(). All the usual "macros in all upper case will evaluate the arguments multiple times" rules apply. We should do all the same things for the regular min/max() vs MIN/MAX() cases, but that has the added complexity of various drivers defining their own local versions of MIN/MAX, so that needs another level of fixes first. Link: https://lore.kernel.org/all/b47fad1d0cf8449886ad148f8c013dae@AcuMS.aculab.com/ Cc: David Laight <David.Laight@aculab.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-