cgroup: implement delayed destruction for cgroup_pidlist
Currently, pidlists are reference counted from file open and release methods. This means that holding onto an open file may waste memory and reads may return data which is very stale. Both aren't critical because pidlists are keyed and shared per namespace and, well, the user isn't supposed to have large delay between open and reads. cgroup is planned to be converted to use kernfs and it'd be best if we can stick to just the seq_file operations - start, next, stop and show. This can be achieved by loading pidlist on demand from start and release with time delay from stop, so that consecutive reads don't end up reloading the pidlist on each iteration. This would remove the need for hooking into open and release while also avoiding issues with holding onto pidlist for too long. This patch implements delayed release of pidlist. As pidlists could be lingering on cgroup removal waiting for the timer to expire, cgroup free path needs to queue the destruction work item immediately and flush. As those work items are self-destroying, each work item can't be flushed directly. A new workqueue - cgroup_pidlist_destroy_wq - is added to serve as flush domain. Note that this patch just adds delayed release on top of the current implementation and doesn't change where pidlist is loaded and released. Following patches will make those changes. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
Showing
Please register or sign in to comment