Commits · f83f3c515654474e19c7fc86e3b06564bb5cb4d4 · Kirill Smelkov / linux

21 Feb, 2017 1 commit

kernfs: fix locking around kernfs_ops->release() callback · f83f3c51

Tejun Heo authored Feb 12, 2017

The release callback may be called from two places - file release
operation and kernfs open file draining.  kernfs_open_file->mutex is
used to synchronize the two callsites.  This unfortunately leads to
possible circular locking because of->mutex is used to protect the
usual kernfs operations which may use locking constructs which are
held while removing and thus draining kernfs files.

@of->mutex is for synchronizing concurrent kernfs access operations
and all we need here is synchronization between the releaes and drain
paths.  As the drain path has to grab kernfs_open_file_mutex anyway,
let's use the mutex to synchronize the release operation instead.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Fixes: 0e67db2f ("kernfs: add kernfs_ops->open/release() callbacks")
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

f83f3c51

02 Feb, 2017 3 commits

Merge branch 'cgroup/for-4.11-rdmacg' into cgroup/for-4.11 · 63f1ca59

Tejun Heo authored Feb 02, 2017

Merge in to resolve conflicts in Documentation/cgroup-v2.txt.  The
conflicts are from multiple section additions and trivial to resolve.
Signed-off-by: Tejun Heo <tj@kernel.org>

63f1ca59

cgroup: drop the matching uid requirement on migration for cgroup v2 · 576dd464

Tejun Heo authored Jan 20, 2017

Along with the write access to the cgroup.procs or tasks file, cgroup
has required the writer's euid, unless root, to match [s]uid of the
target process or task. On cgroup v1, this is necessary because
there's nothing preventing a delegatee from pulling in tasks or
processes from all over the system.

If a user has a cgroup subdirectory delegated to it, the user would
have write access to the cgroup.procs or tasks file. If there are no
further checks than file write access check, the user would be able to
pull processes from all over the system into its subhierarchy which is
clearly not the intended behavior. The matching [s]uid requirement
partially prevents this problem by allowing a delegatee to pull in the
processes that belongs to it. This isn't a sufficient protection
however, because a user would still be able to jump processes across
two disjoint sub-hierarchies that has been delegated to them.

cgroup v2 resolves the issue by requiring the writer to have access to
the common ancestor of the cgroup.procs file of the source and target
cgroups. This confines each delegatee to their own sub-hierarchy
proper and bases all permission decisions on the cgroup filesystem
rather than having to pull in explicit uid matching.

cgroup v2 has still been applying the matching [s]uid requirement just
for historical reasons. On cgroup2, the requirement doesn't serve any
purpose while unnecessarily complicating the permission model. Let's
drop it.
Signed-off-by: Tejun Heo <tj@kernel.org>

576dd464

cgroup, perf_event: make perf_event controller work on cgroup2 hierarchy · 968ebff1

Tejun Heo authored Jan 29, 2017

perf_event is a utility controller whose primary role is identifying
cgroup membership to filter perf events; however, because it also
tracks some per-css state, it can't be replaced by pure cgroup
membership test.  Mark the controller as implicitly enabled on the
default hierarchy so that perf events can always be filtered based on
cgroup v2 path as long as the controller is not mounted on a legacy
hierarchy.

"perf record" is updated accordingly so that it searches for both v1
and v2 hierarchies.  A v1 hierarchy is used if perf_event is mounted
on it; otherwise, it uses the v2 hierarchy.

v2: Doc updated to reflect more flexible rebinding behavior.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>

968ebff1

30 Jan, 2017 1 commit

cgroup: misc cleanups · b807421a

Tejun Heo authored Jan 20, 2017

* cgrp_dfl_implicit_ss_mask is ulong instead of u16 unlike other
  ss_masks.  Make it a u16.

* Move have_canfork_callback together with other callback ss_masks.
Signed-off-by: Tejun Heo <tj@kernel.org>

b807421a

26 Jan, 2017 2 commits

Merge branch 'for-4.10-fixes' into for-4.11 · bdf3d06b
Tejun Heo authored Jan 26, 2017

bdf3d06b

cgroup: don't online subsystems before cgroup_name/path() are operational · 07cd1294

Tejun Heo authored Jan 26, 2017

While refactoring cgroup creation, a5bca215 ("cgroup: factor out
cgroup_create() out of cgroup_mkdir()") incorrectly onlined subsystems
before the new cgroup is associated with it kernfs_node.  This is fine
for cgroup proper but cgroup_name/path() depend on the associated
kernfs_node and if a subsystem makes the new cgroup_subsys_state
visible, which they're allowed to after onlining, it can lead to NULL
dereference.

The current code performs cgroup creation and subsystem onlining in
cgroup_create() and cgroup_mkdir() makes the cgroup and subsystems
visible afterwards.  There's no reason to online the subsystems early
and we can simply drop cgroup_apply_control_enable() call from
cgroup_create() so that the subsystems are onlined and made visible at
the same time.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Fixes: a5bca215 ("cgroup: factor out cgroup_create() out of cgroup_mkdir()") 
Cc: stable@vger.kernel.org # v4.6+

07cd1294

16 Jan, 2017 3 commits

cgroup: call subsys->*attach() only for subsystems which are actually affected by migration · bfc2cf6f

Tejun Heo authored Jan 15, 2017

Currently, subsys->*attach() callbacks are called for all subsystems
which are attached to the hierarchy on which the migration is taking
place.

With cgroup_migrate_prepare_dst() filtering out identity migrations,
v1 hierarchies can avoid spurious ->*attach() callback invocations
where the source and destination csses are identical; however, this
isn't enough on v2 as only a subset of the attached controllers can be
affected on controller enable/disable.

While spurious ->*attach() invocations aren't critically broken,
they're unnecessary overhead and can lead to temporary overcharges on
certain controllers.  Fix it by tracking which subsystems are affected
by a migration and invoking ->*attach() callbacks only on those
subsystems.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Zefan Li <lizefan@huawei.com>

bfc2cf6f

cgroup: track migration context in cgroup_mgctx · e595cd70

Tejun Heo authored Jan 15, 2017

cgroup migration is performed in four steps - css_set preloading,
addition of target tasks, actual migration, and clean up.  A list
named preloaded_csets is used to track the preloading.  This is a bit
too restricted and the code is already depending on the subtlety that
all source css_sets appear before destination ones.

Let's create struct cgroup_mgctx which keeps track of everything
during migration.  Currently, it has separate preload lists for source
and destination csets and also embeds cgroup_taskset which is used
during the actual migration.  This moves struct cgroup_taskset
definition to cgroup-internal.h.

This patch doesn't cause any functional changes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Zefan Li <lizefan@huawei.com>

e595cd70

cgroup: cosmetic update to cgroup_taskset_add() · d8ebf519

Tejun Heo authored Jan 15, 2017

cgroup_taskset_add() was using list_add_tail() when for source csets
but list_move_tail() for destination.  As the operations are gated by
list_empty() test, list_move_tail() is equivalent to list_add_tail()
here.  Use list_add_tail() too for destination csets too.

This doesn't cause any functional changes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Zefan Li <lizefan@huawei.com>

d8ebf519

10 Jan, 2017 5 commits

rdmacg: Fixed uninitialized current resource usage · 7896dfb0

Parav Pandit authored Jan 10, 2017

Fixed warning reported by kbuild test robot.
When reading current resource usage value, when no resources are
allocated, its possible that it can report a uninitialized value
for current resource usage.
This fix avoids it by initializing it to zero as no resource is
allocated.
Signed-off-by: Parav Pandit <pandit.parav@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

7896dfb0

cgroup: Add missing cgroup-v2 PID controller documentation. · 20c56e59
Hans Ragas authored Jan 10, 2017
```
Signed-off-by: Hans Ragas <hansr@fb.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
```
20c56e59

rdmacg: Added documentation for rdmacg · 9c1e67f9

Parav Pandit authored Jan 10, 2017

Added documentation for v1 and v2 version describing high
level design and usage examples on using rdma controller.
Signed-off-by: Parav Pandit <pandit.parav@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

9c1e67f9

IB/core: added support to use rdma cgroup controller · 43579b5f

Parav Pandit authored Jan 10, 2017

Added support APIs for IB core to register/unregister every IB/RDMA
device with rdma cgroup for tracking rdma resources.
IB core registers with rdma cgroup controller.
Added support APIs for uverbs layer to make use of rdma controller.
Added uverbs layer to perform resource charge/uncharge functionality.
Added support during query_device uverb operation to ensure it
returns resource limits by honoring rdma cgroup configured limits.
Signed-off-by: Parav Pandit <pandit.parav@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

43579b5f

rdmacg: Added rdma cgroup controller · 39d3e758

Parav Pandit authored Jan 10, 2017

Added rdma cgroup controller that does accounting, limit enforcement
on rdma/IB resources.

Added rdma cgroup header file which defines its APIs to perform
charging/uncharging functionality. It also defined APIs for RDMA/IB
stack for device registration. Devices which are registered will
participate in controller functions of accounting and limit
enforcements. It define rdmacg_device structure to bind IB stack
and RDMA cgroup controller.

RDMA resources are tracked using resource pool. Resource pool is per
device, per cgroup entity which allows setting up accounting limits
on per device basis.

Currently resources are defined by the RDMA cgroup.

Resource pool is created/destroyed dynamically whenever
charging/uncharging occurs respectively and whenever user
configuration is done. Its a tradeoff of memory vs little more code
space that creates resource pool object whenever necessary, instead of
creating them during cgroup creation and device registration time.
Signed-off-by: Parav Pandit <pandit.parav@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

39d3e758

27 Dec, 2016 15 commits

cgroup: fix a comment typo · 7b4632f0

Geliang Tang authored Dec 24, 2016

Fix a comment typo in cgroup.h.
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>