• Tejun Heo's avatar
    cgroup: fix handling of multi-destination migration from subtree_control enabling · 1f7dd3e5
    Tejun Heo authored
    Consider the following v2 hierarchy.
    
      P0 (+memory) --- P1 (-memory) --- A
                                     \- B
           
    P0 has memory enabled in its subtree_control while P1 doesn't.  If
    both A and B contain processes, they would belong to the memory css of
    P1.  Now if memory is enabled on P1's subtree_control, memory csses
    should be created on both A and B and A's processes should be moved to
    the former and B's processes the latter.  IOW, enabling controllers
    can cause atomic migrations into different csses.
    
    The core cgroup migration logic has been updated accordingly but the
    controller migration methods haven't and still assume that all tasks
    migrate to a single target css; furthermore, the methods were fed the
    css in which subtree_control was updated which is the parent of the
    target csses.  pids controller depends on the migration methods to
    move charges and this made the controller attribute charges to the
    wrong csses often triggering the following warning by driving a
    counter negative.
    
     WARNING: CPU: 1 PID: 1 at kernel/cgroup_pids.c:97 pids_cancel.constprop.6+0x31/0x40()
     Modules linked in:
     CPU: 1 PID: 1 Comm: systemd Not tainted 4.4.0-rc1+ #29
     ...
      ffffffff81f65382 ffff88007c043b90 ffffffff81551ffc 0000000000000000
      ffff88007c043bc8 ffffffff810de202 ffff88007a752000 ffff88007a29ab00
      ffff88007c043c80 ffff88007a1d8400 0000000000000001 ffff88007c043bd8
     Call Trace:
      [<ffffffff81551ffc>] dump_stack+0x4e/0x82
      [<ffffffff810de202>] warn_slowpath_common+0x82/0xc0
      [<ffffffff810de2fa>] warn_slowpath_null+0x1a/0x20
      [<ffffffff8118e031>] pids_cancel.constprop.6+0x31/0x40
      [<ffffffff8118e0fd>] pids_can_attach+0x6d/0xf0
      [<ffffffff81188a4c>] cgroup_taskset_migrate+0x6c/0x330
      [<ffffffff81188e05>] cgroup_migrate+0xf5/0x190
      [<ffffffff81189016>] cgroup_attach_task+0x176/0x200
      [<ffffffff8118949d>] __cgroup_procs_write+0x2ad/0x460
      [<ffffffff81189684>] cgroup_procs_write+0x14/0x20
      [<ffffffff811854e5>] cgroup_file_write+0x35/0x1c0
      [<ffffffff812e26f1>] kernfs_fop_write+0x141/0x190
      [<ffffffff81265f88>] __vfs_write+0x28/0xe0
      [<ffffffff812666fc>] vfs_write+0xac/0x1a0
      [<ffffffff81267019>] SyS_write+0x49/0xb0
      [<ffffffff81bcef32>] entry_SYSCALL_64_fastpath+0x12/0x76
    
    This patch fixes the bug by removing @css parameter from the three
    migration methods, ->can_attach, ->cancel_attach() and ->attach() and
    updating cgroup_taskset iteration helpers also return the destination
    css in addition to the task being migrated.  All controllers are
    updated accordingly.
    
    * Controllers which don't care whether there are one or multiple
      target csses can be converted trivially.  cpu, io, freezer, perf,
      netclassid and netprio fall in this category.
    
    * cpuset's current implementation assumes that there's single source
      and destination and thus doesn't support v2 hierarchy already.  The
      only change made by this patchset is how that single destination css
      is obtained.
    
    * memory migration path already doesn't do anything on v2.  How the
      single destination css is obtained is updated and the prep stage of
      mem_cgroup_can_attach() is reordered to accomodate the change.
    
    * pids is the only controller which was affected by this bug.  It now
      correctly handles multi-destination migrations and no longer causes
      counter underflow from incorrect accounting.
    Signed-off-by: default avatarTejun Heo <tj@kernel.org>
    Reported-and-tested-by: default avatarDaniel Wagner <daniel.wagner@bmw-carit.de>
    Cc: Aleksa Sarai <cyphar@cyphar.com>
    1f7dd3e5
core.c 220 KB