• Tejun Heo's avatar
    cgroup: add cgroup->serial_nr and implement cgroup_next_sibling() · 53fa5261
    Tejun Heo authored
    Currently, there's no easy way to find out the next sibling cgroup
    unless it's known that the current cgroup is accessed from the
    parent's children list in a single RCU critical section.  This in turn
    forces all iterators to require whole iteration to be enclosed in a
    single RCU critical section, which sometimes is too restrictive.  This
    patch implements cgroup_next_sibling() which can reliably determine
    the next sibling regardless of the state of the current cgroup as long
    as it's accessible.
    
    It currently is impossible to determine the next sibling after
    dropping RCU read lock because the cgroup being iterated could be
    removed anytime and if RCU read lock is dropped, nothing guarantess
    its ->sibling.next pointer is accessible.  A removed cgroup would
    continue to point to its next sibling for RCU accesses but stop
    receiving updates from the sibling.  IOW, the next sibling could be
    removed and then complete its grace period while RCU read lock is
    dropped, making it unsafe to dereference ->sibling.next after dropping
    and re-acquiring RCU read lock.
    
    This can be solved by adding a way to traverse to the next sibling
    without dereferencing ->sibling.next.  This patch adds a monotonically
    increasing cgroup serial number, cgroup->serial_nr, which guarantees
    that all cgroup->children lists are kept in increasing serial_nr
    order.  A new function, cgroup_next_sibling(), is implemented, which,
    if CGRP_REMOVED is not set on the current cgroup, follows
    ->sibling.next; otherwise, traverses the parent's ->children list
    until it sees a sibling with higher ->serial_nr.
    
    This allows the function to always return the next sibling regardless
    of the state of the current cgroup without adding overhead in the fast
    path.
    
    Further patches will update the iterators to use cgroup_next_sibling()
    so that they allow dropping RCU read lock and blocking while iteration
    is in progress which in turn will be used to simplify controllers.
    
    v2: Typo fix as per Serge.
    Signed-off-by: default avatarTejun Heo <tj@kernel.org>
    Acked-by: default avatarSerge E. Hallyn <serge.hallyn@ubuntu.com>
    53fa5261
cgroup.c 146 KB