• Tejun Heo's avatar
    kernfs: restructure removal path to fix possible premature return · 35beab06
    Tejun Heo authored
    The recursive nature of kernfs_remove() means that, even if
    kernfs_remove() is not allowed to be called multiple times on the same
    node, there may be race conditions between removal of parent and its
    descendants.  While we can claim that kernfs_remove() shouldn't be
    called on one of the descendants while the removal of an ancestor is
    in progress, such rule is unnecessarily restrictive and very difficult
    to enforce.  It's better to simply allow invoking kernfs_remove() as
    the caller sees fit as long as the caller ensures that the node is
    accessible.
    
    The current behavior in such situations is broken.  Whoever enters
    removal path first takes the node off the hierarchy and then
    deactivates.  Following removers either return as soon as it notices
    that it's not the first one or can't even find the target node as it
    has already been removed from the hierarchy.  In both cases, the
    following removers may finish prematurely while the nodes which should
    be removed and drained are still being processed by the first one.
    
    This patch restructures so that multiple removers, whether through
    recursion or direction invocation, always follow the following rules.
    
    * When there are multiple concurrent removers, only one puts the base
      ref.
    
    * Regardless of which one puts the base ref, all removers are blocked
      until the target node is fully deactivated and removed.
    
    To achieve the above, removal path now first marks all descendants
    including self REMOVED and then deactivates and unlinks leftmost
    descendant one-by-one.  kernfs_deactivate() is called directly from
    __kernfs_removal() and drops and regrabs kernfs_mutex for each
    descendant to drain active refs.  As this means that multiple removers
    can enter kernfs_deactivate() for the same node, the function is
    updated so that it can handle multiple deactivators of the same node -
    only one actually deactivates but all wait till drain completion.
    
    The restructured removal path guarantees that a removed node gets
    unlinked only after the node is deactivated and drained.  Combined
    with proper multiple deactivator handling, this guarantees that any
    invocation of kernfs_remove() returns only after the node itself and
    all its descendants are deactivated, drained and removed.
    
    v2: Draining separated into a separate loop (used to be in the same
        loop as unlink) and done from __kernfs_deactivate().  This is to
        allow exposing deactivation as a separate interface later.
    
        Root node removal was broken in v1 patch.  Fixed.
    
    v3: Revert most of v2 except for root node removal fix and
        simplification of KERNFS_REMOVED setting loop.
    
    v4: Refreshed on top of ("kernfs: make kernfs_deactivate() honor
        KERNFS_LOCKDEP flag").
    Signed-off-by: default avatarTejun Heo <tj@kernel.org>
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    35beab06
dir.c 25.3 KB