• Brian Foster's avatar
    xfs: wake commit waiters on CIL abort before log item abort · 545aa41f
    Brian Foster authored
    XFS shutdown deadlocks have been reproduced by fstest generic/475.
    The deadlock signature involves log I/O completion running error
    handling to abort logged items and waiting for an inode cluster
    buffer lock in the buffer item unpin handler. The buffer lock is
    held by xfsaild attempting to flush an inode. The buffer happens to
    be pinned and so xfs_iflush() triggers an async log force to begin
    work required to get it unpinned. The log force is blocked waiting
    on the commit completion, which never occurs and thus leaves the
    filesystem deadlocked.
    
    The root problem is that aborted log I/O completion pots commit
    completion behind callback completion, which is unexpected for async
    log forces. Under normal running conditions, an async log force
    returns to the caller once the CIL ctx has been formatted/submitted
    and the commit completion event triggered at the tail end of
    xlog_cil_push(). If the filesystem has shutdown, however, we rely on
    xlog_cil_committed() to trigger the completion event and it happens
    to do so after running log item unpin callbacks. This makes it
    unsafe to invoke an async log force from contexts that hold locks
    that might also be required in log completion processing.
    
    To address this problem, wake commit completion waiters before
    aborting log items in the log I/O completion handler. This ensures
    that an async log force will not deadlock on held locks if the
    filesystem happens to shutdown. Note that it is still unsafe to
    issue a sync log force while holding such locks because a sync log
    force explicitly waits on the force completion, which occurs after
    log I/O completion processing.
    Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
    Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
    545aa41f
xfs_log_cil.c 36.7 KB