• Dave Chinner's avatar
    xfs: abort metadata writeback on permanent errors · ac8809f9
    Dave Chinner authored
    If we are doing aysnc writeback of metadata, we can get write errors
    but have nobody to report them to. At the moment, we simply attempt
    to reissue the write from io completion in the hope that it's a
    transient error.
    
    When it's not a transient error, the buffer is stuck forever in
    this loop, and we cannot break out of it. Eventually, unmount will
    hang because the AIL cannot be emptied and everything goes downhill
    from them.
    
    To solve this problem, only retry the write IO once before aborting
    it. We don't throw the buffer away because some transient errors can
    last minutes (e.g.  FC path failover) or even hours (thin
    provisioned devices that have run out of backing space) before they
    go away. Hence we really want to keep trying until we can't try any
    more.
    
    Because the buffer was not cleaned, however, it does not get removed
    from the AIL and hence the next pass across the AIL will start IO on
    it again. As such, we still get the "retry forever" semantics that
    we currently have, but we allow other access to the buffer in the
    mean time. Meanwhile the filesystem can continue to modify the
    buffer and relog it, so the IO errors won't hang the log or the
    filesystem.
    
    Now when we are pushing the AIL, we can see all these "permanent IO
    error" buffers and we can issue a warning about failures before we
    retry the IO. We can also catch these buffers when unmounting an
    issue a corruption warning, too.
    Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
    Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
    Signed-off-by: default avatarBen Myers <bpm@sgi.com>
    ac8809f9
xfs_buf.h 11.9 KB