• Dave Chinner's avatar
    xfs: remote attribute headers contain an invalid LSN · e3c32ee9
    Dave Chinner authored
    In recent testing, a system that crashed failed log recovery on
    restart with a bad symlink buffer magic number:
    
    XFS (vda): Starting recovery (logdev: internal)
    XFS (vda): Bad symlink block magic!
    XFS: Assertion failed: 0, file: fs/xfs/xfs_log_recover.c, line: 2060
    
    On examination of the log via xfs_logprint, none of the symlink
    buffers in the log had a bad magic number, nor were any other types
    of buffer log format headers mis-identified as symlink buffers.
    Tracing was used to find the buffer the kernel was tripping over,
    and xfs_db identified it's contents as:
    
    000: 5841524d 00000000 00000346 64d82b48 8983e692 d71e4680 a5f49e2c b317576e
    020: 00000000 00602038 00000000 006034ce d0020000 00000000 4d4d4d4d 4d4d4d4d
    040: 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d
    060: 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d
    .....
    
    This is a remote attribute buffer, which are notable in that they
    are not logged but are instead written synchronously by the remote
    attribute code so that they exist on disk before the attribute
    transactions are committed to the journal.
    
    The above remote attribute block has an invalid LSN in it - cycle
    0xd002000, block 0 - which means when log recovery comes along to
    determine if the transaction that writes to the underlying block
    should be replayed, it sees a block that has a future LSN and so
    does not replay the buffer data in the transaction. Instead, it
    validates the buffer magic number and attaches the buffer verifier
    to it.  It is this buffer magic number check that is failing in the
    above assert, indicating that we skipped replay due to the LSN of
    the underlying buffer.
    
    The problem here is that the remote attribute buffers cannot have a
    valid LSN placed into them, because the transaction that contains 
    the attribute tree pointer changes and the block allocation that the
    attribute data is being written to hasn't yet been committed. Hence
    the LSN field in the attribute block is completely unwritten,
    thereby leaving the underlying contents of the block in the LSN
    field. It could have any value, and hence a future overwrite of the
    block by log recovery may or may not work correctly.
    
    Fix this by always writing an invalid LSN to the remote attribute
    block, as any buffer in log recovery that needs to write over the
    remote attribute should occur. We are protected from having old data
    written over the attribute by the fact that freeing the block before
    the remote attribute is written will result in the buffer being
    marked stale in the log and so all changes prior to the buffer stale
    transaction will be cancelled by log recovery.
    
    Hence it is safe to ignore the LSN in the case or synchronously
    written, unlogged metadata such as remote attribute blocks, and to
    ensure we do that correctly, we need to write an invalid LSN to all
    remote attribute blocks to trigger immediate recovery of metadata
    that is written over the top.
    
    As a further protection for filesystems that may already have remote
    attribute blocks with bad LSNs on disk, change the log recovery code
    to always trigger immediate recovery of metadata over remote
    attribute blocks.
    
    cc: <stable@vger.kernel.org>
    Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
    Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
    Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
    e3c32ee9
xfs_log_recover.c 128 KB