• Dave Chinner's avatar
    xfs: Enforce attr3 buffer recovery order · d8f4c2d0
    Dave Chinner authored
    From the department of "WTAF? How did we miss that!?"...
    
    When we are recovering a buffer, the first thing we do is check the
    buffer magic number and extract the LSN from the buffer. If the LSN
    is older than the current LSN, we replay the modification to it. If
    the metadata on disk is newer than the transaction in the log, we
    skip it. This is a fundamental v5 filesystem metadata recovery
    behaviour.
    
    generic/482 failed with an attribute writeback failure during log
    recovery. The write verifier caught the corruption before it got
    written to disk, and the attr buffer dump looked like:
    
    XFS (dm-3): Metadata corruption detected at xfs_attr3_leaf_verify+0x275/0x2e0, xfs_attr3_leaf block 0x19be8
    XFS (dm-3): Unmount and run xfs_repair
    XFS (dm-3): First 128 bytes of corrupted metadata buffer:
    00000000: 00 00 00 00 00 00 00 00 3b ee 00 00 4d 2a 01 e1  ........;...M*..
    00000010: 00 00 00 00 00 01 9b e8 00 00 00 01 00 00 05 38  ...............8
                                      ^^^^^^^^^^^^^^^^^^^^^^^
    00000020: df 39 5e 51 58 ac 44 b6 8d c5 e7 10 44 09 bc 17  .9^QX.D.....D...
    00000030: 00 00 00 00 00 02 00 83 00 03 00 cc 0f 24 01 00  .............$..
    00000040: 00 68 0e bc 0f c8 00 10 00 00 00 00 00 00 00 00  .h..............
    00000050: 00 00 3c 31 0f 24 01 00 00 00 3c 32 0f 88 01 00  ..<1.$....<2....
    00000060: 00 00 3c 33 0f d8 01 00 00 00 00 00 00 00 00 00  ..<3............
    00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    .....
    
    The highlighted bytes are the LSN that was replayed into the
    buffer: 0x100000538. This is cycle 1, block 0x538. Prior to replay,
    that block on disk looks like this:
    
    $ sudo xfs_db -c "fsb 0x417d" -c "type attr3" -c p /dev/mapper/thin-vol
    hdr.info.hdr.forw = 0
    hdr.info.hdr.back = 0
    hdr.info.hdr.magic = 0x3bee
    hdr.info.crc = 0xb5af0bc6 (correct)
    hdr.info.bno = 105448
    hdr.info.lsn = 0x100000900
                   ^^^^^^^^^^^
    hdr.info.uuid = df395e51-58ac-44b6-8dc5-e7104409bc17
    hdr.info.owner = 131203
    hdr.count = 2
    hdr.usedbytes = 120
    hdr.firstused = 3796
    hdr.holes = 1
    hdr.freemap[0-2] = [base,size]
    
    Note the LSN stamped into the buffer on disk: 1/0x900. The version
    on disk is much newer than the log transaction that was being
    replayed. That's a bug, and should -never- happen.
    
    So I immediately went to look at xlog_recover_get_buf_lsn() to check
    that we handled the LSN correctly. I was wondering if there was a
    similar "two commits with the same start LSN skips the second
    replay" problem with buffers. I didn't get that far, because I found
    a much more basic, rudimentary bug: xlog_recover_get_buf_lsn()
    doesn't recognise buffers with XFS_ATTR3_LEAF_MAGIC set in them!!!
    
    IOWs, attr3 leaf buffers fall through the magic number checks
    unrecognised, so trigger the "recover immediately" behaviour instead
    of undergoing an LSN check. IOWs, we incorrectly replay ATTR3 leaf
    buffers and that causes silent on disk corruption of inode attribute
    forks and potentially other things....
    
    Git history shows this is *another* zero day bug, this time
    introduced in commit 50d5c8d8
    
     ("xfs: check LSN ordering for v5
    superblocks during recovery") which failed to handle the attr3 leaf
    buffers in recovery. And we've failed to handle them ever since...
    Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
    Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
    Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
    Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
    d8f4c2d0
xfs_buf_item_recover.c 28 KB