• Dave Chinner's avatar
    xfs: ensure verifiers are attached to recovered buffers · 13c22aa3
    Dave Chinner authored
    commit 67dc288c upstream.
    
    Crash testing of CRC enabled filesystems has resulted in a number of
    reports of bad CRCs being detected after the filesystem was mounted.
    Errors such as the following were being seen:
    
    XFS (sdb3): Mounting V5 Filesystem
    XFS (sdb3): Starting recovery (logdev: internal)
    XFS (sdb3): Metadata CRC error detected at xfs_agf_read_verify+0x5a/0x100 [xfs], block 0x1
    XFS (sdb3): Unmount and run xfs_repair
    XFS (sdb3): First 64 bytes of corrupted metadata buffer:
    ffff880136ffd600: 58 41 47 46 00 00 00 01 00 00 00 00 00 0f aa 40  XAGF...........@
    ffff880136ffd610: 00 02 6d 53 00 02 77 f8 00 00 00 00 00 00 00 01  ..mS..w.........
    ffff880136ffd620: 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 03  ................
    ffff880136ffd630: 00 00 00 04 00 08 81 d0 00 08 81 a7 00 00 00 00  ................
    XFS (sdb3): metadata I/O error: block 0x1 ("xfs_trans_read_buf_map") error 74 numblks 1
    
    The errors were typically being seen in AGF, AGI and their related
    btree block buffers some time after log recovery had run. Often it
    wasn't until later subsequent mounts that the problem was
    discovered. The common symptom was a buffer with the correct
    contents, but a CRC and an LSN that matched an older version of the
    contents.
    
    Some debug added to _xfs_buf_ioapply() indicated that buffers were
    being written without verifiers attached to them from log recovery,
    and Jan Kara isolated the cause to log recovery readahead an dit's
    interactions with buffers that had a more recent LSN on disk than
    the transaction being recovered. In this case, the buffer did not
    get a verifier attached, and os when the second phase of log
    recovery ran and recovered EFIs and unlinked inodes, the buffers
    were modified and written without the verifier running. Hence they
    had up to date contents, but stale LSNs and CRCs.
    
    Fix it by attaching verifiers to buffers we skip due to future LSN
    values so they don't escape into the buffer cache without the
    correct verifier attached.
    
    This patch is based on analysis and a patch from Jan Kara.
    Reported-by: default avatarJan Kara <jack@suse.cz>
    Reported-by: default avatarFanael Linithien <fanael4@gmail.com>
    Reported-by: default avatarGrozdan <neutrino8@gmail.com>
    Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
    Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
    Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
    Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    13c22aa3
xfs_log_recover.c 127 KB