• Brian Foster's avatar
    xfs: detect agfl count corruption and reset agfl · 2c7384c8
    Brian Foster authored
    BugLink: https://bugs.launchpad.net/bugs/1776177
    
    commit a27ba260 upstream.
    
    The struct xfs_agfl v5 header was originally introduced with
    unexpected padding that caused the AGFL to operate with one less
    slot than intended. The header has since been packed, but the fix
    left an incompatibility for users who upgrade from an old kernel
    with the unpacked header to a newer kernel with the packed header
    while the AGFL happens to wrap around the end. The newer kernel
    recognizes one extra slot at the physical end of the AGFL that the
    previous kernel did not. The new kernel will eventually attempt to
    allocate a block from that slot, which contains invalid data, and
    cause a crash.
    
    This condition can be detected by comparing the active range of the
    AGFL to the count. While this detects a padding mismatch, it can
    also trigger false positives for unrelated flcount corruption. Since
    we cannot distinguish a size mismatch due to padding from unrelated
    corruption, we can't trust the AGFL enough to simply repopulate the
    empty slot.
    
    Instead, avoid unnecessarily complex detection logic and and use a
    solution that can handle any form of flcount corruption that slips
    through read verifiers: distrust the entire AGFL and reset it to an
    empty state. Any valid blocks within the AGFL are intentionally
    leaked. This requires xfs_repair to rectify (which was already
    necessary based on the state the AGFL was found in). The reset
    mitigates the side effect of the padding mismatch problem from a
    filesystem crash to a free space accounting inconsistency. The
    generic approach also means that this patch can be safely backported
    to kernels with or without a packed struct xfs_agfl.
    
    Check the AGF for an invalid freelist count on initial read from
    disk. If detected, set a flag on the xfs_perag to indicate that a
    reset is required before the AGFL can be used. In the first
    transaction that attempts to use a flagged AGFL, reset it to empty,
    warn the user about the inconsistency and allow the freelist fixup
    code to repopulate the AGFL with new blocks. The xfs_perag flag is
    cleared to eliminate the need for repeated checks on each block
    allocation operation.
    
    This allows kernels that include the packing fix commit 96f859d5
    ("libxfs: pack the agfl header structure so XFS_AGFL_SIZE is correct")
    to handle older unpacked AGFL formats without a filesystem crash.
    Suggested-by: default avatarDave Chinner <david@fromorbit.com>
    Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
    Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
    Reviewed-by Dave Chiluk <chiluk+linuxxfs@indeed.com>
    Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: default avatarDave Chiluk <chiluk+linuxxfs@indeed.com>
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    Signed-off-by: default avatarJuerg Haefliger <juergh@canonical.com>
    Signed-off-by: default avatarKhalid Elmously <khalid.elmously@canonical.com>
    2c7384c8
xfs_trace.h 66.8 KB