• Borislav Petkov (AMD)'s avatar
    x86/mce: Prevent duplicate error records · c3629dd7
    Borislav Petkov (AMD) authored
    
    
    A legitimate use case of the MCA infrastructure is to have the firmware
    log all uncorrectable errors and also, have the OS see all correctable
    errors.
    
    The uncorrectable, UCNA errors are usually configured to be reported
    through an SMI. CMCI, which is the correctable error reporting
    interrupt, uses SMI too and having both enabled, leads to unnecessary
    overhead.
    
    So what ends up happening is, people disable CMCI in the wild and leave
    on only the UCNA SMI.
    
    When CMCI is disabled, the MCA infrastructure resorts to polling the MCA
    banks. If a MCA MSR is shared between the logical threads, one error
    ends up getting logged multiple times as the polling runs on every
    logical thread.
    
    Therefore, introduce locking on the Intel side of the polling routine to
    prevent such duplicate error records from appearing.
    
    Based on a patch by Aristeu Rozanski <aris@ruivo.org>.
    Signed-off-by: default avatarBorislav Petkov (AMD) <bp@alien8.de>
    Tested-by: default avatarTony Luck <tony.luck@intel.com>
    Acked-by: default avatarAristeu Rozanski <aris@ruivo.org>
    Link: https://lore.kernel.org/r/20230515143225.GC4090740@cathedrallabs.org
    c3629dd7
core.c 67.6 KB