• Thomas Gleixner's avatar
    Revert "x86/apic: Ignore secondary threads if nosmt=force" · 506a66f3
    Thomas Gleixner authored
    Dave Hansen reported, that it's outright dangerous to keep SMT siblings
    disabled completely so they are stuck in the BIOS and wait for SIPI.
    
    The reason is that Machine Check Exceptions are broadcasted to siblings and
    the soft disabled sibling has CR4.MCE = 0. If a MCE is delivered to a
    logical core with CR4.MCE = 0, it asserts IERR#, which shuts down or
    reboots the machine. The MCE chapter in the SDM contains the following
    blurb:
    
        Because the logical processors within a physical package are tightly
        coupled with respect to shared hardware resources, both logical
        processors are notified of machine check errors that occur within a
        given physical processor. If machine-check exceptions are enabled when
        a fatal error is reported, all the logical processors within a physical
        package are dispatched to the machine-check exception handler. If
        machine-check exceptions are disabled, the logical processors enter the
        shutdown state and assert the IERR# signal. When enabling machine-check
        exceptions, the MCE flag in control register CR4 should be set for each
        logical processor.
    
    Reverting the commit which ignores siblings at enumeration time solves only
    half of the problem. The core cpuhotplug logic needs to be adjusted as
    well.
    
    This thoughtful engineered mechanism also turns the boot process on all
    Intel HT enabled systems into a MCE lottery. MCE is enabled on the boot CPU
    before the secondary CPUs are brought up. Depending on the number of
    physical cores the window in which this situation can happen is smaller or
    larger. On a HSW-EX it's about 750ms:
    
    MCE is enabled on the boot CPU:
    
    [    0.244017] mce: CPU supports 22 MCE banks
    
    The corresponding sibling #72 boots:
    
    [    1.008005] .... node  #0, CPUs:    #72
    
    That means if an MCE hits on physical core 0 (logical CPUs 0 and 72)
    between these two points the machine is going to shutdown. At least it's a
    known safe state.
    
    It's obvious that the early boot can be hit by an MCE as well and then runs
    into the same situation because MCEs are not yet enabled on the boot CPU.
    But after enabling them on the boot CPU, it does not make any sense to
    prevent the kernel from recovering.
    
    Adjust the nosmt kernel parameter documentation as well.
    
    Reverts: 2207def7 ("x86/apic: Ignore secondary threads if nosmt=force")
    Reported-by: default avatarDave Hansen <dave.hansen@intel.com>
    Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
    Tested-by: default avatarTony Luck <tony.luck@intel.com>
    506a66f3
boot.c 42.8 KB