• Balbir Singh's avatar
    powernv/kdump: Fix cases where the kdump kernel can get HMI's · 4145f358
    Balbir Singh authored
    Certain HMI's such as malfunction error propagate through
    all threads/core on the system. If a thread was offline
    prior to us crashing the system and jumping to the kdump
    kernel, bad things happen when it wakes up due to an HMI
    in the kdump kernel.
    
    There are several possible ways to solve this problem
    
    1. Put the offline cores in a state such that they are
    not woken up for machine check and HMI errors. This
    does not work, since we might need to wake up offline
    threads to handle TB errors
    2. Ignore HMI errors, setup HMEER to mask HMI errors,
    but this still leads the window open for any MCEs
    and masking them for the duration of the dump might
    be a concern
    3. Wake up offline CPUs, as in send them to
    crash_ipi_callback (not wake them up as in mark them
    online as seen by the hotplug). kexec does a
    wake_online_cpus() call, this patch does something
    similar, but instead sends an IPI and forces them to
    crash_ipi_callback()
    
    This patch takes approach #3.
    
    Care is taken to enable this only for powenv platforms
    via crash_wake_offline (a global value set at setup
    time). The crash code sends out IPI's to all CPU's
    which then move to crash_ipi_callback and kexec_smp_wait().
    Signed-off-by: default avatarBalbir Singh <bsingharora@gmail.com>
    Reviewed-by: default avatarNicholas Piggin <npiggin@gmail.com>
    Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
    4145f358
crash.c 8.79 KB