• Mahesh Salgaonkar's avatar
    powerpc/book3s: Fix machine check handling for unhandled errors · 2749a2f2
    Mahesh Salgaonkar authored
    Current code does not check for unhandled/unrecovered errors and return from
    interrupt if it is recoverable exception which in-turn triggers same machine
    check exception in a loop causing hypervisor to be unresponsive.
    
    This patch fixes this situation and forces hypervisor to panic for
    unhandled/unrecovered errors.
    
    This patch also fixes another issue where unrecoverable_exception routine
    was called in real mode in case of unrecoverable exception (MSR_RI = 0).
    This causes another exception vector 0x300 (data access) during system crash
    leading to confusion while debugging cause of the system crash.
    
    Also turn ME bit off while going down, so that when another MCE is hit during
    panic path, system will checkstop and hypervisor will get restarted cleanly
    by SP.
    
    With the above fixes we now throw correct console messages (see below) while
    crashing the system in case of unhandled/unrecoverable machine checks.
    
    --------------
    Severe Machine check interrupt [[Not recovered]
      Initiator: CPU
      Error type: UE [Instruction fetch]
        Effective address: 0000000030002864
    Oops: Machine check, sig: 7 [#1]
    SMP NR_CPUS=2048 NUMA PowerNV
    Modules linked in: bork(O) bridge stp llc kvm [last unloaded: bork]
    CPU: 36 PID: 55162 Comm: bash Tainted: G           O 3.14.0mce #1
    task: c000002d72d022d0 ti: c000000007ec0000 task.ti: c000002d72de4000
    NIP: 0000000030002864 LR: 00000000300151a4 CTR: 000000003001518c
    REGS: c000000007ec3d80 TRAP: 0200   Tainted: G           O  (3.14.0mce)
    MSR: 9000000000041002 <SF,HV,ME,RI>  CR: 28222848  XER: 20000000
    CFAR: 0000000030002838 DAR: d0000000004d0000 DSISR: 00000000 SOFTE: 1
    GPR00: 000000003001512c 0000000031f92cb0 0000000030078af0 0000000030002864
    GPR04: d0000000004d0000 0000000000000000 0000000030002864 ffffffffffffffc9
    GPR08: 0000000000000024 0000000030008af0 000000000000002c c00000000150e728
    GPR12: 9000000000041002 0000000031f90000 0000000010142550 0000000040000000
    GPR16: 0000000010143cdc 0000000000000000 00000000101306fc 00000000101424dc
    GPR20: 00000000101424e0 000000001013c6f0 0000000000000000 0000000000000000
    GPR24: 0000000010143ce0 00000000100f6440 c000002d72de7e00 c000002d72860250
    GPR28: c000002d72860240 c000002d72ac0038 0000000000000008 0000000000040000
    NIP [0000000030002864] 0x30002864
    LR [00000000300151a4] 0x300151a4
    Call Trace:
    Instruction dump:
    XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
    XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
    ---[ end trace 7285f0beac1e29d3 ]---
    
    Sending IPI to other CPUs
    IPI complete
    OPAL V3 detected !
    --------------
    Signed-off-by: default avatarMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
    Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
    2749a2f2
exceptions-64s.S 47.1 KB