Commit 680b6cfd authored by Hidetoshi Seto's avatar Hidetoshi Seto Committed by Ingo Molnar

x86, mce: CE in last bank prevents panic by unknown MCE

If MCE handler is called but none of mces_seen have machine
check event which might signal the MCE (i.e. event higher than
MCE_KEEP_SEVERITY), panic with "Machine check from unknown
source" will be taken since the MCE is assumed to be signaled
from external agent or so.

Usually mces_seen never point MCE_KEEP_SEVERITY event such as
CE. But it can happen because initial value of mces_seen is
accidentally modified by mce_no_way_out() - in case if
mce_no_way_out() run through all banks and the last bank has
the CE, mces_seen points the CE and the "panic by unknown" will
not be taken.

This patch fixes this undesired behavior, and clarifies the logic.
Signed-off-by: default avatarHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Dongming <jin.dongming@np.css.fujitsu.com>
LKML-Reference: <4A94E244.3020301@jp.fujitsu.com>
Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
Reported-by: default avatarJin Dongming <jin.dongming@np.css.fujitsu.com>
parent bf783f9f
...@@ -612,7 +612,7 @@ static int mce_timed_out(u64 *t) ...@@ -612,7 +612,7 @@ static int mce_timed_out(u64 *t)
* This way we prevent any potential data corruption in a unrecoverable case * This way we prevent any potential data corruption in a unrecoverable case
* and also makes sure always all CPU's errors are examined. * and also makes sure always all CPU's errors are examined.
* *
* Also this detects the case of an machine check event coming from outer * Also this detects the case of a machine check event coming from outer
* space (not detected by any CPUs) In this case some external agent wants * space (not detected by any CPUs) In this case some external agent wants
* us to shut down, so panic too. * us to shut down, so panic too.
* *
...@@ -665,7 +665,7 @@ static void mce_reign(void) ...@@ -665,7 +665,7 @@ static void mce_reign(void)
* No machine check event found. Must be some external * No machine check event found. Must be some external
* source or one CPU is hung. Panic. * source or one CPU is hung. Panic.
*/ */
if (!m && tolerant < 3) if (global_worst <= MCE_KEEP_SEVERITY && tolerant < 3)
mce_panic("Machine check from unknown source", NULL, NULL); mce_panic("Machine check from unknown source", NULL, NULL);
/* /*
...@@ -889,11 +889,11 @@ void do_machine_check(struct pt_regs *regs, long error_code) ...@@ -889,11 +889,11 @@ void do_machine_check(struct pt_regs *regs, long error_code)
mce_setup(&m); mce_setup(&m);
m.mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS); m.mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
no_way_out = mce_no_way_out(&m, &msg);
final = &__get_cpu_var(mces_seen); final = &__get_cpu_var(mces_seen);
*final = m; *final = m;
no_way_out = mce_no_way_out(&m, &msg);
barrier(); barrier();
/* /*
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment