• Andi Kleen's avatar
    [PATCH] New machine check handler for x86-64 · 047379fb
    Andi Kleen authored
    This adds a new completely rewritten machine check handler for x86-64.
    The old one never worked on 2.6.
    
    The new handler has many improvements. It closely follows the Intel and AMD
    recommendations on MCE handlers now (the old one had many violations). It handles
    unrecoverable errors in user space better now - it will only kill the process now
    if possible instead of panicing.
    
    This one is CPU independent now - it should work on any CPU that supports the standard
    x86 MCA architecture.
    
    This new handler only logs fatal errors that lead to kernel panic to the console.
    Non fatal errors are logged race free into a new (non ring) buffer now
    and supplied to the user using a new character device.  The old one could
    deadlock on console and printk locks. This also separates machine check errors
    from real kernel errors better. The new buffer has been also designed to
    be easily accessible from external debugging tools: it has a signature
    and could be even recovered after reboot. It is not organized as a ring buffer -
    this means the first errors are kept unless explicitely cleared.
    
    The new error formats can be parsed using ftp://ftp.suse.com/pub/people/ak/x86-64/mcelog.c
    The new character device for it can be created with mknod /dev/mcelog c 10 227
    
    There is a new sysfs interface to configure the machine check handler.
    It has a "tolerant" parameter that defines the aggressiveness of the machine check:
    
    0: always panic
    1: panic if deadlock possible (e.g. MCE happened in the kernel)
    2: try to avoid panic
    
    Default is 2
    
    Despite of having more features the new handler is shorter.
    047379fb
mce.c 10.6 KB