1. 27 Jan, 2012 1 commit
  2. 03 Jan, 2012 6 commits
    • Tony Luck's avatar
      x86/mce: Recognise machine check bank signature for data path error · 5f7b88d5
      Tony Luck authored
      Action required data path signature is defined in table 15-19 of SDM:
      
      +-----------------------------------------------------------------------------+
      | SRAR Error | Valid | OVER | UC | EN | MISCV | ADDRV | PCC | S | AR | MCACOD |
      | Data Load  |     1 |    0 |  1 |  1 |     1 |     1 |   0 | 1 |  1 |  0x134 |
      +-----------------------------------------------------------------------------+
      
      Recognise this, and pass MCE_AR_SEVERITY code back to do_machine_check() if
      we have the action handler configured (CONFIG_MEMORY_FAILURE=y)
      Acked-by: default avatarBorislav Petkov <bp@amd64.org>
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      5f7b88d5
    • Tony Luck's avatar
      x86/mce: Handle "action required" errors · a8c321fb
      Tony Luck authored
      All non-urgent actions (reporting low severity errors and handling
      "action-optional" errors) are now handled by a work queue. This
      means that TIF_MCE_NOTIFY can be used to block execution for a
      thread experiencing an "action-required" fault until we get all
      cpus out of the machine check handler (and the thread that hit
      the fault into mce_notify_process().
      
      We use the new mce_{save,find,clear}_info() API to get information
      from do_machine_check() to mce_notify_process(), and then use the
      newly improved memory_failure(..., MF_ACTION_REQUIRED) to handle
      the error (possibly signalling the process).
      
      Update some comments to make the new code flows clearer.
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      a8c321fb
    • Tony Luck's avatar
      x86/mce: Add mechanism to safely save information in MCE handler · af104e39
      Tony Luck authored
      Machine checks on Intel cpus interrupt execution on all cpus, regardless
      of interrupt masking.  We have a need to save some data about the cause
      of the machine check (physical address) in the machine check handler that
      can be retrieved later to attempt recovery in a more flexible execution
      state.
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      af104e39
    • Tony Luck's avatar
      x86/mce: Create helper function to save addr/misc when needed · 85f92694
      Tony Luck authored
      The MCI_STATUS_MISCV and MCI_STATUS_ADDRV bits in the bank status
      registers define whether the MISC and ADDR registers respectively
      contain valid data - provide a helper function to check these bits
      and read the registers when needed.
      
      In addition, processors that support software error recovery (as
      indicated by the MCG_SER_P bit in the MCG_CAP register) may include
      some undefined bits in the ADDR register - mask these out.
      Acked-by: default avatarBorislav Petkov <bp@amd64.org>
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      85f92694
    • Tony Luck's avatar
      HWPOISON: Add code to handle "action required" errors. · 7329bbeb
      Tony Luck authored
      Add new flag bit "MF_ACTION_REQUIRED" to be used by machine check
      code to force a signal with si_code = BUS_MCEERR_AR in the case
      where the error occurs in processor execution context. Pass the
      flags argument along call chain:
      	memory_failure()
      	  hwpoison_user_mappings()
      	    kill_procs()
      	      kill_proc()
      
      Drop the "_ao" suffix from kill_procs_ao() and kill_proc_ao() since
      they can now handle "action required" as well as "action optional" errors.
      Acked-by: default avatarBorislav Petkov <bp@amd64.org>
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      7329bbeb
    • Tony Luck's avatar
      HWPOISON: Clean up memory_failure() vs. __memory_failure() · cd42f4a3
      Tony Luck authored
      There is only one caller of memory_failure(), all other users call
      __memory_failure() and pass in the flags argument explicitly. The
      lone user of memory_failure() will soon need to pass flags too.
      
      Add flags argument to the callsite in mce.c. Delete the old memory_failure()
      function, and then rename __memory_failure() without the leading "__".
      
      Provide clearer message when action optional memory errors are ignored.
      Acked-by: default avatarBorislav Petkov <bp@amd64.org>
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      cd42f4a3
  3. 09 Dec, 2011 33 commits