1. 22 Sep, 2009 3 commits
    • Huang Ying's avatar
      x86: mce, inject: Use real inject-msg in raise_local · 14c0abf1
      Huang Ying authored
      Current raise_local() uses a struct mce that comes from mce_write()
      as a parameter instead of the real inject-msg, so when we set
      mce.finished = 0 to clear injected MCE, the real inject stays
      valid.
      
      This will cause the remaining inject-msg affect the next injection,
      which is not desired.
      
      To fix this, real inject-msg is used in raise_local instead of the
      one on the stack.
      
      This patch is based on the diagnosis and the fixes by Dean Nelson.
      Reported-by: default avatarDean Nelson <dnelson@redhat.com>
      Signed-off-by: default avatarHuang Ying <ying.huang@intel.com>
      Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      LKML-Reference: <1253601357.15717.757.camel@yhuang-dev.sh.intel.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      14c0abf1
    • Ingo Molnar's avatar
      x86: mce: Fix thermal throttling message storm · b417c9fd
      Ingo Molnar authored
      If a system switches back and forth between hot and cold mode,
      the MCE code will print a stream of critical kernel messages.
      
      Extend the throttling code to properly notice this, by
      only printing the first hot + cold transition and omitting
      the rest up to CHECK_INTERVAL (5 minutes).
      
      This way we'll only get a single incident of:
      
       [  102.356584] CPU0: Temperature above threshold, cpu clock throttled (total events = 1)
       [  102.357000] Disabling lock debugging due to kernel taint
       [  102.369223] CPU0: Temperature/speed normal
      
      Every 5 minutes. The 'total events' count tells the number of cold/hot
      transitions detected, should overheating occur after 5 minutes again:
      
      [  402.357580] CPU0: Temperature above threshold, cpu clock throttled (total events = 24891)
      [  402.358001] CPU0: Temperature/speed normal
      [  450.704142] Machine check events logged
      
      Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      b417c9fd
    • Ingo Molnar's avatar
      x86: mce: Clean up thermal throttling state tracking code · 39676840
      Ingo Molnar authored
      Instead of a mess of three separate percpu variables, consolidate
      the state into a single structure.
      
      Also clean up therm_throt_process(), use cleaner and more
      understandable variable names and a clearer logic.
      
      This, without changing the logic, makes the code more
      streamlined, more readable and smaller as well:
      
         text	   data	    bss	    dec	    hex	filename
         1487	    169	      4	   1660	    67c	therm_throt.o.before
         1432	    176	      4	   1612	    64c	therm_throt.o.after
      
      Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      39676840
  2. 21 Sep, 2009 23 commits
  3. 20 Sep, 2009 14 commits