1. 26 Nov, 2010 2 commits
    • Heiko Carstens's avatar
      nohz: Fix printk_needs_cpu() return value on offline cpus · 61ab2544
      Heiko Carstens authored
      This patch fixes a hang observed with 2.6.32 kernels where timers got enqueued
      on offline cpus.
      
      printk_needs_cpu() may return 1 if called on offline cpus. When a cpu gets
      offlined it schedules the idle process which, before killing its own cpu, will
      call tick_nohz_stop_sched_tick(). That function in turn will call
      printk_needs_cpu() in order to check if the local tick can be disabled. On
      offline cpus this function should naturally return 0 since regardless if the
      tick gets disabled or not the cpu will be dead short after. That is besides the
      fact that __cpu_disable() should already have made sure that no interrupts on
      the offlined cpu will be delivered anyway.
      
      In this case it prevents tick_nohz_stop_sched_tick() to call
      select_nohz_load_balancer(). No idea if that really is a problem. However what
      made me debug this is that on 2.6.32 the function get_nohz_load_balancer() is
      used within __mod_timer() to select a cpu on which a timer gets enqueued. If
      printk_needs_cpu() returns 1 then the nohz_load_balancer cpu doesn't get
      updated when a cpu gets offlined. It may contain the cpu number of an offline
      cpu. In turn timers get enqueued on an offline cpu and not very surprisingly
      they never expire and cause system hangs.
      
      This has been observed 2.6.32 kernels. On current kernels __mod_timer() uses
      get_nohz_timer_target() which doesn't have that problem. However there might be
      other problems because of the too early exit tick_nohz_stop_sched_tick() in
      case a cpu goes offline.
      
      Easiest way to fix this is just to test if the current cpu is offline and call
      printk_tick() directly which clears the condition.
      
      Alternatively I tried a cpu hotplug notifier which would clear the condition,
      however between calling the notifier function and printk_needs_cpu() something
      could have called printk() again and the problem is back again. This seems to
      be the safest fix.
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: stable@kernel.org
      LKML-Reference: <20101126120235.406766476@de.ibm.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      61ab2544
    • Heiko Carstens's avatar
      printk: Fix wake_up_klogd() vs cpu hotplug · 49f41383
      Heiko Carstens authored
      wake_up_klogd() may get called from preemptible context but uses
      __raw_get_cpu_var() to write to a per cpu variable. If it gets preempted
      between getting the address and writing to it, the cpu in question could be
      offline if the process gets scheduled back and hence writes to the per cpu data
      of an offline cpu.
      
      This buggy behaviour was introduced with fa33507a "printk: robustify
      printk, fix #2" which was supposed to fix a "using smp_processor_id() in
      preemptible" warning.
      
      Let's use this_cpu_write() instead which disables preemption and makes sure
      that the outlined scenario cannot happen.
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Acked-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20101126124247.GC7023@osiris.boeblingen.de.ibm.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      49f41383
  2. 18 Nov, 2010 2 commits
  3. 17 Nov, 2010 3 commits
  4. 16 Nov, 2010 17 commits
  5. 15 Nov, 2010 16 commits