Commit a263898f authored by Ingo Molnar's avatar Ingo Molnar Committed by Linus Torvalds

CPU hotplug: fix cpu_is_offline() on !CONFIG_HOTPLUG_CPU

make randconfig bootup testing found that the cpufreq code
crashes on bootup, if the powernow-k8 driver is enabled and
if maxcpus=1 passed on the boot line to a !CONFIG_HOTPLUG_CPU
kernel.

First lockdep found out that there's an inconsistent unlock
sequence:

 =====================================
 [ BUG: bad unlock balance detected! ]
 -------------------------------------
 swapper/1 is trying to release lock (&per_cpu(cpu_policy_rwsem, cpu)) at:
 [<ffffffff806ffd8e>] unlock_policy_rwsem_write+0x3c/0x42
 but there are no more locks to release!

Call Trace:
 [<ffffffff806ffd8e>] unlock_policy_rwsem_write+0x3c/0x42
 [<ffffffff80251c29>] print_unlock_inbalance_bug+0x104/0x12c
 [<ffffffff80252f3a>] mark_held_locks+0x56/0x94
 [<ffffffff806ffd8e>] unlock_policy_rwsem_write+0x3c/0x42
 [<ffffffff807008b6>] cpufreq_add_dev+0x2a8/0x5c4
 ...

then shortly afterwards the cpufreq code crashed on an assert:

 ------------[ cut here ]------------
 kernel BUG at drivers/cpufreq/cpufreq.c:1068!
 invalid opcode: 0000 [1] SMP
 [...]
 Call Trace:
  [<ffffffff805145d6>] sysdev_driver_unregister+0x5b/0x91
  [<ffffffff806ff520>] cpufreq_register_driver+0x15d/0x1a2
  [<ffffffff80cc0596>] powernowk8_init+0x86/0x94
 [...]
 ---[ end trace 1e9219be2b4431de ]---

the bug was caused by maxcpus=1 bootup, which brought up the
secondary core as !cpu_online() but !cpu_is_offline() either,
which on on !CONFIG_HOTPLUG_CPU is always 0 (include/linux/cpu.h):

  /* CPUs don't go offline once they're online w/o CONFIG_HOTPLUG_CPU */
  static inline int cpu_is_offline(int cpu) { return 0; }

but the cpufreq code uses cpu_online() and cpu_is_offline() in
a mixed way - the low-level drivers use cpu_online(), while
the cpufreq core uses cpu_is_offline(). This opened up the
possibility to add the non-initialized sysdev device of the
secondary core:

 cpufreq-core: trying to register driver powernow-k8
 cpufreq-core: adding CPU 0
 powernow-k8: BIOS error - no PSB or ACPI _PSS objects
 cpufreq-core: initialization failed
 cpufreq-core: adding CPU 1
 cpufreq-core: initialization failed

which then blew up. The fix is to make cpu_is_offline() always
the negation of cpu_online(). With that fix applied the kernel
boots up fine without crashing:

 Calling initcall 0xffffffff80cc0510: powernowk8_init+0x0/0x94()
 powernow-k8: Found 1 AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ processors (1 cpu cores) (version 2.20.00)
 powernow-k8: BIOS error - no PSB or ACPI _PSS objects
 initcall 0xffffffff80cc0510: powernowk8_init+0x0/0x94() returned -19.
 initcall 0xffffffff80cc0510 ran for 19 msecs: powernowk8_init+0x0/0x94()
 Calling initcall 0xffffffff80cc328f: init_lapic_nmi_sysfs+0x0/0x39()

We could fix this by making CPU enumeration aware of max_cpus, but that
would be more fragile IMO, and the cpu_online(cpu) != cpu_is_offline(cpu)
possibility was quite confusing and a continuous source of bugs too.

Most distributions have kernels with CPU hotplug enabled, so this bug
remained hidden for a long time.

Bug forensics:

The broken cpu_is_offline() API variant was introduced via:

 commit a59d2e4e
 Author: Rusty Russell <rusty@rustcorp.com.au>
 Date:   Mon Mar 8 06:06:03 2004 -0800

     [PATCH] minor cleanups for hotplug CPUs

( this predates linux-2.6.git, this commit is available from Thomas's
  historic git tree. )

Then 1.5 years later the cpufreq code made use of it:

 commit c32b6b8e
 Author: Ashok Raj <ashok.raj@intel.com>
 Date:   Sun Oct 30 14:59:54 2005 -0800

     [PATCH] create and destroy cpufreq sysfs entries based on cpu notifiers

 +       if (cpu_is_offline(cpu))
 +               return 0;

which is a correct use of the subtly broken new API. v2.6.15 then
shipped with this bug included.

then it took two more years for random-kernel qa to hit it.
Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
parent 57a04513
...@@ -107,7 +107,6 @@ extern void unlock_cpu_hotplug(void); ...@@ -107,7 +107,6 @@ extern void unlock_cpu_hotplug(void);
#define register_hotcpu_notifier(nb) register_cpu_notifier(nb) #define register_hotcpu_notifier(nb) register_cpu_notifier(nb)
#define unregister_hotcpu_notifier(nb) unregister_cpu_notifier(nb) #define unregister_hotcpu_notifier(nb) unregister_cpu_notifier(nb)
int cpu_down(unsigned int cpu); int cpu_down(unsigned int cpu);
#define cpu_is_offline(cpu) unlikely(!cpu_online(cpu))
#else /* CONFIG_HOTPLUG_CPU */ #else /* CONFIG_HOTPLUG_CPU */
...@@ -122,9 +121,6 @@ static inline void cpuhotplug_mutex_unlock(struct mutex *cpu_hp_mutex) ...@@ -122,9 +121,6 @@ static inline void cpuhotplug_mutex_unlock(struct mutex *cpu_hp_mutex)
/* These aren't inline functions due to a GCC bug. */ /* These aren't inline functions due to a GCC bug. */
#define register_hotcpu_notifier(nb) ({ (void)(nb); 0; }) #define register_hotcpu_notifier(nb) ({ (void)(nb); 0; })
#define unregister_hotcpu_notifier(nb) ({ (void)(nb); }) #define unregister_hotcpu_notifier(nb) ({ (void)(nb); })
/* CPUs don't go offline once they're online w/o CONFIG_HOTPLUG_CPU */
static inline int cpu_is_offline(int cpu) { return 0; }
#endif /* CONFIG_HOTPLUG_CPU */ #endif /* CONFIG_HOTPLUG_CPU */
#ifdef CONFIG_PM_SLEEP_SMP #ifdef CONFIG_PM_SLEEP_SMP
......
...@@ -397,6 +397,8 @@ extern cpumask_t cpu_present_map; ...@@ -397,6 +397,8 @@ extern cpumask_t cpu_present_map;
#define cpu_present(cpu) ((cpu) == 0) #define cpu_present(cpu) ((cpu) == 0)
#endif #endif
#define cpu_is_offline(cpu) unlikely(!cpu_online(cpu))
#ifdef CONFIG_SMP #ifdef CONFIG_SMP
extern int nr_cpu_ids; extern int nr_cpu_ids;
#define any_online_cpu(mask) __any_online_cpu(&(mask)) #define any_online_cpu(mask) __any_online_cpu(&(mask))
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment