Commit 9c388a5e authored by Thomas Gleixner's avatar Thomas Gleixner

watchdog/harclockup/perf: Revert a33d4484 ("watchdog/hardlockup/perf:...

watchdog/harclockup/perf: Revert a33d4484 ("watchdog/hardlockup/perf: Simplify deferred event destroy")

Guenter reported a crash in the watchdog/perf code, which is caused by
cleanup() and enable() running concurrently. The reason for this is:

The watchdog functions are serialized via the watchdog_mutex and cpu
hotplug locking, but the enable of the perf based watchdog happens in
context of the unpark callback of the smpboot thread. But that unpark
function is not synchronous inside the locking. The unparking of the thread
just wakes it up and leaves so there is no guarantee when the thread is
executing.

If it starts running _before_ the cleanup happened then it will create a
event and overwrite the dead event pointer. The new event is then cleaned
up because the event is marked dead.

    lock(watchdog_mutex);
    lockup_detector_reconfigure();
        cpus_read_lock();
	stop();
	   park()
	update();
	start();
	   unpark()
	cpus_read_unlock();		thread runs()
					  overwrite dead event ptr
	cleanup();
	  free new event, which is active inside perf....
    unlock(watchdog_mutex);

The park side is safe as that actually waits for the thread to reach
parked state.

Commit a33d4484 removed the protection against this kind of scenario
under the stupid assumption that the hotplug serialization and the
watchdog_mutex cover everything. 

Bring it back.

Reverts: a33d4484 ("watchdog/hardlockup/perf: Simplify deferred event destroy")
Reported-and-tested-by: default avatarGuenter Roeck <linux@roeck-us.net>
Signed-off-by: default avatarThomas Feels-stupid Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Don Zickus <dzickus@redhat.com>
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1710312145190.1942@nanos

parent 153fbd12
...@@ -21,6 +21,7 @@ ...@@ -21,6 +21,7 @@
static DEFINE_PER_CPU(bool, hard_watchdog_warn); static DEFINE_PER_CPU(bool, hard_watchdog_warn);
static DEFINE_PER_CPU(bool, watchdog_nmi_touch); static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
static DEFINE_PER_CPU(struct perf_event *, watchdog_ev); static DEFINE_PER_CPU(struct perf_event *, watchdog_ev);
static DEFINE_PER_CPU(struct perf_event *, dead_event);
static struct cpumask dead_events_mask; static struct cpumask dead_events_mask;
static unsigned long hardlockup_allcpu_dumped; static unsigned long hardlockup_allcpu_dumped;
...@@ -203,6 +204,8 @@ void hardlockup_detector_perf_disable(void) ...@@ -203,6 +204,8 @@ void hardlockup_detector_perf_disable(void)
if (event) { if (event) {
perf_event_disable(event); perf_event_disable(event);
this_cpu_write(watchdog_ev, NULL);
this_cpu_write(dead_event, event);
cpumask_set_cpu(smp_processor_id(), &dead_events_mask); cpumask_set_cpu(smp_processor_id(), &dead_events_mask);
watchdog_cpus--; watchdog_cpus--;
} }
...@@ -218,7 +221,7 @@ void hardlockup_detector_perf_cleanup(void) ...@@ -218,7 +221,7 @@ void hardlockup_detector_perf_cleanup(void)
int cpu; int cpu;
for_each_cpu(cpu, &dead_events_mask) { for_each_cpu(cpu, &dead_events_mask) {
struct perf_event *event = per_cpu(watchdog_ev, cpu); struct perf_event *event = per_cpu(dead_event, cpu);
/* /*
* Required because for_each_cpu() reports unconditionally * Required because for_each_cpu() reports unconditionally
...@@ -226,7 +229,7 @@ void hardlockup_detector_perf_cleanup(void) ...@@ -226,7 +229,7 @@ void hardlockup_detector_perf_cleanup(void)
*/ */
if (event) if (event)
perf_event_release_kernel(event); perf_event_release_kernel(event);
per_cpu(watchdog_ev, cpu) = NULL; per_cpu(dead_event, cpu) = NULL;
} }
cpumask_clear(&dead_events_mask); cpumask_clear(&dead_events_mask);
} }
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment