Commit 5097cbcb authored by Oleg Nesterov's avatar Oleg Nesterov Committed by Ingo Molnar

sched/isolation: Prevent boot crash when the boot CPU is nohz_full

Documentation/timers/no_hz.rst states that the "nohz_full=" mask must not
include the boot CPU, which is no longer true after:

  08ae95f4 ("nohz_full: Allow the boot CPU to be nohz_full").

However after:

  aae17ebb ("workqueue: Avoid using isolated cpus' timers on queue_delayed_work")

the kernel will crash at boot time in this case; housekeeping_any_cpu()
returns an invalid CPU number until smp_init() brings the first
housekeeping CPU up.

Change housekeeping_any_cpu() to check the result of cpumask_any_and() and
return smp_processor_id() in this case.

This is just the simple and backportable workaround which fixes the
symptom, but smp_processor_id() at boot time should be safe at least for
type == HK_TYPE_TIMER, this more or less matches the tick_do_timer_boot_cpu
logic.

There is no worry about cpu_down(); tick_nohz_cpu_down() will not allow to
offline tick_do_timer_cpu (the 1st online housekeeping CPU).

Fixes: aae17ebb ("workqueue: Avoid using isolated cpus' timers on queue_delayed_work")
Reported-by: default avatarChris von Recklinghausen <crecklin@redhat.com>
Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
Reviewed-by: default avatarPhil Auld <pauld@redhat.com>
Acked-by: default avatarFrederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20240411143905.GA19288@redhat.com
Closes: https://lore.kernel.org/all/20240402105847.GA24832@redhat.com/
parent 1560d1f6
...@@ -129,11 +129,8 @@ adaptive-tick CPUs: At least one non-adaptive-tick CPU must remain ...@@ -129,11 +129,8 @@ adaptive-tick CPUs: At least one non-adaptive-tick CPU must remain
online to handle timekeeping tasks in order to ensure that system online to handle timekeeping tasks in order to ensure that system
calls like gettimeofday() returns accurate values on adaptive-tick CPUs. calls like gettimeofday() returns accurate values on adaptive-tick CPUs.
(This is not an issue for CONFIG_NO_HZ_IDLE=y because there are no running (This is not an issue for CONFIG_NO_HZ_IDLE=y because there are no running
user processes to observe slight drifts in clock rate.) Therefore, the user processes to observe slight drifts in clock rate.) Note that this
boot CPU is prohibited from entering adaptive-ticks mode. Specifying a means that your system must have at least two CPUs in order for
"nohz_full=" mask that includes the boot CPU will result in a boot-time
error message, and the boot CPU will be removed from the mask. Note that
this means that your system must have at least two CPUs in order for
CONFIG_NO_HZ_FULL=y to do anything for you. CONFIG_NO_HZ_FULL=y to do anything for you.
Finally, adaptive-ticks CPUs must have their RCU callbacks offloaded. Finally, adaptive-ticks CPUs must have their RCU callbacks offloaded.
......
...@@ -46,7 +46,16 @@ int housekeeping_any_cpu(enum hk_type type) ...@@ -46,7 +46,16 @@ int housekeeping_any_cpu(enum hk_type type)
if (cpu < nr_cpu_ids) if (cpu < nr_cpu_ids)
return cpu; return cpu;
return cpumask_any_and(housekeeping.cpumasks[type], cpu_online_mask); cpu = cpumask_any_and(housekeeping.cpumasks[type], cpu_online_mask);
if (likely(cpu < nr_cpu_ids))
return cpu;
/*
* Unless we have another problem this can only happen
* at boot time before start_secondary() brings the 1st
* housekeeping CPU up.
*/
WARN_ON_ONCE(system_state == SYSTEM_RUNNING ||
type != HK_TYPE_TIMER);
} }
} }
return smp_processor_id(); return smp_processor_id();
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment