• Michael Ellerman's avatar
    powerpc/smp: Wait until secondaries are active & online · 875ebe94
    Michael Ellerman authored
    Anton has a busy ppc64le KVM box where guests sometimes hit the infamous
    "kernel BUG at kernel/smpboot.c:134!" issue during boot:
    
      BUG_ON(td->cpu != smp_processor_id());
    
    Basically a per CPU hotplug thread scheduled on the wrong CPU. The oops
    output confirms it:
    
      CPU: 0
      Comm: watchdog/130
    
    The problem is that we aren't ensuring the CPU active bit is set for the
    secondary before allowing the master to continue on. The master unparks
    the secondary CPU's kthreads and the scheduler looks for a CPU to run
    on. It calls select_task_rq() and realises the suggested CPU is not in
    the cpus_allowed mask. It then ends up in select_fallback_rq(), and
    since the active bit isnt't set we choose some other CPU to run on.
    
    This seems to have been introduced by 6acbfb96 "sched: Fix hotplug
    vs. set_cpus_allowed_ptr()", which changed from setting active before
    online to setting active after online. However that was in turn fixing a
    bug where other code assumed an active CPU was also online, so we can't
    just revert that fix.
    
    The simplest fix is just to spin waiting for both active & online to be
    set. We already have a barrier prior to set_cpu_online() (which also
    sets active), to ensure all other setup is completed before online &
    active are set.
    
    Fixes: 6acbfb96 ("sched: Fix hotplug vs. set_cpus_allowed_ptr()")
    Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
    Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
    Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
    875ebe94
smp.c 18.2 KB