• Michael Ellerman's avatar
    powerpc/stacktrace: Fix spurious "stale" traces in raise_backtrace_ipi() · 7c6986ad
    Michael Ellerman authored
    In raise_backtrace_ipi() we iterate through the cpumask of CPUs, sending
    each an IPI asking them to do a backtrace, but we don't wait for the
    backtrace to happen.
    
    We then iterate through the CPU mask again, and if any CPU hasn't done
    the backtrace and cleared itself from the mask, we print a trace on its
    behalf, noting that the trace may be "stale".
    
    This works well enough when a CPU is not responding, because in that
    case it doesn't receive the IPI and the sending CPU is left to print the
    trace. But when all CPUs are responding we are left with a race between
    the sending and receiving CPUs, if the sending CPU wins the race then it
    will erroneously print a trace.
    
    This leads to spurious "stale" traces from the sending CPU, which can
    then be interleaved messily with the receiving CPU, note the CPU
    numbers, eg:
    
      [ 1658.929157][    C7] rcu: Stack dump where RCU GP kthread last ran:
      [ 1658.929223][    C7] Sending NMI from CPU 7 to CPUs 1:
      [ 1658.929303][    C1] NMI backtrace for cpu 1
      [ 1658.929303][    C7] CPU 1 didn't respond to backtrace IPI, inspecting paca.
      [ 1658.929362][    C1] CPU: 1 PID: 325 Comm: kworker/1:1H Tainted: G        W   E     5.13.0-rc2+ #46
      [ 1658.929405][    C7] irq_soft_mask: 0x01 in_mce: 0 in_nmi: 0 current: 325 (kworker/1:1H)
      [ 1658.929465][    C1] Workqueue: events_highpri test_work_fn [test_lockup]
      [ 1658.929549][    C7] Back trace of paca->saved_r1 (0xc0000000057fb400) (possibly stale):
      [ 1658.929592][    C1] NIP:  c00000000002cf50 LR: c008000000820178 CTR: c00000000002cfa0
    
    To fix it, change the logic so that the sending CPU waits 5s for the
    receiving CPU to print its trace. If the receiving CPU prints its trace
    successfully then the sending CPU just continues, avoiding any spurious
    "stale" trace.
    
    This has the added benefit of allowing all CPUs to print their traces in
    order and avoids any interleaving of their output.
    
    Fixes: 5cc05910 ("powerpc/64s: Wire up arch_trigger_cpumask_backtrace()")
    Cc: stable@vger.kernel.org # v4.18+
    Reported-by: default avatarNathan Lynch <nathanl@linux.ibm.com>
    Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/20210625140408.3351173-1-mpe@ellerman.id.au
    7c6986ad
stacktrace.c 5.66 KB