• Paul E. McKenney's avatar
    x86/nmi: Upgrade NMI backtrace stall checks & messages · 3186b618
    Paul E. McKenney authored
    The commit to improve NMI stall debuggability:
    
      344da544 ("x86/nmi: Print reasons why backtrace NMIs are ignored")
    
    ... has shown value, but widespread use has also identified a few
    opportunities for improvement.
    
    The systems have (as usual) shown far more creativity than that commit's
    author, demonstrating yet again that failing CPUs can do whatever they want.
    
    In addition, the current message format is less friendly than one might
    like to those attempting to use these messages to identify failing CPUs.
    
    Therefore, separately flag CPUs that, during the full time that the
    stack-backtrace request was waiting, were always in an NMI handler,
    were never in an NMI handler, or exited one NMI handler.
    
    Also, split the message identifying the CPU and the time since that CPU's
    last NMI-related activity so that a single line identifies the CPU without
    any other variable information, greatly reducing the processing overhead
    required to identify repeat-offender CPUs.
    Co-developed-by: default avatarBreno Leitao <leitao@debian.org>
    Signed-off-by: default avatarBreno Leitao <leitao@debian.org>
    Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
    Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Link: https://lore.kernel.org/r/ab4d70c8-c874-42dc-b206-643018922393@paulmck-laptop
    3186b618
nmi.c 19.9 KB