• Tom Hromatka's avatar
    sysrq: Reset the watchdog timers while displaying high-resolution timers · 75e67441
    Tom Hromatka authored
    
    [ Upstream commit 01070427 ]
    
    On systems with a large number of CPUs, running sysrq-<q> can cause
    watchdog timeouts.  There are two slow sections of code in the sysrq-<q>
    path in timer_list.c.
    
    1. print_active_timers() - This function is called by print_cpu() and
       contains a slow goto loop.  On a machine with hundreds of CPUs, this
       loop took approximately 100ms for the first CPU in a NUMA node.
       (Subsequent CPUs in the same node ran much quicker.)  The total time
       to print all of the CPUs is ultimately long enough to trigger the
       soft lockup watchdog.
    
    2. print_tickdevice() - This function outputs a large amount of textual
       information.  This function also took approximately 100ms per CPU.
    
    Since sysrq-<q> is not a performance critical path, there should be no
    harm in touching the nmi watchdog in both slow sections above.  Touching
    it in just one location was insufficient on systems with hundreds of
    CPUs as occasional timeouts were still observed during testing.
    
    This issue was observed on an Oracle T7 machine with 128 CPUs, but I
    anticipate it may affect other systems with similarly large numbers of
    CPUs.
    Signed-off-by: default avatarTom Hromatka <tom.hromatka@oracle.com>
    Reviewed-by: default avatarRob Gardner <rob.gardner@oracle.com>
    Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
    Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    75e67441
timer_list.c 9.71 KB