• Jason Wessel's avatar
    softlockup: fix NMI hangs due to lock race - 2.6.26-rc regression · 9c106c11
    Jason Wessel authored
    The touch_nmi_watchdog() routine on x86 ultimately calls
    touch_softlockup_watchdog().  The problem is that to touch the
    softlockup watchdog, the cpu_clock code has to be called which could
    involve multiple cpu locks and can lead to a hard hang if one of the
    locks is held by a processor that is not going to return anytime soon
    (such as could be the case with kgdb or perhaps even with some other
    kind of exception).
    
    This patch causes the public version of the
    touch_softlockup_watchdog() to defer the cpu clock access to a later
    point.
    
    The test case for this problem is to use the following kernel config
    options:
    
    CONFIG_KGDB_TESTS=y
    CONFIG_KGDB_TESTS_ON_BOOT=y
    CONFIG_KGDB_TESTS_BOOT_STRING="V1F100I100000"
    
    It should be noted that kgdb test suite and these options were not
    available until 2.6.26-rc2, so it was necessary to patch the kgdb
    test suite during the bisection.
    
    I would consider this patch a regression fix because the problem first
    appeared in commit 27ec4407 when some
    logic was added to try to periodically sync the clocks.  It was
    possible to work around this particular problem by simply not
    performing the sync anytime the system was in a critical context.
    This was ok until commit 3e51f33f,
    which added config option CONFIG_HAVE_UNSTABLE_SCHED_CLOCK and some
    multi-cpu locks to sync the clocks.  It became clear that accessing
    this code from an nmi was the source of the lockups.  Avoiding the
    access to the low level clock code from an code inside the NMI
    processing also fixed the problem with the 27ec44... commit.
    Signed-off-by: default avatarJason Wessel <jason.wessel@windriver.com>
    Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
    9c106c11
softlockup.c 7.8 KB