• Frederic Weisbecker's avatar
    timer/migration: Fix quick check reporting late expiry · 8ca18367
    Frederic Weisbecker authored
    When a CPU is the last active in the hierarchy and it tries to enter
    into idle, the quick check looking up the next event towards cpuidle
    heuristics may report a too late expiry, such as in the following
    scenario:
    
                            [GRP1:0]
                         migrator = NONE
                         active   = NONE
                         nextevt  = T0:0, T0:1
                         /              \
              [GRP0:0]                  [GRP0:1]
           migrator = NONE           migrator = NONE
           active   = NONE           active   = NONE
           nextevt  = T0, T1         nextevt  = T2
           /         \                /         \
          0           1              2           3
        idle       idle           idle         idle
    
    0) The whole system is idle, and CPU 0 was the last migrator. CPU 0 has
    a timer (T0), CPU 1 has a timer (T1) and CPU 2 has a timer (T2). The
    expire order is T0 < T1 < T2.
    
                            [GRP1:0]
                         migrator = GRP0:0
                         active   = GRP0:0
                         nextevt  = T0:0(i), T0:1
                       /              \
              [GRP0:0]                  [GRP0:1]
           migrator = CPU0           migrator = NONE
           active   = CPU0           active   = NONE
           nextevt  = T0(i), T1      nextevt  = T2
           /         \                /         \
          0           1              2           3
        active       idle           idle         idle
    
    1) CPU 0 becomes active. The (i) means a now ignored timer.
    
                            [GRP1:0]
                         migrator = GRP0:0
                         active   = GRP0:0
                         nextevt  = T0:1
                         /              \
              [GRP0:0]                  [GRP0:1]
           migrator = CPU0           migrator = NONE
           active   = CPU0           active   = NONE
           nextevt  = T1             nextevt  = T2
           /         \                /         \
          0           1              2           3
        active       idle           idle         idle
    
    2) CPU 0 handles remote. No timer actually expired but ignored timers
       have been cleaned out and their sibling's timers haven't been
       propagated. As a result the top level's next event is T2 and not T1.
    
    3) CPU 0 tries to enter idle without any global timer enqueued and calls
       tmigr_quick_check(). The expiry of T2 is returned instead of the
       expiry of T1.
    
    When the quick check returns an expiry that is too late, the cpuidle
    governor may pick up a C-state that is too deep. This may be result into
    undesired CPU wake up latency if the next timer is actually close enough.
    
    Fix this with assuming that expiries aren't sorted top-down while
    performing the quick check. Pick up instead the earliest encountered one
    while walking up the hierarchy.
    
    7ee98877 ("timers: Implement the hierarchical pull model")
    Signed-off-by: default avatarFrederic Weisbecker <frederic@kernel.org>
    Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
    Link: https://lore.kernel.org/r/20240305002822.18130-1-frederic@kernel.org
    8ca18367
timer_migration.c 55.5 KB