• Frederic Weisbecker's avatar
    timers/nohz: Add a comment about broken iowait counter update race · ead70b75
    Frederic Weisbecker authored
    The per-cpu iowait task counter is incremented locally upon sleeping.
    But since the task can be woken to (and by) another CPU, the counter may
    then be decremented remotely. This is the source of a race involving
    readers VS writer of idle/iowait sleeptime.
    
    The following scenario shows an example where a /proc/stat reader
    observes a pending sleep time as IO whereas that pending sleep time
    later eventually gets accounted as non-IO.
    
        CPU 0                       CPU  1                    CPU 2
        -----                       -----                     ------
        //io_schedule() TASK A
        current->in_iowait = 1
        rq(0)->nr_iowait++
        //switch to idle
                            // READ /proc/stat
                            // See nr_iowait_cpu(0) == 1
                            return ts->iowait_sleeptime +
                                   ktime_sub(ktime_get(), ts->idle_entrytime)
    
                                                              //try_to_wake_up(TASK A)
                                                              rq(0)->nr_iowait--
        //idle exit
        // See nr_iowait_cpu(0) == 0
        ts->idle_sleeptime += ktime_sub(ktime_get(), ts->idle_entrytime)
    
    As a result subsequent reads on /proc/stat may expose backward progress.
    
    This is unfortunately hardly fixable. Just add a comment about that
    condition.
    Signed-off-by: default avatarFrederic Weisbecker <frederic@kernel.org>
    Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
    Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lore.kernel.org/r/20230222144649.624380-5-frederic@kernel.org
    ead70b75
tick-sched.c 39.9 KB