• Matt Fleming's avatar
    sched/loadavg: Avoid loadavg spikes caused by delayed NO_HZ accounting · 13d541b0
    Matt Fleming authored
    commit 6e5f32f7 upstream.
    
    If we crossed a sample window while in NO_HZ we will add LOAD_FREQ to
    the pending sample window time on exit, setting the next update not
    one window into the future, but two.
    
    This situation on exiting NO_HZ is described by:
    
      this_rq->calc_load_update < jiffies < calc_load_update
    
    In this scenario, what we should be doing is:
    
      this_rq->calc_load_update = calc_load_update		     [ next window ]
    
    But what we actually do is:
    
      this_rq->calc_load_update = calc_load_update + LOAD_FREQ   [ next+1 window ]
    
    This has the effect of delaying load average updates for potentially
    up to ~9seconds.
    
    This can result in huge spikes in the load average values due to
    per-cpu uninterruptible task counts being out of sync when accumulated
    across all CPUs.
    
    It's safe to update the per-cpu active count if we wake between sample
    windows because any load that we left in 'calc_load_idle' will have
    been zero'd when the idle load was folded in calc_global_load().
    
    This issue is easy to reproduce before,
    
      commit 9d89c257 ("sched/fair: Rewrite runnable load and utilization average tracking")
    
    just by forking short-lived process pipelines built from ps(1) and
    grep(1) in a loop. I'm unable to reproduce the spikes after that
    commit, but the bug still seems to be present from code review.
    Signed-off-by: default avatarMatt Fleming <matt@codeblueprint.co.uk>
    Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
    Cc: Frederic Weisbecker <fweisbec@gmail.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Mike Galbraith <efault@gmx.de>
    Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
    Cc: Morten Rasmussen <morten.rasmussen@arm.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Vincent Guittot <vincent.guittot@linaro.org>
    Fixes: commit 5167e8d5 ("sched/nohz: Rewrite and fix load-avg computation -- again")
    Link: http://lkml.kernel.org/r/20170217120731.11868-2-matt@codeblueprint.co.ukSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
    [bwh: Backported to 3.2: adjust filename]
    Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
    13d541b0
sched.c 242 KB