• Morten Rasmussen's avatar
    sched/fair: Add over-utilization/tipping point indicator · 2802bf3c
    Morten Rasmussen authored
    Energy-aware scheduling is only meant to be active while the system is
    _not_ over-utilized. That is, there are spare cycles available to shift
    tasks around based on their actual utilization to get a more
    energy-efficient task distribution without depriving any tasks. When
    above the tipping point task placement is done the traditional way based
    on load_avg, spreading the tasks across as many cpus as possible based
    on priority scaled load to preserve smp_nice. Below the tipping point we
    want to use util_avg instead. We need to define a criteria for when we
    make the switch.
    
    The util_avg for each cpu converges towards 100% regardless of how many
    additional tasks we may put on it. If we define over-utilized as:
    
    sum_{cpus}(rq.cfs.avg.util_avg) + margin > sum_{cpus}(rq.capacity)
    
    some individual cpus may be over-utilized running multiple tasks even
    when the above condition is false. That should be okay as long as we try
    to spread the tasks out to avoid per-cpu over-utilization as much as
    possible and if all tasks have the _same_ priority. If the latter isn't
    true, we have to consider priority to preserve smp_nice.
    
    For example, we could have n_cpus nice=-10 util_avg=55% tasks and
    n_cpus/2 nice=0 util_avg=60% tasks. Balancing based on util_avg we are
    likely to end up with nice=-10 tasks sharing cpus and nice=0 tasks
    getting their own as we 1.5*n_cpus tasks in total and 55%+55% is less
    over-utilized than 55%+60% for those cpus that have to be shared. The
    system utilization is only 85% of the system capacity, but we are
    breaking smp_nice.
    
    To be sure not to break smp_nice, we have defined over-utilization
    conservatively as when any cpu in the system is fully utilized at its
    highest frequency instead:
    
    cpu_rq(any).cfs.avg.util_avg + margin > cpu_rq(any).capacity
    
    IOW, as soon as one cpu is (nearly) 100% utilized, we switch to load_avg
    to factor in priority to preserve smp_nice.
    
    With this definition, we can skip periodic load-balance as no cpu has an
    always-running task when the system is not over-utilized. All tasks will
    be periodic and we can balance them at wake-up. This conservative
    condition does however mean that some scenarios that could benefit from
    energy-aware decisions even if one cpu is fully utilized would not get
    those benefits.
    
    For systems where some cpus might have reduced capacity on some cpus
    (RT-pressure and/or big.LITTLE), we want periodic load-balance checks as
    soon a just a single cpu is fully utilized as it might one of those with
    reduced capacity and in that case we want to migrate it.
    
    [ peterz: Added a comment explaining why new tasks are not accounted during
              overutilization detection. ]
    Signed-off-by: default avatarMorten Rasmussen <morten.rasmussen@arm.com>
    Signed-off-by: default avatarQuentin Perret <quentin.perret@arm.com>
    Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Mike Galbraith <efault@gmx.de>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: adharmap@codeaurora.org
    Cc: chris.redpath@arm.com
    Cc: currojerez@riseup.net
    Cc: dietmar.eggemann@arm.com
    Cc: edubezval@gmail.com
    Cc: gregkh@linuxfoundation.org
    Cc: javi.merino@kernel.org
    Cc: joel@joelfernandes.org
    Cc: juri.lelli@redhat.com
    Cc: patrick.bellasi@arm.com
    Cc: pkondeti@codeaurora.org
    Cc: rjw@rjwysocki.net
    Cc: skannan@codeaurora.org
    Cc: smuckle@google.com
    Cc: srinivas.pandruvada@linux.intel.com
    Cc: thara.gopinath@linaro.org
    Cc: tkjos@google.com
    Cc: valentin.schneider@arm.com
    Cc: vincent.guittot@linaro.org
    Cc: viresh.kumar@linaro.org
    Link: https://lkml.kernel.org/r/20181203095628.11858-13-quentin.perret@arm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
    2802bf3c
sched.h 59.7 KB