• Vincent Guittot's avatar
    sched/fair: Improve spreading of utilization · c32b4308
    Vincent Guittot authored
    During load_balancing, a group with spare capacity will try to pull some
    utilizations from an overloaded group. In such case, the load balance
    looks for the runqueue with the highest utilization. Nevertheless, it
    should also ensure that there are some pending tasks to pull otherwise
    the load balance will fail to pull a task and the spread of the load will
    be delayed.
    
    This situation is quite transient but it's possible to highlight the
    effect with a short run of sysbench test so the time to spread task impacts
    the global result significantly.
    
    Below are the average results for 15 iterations on an arm64 octo core:
    sysbench --test=cpu --num-threads=8  --max-requests=1000 run
    
                               tip/sched/core  +patchset
    total time:                172ms           158ms
    per-request statistics:
             avg:                1.337ms         1.244ms
             max:               21.191ms        10.753ms
    
    The average max doesn't fully reflect the wide spread of the value which
    ranges from 1.350ms to more than 41ms for the tip/sched/core and from
    1.350ms to 21ms with the patch.
    
    Other factors like waiting for an idle load balance or cache hotness
    can delay the spreading of the tasks which explains why we can still
    have up to 21ms with the patch.
    Signed-off-by: default avatarVincent Guittot <vincent.guittot@linaro.org>
    Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lkml.kernel.org/r/20200312165429.990-1-vincent.guittot@linaro.org
    c32b4308
fair.c 295 KB