• Vincent Guittot's avatar
    sched/fair: Rework load_balance() · 0b0695f2
    Vincent Guittot authored
    The load_balance() algorithm contains some heuristics which have become
    meaningless since the rework of the scheduler's metrics like the
    introduction of PELT.
    
    Furthermore, load is an ill-suited metric for solving certain task
    placement imbalance scenarios.
    
    For instance, in the presence of idle CPUs, we should simply try to get at
    least one task per CPU, whereas the current load-based algorithm can actually
    leave idle CPUs alone simply because the load is somewhat balanced.
    
    The current algorithm ends up creating virtual and meaningless values like
    the avg_load_per_task or tweaks the state of a group to make it overloaded
    whereas it's not, in order to try to migrate tasks.
    
    load_balance() should better qualify the imbalance of the group and clearly
    define what has to be moved to fix this imbalance.
    
    The type of sched_group has been extended to better reflect the type of
    imbalance. We now have:
    
    	group_has_spare
    	group_fully_busy
    	group_misfit_task
    	group_asym_packing
    	group_imbalanced
    	group_overloaded
    
    Based on the type of sched_group, load_balance now sets what it wants to
    move in order to fix the imbalance. It can be some load as before but also
    some utilization, a number of task or a type of task:
    
    	migrate_task
    	migrate_util
    	migrate_load
    	migrate_misfit
    
    This new load_balance() algorithm fixes several pending wrong tasks
    placement:
    
     - the 1 task per CPU case with asymmetric system
     - the case of cfs task preempted by other class
     - the case of tasks not evenly spread on groups with spare capacity
    
    Also the load balance decisions have been consolidated in the 3 functions
    below after removing the few bypasses and hacks of the current code:
    
     - update_sd_pick_busiest() select the busiest sched_group.
     - find_busiest_group() checks if there is an imbalance between local and
       busiest group.
     - calculate_imbalance() decides what have to be moved.
    
    Finally, the now unused field total_running of struct sd_lb_stats has been
    removed.
    Signed-off-by: default avatarVincent Guittot <vincent.guittot@linaro.org>
    Cc: Ben Segall <bsegall@google.com>
    Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
    Cc: Juri Lelli <juri.lelli@redhat.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Mel Gorman <mgorman@suse.de>
    Cc: Mike Galbraith <efault@gmx.de>
    Cc: Morten.Rasmussen@arm.com
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Steven Rostedt <rostedt@goodmis.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: hdanton@sina.com
    Cc: parth@linux.ibm.com
    Cc: pauld@redhat.com
    Cc: quentin.perret@arm.com
    Cc: riel@surriel.com
    Cc: srikar@linux.vnet.ibm.com
    Cc: valentin.schneider@arm.com
    Link: https://lkml.kernel.org/r/1571405198-27570-5-git-send-email-vincent.guittot@linaro.org
    [ Small readability and spelling updates. ]
    Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
    0b0695f2
fair.c 280 KB