• Rik van Riel's avatar
    sched/numa: Examine a task move when examining a task swap · 0132c3e1
    Rik van Riel authored
    
    
    Running "perf bench numa mem -0 -m -P 1000 -p 8 -t 20" on a 4
    node system results in 160 runnable threads on a system with 80
    CPU threads.
    
    Once a process has nearly converged, with 39 threads on one node
    and 1 thread on another node, the remaining thread will be unable
    to migrate to its preferred node through a task swap.
    
    However, a simple task move would make the workload converge,
    witout causing an imbalance.
    
    Test for this unlikely occurrence, and attempt a task move to
    the preferred nid when it happens.
    
     # Running main, "perf bench numa mem -p 8 -t 20 -0 -m -P 1000"
    
     ###
     # 160 tasks will execute (on 4 nodes, 80 CPUs):
     #         -1x     0MB global  shared mem operations
     #         -1x  1000MB process shared mem operations
     #         -1x     0MB thread  local  mem operations
     ###
    
     ###
     #
     #    0.0%  [0.2 mins]  0/0   1/1  36/2   0/0  [36/3 ] l:  0-0   (  0) {0-2}
     #    0.0%  [0.3 mins] 43/3  37/2  39/2  41/3  [ 6/10] l:  0-1   (  1) {1-2}
     #    0.0%  [0.4 mins] 42/3  38/2  40/2  40/2  [ 4/9 ] l:  1-2   (  1) [50.0%] {1-2}
     #    0.0%  [0.6 mins] 41/3  39/2  40/2  40/2  [ 2/9 ] l:  2-4   (  2) [50.0%] {1-2}
     #    0.0%  [0.7 mins] 40/2  40/2  40/2  40/2  [ 0/8 ] l:  3-5   (  2) [40.0%] (  41.8s converged)
    
    Without this patch, this same perf bench numa mem run had to
    rely on the scheduler load balancer to first balance out the
    load (moving a random task), before a task swap could complete
    the NUMA convergence.
    
    The load balancer does not normally take action unless the load
    
    difference exceeds 25%. Convergence times of over half an hour
    have been observed without this patch.
    
    With this patch, the NUMA balancing code will simply migrate the
    task, if that does not cause an imbalance.
    
    Also skip examining a CPU in detail if the improvement on that CPU
    is no more than the best we already have.
    Signed-off-by: default avatarRik van Riel <riel@redhat.com>
    Cc: chegu_vinod@hp.com
    Cc: mgorman@suse.de
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
    Link: http://lkml.kernel.org/n/tip-ggthh0rnh0yua6o5o3p6cr1o@git.kernel.org
    
    Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
    0132c3e1
fair.c 203 KB