Commit 68d1b02a authored by Rik van Riel's avatar Rik van Riel Committed by Ingo Molnar

sched/numa: Do not set preferred_node on migration to a second choice node

Setting the numa_preferred_node for a task in task_numa_migrate
does nothing on a 2-node system. Either we migrate to the node
that already was our preferred node, or we stay where we were.

On a 4-node system, it can slightly decrease overhead, by not
calling the NUMA code as much. Since every node tends to be
directly connected to every other node, running on the wrong
node for a while does not do much damage.

However, on an 8 node system, there are far more bad nodes
than there are good ones, and pretending that a second choice
is actually the preferred node can greatly delay, or even
prevent, a workload from converging.

The only time we can safely pretend that a second choice
node is the preferred node is when the task is part of a
workload that spans multiple NUMA nodes.
Signed-off-by: default avatarRik van Riel <riel@redhat.com>
Tested-by: default avatarVinod Chegu <chegu_vinod@hp.com>
Acked-by: default avatarMel Gorman <mgorman@suse.de>
Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1397235629-16328-4-git-send-email-riel@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
parent 5085e2a3
...@@ -1301,7 +1301,16 @@ static int task_numa_migrate(struct task_struct *p) ...@@ -1301,7 +1301,16 @@ static int task_numa_migrate(struct task_struct *p)
if (env.best_cpu == -1) if (env.best_cpu == -1)
return -EAGAIN; return -EAGAIN;
sched_setnuma(p, env.dst_nid); /*
* If the task is part of a workload that spans multiple NUMA nodes,
* and is migrating into one of the workload's active nodes, remember
* this node as the task's preferred numa node, so the workload can
* settle down.
* A task that migrated to a second choice node will be better off
* trying for a better one later. Do not set the preferred node here.
*/
if (p->numa_group && node_isset(env.dst_nid, p->numa_group->active_nodes))
sched_setnuma(p, env.dst_nid);
/* /*
* Reset the scan period if the task is being rescheduled on an * Reset the scan period if the task is being rescheduled on an
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment