Commit e3fe70b1 authored by Rik van Riel's avatar Rik van Riel Committed by Ingo Molnar

sched/numa: Classify the NUMA topology of a system

Smaller NUMA systems tend to have all NUMA nodes directly connected
to each other. This includes the degenerate case of a system with just
one node, ie. a non-NUMA system.

Larger systems can have two kinds of NUMA topology, which affects how
tasks and memory should be placed on the system.

On glueless mesh systems, nodes that are not directly connected to
each other will bounce traffic through intermediary nodes. Task groups
can be run closer to each other by moving tasks from a node to an
intermediary node between it and the task's preferred node.

On NUMA systems with backplane controllers, the intermediary hops
are incapable of running programs. This creates "islands" of nodes
that are at an equal distance to anywhere else in the system.

Each kind of topology requires a slightly different placement
algorithm; this patch provides the mechanism to detect the kind
of NUMA topology of a system.
Signed-off-by: default avatarRik van Riel <riel@redhat.com>
Tested-by: default avatarChegu Vinod <chegu_vinod@hp.com>
[ Changed to use kernel/sched/sched.h ]
Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: mgorman@suse.de
Cc: chegu_vinod@hp.com
Link: http://lkml.kernel.org/r/1413530994-9732-3-git-send-email-riel@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
parent 9942f79b
......@@ -6128,6 +6128,7 @@ static void claim_allocations(int cpu, struct sched_domain *sd)
#ifdef CONFIG_NUMA
static int sched_domains_numa_levels;
enum numa_topology_type sched_numa_topology_type;
static int *sched_domains_numa_distance;
int sched_max_numa_distance;
static struct cpumask ***sched_domains_numa_masks;
......@@ -6316,6 +6317,56 @@ bool find_numa_distance(int distance)
return false;
}
/*
* A system can have three types of NUMA topology:
* NUMA_DIRECT: all nodes are directly connected, or not a NUMA system
* NUMA_GLUELESS_MESH: some nodes reachable through intermediary nodes
* NUMA_BACKPLANE: nodes can reach other nodes through a backplane
*
* The difference between a glueless mesh topology and a backplane
* topology lies in whether communication between not directly
* connected nodes goes through intermediary nodes (where programs
* could run), or through backplane controllers. This affects
* placement of programs.
*
* The type of topology can be discerned with the following tests:
* - If the maximum distance between any nodes is 1 hop, the system
* is directly connected.
* - If for two nodes A and B, located N > 1 hops away from each other,
* there is an intermediary node C, which is < N hops away from both
* nodes A and B, the system is a glueless mesh.
*/
static void init_numa_topology_type(void)
{
int a, b, c, n;
n = sched_max_numa_distance;
if (n <= 1)
sched_numa_topology_type = NUMA_DIRECT;
for_each_online_node(a) {
for_each_online_node(b) {
/* Find two nodes furthest removed from each other. */
if (node_distance(a, b) < n)
continue;
/* Is there an intermediary node between a and b? */
for_each_online_node(c) {
if (node_distance(a, c) < n &&
node_distance(b, c) < n) {
sched_numa_topology_type =
NUMA_GLUELESS_MESH;
return;
}
}
sched_numa_topology_type = NUMA_BACKPLANE;
return;
}
}
}
static void sched_init_numa(void)
{
int next_distance, curr_distance = node_distance(0, 0);
......@@ -6449,6 +6500,8 @@ static void sched_init_numa(void)
sched_domains_numa_levels = level;
sched_max_numa_distance = sched_domains_numa_distance[level - 1];
init_numa_topology_type();
}
static void sched_domains_numa_masks_set(int cpu)
......
......@@ -679,6 +679,12 @@ static inline u64 rq_clock_task(struct rq *rq)
}
#ifdef CONFIG_NUMA
enum numa_topology_type {
NUMA_DIRECT,
NUMA_GLUELESS_MESH,
NUMA_BACKPLANE,
};
extern enum numa_topology_type sched_numa_topology_type;
extern int sched_max_numa_distance;
extern bool find_numa_distance(int distance);
#endif
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment