• Valentin Schneider's avatar
    sched/topology: Assert non-NUMA topology masks don't (partially) overlap · ccf74128
    Valentin Schneider authored
    topology.c::get_group() relies on the assumption that non-NUMA domains do
    not partially overlap. Zeng Tao pointed out in [1] that such topology
    descriptions, while completely bogus, can end up being exposed to the
    scheduler.
    
    In his example (8 CPUs, 2-node system), we end up with:
      MC span for CPU3 == 3-7
      MC span for CPU4 == 4-7
    
    The first pass through get_group(3, sdd@MC) will result in the following
    sched_group list:
    
      3 -> 4 -> 5 -> 6 -> 7
      ^                  /
       `----------------'
    
    And a later pass through get_group(4, sdd@MC) will "corrupt" that to:
    
      3 -> 4 -> 5 -> 6 -> 7
           ^             /
    	`-----------'
    
    which will completely break things like 'while (sg != sd->groups)' when
    using CPU3's base sched_domain.
    
    There already are some architecture-specific checks in place such as
    x86/kernel/smpboot.c::topology.sane(), but this is something we can detect
    in the core scheduler, so it seems worthwhile to do so.
    
    Warn and abort the construction of the sched domains if such a broken
    topology description is detected. Note that this is somewhat
    expensive (O(t.c²), 't' non-NUMA topology levels and 'c' CPUs) and could be
    gated under SCHED_DEBUG if deemed necessary.
    
    Testing
    =======
    
    Dietmar managed to reproduce this using the following qemu incantation:
    
      $ qemu-system-aarch64 -kernel ./Image -hda ./qemu-image-aarch64.img \
      -append 'root=/dev/vda console=ttyAMA0 loglevel=8 sched_debug' -smp \
      cores=8 --nographic -m 512 -cpu cortex-a53 -machine virt -numa \
      node,cpus=0-2,nodeid=0 -numa node,cpus=3-7,nodeid=1
    
    alongside the following drivers/base/arch_topology.c hack (AIUI wouldn't be
    needed if '-smp cores=X, sockets=Y' would work with qemu):
    
    8<---
    @@ -465,6 +465,9 @@ void update_siblings_masks(unsigned int cpuid)
     		if (cpuid_topo->package_id != cpu_topo->package_id)
     			continue;
    
    +		if ((cpu < 4 && cpuid > 3) || (cpu > 3 && cpuid < 4))
    +			continue;
    +
     		cpumask_set_cpu(cpuid, &cpu_topo->core_sibling);
     		cpumask_set_cpu(cpu, &cpuid_topo->core_sibling);
    
    8<---
    
    [1]: https://lkml.kernel.org/r/1577088979-8545-1-git-send-email-prime.zeng@hisilicon.comReported-by: default avatarZeng Tao <prime.zeng@hisilicon.com>
    Signed-off-by: default avatarValentin Schneider <valentin.schneider@arm.com>
    Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lkml.kernel.org/r/20200115160915.22575-1-valentin.schneider@arm.com
    ccf74128
topology.c 57.9 KB