• Valentin Schneider's avatar
    sched/topology: Don't try to build empty sched domains · cd1cb335
    Valentin Schneider authored
    Turns out hotplugging CPUs that are in exclusive cpusets can lead to the
    cpuset code feeding empty cpumasks to the sched domain rebuild machinery.
    
    This leads to the following splat:
    
        Internal error: Oops: 96000004 [#1] PREEMPT SMP
        Modules linked in:
        CPU: 0 PID: 235 Comm: kworker/5:2 Not tainted 5.4.0-rc1-00005-g8d495477 #23
        Hardware name: ARM Juno development board (r0) (DT)
        Workqueue: events cpuset_hotplug_workfn
        pstate: 60000005 (nZCv daif -PAN -UAO)
        pc : build_sched_domains (./include/linux/arch_topology.h:23 kernel/sched/topology.c:1898 kernel/sched/topology.c:1969)
        lr : build_sched_domains (kernel/sched/topology.c:1966)
        Call trace:
        build_sched_domains (./include/linux/arch_topology.h:23 kernel/sched/topology.c:1898 kernel/sched/topology.c:1969)
        partition_sched_domains_locked (kernel/sched/topology.c:2250)
        rebuild_sched_domains_locked (./include/linux/bitmap.h:370 ./include/linux/cpumask.h:538 kernel/cgroup/cpuset.c:955 kernel/cgroup/cpuset.c:978 kernel/cgroup/cpuset.c:1019)
        rebuild_sched_domains (kernel/cgroup/cpuset.c:1032)
        cpuset_hotplug_workfn (kernel/cgroup/cpuset.c:3205 (discriminator 2))
        process_one_work (./arch/arm64/include/asm/jump_label.h:21 ./include/linux/jump_label.h:200 ./include/trace/events/workqueue.h:114 kernel/workqueue.c:2274)
        worker_thread (./include/linux/compiler.h:199 ./include/linux/list.h:268 kernel/workqueue.c:2416)
        kthread (kernel/kthread.c:255)
        ret_from_fork (arch/arm64/kernel/entry.S:1167)
        Code: f860dae2 912802d6 aa1603e1 12800000 (f8616853)
    
    The faulty line in question is:
    
      cap = arch_scale_cpu_capacity(cpumask_first(cpu_map));
    
    and we're not checking the return value against nr_cpu_ids (we shouldn't
    have to!), which leads to the above.
    
    Prevent generate_sched_domains() from returning empty cpumasks, and add
    some assertion in build_sched_domains() to scream bloody murder if it
    happens again.
    
    The above splat was obtained on my Juno r0 with the following reproducer:
    
      $ cgcreate -g cpuset:asym
      $ cgset -r cpuset.cpus=0-3 asym
      $ cgset -r cpuset.mems=0 asym
      $ cgset -r cpuset.cpu_exclusive=1 asym
    
      $ cgcreate -g cpuset:smp
      $ cgset -r cpuset.cpus=4-5 smp
      $ cgset -r cpuset.mems=0 smp
      $ cgset -r cpuset.cpu_exclusive=1 smp
    
      $ cgset -r cpuset.sched_load_balance=0 .
    
      $ echo 0 > /sys/devices/system/cpu/cpu4/online
      $ echo 0 > /sys/devices/system/cpu/cpu5/online
    Signed-off-by: default avatarValentin Schneider <valentin.schneider@arm.com>
    Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
    Cc: Dietmar.Eggemann@arm.com
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: hannes@cmpxchg.org
    Cc: lizefan@huawei.com
    Cc: morten.rasmussen@arm.com
    Cc: qperret@google.com
    Cc: tj@kernel.org
    Cc: vincent.guittot@linaro.org
    Fixes: 05484e09 ("sched/topology: Add SD_ASYM_CPUCAPACITY flag detection")
    Link: https://lkml.kernel.org/r/20191023153745.19515-2-valentin.schneider@arm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
    cd1cb335
topology.c 56.7 KB