• Barry Song's avatar
    sched/topology: fix the issue groups don't span domain->span for NUMA diameter > 2 · 585b6d27
    Barry Song authored
    As long as NUMA diameter > 2, building sched_domain by sibling's child
    domain will definitely create a sched_domain with sched_group which will
    span out of the sched_domain:
    
                   +------+         +------+        +-------+       +------+
                   | node |  12     |node  | 20     | node  |  12   |node  |
                   |  0   +---------+1     +--------+ 2     +-------+3     |
                   +------+         +------+        +-------+       +------+
    
    domain0        node0            node1            node2          node3
    
    domain1        node0+1          node0+1          node2+3        node2+3
                                                     +
    domain2        node0+1+2                         |
                 group: node0+1                      |
                   group:node2+3 <-------------------+
    
    when node2 is added into the domain2 of node0, kernel is using the child
    domain of node2's domain2, which is domain1(node2+3). Node 3 is outside
    the span of the domain including node0+1+2.
    
    This will make load_balance() run based on screwed avg_load and group_type
    in the sched_group spanning out of the sched_domain, and it also makes
    select_task_rq_fair() pick an idle CPU outside the sched_domain.
    
    Real servers which suffer from this problem include Kunpeng920 and 8-node
    Sun Fire X4600-M2, at least.
    
    Here we move to use the *child* domain of the *child* domain of node2's
    domain2 as the new added sched_group. At the same, we re-use the lower
    level sgc directly.
                   +------+         +------+        +-------+       +------+
                   | node |  12     |node  | 20     | node  |  12   |node  |
                   |  0   +---------+1     +--------+ 2     +-------+3     |
                   +------+         +------+        +-------+       +------+
    
    domain0        node0            node1          +- node2          node3
                                                   |
    domain1        node0+1          node0+1        | node2+3        node2+3
                                                   |
    domain2        node0+1+2                       |
                 group: node0+1                    |
                   group:node2 <-------------------+
    
    While the lower level sgc is re-used, this patch only changes the remote
    sched_groups for those sched_domains playing grandchild trick, therefore,
    sgc->next_update is still safe since it's only touched by CPUs that have
    the group span as local group. And sgc->imbalance is also safe because
    sd_parent remains the same in load_balance and LB only tries other CPUs
    from the local group.
    Moreover, since local groups are not touched, they are still getting
    roughly equal size in a TL. And should_we_balance() only matters with
    local groups, so the pull probability of those groups are still roughly
    equal.
    
    Tested by the below topology:
    qemu-system-aarch64  -M virt -nographic \
     -smp cpus=8 \
     -numa node,cpus=0-1,nodeid=0 \
     -numa node,cpus=2-3,nodeid=1 \
     -numa node,cpus=4-5,nodeid=2 \
     -numa node,cpus=6-7,nodeid=3 \
     -numa dist,src=0,dst=1,val=12 \
     -numa dist,src=0,dst=2,val=20 \
     -numa dist,src=0,dst=3,val=22 \
     -numa dist,src=1,dst=2,val=22 \
     -numa dist,src=2,dst=3,val=12 \
     -numa dist,src=1,dst=3,val=24 \
     -m 4G -cpu cortex-a57 -kernel arch/arm64/boot/Image
    
    w/o patch, we get lots of "groups don't span domain->span":
    [    0.802139] CPU0 attaching sched-domain(s):
    [    0.802193]  domain-0: span=0-1 level=MC
    [    0.802443]   groups: 0:{ span=0 cap=1013 }, 1:{ span=1 cap=979 }
    [    0.802693]   domain-1: span=0-3 level=NUMA
    [    0.802731]    groups: 0:{ span=0-1 cap=1992 }, 2:{ span=2-3 cap=1943 }
    [    0.802811]    domain-2: span=0-5 level=NUMA
    [    0.802829]     groups: 0:{ span=0-3 cap=3935 }, 4:{ span=4-7 cap=3937 }
    [    0.802881] ERROR: groups don't span domain->span
    [    0.803058]     domain-3: span=0-7 level=NUMA
    [    0.803080]      groups: 0:{ span=0-5 mask=0-1 cap=5843 }, 6:{ span=4-7 mask=6-7 cap=4077 }
    [    0.804055] CPU1 attaching sched-domain(s):
    [    0.804072]  domain-0: span=0-1 level=MC
    [    0.804096]   groups: 1:{ span=1 cap=979 }, 0:{ span=0 cap=1013 }
    [    0.804152]   domain-1: span=0-3 level=NUMA
    [    0.804170]    groups: 0:{ span=0-1 cap=1992 }, 2:{ span=2-3 cap=1943 }
    [    0.804219]    domain-2: span=0-5 level=NUMA
    [    0.804236]     groups: 0:{ span=0-3 cap=3935 }, 4:{ span=4-7 cap=3937 }
    [    0.804302] ERROR: groups don't span domain->span
    [    0.804520]     domain-3: span=0-7 level=NUMA
    [    0.804546]      groups: 0:{ span=0-5 mask=0-1 cap=5843 }, 6:{ span=4-7 mask=6-7 cap=4077 }
    [    0.804677] CPU2 attaching sched-domain(s):
    [    0.804687]  domain-0: span=2-3 level=MC
    [    0.804705]   groups: 2:{ span=2 cap=934 }, 3:{ span=3 cap=1009 }
    [    0.804754]   domain-1: span=0-3 level=NUMA
    [    0.804772]    groups: 2:{ span=2-3 cap=1943 }, 0:{ span=0-1 cap=1992 }
    [    0.804820]    domain-2: span=0-5 level=NUMA
    [    0.804836]     groups: 2:{ span=0-3 mask=2-3 cap=3991 }, 4:{ span=0-1,4-7 mask=4-5 cap=5985 }
    [    0.804944] ERROR: groups don't span domain->span
    [    0.805108]     domain-3: span=0-7 level=NUMA
    [    0.805134]      groups: 2:{ span=0-5 mask=2-3 cap=5899 }, 6:{ span=0-1,4-7 mask=6-7 cap=6125 }
    [    0.805223] CPU3 attaching sched-domain(s):
    [    0.805232]  domain-0: span=2-3 level=MC
    [    0.805249]   groups: 3:{ span=3 cap=1009 }, 2:{ span=2 cap=934 }
    [    0.805319]   domain-1: span=0-3 level=NUMA
    [    0.805336]    groups: 2:{ span=2-3 cap=1943 }, 0:{ span=0-1 cap=1992 }
    [    0.805383]    domain-2: span=0-5 level=NUMA
    [    0.805399]     groups: 2:{ span=0-3 mask=2-3 cap=3991 }, 4:{ span=0-1,4-7 mask=4-5 cap=5985 }
    [    0.805458] ERROR: groups don't span domain->span
    [    0.805605]     domain-3: span=0-7 level=NUMA
    [    0.805626]      groups: 2:{ span=0-5 mask=2-3 cap=5899 }, 6:{ span=0-1,4-7 mask=6-7 cap=6125 }
    [    0.805712] CPU4 attaching sched-domain(s):
    [    0.805721]  domain-0: span=4-5 level=MC
    [    0.805738]   groups: 4:{ span=4 cap=984 }, 5:{ span=5 cap=924 }
    [    0.805787]   domain-1: span=4-7 level=NUMA
    [    0.805803]    groups: 4:{ span=4-5 cap=1908 }, 6:{ span=6-7 cap=2029 }
    [    0.805851]    domain-2: span=0-1,4-7 level=NUMA
    [    0.805867]     groups: 4:{ span=4-7 cap=3937 }, 0:{ span=0-3 cap=3935 }
    [    0.805915] ERROR: groups don't span domain->span
    [    0.806108]     domain-3: span=0-7 level=NUMA
    [    0.806130]      groups: 4:{ span=0-1,4-7 mask=4-5 cap=5985 }, 2:{ span=0-3 mask=2-3 cap=3991 }
    [    0.806214] CPU5 attaching sched-domain(s):
    [    0.806222]  domain-0: span=4-5 level=MC
    [    0.806240]   groups: 5:{ span=5 cap=924 }, 4:{ span=4 cap=984 }
    [    0.806841]   domain-1: span=4-7 level=NUMA
    [    0.806866]    groups: 4:{ span=4-5 cap=1908 }, 6:{ span=6-7 cap=2029 }
    [    0.806934]    domain-2: span=0-1,4-7 level=NUMA
    [    0.806953]     groups: 4:{ span=4-7 cap=3937 }, 0:{ span=0-3 cap=3935 }
    [    0.807004] ERROR: groups don't span domain->span
    [    0.807312]     domain-3: span=0-7 level=NUMA
    [    0.807386]      groups: 4:{ span=0-1,4-7 mask=4-5 cap=5985 }, 2:{ span=0-3 mask=2-3 cap=3991 }
    [    0.807686] CPU6 attaching sched-domain(s):
    [    0.807710]  domain-0: span=6-7 level=MC
    [    0.807750]   groups: 6:{ span=6 cap=1017 }, 7:{ span=7 cap=1012 }
    [    0.807840]   domain-1: span=4-7 level=NUMA
    [    0.807870]    groups: 6:{ span=6-7 cap=2029 }, 4:{ span=4-5 cap=1908 }
    [    0.807952]    domain-2: span=0-1,4-7 level=NUMA
    [    0.807985]     groups: 6:{ span=4-7 mask=6-7 cap=4077 }, 0:{ span=0-5 mask=0-1 cap=5843 }
    [    0.808045] ERROR: groups don't span domain->span
    [    0.808257]     domain-3: span=0-7 level=NUMA
    [    0.808571]      groups: 6:{ span=0-1,4-7 mask=6-7 cap=6125 }, 2:{ span=0-5 mask=2-3 cap=5899 }
    [    0.808848] CPU7 attaching sched-domain(s):
    [    0.808860]  domain-0: span=6-7 level=MC
    [    0.808880]   groups: 7:{ span=7 cap=1012 }, 6:{ span=6 cap=1017 }
    [    0.808953]   domain-1: span=4-7 level=NUMA
    [    0.808974]    groups: 6:{ span=6-7 cap=2029 }, 4:{ span=4-5 cap=1908 }
    [    0.809034]    domain-2: span=0-1,4-7 level=NUMA
    [    0.809055]     groups: 6:{ span=4-7 mask=6-7 cap=4077 }, 0:{ span=0-5 mask=0-1 cap=5843 }
    [    0.809128] ERROR: groups don't span domain->span
    [    0.810361]     domain-3: span=0-7 level=NUMA
    [    0.810400]      groups: 6:{ span=0-1,4-7 mask=6-7 cap=5961 }, 2:{ span=0-5 mask=2-3 cap=5903 }
    
    w/ patch, we don't get "groups don't span domain->span" any more:
    [    1.486271] CPU0 attaching sched-domain(s):
    [    1.486820]  domain-0: span=0-1 level=MC
    [    1.500924]   groups: 0:{ span=0 cap=980 }, 1:{ span=1 cap=994 }
    [    1.515717]   domain-1: span=0-3 level=NUMA
    [    1.515903]    groups: 0:{ span=0-1 cap=1974 }, 2:{ span=2-3 cap=1989 }
    [    1.516989]    domain-2: span=0-5 level=NUMA
    [    1.517124]     groups: 0:{ span=0-3 cap=3963 }, 4:{ span=4-5 cap=1949 }
    [    1.517369]     domain-3: span=0-7 level=NUMA
    [    1.517423]      groups: 0:{ span=0-5 mask=0-1 cap=5912 }, 6:{ span=4-7 mask=6-7 cap=4054 }
    [    1.520027] CPU1 attaching sched-domain(s):
    [    1.520097]  domain-0: span=0-1 level=MC
    [    1.520184]   groups: 1:{ span=1 cap=994 }, 0:{ span=0 cap=980 }
    [    1.520429]   domain-1: span=0-3 level=NUMA
    [    1.520487]    groups: 0:{ span=0-1 cap=1974 }, 2:{ span=2-3 cap=1989 }
    [    1.520687]    domain-2: span=0-5 level=NUMA
    [    1.520744]     groups: 0:{ span=0-3 cap=3963 }, 4:{ span=4-5 cap=1949 }
    [    1.520948]     domain-3: span=0-7 level=NUMA
    [    1.521038]      groups: 0:{ span=0-5 mask=0-1 cap=5912 }, 6:{ span=4-7 mask=6-7 cap=4054 }
    [    1.522068] CPU2 attaching sched-domain(s):
    [    1.522348]  domain-0: span=2-3 level=MC
    [    1.522606]   groups: 2:{ span=2 cap=1003 }, 3:{ span=3 cap=986 }
    [    1.522832]   domain-1: span=0-3 level=NUMA
    [    1.522885]    groups: 2:{ span=2-3 cap=1989 }, 0:{ span=0-1 cap=1974 }
    [    1.523043]    domain-2: span=0-5 level=NUMA
    [    1.523092]     groups: 2:{ span=0-3 mask=2-3 cap=4037 }, 4:{ span=4-5 cap=1949 }
    [    1.523302]     domain-3: span=0-7 level=NUMA
    [    1.523352]      groups: 2:{ span=0-5 mask=2-3 cap=5986 }, 6:{ span=0-1,4-7 mask=6-7 cap=6102 }
    [    1.523748] CPU3 attaching sched-domain(s):
    [    1.523774]  domain-0: span=2-3 level=MC
    [    1.523825]   groups: 3:{ span=3 cap=986 }, 2:{ span=2 cap=1003 }
    [    1.524009]   domain-1: span=0-3 level=NUMA
    [    1.524086]    groups: 2:{ span=2-3 cap=1989 }, 0:{ span=0-1 cap=1974 }
    [    1.524281]    domain-2: span=0-5 level=NUMA
    [    1.524331]     groups: 2:{ span=0-3 mask=2-3 cap=4037 }, 4:{ span=4-5 cap=1949 }
    [    1.524534]     domain-3: span=0-7 level=NUMA
    [    1.524586]      groups: 2:{ span=0-5 mask=2-3 cap=5986 }, 6:{ span=0-1,4-7 mask=6-7 cap=6102 }
    [    1.524847] CPU4 attaching sched-domain(s):
    [    1.524873]  domain-0: span=4-5 level=MC
    [    1.524954]   groups: 4:{ span=4 cap=958 }, 5:{ span=5 cap=991 }
    [    1.525105]   domain-1: span=4-7 level=NUMA
    [    1.525153]    groups: 4:{ span=4-5 cap=1949 }, 6:{ span=6-7 cap=2006 }
    [    1.525368]    domain-2: span=0-1,4-7 level=NUMA
    [    1.525428]     groups: 4:{ span=4-7 cap=3955 }, 0:{ span=0-1 cap=1974 }
    [    1.532726]     domain-3: span=0-7 level=NUMA
    [    1.532811]      groups: 4:{ span=0-1,4-7 mask=4-5 cap=6003 }, 2:{ span=0-3 mask=2-3 cap=4037 }
    [    1.534125] CPU5 attaching sched-domain(s):
    [    1.534159]  domain-0: span=4-5 level=MC
    [    1.534303]   groups: 5:{ span=5 cap=991 }, 4:{ span=4 cap=958 }
    [    1.534490]   domain-1: span=4-7 level=NUMA
    [    1.534572]    groups: 4:{ span=4-5 cap=1949 }, 6:{ span=6-7 cap=2006 }
    [    1.534734]    domain-2: span=0-1,4-7 level=NUMA
    [    1.534783]     groups: 4:{ span=4-7 cap=3955 }, 0:{ span=0-1 cap=1974 }
    [    1.536057]     domain-3: span=0-7 level=NUMA
    [    1.536430]      groups: 4:{ span=0-1,4-7 mask=4-5 cap=6003 }, 2:{ span=0-3 mask=2-3 cap=3896 }
    [    1.536815] CPU6 attaching sched-domain(s):
    [    1.536846]  domain-0: span=6-7 level=MC
    [    1.536934]   groups: 6:{ span=6 cap=1005 }, 7:{ span=7 cap=1001 }
    [    1.537144]   domain-1: span=4-7 level=NUMA
    [    1.537262]    groups: 6:{ span=6-7 cap=2006 }, 4:{ span=4-5 cap=1949 }
    [    1.537553]    domain-2: span=0-1,4-7 level=NUMA
    [    1.537613]     groups: 6:{ span=4-7 mask=6-7 cap=4054 }, 0:{ span=0-1 cap=1805 }
    [    1.537872]     domain-3: span=0-7 level=NUMA
    [    1.537998]      groups: 6:{ span=0-1,4-7 mask=6-7 cap=6102 }, 2:{ span=0-5 mask=2-3 cap=5845 }
    [    1.538448] CPU7 attaching sched-domain(s):
    [    1.538505]  domain-0: span=6-7 level=MC
    [    1.538586]   groups: 7:{ span=7 cap=1001 }, 6:{ span=6 cap=1005 }
    [    1.538746]   domain-1: span=4-7 level=NUMA
    [    1.538798]    groups: 6:{ span=6-7 cap=2006 }, 4:{ span=4-5 cap=1949 }
    [    1.539048]    domain-2: span=0-1,4-7 level=NUMA
    [    1.539111]     groups: 6:{ span=4-7 mask=6-7 cap=4054 }, 0:{ span=0-1 cap=1805 }
    [    1.539571]     domain-3: span=0-7 level=NUMA
    [    1.539610]      groups: 6:{ span=0-1,4-7 mask=6-7 cap=6102 }, 2:{ span=0-5 mask=2-3 cap=5845 }
    Signed-off-by: default avatarBarry Song <song.bao.hua@hisilicon.com>
    Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
    Reviewed-by: default avatarValentin Schneider <valentin.schneider@arm.com>
    Tested-by: default avatarMeelis Roos <mroos@linux.ee>
    Link: https://lkml.kernel.org/r/20210224030944.15232-1-song.bao.hua@hisilicon.com
    585b6d27
topology.c 60.4 KB