• bob picco's avatar
    sparc64: find_node adjustment · b1f46334
    bob picco authored
    [ Upstream commit 3dee9df5 ]
    
    We have seen an issue with guest boot into LDOM that causes early boot failures
    because of no matching rules for node identitity of the memory. I analyzed this
    on my T4 and concluded there might not be a solution. I saw the issue in
    mainline too when booting into the control/primary domain - with guests
    configured.  Note, this could be a firmware bug on some older machines.
    
    I'll provide a full explanation of the issues below. Should we not find a
    matching BEST latency group for a real address (RA) then we will assume node 0.
    On the T4-2 here with the information provided I can't see an alternative.
    
    Technically the LDOM shown below should match the MBLOCK to the
    favorable latency group. However other factors must be considered too. Were
    the memory controllers configured "fine" grained interleave or "coarse"
    grain interleaved -  T4. Also should a "group" MD node be considered a NUMA
    node?
    
    There has to be at least one Machine Description (MD) "group" and hence one
    NUMA node. The group can have one or more latency groups (lg) - more than one
    memory controller. The current code chooses the smallest latency as the most
    favorable per group. The latency and lg information is in MLGROUP below.
    MBLOCK is the base and size of the RAs for the machine as fetched from OBP
    /memory "available" property. My machine has one MBLOCK but more would be
    possible - with holes?
    
    For a T4-2 the following information has been gathered:
    with LDOM guest
    MEMBLOCK configuration:
     memory size = 0x27f870000
     memory.cnt  = 0x3
     memory[0x0]    [0x00000020400000-0x0000029fc67fff], 0x27f868000 bytes
     memory[0x1]    [0x0000029fd8a000-0x0000029fd8bfff], 0x2000 bytes
     memory[0x2]    [0x0000029fd92000-0x0000029fd97fff], 0x6000 bytes
     reserved.cnt  = 0x2
     reserved[0x0]  [0x00000020800000-0x000000216c15c0], 0xec15c1 bytes
     reserved[0x1]  [0x00000024800000-0x0000002c180c1e], 0x7980c1f bytes
    MBLOCK[0]: base[20000000] size[280000000] offset[0]
    (note: "base" and "size" reported in "MBLOCK" encompass the "memory[X]" values)
    (note: (RA + offset) & mask = val is the formula to detect a match for the
    memory controller. should there be no match for find_node node, a return
    value of -1 resulted for the node - BAD)
    
    There is one group. It has these forward links
    MLGROUP[1]: node[545] latency[1f7e8] match[200000000] mask[200000000]
    MLGROUP[2]: node[54d] latency[2de60] match[0] mask[200000000]
    NUMA NODE[0]: node[545] mask[200000000] val[200000000] (latency[1f7e8])
    (note: "val" is the best lg's (smallest latency) "match")
    
    no LDOM guest - bare metal
    MEMBLOCK configuration:
     memory size = 0xfdf2d0000
     memory.cnt  = 0x3
     memory[0x0]    [0x00000020400000-0x00000fff6adfff], 0xfdf2ae000 bytes
     memory[0x1]    [0x00000fff6d2000-0x00000fff6e7fff], 0x16000 bytes
     memory[0x2]    [0x00000fff766000-0x00000fff771fff], 0xc000 bytes
     reserved.cnt  = 0x2
     reserved[0x0]  [0x00000020800000-0x00000021a04580], 0x1204581 bytes
     reserved[0x1]  [0x00000024800000-0x0000002c7d29fc], 0x7fd29fd bytes
    MBLOCK[0]: base[20000000] size[fe0000000] offset[0]
    
    there are two groups
    group node[16d5]
    MLGROUP[0]: node[1765] latency[1f7e8] match[0] mask[200000000]
    MLGROUP[3]: node[177d] latency[2de60] match[200000000] mask[200000000]
    NUMA NODE[0]: node[1765] mask[200000000] val[0] (latency[1f7e8])
    group node[171d]
    MLGROUP[2]: node[1775] latency[2de60] match[0] mask[200000000]
    MLGROUP[1]: node[176d] latency[1f7e8] match[200000000] mask[200000000]
    NUMA NODE[1]: node[176d] mask[200000000] val[200000000] (latency[1f7e8])
    (note: for this two "group" bare metal machine, 1/2 memory is in group one's
    lg and 1/2 memory is in group two's lg).
    
    Cc: sparclinux@vger.kernel.org
    Signed-off-by: default avatarBob Picco <bob.picco@oracle.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    b1f46334
init_64.c 67.1 KB