• Alison Schofield's avatar
    x86,sched: Allow topologies where NUMA nodes share an LLC · 1340ccfa
    Alison Schofield authored
    Intel's Skylake Server CPUs have a different LLC topology than previous
    generations. When in Sub-NUMA-Clustering (SNC) mode, the package is divided
    into two "slices", each containing half the cores, half the LLC, and one
    memory controller and each slice is enumerated to Linux as a NUMA
    node. This is similar to how the cores and LLC were arranged for the
    Cluster-On-Die (CoD) feature.
    
    CoD allowed the same cache line to be present in each half of the LLC.
    But, with SNC, each line is only ever present in *one* slice. This means
    that the portion of the LLC *available* to a CPU depends on the data being
    accessed:
    
        Remote socket: entire package LLC is shared
        Local socket->local slice: data goes into local slice LLC
        Local socket->remote slice: data goes into remote-slice LLC. Slightly
                        		higher latency than local slice LLC.
    
    The biggest implication from this is that a process accessing all
    NUMA-local memory only sees half the LLC capacity.
    
    The CPU describes its cache hierarchy with the CPUID instruction. One of
    the CPUID leaves enumerates the "logical processors sharing this
    cache". This information is used for scheduling decisions so that tasks
    move more freely between CPUs sharing the cache.
    
    But, the CPUID for the SNC configuration discussed above enumerates the LLC
    as being shared by the entire package. This is not 100% precise because the
    entire cache is not usable by all accesses. But, it *is* the way the
    hardware enumerates itself, and this is not likely to change.
    
    The userspace visible impact of all the above is that the sysfs info
    reports the entire LLC as being available to the entire package. As noted
    above, this is not true for local socket accesses. This patch does not
    correct the sysfs info. It is the same, pre and post patch.
    
    The current code emits the following warning:
    
     sched: CPU #3's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
    
    The warning is coming from the topology_sane() check in smpboot.c because
    the topology is not matching the expectations of the model for obvious
    reasons.
    
    To fix this, add a vendor and model specific check to never call
    topology_sane() for these systems. Also, just like "Cluster-on-Die" disable
    the "coregroup" sched_domain_topology_level and use NUMA information from
    the SRAT alone.
    
    This is OK at least on the hardware we are immediately concerned about
    because the LLC sharing happens at both the slice and at the package level,
    which are also NUMA boundaries.
    Signed-off-by: default avatarAlison Schofield <alison.schofield@intel.com>
    Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
    Reviewed-by: default avatarBorislav Petkov <bp@suse.de>
    Cc: Prarit Bhargava <prarit@redhat.com>
    Cc: Tony Luck <tony.luck@intel.com>
    Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
    Cc: brice.goglin@gmail.com
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Igor Mammedov <imammedo@redhat.com>
    Cc: "H. Peter Anvin" <hpa@linux.intel.com>
    Cc: Tim Chen <tim.c.chen@linux.intel.com>
    Link: https://lkml.kernel.org/r/20180407002130.GA18984@alison-desk.jf.intel.com
    1340ccfa
smpboot.c 40.1 KB