• Feng Tang's avatar
    x86/tsc: Use topology_max_packages() to get package number · b4bac279
    Feng Tang authored
    Commit b50db709 ("x86/tsc: Disable clocksource watchdog for TSC on
    qualified platorms") was introduced to solve problem that sometimes TSC
    clocksource is wrongly judged as unstable by watchdog like 'jiffies', HPET,
    etc.
    
    In it, the hardware package number is a key factor for judging whether to
    disable the watchdog for TSC, and 'nr_online_nodes' was chosen due to, at
    that time (kernel v5.1x), it is available in early boot phase before
    registering 'tsc-early' clocksource, where all non-boot CPUs are not
    brought up yet.
    
    Dave and Rui pointed out there are many cases in which 'nr_online_nodes'
    is cheated and not accurate, like:
    
     * SNC (sub-numa cluster) mode enabled
     * numa emulation (numa=fake=8 etc.)
     * numa=off
     * platforms with CPU-less HBM nodes, CPU-less Optane memory nodes.
     * 'maxcpus=' cmdline setup, where chopped CPUs could be onlined later
     * 'nr_cpus=', 'possible_cpus=' cmdline setup, where chopped CPUs can
       not be onlined after boot
    
    The SNC case is the most user-visible case, as many CSP (Cloud Service
    Provider) enable this feature in their server fleets. When SNC3 enabled, a
    2 socket machine will appear to have 6 NUMA nodes, and get impacted by the
    issue in reality.
    
    Thomas' recent patchset of refactoring x86 topology code improves
    topology_max_packages() greatly, by making it more accurate and available
    in early boot phase, which works well in most of the above cases.
    
    The only exceptions are 'nr_cpus=' and 'possible_cpus=' setup, which may
    under-estimate the package number. As during topology setup, the boot CPU
    iterates through all enumerated APIC IDs and either accepts or rejects the
    APIC ID. For accepted IDs, it figures out which bits of the ID map to the
    package number.  It tracks which package numbers have been seen in a
    bitmap.  topology_max_packages() just returns the number of bits set in
    that bitmap.
    
    'nr_cpus=' and 'possible_cpus=' can cause more APIC IDs to be rejected and
    can artificially lower the number of bits in the package bitmap and thus
    topology_max_packages().  This means that, for example, a system with 8
    physical packages might reject all the CPUs on 6 of those packages and be
    left with only 2 packages and 2 bits set in the package bitmap. It needs
    the TSC watchdog, but would disable it anyway.  This isn't ideal, but it
    only happens for debug-oriented options. This is fixable by tracking the
    package numbers for rejected CPUs.  But it's not worth the trouble for
    debugging.
    
    So use topology_max_packages() to replace nr_online_nodes().
    Reported-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
    Signed-off-by: default avatarFeng Tang <feng.tang@intel.com>
    Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
    Reviewed-by: default avatarWaiman Long <longman@redhat.com>
    Link: https://lore.kernel.org/all/20240729021202.180955-1-feng.tang@intel.com
    Closes: https://lore.kernel.org/lkml/a4860054-0f16-6513-f121-501048431086@intel.com/
    b4bac279
tsc.c 40.5 KB