• Christoph Lameter (Ampere)'s avatar
    ARM64: Dynamically allocate cpumasks and increase supported CPUs to 512 · 3fbd56f0
    Christoph Lameter (Ampere) authored
      [ a.k.a. Revert "Revert "ARM64: Dynamically allocate cpumasks and
        increase supported CPUs to 512""; originally reverted because of a
        bug in the cpufreq-dt code not using zalloc_cpumask_var() ]
    
    Currently defconfig selects NR_CPUS=256, but some vendors (e.g. Ampere
    Computing) are planning to ship systems with 512 CPUs. So that all CPUs on
    these systems can be used with defconfig, we'd like to bump NR_CPUS to 512.
    Therefore this patch increases the default NR_CPUS from 256 to 512.
    
    As increasing NR_CPUS will increase the size of cpumasks, there's a fear that
    this might have a significant impact on stack usage due to code which places
    cpumasks on the stack. To mitigate that concern, we can select
    CPUMASK_OFFSTACK. As that doesn't seem to be a problem today with
    NR_CPUS=256, we only select this when NR_CPUS > 256.
    
    CPUMASK_OFFSTACK configures the cpumasks in the kernel to be
    dynamically allocated. This was used in the X86 architecture in the
    past to enable support for larger CPU configurations up to 8k cpus.
    
    With that is becomes possible to dynamically size the allocation of
    the cpu bitmaps depending on the quantity of processors detected on
    bootup. Memory used for cpumasks will increase if the kernel is
    run on a machine with more cores.
    
    Further increases may be needed if ARM processor vendors start
    supporting more processors. Given the current inflationary trends
    in core counts from multiple processor manufacturers this may occur.
    
    There are minor regressions for hackbench. The kernel data size
    for 512 cpus is smaller with offstack than with onstack.
    
    Benchmark results using hackbench average over 10 runs of
    
     	hackbench -s 512 -l 2000 -g 15 -f 25 -P
    
    on Altra 80 Core
    
    Support for 256 CPUs on stack. Baseline
    
     	7.8564 sec
    
    Support for 512 CUs on stack.
    
     	7.8713 sec + 0.18%
    
    512 CPUS offstack
    
     	7.8916 sec + 0.44%
    
    Kernel size comparison:
    
        text		   data	    filename				Difference to onstack256 baseline
    25755648	9589248	    vmlinuz-6.8.0-rc4-onstack256
    25755648	9607680	    vmlinuz-6.8.0-rc4-onstack512	+0.19%
    25755648	9603584	    vmlinuz-6.8.0-rc4-offstack512	+0.14%
    Tested-by: default avatarEric Mackay <eric.mackay@oracle.com>
    Reviewed-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
    Signed-off-by: default avatarChristoph Lameter (Ampere) <cl@linux.com>
    Acked-by: default avatarMark Rutland <mark.rutland@arm.com>
    Link: https://lore.kernel.org/r/37099a57-b655-3b3a-56d0-5f7fbd49d7db@gentwo.org
    Link: https://lore.kernel.org/r/20240314125457.186678-1-m.szyprowski@samsung.com
    [catalin.marinas@arm.com: use 'select' instead of duplicating 'config CPUMASK_OFFSTACK']
    Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
    3fbd56f0
Kconfig 80.3 KB