1. 18 Mar, 2024 1 commit
    • Christoph Lameter (Ampere)'s avatar
      ARM64: Dynamically allocate cpumasks and increase supported CPUs to 512 · 3fbd56f0
      Christoph Lameter (Ampere) authored
        [ a.k.a. Revert "Revert "ARM64: Dynamically allocate cpumasks and
          increase supported CPUs to 512""; originally reverted because of a
          bug in the cpufreq-dt code not using zalloc_cpumask_var() ]
      
      Currently defconfig selects NR_CPUS=256, but some vendors (e.g. Ampere
      Computing) are planning to ship systems with 512 CPUs. So that all CPUs on
      these systems can be used with defconfig, we'd like to bump NR_CPUS to 512.
      Therefore this patch increases the default NR_CPUS from 256 to 512.
      
      As increasing NR_CPUS will increase the size of cpumasks, there's a fear that
      this might have a significant impact on stack usage due to code which places
      cpumasks on the stack. To mitigate that concern, we can select
      CPUMASK_OFFSTACK. As that doesn't seem to be a problem today with
      NR_CPUS=256, we only select this when NR_CPUS > 256.
      
      CPUMASK_OFFSTACK configures the cpumasks in the kernel to be
      dynamically allocated. This was used in the X86 architecture in the
      past to enable support for larger CPU configurations up to 8k cpus.
      
      With that is becomes possible to dynamically size the allocation of
      the cpu bitmaps depending on the quantity of processors detected on
      bootup. Memory used for cpumasks will increase if the kernel is
      run on a machine with more cores.
      
      Further increases may be needed if ARM processor vendors start
      supporting more processors. Given the current inflationary trends
      in core counts from multiple processor manufacturers this may occur.
      
      There are minor regressions for hackbench. The kernel data size
      for 512 cpus is smaller with offstack than with onstack.
      
      Benchmark results using hackbench average over 10 runs of
      
       	hackbench -s 512 -l 2000 -g 15 -f 25 -P
      
      on Altra 80 Core
      
      Support for 256 CPUs on stack. Baseline
      
       	7.8564 sec
      
      Support for 512 CUs on stack.
      
       	7.8713 sec + 0.18%
      
      512 CPUS offstack
      
       	7.8916 sec + 0.44%
      
      Kernel size comparison:
      
          text		   data	    filename				Difference to onstack256 baseline
      25755648	9589248	    vmlinuz-6.8.0-rc4-onstack256
      25755648	9607680	    vmlinuz-6.8.0-rc4-onstack512	+0.19%
      25755648	9603584	    vmlinuz-6.8.0-rc4-offstack512	+0.14%
      Tested-by: default avatarEric Mackay <eric.mackay@oracle.com>
      Reviewed-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarChristoph Lameter (Ampere) <cl@linux.com>
      Acked-by: default avatarMark Rutland <mark.rutland@arm.com>
      Link: https://lore.kernel.org/r/37099a57-b655-3b3a-56d0-5f7fbd49d7db@gentwo.org
      Link: https://lore.kernel.org/r/20240314125457.186678-1-m.szyprowski@samsung.com
      [catalin.marinas@arm.com: use 'select' instead of duplicating 'config CPUMASK_OFFSTACK']
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      3fbd56f0
  2. 13 Mar, 2024 2 commits
  3. 11 Mar, 2024 1 commit
  4. 07 Mar, 2024 12 commits
    • Catalin Marinas's avatar
      Merge branch 'for-next/stage1-lpa2' into for-next/core · 88f09122
      Catalin Marinas authored
      * for-next/stage1-lpa2: (48 commits)
        : Add support for LPA2 and WXN and stage 1
        arm64/mm: Avoid ID mapping of kpti flag if it is no longer needed
        arm64/mm: Use generic __pud_free() helper in pud_free() implementation
        arm64: gitignore: ignore relacheck
        arm64: Use Signed/Unsigned enums for TGRAN{4,16,64} and VARange
        arm64: mm: Make PUD folding check in set_pud() a runtime check
        arm64: mm: add support for WXN memory translation attribute
        mm: add arch hook to validate mmap() prot flags
        arm64: defconfig: Enable LPA2 support
        arm64: Enable 52-bit virtual addressing for 4k and 16k granule configs
        arm64: kvm: avoid CONFIG_PGTABLE_LEVELS for runtime levels
        arm64: ptdump: Deal with translation levels folded at runtime
        arm64: ptdump: Disregard unaddressable VA space
        arm64: mm: Add support for folding PUDs at runtime
        arm64: kasan: Reduce minimum shadow alignment and enable 5 level paging
        arm64: mm: Add 5 level paging support to fixmap and swapper handling
        arm64: Enable LPA2 at boot if supported by the system
        arm64: mm: add LPA2 and 5 level paging support to G-to-nG conversion
        arm64: mm: Add definitions to support 5 levels of paging
        arm64: mm: Add LPA2 support to phys<->pte conversion routines
        arm64: mm: Wire up TCR.DS bit to PTE shareability fields
        ...
      88f09122
    • Catalin Marinas's avatar
      Merge branches 'for-next/reorg-va-space', 'for-next/rust-for-arm64',... · 0c5ade74
      Catalin Marinas authored
      Merge branches 'for-next/reorg-va-space', 'for-next/rust-for-arm64', 'for-next/misc', 'for-next/daif-cleanup', 'for-next/kselftest', 'for-next/documentation', 'for-next/sysreg' and 'for-next/dpisa', remote-tracking branch 'arm64/for-next/perf' into for-next/core
      
      * arm64/for-next/perf: (39 commits)
        docs: perf: Fix build warning of hisi-pcie-pmu.rst
        perf: starfive: Only allow COMPILE_TEST for 64-bit architectures
        MAINTAINERS: Add entry for StarFive StarLink PMU
        docs: perf: Add description for StarFive's StarLink PMU
        dt-bindings: perf: starfive: Add JH8100 StarLink PMU
        perf: starfive: Add StarLink PMU support
        docs: perf: Update usage for target filter of hisi-pcie-pmu
        drivers/perf: hisi_pcie: Merge find_related_event() and get_event_idx()
        drivers/perf: hisi_pcie: Relax the check on related events
        drivers/perf: hisi_pcie: Check the target filter properly
        drivers/perf: hisi_pcie: Add more events for counting TLP bandwidth
        drivers/perf: hisi_pcie: Fix incorrect counting under metric mode
        drivers/perf: hisi_pcie: Introduce hisi_pcie_pmu_get_event_ctrl_val()
        drivers/perf: hisi_pcie: Rename hisi_pcie_pmu_{config,clear}_filter()
        drivers/perf: hisi: Enable HiSilicon Erratum 162700402 quirk for HIP09
        perf/arm_cspmu: Add devicetree support
        dt-bindings/perf: Add Arm CoreSight PMU
        perf/arm_cspmu: Simplify counter reset
        perf/arm_cspmu: Simplify attribute groups
        perf/arm_cspmu: Simplify initialisation
        ...
      
      * for-next/reorg-va-space:
        : Reorganise the arm64 kernel VA space in preparation for LPA2 support
        : (52-bit VA/PA).
        arm64: kaslr: Adjust randomization range dynamically
        arm64: mm: Reclaim unused vmemmap region for vmalloc use
        arm64: vmemmap: Avoid base2 order of struct page size to dimension region
        arm64: ptdump: Discover start of vmemmap region at runtime
        arm64: ptdump: Allow all region boundaries to be defined at boot time
        arm64: mm: Move fixmap region above vmemmap region
        arm64: mm: Move PCI I/O emulation region above the vmemmap region
      
      * for-next/rust-for-arm64:
        : Enable Rust support for arm64
        arm64: rust: Enable Rust support for AArch64
        rust: Refactor the build target to allow the use of builtin targets
      
      * for-next/misc:
        : Miscellaneous arm64 patches
        ARM64: Dynamically allocate cpumasks and increase supported CPUs to 512
        arm64: Remove enable_daif macro
        arm64/hw_breakpoint: Directly use ESR_ELx_WNR for an watchpoint exception
        arm64: cpufeatures: Clean up temporary variable to simplify code
        arm64: Update setup_arch() comment on interrupt masking
        arm64: remove unnecessary ifdefs around is_compat_task()
        arm64: ftrace: Don't forbid CALL_OPS+CC_OPTIMIZE_FOR_SIZE with Clang
        arm64/sme: Ensure that all fields in SMCR_EL1 are set to known values
        arm64/sve: Ensure that all fields in ZCR_EL1 are set to known values
        arm64/sve: Document that __SVE_VQ_MAX is much larger than needed
        arm64: make member of struct pt_regs and it's offset macro in the same order
        arm64: remove unneeded BUILD_BUG_ON assertion
        arm64: kretprobes: acquire the regs via a BRK exception
        arm64: io: permit offset addressing
        arm64: errata: Don't enable workarounds for "rare" errata by default
      
      * for-next/daif-cleanup:
        : Clean up DAIF handling for EL0 returns
        arm64: Unmask Debug + SError in do_notify_resume()
        arm64: Move do_notify_resume() to entry-common.c
        arm64: Simplify do_notify_resume() DAIF masking
      
      * for-next/kselftest:
        : Miscellaneous arm64 kselftest patches
        kselftest/arm64: Test that ptrace takes effect in the target process
      
      * for-next/documentation:
        : arm64 documentation patches
        arm64/sme: Remove spurious 'is' in SME documentation
        arm64/fp: Clarify effect of setting an unsupported system VL
        arm64/sme: Fix cut'n'paste in ABI document
        arm64/sve: Remove bitrotted comment about syscall behaviour
      
      * for-next/sysreg:
        : sysreg updates
        arm64/sysreg: Update ID_AA64DFR0_EL1 register
        arm64/sysreg: Update ID_DFR0_EL1 register fields
        arm64/sysreg: Add register fields for ID_AA64DFR1_EL1
      
      * for-next/dpisa:
        : Support for 2023 dpISA extensions
        kselftest/arm64: Add 2023 DPISA hwcap test coverage
        kselftest/arm64: Add basic FPMR test
        kselftest/arm64: Handle FPMR context in generic signal frame parser
        arm64/hwcap: Define hwcaps for 2023 DPISA features
        arm64/ptrace: Expose FPMR via ptrace
        arm64/signal: Add FPMR signal handling
        arm64/fpsimd: Support FEAT_FPMR
        arm64/fpsimd: Enable host kernel access to FPMR
        arm64/cpufeature: Hook new identification registers up to cpufeature
      0c5ade74
    • Christoph Lameter (Ampere)'s avatar
      ARM64: Dynamically allocate cpumasks and increase supported CPUs to 512 · 0499a783
      Christoph Lameter (Ampere) authored
      Currently defconfig selects NR_CPUS=256, but some vendors (e.g. Ampere
      Computing) are planning to ship systems with 512 CPUs. So that all CPUs on
      these systems can be used with defconfig, we'd like to bump NR_CPUS to 512.
      Therefore this patch increases the default NR_CPUS from 256 to 512.
      
      As increasing NR_CPUS will increase the size of cpumasks, there's a fear that
      this might have a significant impact on stack usage due to code which places
      cpumasks on the stack. To mitigate that concern, we can select
      CPUMASK_OFFSTACK. As that doesn't seem to be a problem today with
      NR_CPUS=256, we only select this when NR_CPUS > 256.
      
      CPUMASK_OFFSTACK configures the cpumasks in the kernel to be
      dynamically allocated. This was used in the X86 architecture in the
      past to enable support for larger CPU configurations up to 8k cpus.
      
      With that is becomes possible to dynamically size the allocation of
      the cpu bitmaps depending on the quantity of processors detected on
      bootup. Memory used for cpumasks will increase if the kernel is
      run on a machine with more cores.
      
      Further increases may be needed if ARM processor vendors start
      supporting more processors. Given the current inflationary trends
      in core counts from multiple processor manufacturers this may occur.
      
      There are minor regressions for hackbench. The kernel data size
      for 512 cpus is smaller with offstack than with onstack.
      
      Benchmark results using hackbench average over 10 runs of
      
       	hackbench -s 512 -l 2000 -g 15 -f 25 -P
      
      on Altra 80 Core
      
      Support for 256 CPUs on stack. Baseline
      
       	7.8564 sec
      
      Support for 512 CUs on stack.
      
       	7.8713 sec + 0.18%
      
      512 CPUS offstack
      
       	7.8916 sec + 0.44%
      
      Kernel size comparison:
      
          text		   data	    filename				Difference to onstack256 baseline
      25755648	9589248	    vmlinuz-6.8.0-rc4-onstack256
      25755648	9607680	    vmlinuz-6.8.0-rc4-onstack512	+0.19%
      25755648	9603584	    vmlinuz-6.8.0-rc4-offstack512	+0.14%
      Tested-by: default avatarEric Mackay <eric.mackay@oracle.com>
      Reviewed-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarChristoph Lameter (Ampere) <cl@linux.com>
      Acked-by: default avatarMark Rutland <mark.rutland@arm.com>
      Link: https://lore.kernel.org/r/37099a57-b655-3b3a-56d0-5f7fbd49d7db@gentwo.org
      [catalin.marinas@arm.com: use 'select' instead of duplicating 'config CPUMASK_OFFSTACK']
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      0499a783
    • Mark Brown's avatar
      kselftest/arm64: Add 2023 DPISA hwcap test coverage · 44d10c27
      Mark Brown authored
      Add the hwcaps added for the 2023 DPISA extensions to the hwcaps test
      program.
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Link: https://lore.kernel.org/r/20240306-arm64-2023-dpisa-v5-9-c568edc8ed7f@kernel.orgSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      44d10c27
    • Mark Brown's avatar
      kselftest/arm64: Add basic FPMR test · 7bcebadd
      Mark Brown authored
      Verify that a FPMR frame is generated on systems that support FPMR and not
      generated otherwise.
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Link: https://lore.kernel.org/r/20240306-arm64-2023-dpisa-v5-8-c568edc8ed7f@kernel.orgSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      7bcebadd
    • Mark Brown's avatar
      kselftest/arm64: Handle FPMR context in generic signal frame parser · f4dcccdd
      Mark Brown authored
      Teach the generic signal frame parsing code about the newly added FPMR
      frame, avoiding warnings every time one is generated.
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Link: https://lore.kernel.org/r/20240306-arm64-2023-dpisa-v5-7-c568edc8ed7f@kernel.orgSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      f4dcccdd
    • Mark Brown's avatar
      arm64/hwcap: Define hwcaps for 2023 DPISA features · c1932cac
      Mark Brown authored
      The 2023 architecture extensions include a large number of floating point
      features, most of which simply add new instructions. Add hwcaps so that
      userspace can enumerate these features.
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Link: https://lore.kernel.org/r/20240306-arm64-2023-dpisa-v5-6-c568edc8ed7f@kernel.orgSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      c1932cac
    • Mark Brown's avatar
      arm64/ptrace: Expose FPMR via ptrace · 4035c22e
      Mark Brown authored
      Add a new regset to expose FPMR via ptrace. It is not added to the FPSIMD
      registers since that structure is exposed elsewhere without any allowance
      for extension we don't add there.
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Link: https://lore.kernel.org/r/20240306-arm64-2023-dpisa-v5-5-c568edc8ed7f@kernel.orgSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      4035c22e
    • Mark Brown's avatar
      arm64/signal: Add FPMR signal handling · 8c46def4
      Mark Brown authored
      Expose FPMR in the signal context on systems where it is supported. The
      kernel validates the exact size of the FPSIMD registers so we can't readily
      add it to fpsimd_context without disruption.
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Link: https://lore.kernel.org/r/20240306-arm64-2023-dpisa-v5-4-c568edc8ed7f@kernel.orgSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      8c46def4
    • Mark Brown's avatar
      arm64/fpsimd: Support FEAT_FPMR · 203f2b95
      Mark Brown authored
      FEAT_FPMR defines a new EL0 accessible register FPMR use to configure the
      FP8 related features added to the architecture at the same time. Detect
      support for this register and context switch it for EL0 when present.
      
      Due to the sharing of responsibility for saving floating point state
      between the host kernel and KVM FP8 support is not yet implemented in KVM
      and a stub similar to that used for SVCR is provided for FPMR in order to
      avoid bisection issues. To make it easier to share host state with the
      hypervisor we store FPMR as a hardened usercopy field in uw (along with
      some padding).
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Link: https://lore.kernel.org/r/20240306-arm64-2023-dpisa-v5-3-c568edc8ed7f@kernel.orgSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      203f2b95
    • Mark Brown's avatar
      arm64/fpsimd: Enable host kernel access to FPMR · b6c0b424
      Mark Brown authored
      FEAT_FPMR provides a new generally accessible architectural register FPMR.
      This is only accessible to EL0 and EL1 when HCRX_EL2.EnFPM is set to 1,
      do this when the host is running. The guest part will be done along with
      context switching the new register and exposing it via guest management.
      Acked-by: default avatarMarc Zyngier <maz@kernel.org>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Link: https://lore.kernel.org/r/20240306-arm64-2023-dpisa-v5-2-c568edc8ed7f@kernel.orgSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      b6c0b424
    • Mark Brown's avatar
      arm64/cpufeature: Hook new identification registers up to cpufeature · cc9f69a3
      Mark Brown authored
      The 2023 architecture extensions have defined several new ID registers,
      hook them up to the cpufeature code so we can add feature checks and hwcaps
      based on their contents.
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Link: https://lore.kernel.org/r/20240306-arm64-2023-dpisa-v5-1-c568edc8ed7f@kernel.orgSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      cc9f69a3
  5. 05 Mar, 2024 2 commits
  6. 04 Mar, 2024 13 commits
  7. 01 Mar, 2024 5 commits
  8. 28 Feb, 2024 3 commits
  9. 22 Feb, 2024 1 commit