1. 08 Nov, 2023 1 commit
  2. 07 Nov, 2023 3 commits
  3. 26 Oct, 2023 6 commits
    • Catalin Marinas's avatar
      Merge branch 'for-next/cpus_have_const_cap' into for-next/core · 14dcf78a
      Catalin Marinas authored
      * for-next/cpus_have_const_cap: (38 commits)
        : cpus_have_const_cap() removal
        arm64: Remove cpus_have_const_cap()
        arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_REPEAT_TLBI
        arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_NVIDIA_CARMEL_CNP
        arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_CAVIUM_23154
        arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_2645198
        arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_1742098
        arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_1542419
        arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_843419
        arm64: Avoid cpus_have_const_cap() for ARM64_UNMAP_KERNEL_AT_EL0
        arm64: Avoid cpus_have_const_cap() for ARM64_{SVE,SME,SME2,FA64}
        arm64: Avoid cpus_have_const_cap() for ARM64_SPECTRE_V2
        arm64: Avoid cpus_have_const_cap() for ARM64_SSBS
        arm64: Avoid cpus_have_const_cap() for ARM64_MTE
        arm64: Avoid cpus_have_const_cap() for ARM64_HAS_TLB_RANGE
        arm64: Avoid cpus_have_const_cap() for ARM64_HAS_WFXT
        arm64: Avoid cpus_have_const_cap() for ARM64_HAS_RNG
        arm64: Avoid cpus_have_const_cap() for ARM64_HAS_EPAN
        arm64: Avoid cpus_have_const_cap() for ARM64_HAS_PAN
        arm64: Avoid cpus_have_const_cap() for ARM64_HAS_GIC_PRIO_MASKING
        arm64: Avoid cpus_have_const_cap() for ARM64_HAS_DIT
        ...
      14dcf78a
    • Catalin Marinas's avatar
      Merge branch 'for-next/feat_lse128' into for-next/core · 2baca17e
      Catalin Marinas authored
      * for-next/feat_lse128:
        : HWCAP for FEAT_LSE128
        kselftest/arm64: add FEAT_LSE128 to hwcap test
        arm64: add FEAT_LSE128 HWCAP
      2baca17e
    • Catalin Marinas's avatar
      Merge branch 'for-next/feat_lrcpc3' into for-next/core · 023113fe
      Catalin Marinas authored
      * for-next/feat_lrcpc3:
        : HWCAP for FEAT_LRCPC3
        selftests/arm64: add HWCAP2_LRCPC3 test
        arm64: add FEAT_LRCPC3 HWCAP
      023113fe
    • Catalin Marinas's avatar
      Merge branch 'for-next/feat_sve_b16b16' into for-next/core · 2a3f8ce3
      Catalin Marinas authored
      * for-next/feat_sve_b16b16:
        : Add support for FEAT_SVE_B16B16 (BFloat16)
        kselftest/arm64: Verify HWCAP2_SVE_B16B16
        arm64/sve: Report FEAT_SVE_B16B16 to userspace
      2a3f8ce3
    • Catalin Marinas's avatar
      Merge branches 'for-next/sve-remove-pseudo-regs', 'for-next/backtrace-ipi',... · 1519018c
      Catalin Marinas authored
      Merge branches 'for-next/sve-remove-pseudo-regs', 'for-next/backtrace-ipi', 'for-next/kselftest', 'for-next/misc' and 'for-next/cpufeat-display-cores', remote-tracking branch 'arm64/for-next/perf' into for-next/core
      
      * arm64/for-next/perf:
        perf: hisi: Fix use-after-free when register pmu fails
        drivers/perf: hisi_pcie: Initialize event->cpu only on success
        drivers/perf: hisi_pcie: Check the type first in pmu::event_init()
        perf/arm-cmn: Enable per-DTC counter allocation
        perf/arm-cmn: Rework DTC counters (again)
        perf/arm-cmn: Fix DTC domain detection
        drivers: perf: arm_pmuv3: Drop some unused arguments from armv8_pmu_init()
        drivers: perf: arm_pmuv3: Read PMMIR_EL1 unconditionally
        drivers/perf: hisi: use cpuhp_state_remove_instance_nocalls() for hisi_hns3_pmu uninit process
        drivers/perf: xgene: Use device_get_match_data()
        perf/amlogic: add missing MODULE_DEVICE_TABLE
        docs/perf: Add ampere_cspmu to toctree to fix a build warning
        perf: arm_cspmu: ampere_cspmu: Add support for Ampere SoC PMU
        perf: arm_cspmu: Support implementation specific validation
        perf: arm_cspmu: Support implementation specific filters
        perf: arm_cspmu: Split 64-bit write to 32-bit writes
        perf: arm_cspmu: Separate Arm and vendor module
      
      * for-next/sve-remove-pseudo-regs:
        : arm64/fpsimd: Remove the vector length pseudo registers
        arm64/sve: Remove SMCR pseudo register from cpufeature code
        arm64/sve: Remove ZCR pseudo register from cpufeature code
      
      * for-next/backtrace-ipi:
        : Add IPI for backtraces/kgdb, use NMI
        arm64: smp: Don't directly call arch_smp_send_reschedule() for wakeup
        arm64: smp: avoid NMI IPIs with broken MediaTek FW
        arm64: smp: Mark IPI globals as __ro_after_init
        arm64: kgdb: Implement kgdb_roundup_cpus() to enable pseudo-NMI roundup
        arm64: smp: IPI_CPU_STOP and IPI_CPU_CRASH_STOP should try for NMI
        arm64: smp: Add arch support for backtrace using pseudo-NMI
        arm64: smp: Remove dedicated wakeup IPI
        arm64: idle: Tag the arm64 idle functions as __cpuidle
        irqchip/gic-v3: Enable support for SGIs to act as NMIs
      
      * for-next/kselftest:
        : Various arm64 kselftest updates
        kselftest/arm64: Validate SVCR in streaming SVE stress test
      
      * for-next/misc:
        : Miscellaneous patches
        arm64: Restrict CPU_BIG_ENDIAN to GNU as or LLVM IAS 15.x or newer
        arm64: module: Fix PLT counting when CONFIG_RANDOMIZE_BASE=n
        arm64, irqchip/gic-v3, ACPI: Move MADT GICC enabled check into a helper
        clocksource/drivers/arm_arch_timer: limit XGene-1 workaround
        arm64: Remove system_uses_lse_atomics()
        arm64: Mark the 'addr' argument to set_ptes() and __set_pte_at() as unused
        arm64/mm: Hoist synchronization out of set_ptes() loop
        arm64: swiotlb: Reduce the default size if no ZONE_DMA bouncing needed
      
      * for-next/cpufeat-display-cores:
        : arm64 cpufeature display enabled cores
        arm64: cpufeature: Change DBM to display enabled cores
        arm64: cpufeature: Display the set of cores with a feature
      1519018c
    • Nathan Chancellor's avatar
      arm64: Restrict CPU_BIG_ENDIAN to GNU as or LLVM IAS 15.x or newer · 146a15b8
      Nathan Chancellor authored
      Prior to LLVM 15.0.0, LLVM's integrated assembler would incorrectly
      byte-swap NOP when compiling for big-endian, and the resulting series of
      bytes happened to match the encoding of FNMADD S21, S30, S0, S0.
      
      This went unnoticed until commit:
      
        34f66c4c ("arm64: Use a positive cpucap for FP/SIMD")
      
      Prior to that commit, the kernel would always enable the use of FPSIMD
      early in boot when __cpu_setup() initialized CPACR_EL1, and so usage of
      FNMADD within the kernel was not detected, but could result in the
      corruption of user or kernel FPSIMD state.
      
      After that commit, the instructions happen to trap during boot prior to
      FPSIMD being detected and enabled, e.g.
      
      | Unhandled 64-bit el1h sync exception on CPU0, ESR 0x000000001fe00000 -- ASIMD
      | CPU: 0 PID: 0 Comm: swapper Not tainted 6.6.0-rc3-00013-g34f66c4c #1
      | Hardware name: linux,dummy-virt (DT)
      | pstate: 400000c9 (nZcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
      | pc : __pi_strcmp+0x1c/0x150
      | lr : populate_properties+0xe4/0x254
      | sp : ffffd014173d3ad0
      | x29: ffffd014173d3af0 x28: fffffbfffddffcb8 x27: 0000000000000000
      | x26: 0000000000000058 x25: fffffbfffddfe054 x24: 0000000000000008
      | x23: fffffbfffddfe000 x22: fffffbfffddfe000 x21: fffffbfffddfe044
      | x20: ffffd014173d3b70 x19: 0000000000000001 x18: 0000000000000005
      | x17: 0000000000000010 x16: 0000000000000000 x15: 00000000413e7000
      | x14: 0000000000000000 x13: 0000000000001bcc x12: 0000000000000000
      | x11: 00000000d00dfeed x10: ffffd414193f2cd0 x9 : 0000000000000000
      | x8 : 0101010101010101 x7 : ffffffffffffffc0 x6 : 0000000000000000
      | x5 : 0000000000000000 x4 : 0101010101010101 x3 : 000000000000002a
      | x2 : 0000000000000001 x1 : ffffd014171f2988 x0 : fffffbfffddffcb8
      | Kernel panic - not syncing: Unhandled exception
      | CPU: 0 PID: 0 Comm: swapper Not tainted 6.6.0-rc3-00013-g34f66c4c #1
      | Hardware name: linux,dummy-virt (DT)
      | Call trace:
      |  dump_backtrace+0xec/0x108
      |  show_stack+0x18/0x2c
      |  dump_stack_lvl+0x50/0x68
      |  dump_stack+0x18/0x24
      |  panic+0x13c/0x340
      |  el1t_64_irq_handler+0x0/0x1c
      |  el1_abort+0x0/0x5c
      |  el1h_64_sync+0x64/0x68
      |  __pi_strcmp+0x1c/0x150
      |  unflatten_dt_nodes+0x1e8/0x2d8
      |  __unflatten_device_tree+0x5c/0x15c
      |  unflatten_device_tree+0x38/0x50
      |  setup_arch+0x164/0x1e0
      |  start_kernel+0x64/0x38c
      |  __primary_switched+0xbc/0xc4
      
      Restrict CONFIG_CPU_BIG_ENDIAN to a known good assembler, which is
      either GNU as or LLVM's IAS 15.0.0 and newer, which contains the linked
      commit.
      
      Closes: https://github.com/ClangBuiltLinux/linux/issues/1948
      Link: https://github.com/llvm/llvm-project/commit/1379b150991f70a5782e9a143c2ba5308da1161cSigned-off-by: default avatarNathan Chancellor <nathan@kernel.org>
      Cc: stable@vger.kernel.org
      Acked-by: default avatarMark Rutland <mark.rutland@arm.com>
      Link: https://lore.kernel.org/r/20231025-disable-arm64-be-ias-b4-llvm-15-v1-1-b25263ed8b23@kernel.orgSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      146a15b8
  4. 24 Oct, 2023 5 commits
  5. 23 Oct, 2023 5 commits
  6. 19 Oct, 2023 3 commits
  7. 18 Oct, 2023 3 commits
  8. 17 Oct, 2023 2 commits
  9. 16 Oct, 2023 12 commits
    • Ryan Roberts's avatar
      arm64/mm: Hoist synchronization out of set_ptes() loop · 3425cec4
      Ryan Roberts authored
      set_ptes() sets a physically contiguous block of memory (which all
      belongs to the same folio) to a contiguous block of ptes. The arm64
      implementation of this previously just looped, operating on each
      individual pte. But the __sync_icache_dcache() and mte_sync_tags()
      operations can both be hoisted out of the loop so that they are
      performed once for the contiguous set of pages (which may be less than
      the whole folio). This should result in minor performance gains.
      
      __sync_icache_dcache() already acts on the whole folio, and sets a flag
      in the folio so that it skips duplicate calls. But by hoisting the call,
      all the pte testing is done only once.
      
      mte_sync_tags() operates on each individual page with its own loop. But
      by passing the number of pages explicitly, we can rely solely on its
      loop and do the checks only once. This approach also makes it robust for
      the future, rather than assuming if a head page of a compound page is
      being mapped, then the whole compound page is being mapped, instead we
      explicitly know how many pages are being mapped. The old assumption may
      not continue to hold once the "anonymous large folios" feature is
      merged.
      Signed-off-by: default avatarRyan Roberts <ryan.roberts@arm.com>
      Reviewed-by: default avatarSteven Price <steven.price@arm.com>
      Link: https://lore.kernel.org/r/20231005140730.2191134-1-ryan.roberts@arm.comSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      3425cec4
    • Mark Rutland's avatar
      arm64: Remove cpus_have_const_cap() · e8d4006d
      Mark Rutland authored
      There are no longer any users of cpus_have_const_cap(), and therefore it
      can be removed.
      
      Remove cpus_have_const_cap(). At the same time, remove
      __cpus_have_const_cap(), as this is a trivial wrapper of
      alternative_has_cap_unlikely(), which can be used directly instead.
      
      The comment for __system_matches_cap() is updated to no longer refer to
      cpus_have_const_cap(). As we have a number of ways to check the cpucaps,
      the specific suggestions are removed.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Kristina Martsenko <kristina.martsenko@arm.com>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      e8d4006d
    • Mark Rutland's avatar
      arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_REPEAT_TLBI · 47759eca
      Mark Rutland authored
      In arch_tlbbatch_should_defer() we use cpus_have_const_cap() to check
      for ARM64_WORKAROUND_REPEAT_TLBI, but this is not necessary and
      alternative_has_cap_*() would be preferable.
      
      For historical reasons, cpus_have_const_cap() is more complicated than
      it needs to be. Before cpucaps are finalized, it will perform a bitmap
      test of the system_cpucaps bitmap, and once cpucaps are finalized it
      will use an alternative branch. This used to be necessary to handle some
      race conditions in the window between cpucap detection and the
      subsequent patching of alternatives and static branches, where different
      branches could be out-of-sync with one another (or w.r.t. alternative
      sequences). Now that we use alternative branches instead of static
      branches, these are all patched atomically w.r.t. one another, and there
      are only a handful of cases that need special care in the window between
      cpucap detection and alternative patching.
      
      Due to the above, it would be nice to remove cpus_have_const_cap(), and
      migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
      or cpus_have_cap() depending on when their requirements. This will
      remove redundant instructions and improve code generation, and will make
      it easier to determine how each callsite will behave before, during, and
      after alternative patching.
      
      The cpus_have_const_cap() check in arch_tlbbatch_should_defer() is an
      optimization to avoid some redundant work when the
      ARM64_WORKAROUND_REPEAT_TLBI cpucap is detected and forces the immediate
      use of TLBI + DSB ISH. In the window between detecting the
      ARM64_WORKAROUND_REPEAT_TLBI cpucap and patching alternatives this is
      not a big concern and there's no need to optimize this window at the
      expsense of subsequent usage at runtime.
      
      This patch replaces the use of cpus_have_const_cap() with
      alternative_has_cap_unlikely(), which will avoid generating code to test
      the system_cpucaps bitmap and should be better for all subsequent calls
      at runtime. The ARM64_WORKAROUND_REPEAT_TLBI cpucap is added to
      cpucap_is_possible() so that code can be elided entirely when this is
      not possible without requiring ifdeffery or IS_ENABLED() checks at each
      usage.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      47759eca
    • Mark Rutland's avatar
      arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_NVIDIA_CARMEL_CNP · 0d48058e
      Mark Rutland authored
      In has_useable_cnp() we use cpus_have_const_cap() to check for
      ARM64_WORKAROUND_NVIDIA_CARMEL_CNP, but this is not necessary and
      cpus_have_cap() would be preferable.
      
      For historical reasons, cpus_have_const_cap() is more complicated than
      it needs to be. Before cpucaps are finalized, it will perform a bitmap
      test of the system_cpucaps bitmap, and once cpucaps are finalized it
      will use an alternative branch. This used to be necessary to handle some
      race conditions in the window between cpucap detection and the
      subsequent patching of alternatives and static branches, where different
      branches could be out-of-sync with one another (or w.r.t. alternative
      sequences). Now that we use alternative branches instead of static
      branches, these are all patched atomically w.r.t. one another, and there
      are only a handful of cases that need special care in the window between
      cpucap detection and alternative patching.
      
      Due to the above, it would be nice to remove cpus_have_const_cap(), and
      migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
      or cpus_have_cap() depending on when their requirements. This will
      remove redundant instructions and improve code generation, and will make
      it easier to determine how each callsite will behave before, during, and
      after alternative patching.
      
      We use has_useable_cnp() to determine whether we have the system-wide
      ARM64_HAS_CNP cpucap. Due to the structure of the cpufeature code, we
      call has_useable_cnp() in two distinct cases:
      
      1) When finalizing system capabilities, setup_system_capabilities() will
         call has_useable_cnp() with SCOPE_SYSTEM to determine whether all
         CPUs have the feature. This is called after we've detected any local
         cpucaps including ARM64_WORKAROUND_NVIDIA_CARMEL_CNP, but prior to
         patching alternatives.
      
         If the ARM64_WORKAROUND_NVIDIA_CARMEL_CNP was detected, we will not
         detect ARM64_HAS_CNP.
      
      2) After finalizing system capabilties, verify_local_cpu_capabilities()
         will call has_useable_cnp() with SCOPE_LOCAL_CPU to verify that CPUs
         have CNP if we previously detected it.
      
         Note that if ARM64_WORKAROUND_NVIDIA_CARMEL_CNP was detected, we will
         not have detected ARM64_HAS_CNP.
      
      For case 1 we must check the system_cpucaps bitmap as this occurs prior
      to patching the alternatives. For case 2 we'll only call
      has_useable_cnp() once per subsequent onlining of a CPU, and as this
      isn't a fast path it's not necessary to optimize for this case.
      
      This patch replaces the use of cpus_have_const_cap() with
      cpus_have_cap(), which will only generate the bitmap test and avoid
      generating an alternative sequence, resulting in slightly simpler annd
      smaller code being generated. The ARM64_WORKAROUND_NVIDIA_CARMEL_CNP
      cpucap is added to cpucap_is_possible() so that code can be elided
      entirely when this is not possible.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      0d48058e
    • Mark Rutland's avatar
      arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_CAVIUM_23154 · a98a5eac
      Mark Rutland authored
      In gic_read_iar() we use cpus_have_const_cap() to check for
      ARM64_WORKAROUND_CAVIUM_23154 but this is not necessary and
      alternative_has_cap_*() would be preferable.
      
      For historical reasons, cpus_have_const_cap() is more complicated than
      it needs to be. Before cpucaps are finalized, it will perform a bitmap
      test of the system_cpucaps bitmap, and once cpucaps are finalized it
      will use an alternative branch. This used to be necessary to handle some
      race conditions in the window between cpucap detection and the
      subsequent patching of alternatives and static branches, where different
      branches could be out-of-sync with one another (or w.r.t. alternative
      sequences). Now that we use alternative branches instead of static
      branches, these are all patched atomically w.r.t. one another, and there
      are only a handful of cases that need special care in the window between
      cpucap detection and alternative patching.
      
      Due to the above, it would be nice to remove cpus_have_const_cap(), and
      migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
      or cpus_have_cap() depending on when their requirements. This will
      remove redundant instructions and improve code generation, and will make
      it easier to determine how each callsite will behave before, during, and
      after alternative patching.
      
      The ARM64_WORKAROUND_CAVIUM_23154 cpucap is detected and patched early
      on the boot CPU before the GICv3 driver is initialized and hence before
      gic_read_iar() is ever called. Thus it is not necessary to use
      cpus_have_const_cap(), and alternative_has_cap() is equivalent.
      
      In addition, arm64's gic_read_iar() lives in irq-gic-v3.c purely for
      historical reasons. It was originally added prior to 32-bit arm support
      in commit:
      
        6d4e11c5 ("irqchip/gicv3: Workaround for Cavium ThunderX erratum 23154")
      
      When support for 32-bit arm was added, 32-bit arm's gic_read_iar()
      implementation was placed in <asm/arch_gicv3.h>, but the arm64 version
      was kept within irq-gic-v3.c as it depended on a static key local to
      irq-gic-v3.c and it was easier to add ifdeffery, which is what we did in
      commit:
      
        7936e914 ("irqchip/gic-v3: Refactor the arm64 specific parts")
      
      Subsequently the static key was replaced with a cpucap in commit:
      
        a4023f68 ("arm64: Add hypervisor safe helper for checking constant capabilities")
      
      Since that commit there has been no need to keep arm64's gic_read_iar()
      in irq-gic-v3.c.
      
      This patch replaces the use of cpus_have_const_cap() with
      alternative_has_cap_unlikely(), which will avoid generating code to test
      the system_cpucaps bitmap and should be better for all subsequent calls
      at runtime. For consistency, move the arm64-specific gic_read_iar()
      implementation over to arm64's <asm/arch_gicv3.h>. The
      ARM64_WORKAROUND_CAVIUM_23154 cpucap is added to cpucap_is_possible() so
      that code can be elided entirely when this is not possible.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      a98a5eac
    • Mark Rutland's avatar
      arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_2645198 · 412cb380
      Mark Rutland authored
      We use cpus_have_const_cap() to check for ARM64_WORKAROUND_2645198 but
      this is not necessary and alternative_has_cap() would be preferable.
      
      For historical reasons, cpus_have_const_cap() is more complicated than
      it needs to be. Before cpucaps are finalized, it will perform a bitmap
      test of the system_cpucaps bitmap, and once cpucaps are finalized it
      will use an alternative branch. This used to be necessary to handle some
      race conditions in the window between cpucap detection and the
      subsequent patching of alternatives and static branches, where different
      branches could be out-of-sync with one another (or w.r.t. alternative
      sequences). Now that we use alternative branches instead of static
      branches, these are all patched atomically w.r.t. one another, and there
      are only a handful of cases that need special care in the window between
      cpucap detection and alternative patching.
      
      Due to the above, it would be nice to remove cpus_have_const_cap(), and
      migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
      or cpus_have_cap() depending on when their requirements. This will
      remove redundant instructions and improve code generation, and will make
      it easier to determine how each callsite will behave before, during, and
      after alternative patching.
      
      The ARM64_WORKAROUND_2645198 cpucap is detected and patched before any
      userspace translation table exist, and the workaround is only necessary
      when manipulating usrspace translation tables which are in use. Thus it
      is not necessary to use cpus_have_const_cap(), and alternative_has_cap()
      is equivalent.
      
      This patch replaces the use of cpus_have_const_cap() with
      alternative_has_cap_unlikely(), which will avoid generating code to test
      the system_cpucaps bitmap and should be better for all subsequent calls
      at runtime.  The ARM64_WORKAROUND_2645198 cpucap is added to
      cpucap_is_possible() so that code can be elided entirely when this is
      not possible, and redundant IS_ENABLED() checks are removed.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      412cb380
    • Mark Rutland's avatar
      arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_1742098 · 48b57d91
      Mark Rutland authored
      In elf_hwcap_fixup() we use cpus_have_const_cap() to check for
      ARM64_WORKAROUND_1742098, but this is not necessary and cpus_have_cap()
      would be preferable.
      
      For historical reasons, cpus_have_const_cap() is more complicated than
      it needs to be. Before cpucaps are finalized, it will perform a bitmap
      test of the system_cpucaps bitmap, and once cpucaps are finalized it
      will use an alternative branch. This used to be necessary to handle some
      race conditions in the window between cpucap detection and the
      subsequent patching of alternatives and static branches, where different
      branches could be out-of-sync with one another (or w.r.t. alternative
      sequences). Now that we use alternative branches instead of static
      branches, these are all patched atomically w.r.t. one another, and there
      are only a handful of cases that need special care in the window between
      cpucap detection and alternative patching.
      
      Due to the above, it would be nice to remove cpus_have_const_cap(), and
      migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
      or cpus_have_cap() depending on when their requirements. This will
      remove redundant instructions and improve code generation, and will make
      it easier to determine how each callsite will behave before, during, and
      after alternative patching.
      
      The ARM64_WORKAROUND_1742098 cpucap is detected and patched before
      elf_hwcap_fixup() can run, and hence it is not necessary to use
      cpus_have_const_cap(). We run cpus_have_const_cap() at most twice: once
      after finalizing system cpucaps, and potentially once more after
      detecting mismatched CPUs which support AArch32 at EL0. Due to this,
      it's not necessary to optimize for many calls to elf_hwcap_fixup(), and
      it's fine to use cpus_have_cap().
      
      This patch replaces the use of cpus_have_const_cap() with
      cpus_have_cap(), which will only generate the bitmap test and avoid
      generating an alternative sequence, resulting in slightly simpler annd
      smaller code being generated. For consistenct with other cpucaps, the
      ARM64_WORKAROUND_1742098 cpucap is added to cpucap_is_possible() so that
      code can be elided when this is not possible. However, as we only define
      compat_elf_hwcap2 when CONFIG_COMPAT=y, some ifdeffery is still required
      within user_feature_fixup() to avoid build errors when CONFIG_COMPAT=n.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      48b57d91
    • Mark Rutland's avatar
      arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_1542419 · d1e40f82
      Mark Rutland authored
      We use cpus_have_const_cap() to check for ARM64_WORKAROUND_1542419 but
      this is not necessary and cpus_have_final_cap() would be preferable.
      
      For historical reasons, cpus_have_const_cap() is more complicated than
      it needs to be. Before cpucaps are finalized, it will perform a bitmap
      test of the system_cpucaps bitmap, and once cpucaps are finalized it
      will use an alternative branch. This used to be necessary to handle some
      race conditions in the window between cpucap detection and the
      subsequent patching of alternatives and static branches, where different
      branches could be out-of-sync with one another (or w.r.t. alternative
      sequences). Now that we use alternative branches instead of static
      branches, these are all patched atomically w.r.t. one another, and there
      are only a handful of cases that need special care in the window between
      cpucap detection and alternative patching.
      
      Due to the above, it would be nice to remove cpus_have_const_cap(), and
      migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
      or cpus_have_cap() depending on when their requirements. This will
      remove redundant instructions and improve code generation, and will make
      it easier to determine how each callsite will behave before, during, and
      after alternative patching.
      
      The ARM64_WORKAROUND_1542419 cpucap is detected and patched before any
      userspace code can run, and the both __do_compat_cache_op() and
      ctr_read_handler() are only reachable from exceptions taken from
      userspace. Thus it is not necessary for either to use
      cpus_have_const_cap(), and cpus_have_final_cap() is equivalent.
      
      This patch replaces the use of cpus_have_const_cap() with
      cpus_have_final_cap(), which will avoid generating code to test the
      system_cpucaps bitmap and should be better for all subsequent calls at
      runtime. Using cpus_have_final_cap() clearly documents that we do not
      expect this code to run before cpucaps are finalized, and will make it
      easier to spot issues if code is changed in future to allow these
      functions to be reached earlier.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      d1e40f82
    • Mark Rutland's avatar
      arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_843419 · 0a285dfe
      Mark Rutland authored
      In count_plts() and is_forbidden_offset_for_adrp() we use
      cpus_have_const_cap() to check for ARM64_WORKAROUND_843419, but this is
      not necessary and cpus_have_final_cap() would be preferable.
      
      For historical reasons, cpus_have_const_cap() is more complicated than
      it needs to be. Before cpucaps are finalized, it will perform a bitmap
      test of the system_cpucaps bitmap, and once cpucaps are finalized it
      will use an alternative branch. This used to be necessary to handle some
      race conditions in the window between cpucap detection and the
      subsequent patching of alternatives and static branches, where different
      branches could be out-of-sync with one another (or w.r.t. alternative
      sequences). Now that we use alternative branches instead of static
      branches, these are all patched atomically w.r.t. one another, and there
      are only a handful of cases that need special care in the window between
      cpucap detection and alternative patching.
      
      Due to the above, it would be nice to remove cpus_have_const_cap(), and
      migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
      or cpus_have_cap() depending on when their requirements. This will
      remove redundant instructions and improve code generation, and will make
      it easier to determine how each callsite will behave before, during, and
      after alternative patching.
      
      It's not possible to load a module in the window between detecting the
      ARM64_WORKAROUND_843419 cpucap and patching alternatives. The module VA
      range limits are initialized much later in module_init_limits() which is
      a subsys_initcall, and module loading cannot happen before this. Hence
      it's not necessary for count_plts() or is_forbidden_offset_for_adrp() to
      use cpus_have_const_cap().
      
      This patch replaces the use of cpus_have_const_cap() with
      cpus_have_final_cap() which will avoid generating code to test the
      system_cpucaps bitmap and should be better for all subsequent calls at
      runtime. Using cpus_have_final_cap() clearly documents that we do not
      expect this code to run before cpucaps are finalized, and will make it
      easier to spot issues if code is changed in future to allow modules to
      be loaded earlier. The ARM64_WORKAROUND_843419 cpucap is added to
      cpucap_is_possible() so that code can be elided entirely when this is not
      possible, and redundant IS_ENABLED() checks are removed.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      0a285dfe
    • Mark Rutland's avatar
      arm64: Avoid cpus_have_const_cap() for ARM64_UNMAP_KERNEL_AT_EL0 · c2ef5f1e
      Mark Rutland authored
      In arm64_kernel_unmapped_at_el0() we use cpus_have_const_cap() to check
      for ARM64_UNMAP_KERNEL_AT_EL0, but this is only necessary so that
      arm64_get_bp_hardening_vector() and this_cpu_set_vectors() can run prior
      to alternatives being patched. Otherwise this is not necessary and
      alternative_has_cap_*() would be preferable.
      
      For historical reasons, cpus_have_const_cap() is more complicated than
      it needs to be. Before cpucaps are finalized, it will perform a bitmap
      test of the system_cpucaps bitmap, and once cpucaps are finalized it
      will use an alternative branch. This used to be necessary to handle some
      race conditions in the window between cpucap detection and the
      subsequent patching of alternatives and static branches, where different
      branches could be out-of-sync with one another (or w.r.t. alternative
      sequences). Now that we use alternative branches instead of static
      branches, these are all patched atomically w.r.t. one another, and there
      are only a handful of cases that need special care in the window between
      cpucap detection and alternative patching.
      
      Due to the above, it would be nice to remove cpus_have_const_cap(), and
      migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
      or cpus_have_cap() depending on when their requirements. This will
      remove redundant instructions and improve code generation, and will make
      it easier to determine how each callsite will behave before, during, and
      after alternative patching.
      
      The ARM64_UNMAP_KERNEL_AT_EL0 cpucap is a system-wide feature that is
      detected and patched before any translation tables are created for
      userspace. In the window between detecting the ARM64_UNMAP_KERNEL_AT_EL0
      cpucap and patching alternatives, most users of
      arm64_kernel_unmapped_at_el0() do not need to know that the cpucap has
      been detected:
      
      * As KVM is initialized after cpucaps are finalized, no usaef of
        arm64_kernel_unmapped_at_el0() in the KVM code is reachable during
        this window.
      
      * The arm64_mm_context_get() function in arch/arm64/mm/context.c is only
        called after the SMMU driver is brought up after alternatives have
        been patched. Thus this can safely use cpus_have_final_cap() or
        alternative_has_cap_*().
      
        Similarly the asids_update_limit() function is called after
        alternatives have been patched as an arch_initcall, and this can
        safely use cpus_have_final_cap() or alternative_has_cap_*().
      
        Similarly we do not expect an ASID rollover to occur between cpucaps
        being detected and patching alternatives. Thus
        set_reserved_asid_bits() can safely use cpus_have_final_cap() or
        alternative_has_cap_*().
      
      * The __tlbi_user() and __tlbi_user_level() macros are not used during
        this window, and only need to invalidate additional entries once
        userspace translation tables have been active on a CPU. Thus these can
        safely use alternative_has_cap_*().
      
      * The xen_kernel_unmapped_at_usr() function is not used during this
        window as it is only used in a late_initcall. Thus this can safely use
        cpus_have_final_cap() or alternative_has_cap_*().
      
      * The arm64_get_meltdown_state() function is not used during this
        window. It only used by arm64_get_meltdown_state() and KVM code, both
        of which are only used after cpucaps have been finalized. Thus this
        can safely use cpus_have_final_cap() or alternative_has_cap_*().
      
      * The tls_thread_switch() uses arm64_kernel_unmapped_at_el0() as an
        optimization to avoid zeroing tpidrro_el0 when KPTI is enabled
        and this will be trampled by the KPTI trampoline. It doesn't matter if
        this continues to zero the register during the window between
        detecting the cpucap and patching alternatives, so this can safely use
        alternative_has_cap_*().
      
      * The sdei_arch_get_entry_point() and do_sdei_event() functions aren't
        reachable at this time as the SDEI driver is registered later by
        acpi_init() -> acpi_ghes_init() -> sdei_init(), where acpi_init is a
        subsys_initcall. Thus these can safely use cpus_have_final_cap() or
        alternative_has_cap_*().
      
      * The uses under drivers/ aren't reachable at this time as the drivers
        are registered later:
      
        - TRBE is registered via module_init()
        - SMMUv3 is registred via module_driver()
        - SPE is registred via module_init()
      
      * The arm64_get_bp_hardening_vector() and this_cpu_set_vectors()
        functions need to run on boot CPUs prior to patching alternatives.
        As these are only called during the onlining of a CPU, it's fine to
        perform a system_cpucaps bitmap test using cpus_have_cap().
      
      This patch modifies this_cpu_set_vectors() to use cpus_have_cap(), and
      replaced all other use of cpus_have_const_cap() with
      alternative_has_cap_unlikely(), which will avoid generating code to test
      the system_cpucaps bitmap and should be better for all subsequent calls
      at runtime. The ARM64_UNMAP_KERNEL_AT_EL0 cpucap is added to
      cpucap_is_possible() so that code can be elided entirely when this is
      not possible.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: James Morse <james.morse@arm.com>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      c2ef5f1e
    • Mark Rutland's avatar
      arm64: Avoid cpus_have_const_cap() for ARM64_{SVE,SME,SME2,FA64} · a76521d1
      Mark Rutland authored
      In system_supports_{sve,sme,sme2,fa64}() we use cpus_have_const_cap() to
      check for the relevant cpucaps, but this is only necessary so that
      sve_setup() and sme_setup() can run prior to alternatives being patched,
      and otherwise alternative_has_cap_*() would be preferable.
      
      For historical reasons, cpus_have_const_cap() is more complicated than
      it needs to be. Before cpucaps are finalized, it will perform a bitmap
      test of the system_cpucaps bitmap, and once cpucaps are finalized it
      will use an alternative branch. This used to be necessary to handle some
      race conditions in the window between cpucap detection and the
      subsequent patching of alternatives and static branches, where different
      branches could be out-of-sync with one another (or w.r.t. alternative
      sequences). Now that we use alternative branches instead of static
      branches, these are all patched atomically w.r.t. one another, and there
      are only a handful of cases that need special care in the window between
      cpucap detection and alternative patching.
      
      Due to the above, it would be nice to remove cpus_have_const_cap(), and
      migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
      or cpus_have_cap() depending on when their requirements. This will
      remove redundant instructions and improve code generation, and will make
      it easier to determine how each callsite will behave before, during, and
      after alternative patching.
      
      All of system_supports_{sve,sme,sme2,fa64}() will return false prior to
      system cpucaps being detected. In the window between system cpucaps being
      detected and patching alternatives, we need system_supports_sve() and
      system_supports_sme() to run to initialize SVE and SME properties, but
      all other users of system_supports_{sve,sme,sme2,fa64}() don't depend on
      the relevant cpucap becoming true until alternatives are patched:
      
      * No KVM code runs until after alternatives are patched, and so this can
        safely use cpus_have_final_cap() or alternative_has_cap_*().
      
      * The cpuid_cpu_online() callback in arch/arm64/kernel/cpuinfo.c is
        registered later from cpuinfo_regs_init() as a device_initcall, and so
        this can safely use cpus_have_final_cap() or alternative_has_cap_*().
      
      * The entry, signal, and ptrace code isn't reachable until userspace has
        run, and so this can safely use cpus_have_final_cap() or
        alternative_has_cap_*().
      
      * Currently perf_reg_validate() will un-reserve the PERF_REG_ARM64_VG
        pseudo-register before alternatives are patched, and before
        sve_setup() has run. If a sampling event is created early enough, this
        would allow perf_ext_reg_value() to sample (the as-yet uninitialized)
        thread_struct::vl[] prior to alternatives being patched.
      
        It would be preferable to defer this until alternatives are patched,
        and this can safely use alternative_has_cap_*().
      
      * The context-switch code will run during this window as part of
        stop_machine() used during alternatives_patch_all(), and potentially
        for other work if other kernel threads are created early. No threads
        require the use of SVE/SME/SME2/FA64 prior to alternatives being
        patched, and it would be preferable for the related context-switch
        logic to take effect after alternatives are patched so that ths is
        guaranteed to see a consistent system-wide state (e.g. anything
        initialized by sve_setup() and sme_setup().
      
        This can safely ues alternative_has_cap_*().
      
      This patch replaces the use of cpus_have_const_cap() with
      alternative_has_cap_unlikely(), which will avoid generating code to test
      the system_cpucaps bitmap and should be better for all subsequent calls
      at runtime. The sve_setup() and sme_setup() functions are modified to
      use cpus_have_cap() directly so that they can observe the cpucaps being
      set prior to alternatives being patched.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Reviewed-by: default avatarMark Brown <broonie@kernel.org>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      a76521d1
    • Mark Rutland's avatar
      arm64: Avoid cpus_have_const_cap() for ARM64_SPECTRE_V2 · af645439
      Mark Rutland authored
      In arm64_apply_bp_hardening() we use cpus_have_const_cap() to check for
      ARM64_SPECTRE_V2 , but this is not necessary and alternative_has_cap_*()
      would be preferable.
      
      For historical reasons, cpus_have_const_cap() is more complicated than
      it needs to be. Before cpucaps are finalized, it will perform a bitmap
      test of the system_cpucaps bitmap, and once cpucaps are finalized it
      will use an alternative branch. This used to be necessary to handle some
      race conditions in the window between cpucap detection and the
      subsequent patching of alternatives and static branches, where different
      branches could be out-of-sync with one another (or w.r.t. alternative
      sequences). Now that we use alternative branches instead of static
      branches, these are all patched atomically w.r.t. one another, and there
      are only a handful of cases that need special care in the window between
      cpucap detection and alternative patching.
      
      Due to the above, it would be nice to remove cpus_have_const_cap(), and
      migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
      or cpus_have_cap() depending on when their requirements. This will
      remove redundant instructions and improve code generation, and will make
      it easier to determine how each callsite will behave before, during, and
      after alternative patching.
      
      The cpus_have_const_cap() check in arm64_apply_bp_hardening() is
      intended to avoid the overhead of looking up and invoking a per-cpu
      function pointer when no branch predictor hardening is required. The
      arm64_apply_bp_hardening() function itself is called in two distinct
      flows:
      
      1) When handling certain exceptions taken from EL0, where the PC could
         be a TTBR1 address and hence might have trained a branch predictor.
      
         As cpucaps are detected and alternatives are patched long before it
         is possible to execute userspace, it is not necessary to use
         cpus_have_const_cap() for these cases, and cpus_have_final_cap() or
         alternative_has_cap() would be preferable.
      
      2) When switching between tasks in check_and_switch_context().
      
         This can be called before cpucaps are detected and alternatives are
         patched, but this is long before the kernel mounts filesystems or
         accepts any input. At this stage the kernel hasn't loaded any secrets
         and there is no potential for hostile branch predictor training. Once
         cpucaps have been finalized and alternatives have been patched,
         switching tasks will invalidate any prior predictions. Hence it is
         not necessary to use cpus_have_const_cap() for this case.
      
      This patch replaces the use of cpus_have_const_cap() with
      alternative_has_cap_unlikely(), which will avoid generating code to test
      the system_cpucaps bitmap and should be better for all subsequent calls
      at runtime.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      af645439