- 08 Nov, 2023 1 commit
-
-
Douglas Anderson authored
In commit 44bd78dd ("irqchip/gic-v3: Disable pseudo NMIs on MediaTek devices w/ firmware issues") we added a method for detecting MediaTek devices with broken firmware and disabled pseudo-NMI. While that worked, it didn't address the problem at a deep enough level. The fundamental issue with this broken firmware is that it's not saving and restoring several important GICR registers. The current list is believed to be: * GICR_NUM_IPRIORITYR * GICR_CTLR * GICR_ISPENDR0 * GICR_ISACTIVER0 * GICR_NSACR Pseudo-NMI didn't work because it was the only thing (currently) in the kernel that relied on the broken registers, so forcing pseudo-NMI off was an effective fix. However, it could be observed that calling system_uses_irq_prio_masking() on these systems still returned "true". That caused confusion and led to the need for commit a07a5941 ("arm64: smp: avoid NMI IPIs with broken MediaTek FW"). It's worried that the incorrect value returned by system_uses_irq_prio_masking() on these systems will continue to confuse future developers. Let's fix the issue a little more completely by disabling IRQ priorities at a deeper level in the kernel. Once we do this we can revert some of the other bits of code dealing with this quirk. This includes a partial revert of commit 44bd78dd ("irqchip/gic-v3: Disable pseudo NMIs on MediaTek devices w/ firmware issues"). This isn't a full revert because it leaves some of the changes to the "quirks" structure around in case future code needs it. Suggested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Tested-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231107072651.v2.1.Ide945748593cffd8ff0feb9ae22b795935b944d6@changeidSigned-off-by: Catalin Marinas <catalin.marinas@arm.com>
-
- 07 Nov, 2023 3 commits
-
-
Ilkka Koskinen authored
The driver used to truncate several 64-bit registers such as PMCEID[n] registers used to describe whether architectural and microarchitectural events in range 0x4000-0x401f exist. Due to discarding the bits, the driver made the events invisible, even if they existed. Moreover, PMCCFILTR and PMCR registers have additional bits in the upper 32 bits. This patch makes them available although they aren't currently used. Finally, functions handling PMXEVCNTR and PMXEVTYPER registers are removed as they not being used at all. Fixes: df29ddf4 ("arm64: perf: Abstract system register accesses away") Reported-by: Carl Worth <carl@os.amperecomputing.com> Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Acked-by: Will Deacon <will@kernel.org> Closes: https://lore.kernel.org/.. Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20231102183012.1251410-1-ilkka@os.amperecomputing.comSigned-off-by: Catalin Marinas <catalin.marinas@arm.com>
-
Ilkka Koskinen authored
Coresight PMU driver didn't reject events meant for other PMUs. This caused some of the Core PMU events disappearing from the output of "perf list". In addition, trying to run e.g. $ perf stat -e r2 sleep 1 made Coresight PMU driver to handle the event instead of letting Core PMU driver to deal with it. Cc: stable@vger.kernel.org Fixes: e37dfd65 ("perf: arm_cspmu: Add support for ARM CoreSight PMU driver") Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Acked-by: Will Deacon <will@kernel.org> Reviewed-by: Besar Wicaksono <bwicaksono@nvidia.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20231103001654.35565-1-ilkka@os.amperecomputing.comSigned-off-by: Catalin Marinas <catalin.marinas@arm.com>
-
Marielle Novastrider authored
Small typos in register and field names. Signed-off-by: Marielle Novastrider <marielle@novastrider.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20231031200838.55569-1-marielle@novastrider.comSigned-off-by: Catalin Marinas <catalin.marinas@arm.com>
-
- 26 Oct, 2023 6 commits
-
-
Catalin Marinas authored
* for-next/cpus_have_const_cap: (38 commits) : cpus_have_const_cap() removal arm64: Remove cpus_have_const_cap() arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_REPEAT_TLBI arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_NVIDIA_CARMEL_CNP arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_CAVIUM_23154 arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_2645198 arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_1742098 arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_1542419 arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_843419 arm64: Avoid cpus_have_const_cap() for ARM64_UNMAP_KERNEL_AT_EL0 arm64: Avoid cpus_have_const_cap() for ARM64_{SVE,SME,SME2,FA64} arm64: Avoid cpus_have_const_cap() for ARM64_SPECTRE_V2 arm64: Avoid cpus_have_const_cap() for ARM64_SSBS arm64: Avoid cpus_have_const_cap() for ARM64_MTE arm64: Avoid cpus_have_const_cap() for ARM64_HAS_TLB_RANGE arm64: Avoid cpus_have_const_cap() for ARM64_HAS_WFXT arm64: Avoid cpus_have_const_cap() for ARM64_HAS_RNG arm64: Avoid cpus_have_const_cap() for ARM64_HAS_EPAN arm64: Avoid cpus_have_const_cap() for ARM64_HAS_PAN arm64: Avoid cpus_have_const_cap() for ARM64_HAS_GIC_PRIO_MASKING arm64: Avoid cpus_have_const_cap() for ARM64_HAS_DIT ...
-
Catalin Marinas authored
* for-next/feat_lse128: : HWCAP for FEAT_LSE128 kselftest/arm64: add FEAT_LSE128 to hwcap test arm64: add FEAT_LSE128 HWCAP
-
Catalin Marinas authored
* for-next/feat_lrcpc3: : HWCAP for FEAT_LRCPC3 selftests/arm64: add HWCAP2_LRCPC3 test arm64: add FEAT_LRCPC3 HWCAP
-
Catalin Marinas authored
* for-next/feat_sve_b16b16: : Add support for FEAT_SVE_B16B16 (BFloat16) kselftest/arm64: Verify HWCAP2_SVE_B16B16 arm64/sve: Report FEAT_SVE_B16B16 to userspace
-
Catalin Marinas authored
Merge branches 'for-next/sve-remove-pseudo-regs', 'for-next/backtrace-ipi', 'for-next/kselftest', 'for-next/misc' and 'for-next/cpufeat-display-cores', remote-tracking branch 'arm64/for-next/perf' into for-next/core * arm64/for-next/perf: perf: hisi: Fix use-after-free when register pmu fails drivers/perf: hisi_pcie: Initialize event->cpu only on success drivers/perf: hisi_pcie: Check the type first in pmu::event_init() perf/arm-cmn: Enable per-DTC counter allocation perf/arm-cmn: Rework DTC counters (again) perf/arm-cmn: Fix DTC domain detection drivers: perf: arm_pmuv3: Drop some unused arguments from armv8_pmu_init() drivers: perf: arm_pmuv3: Read PMMIR_EL1 unconditionally drivers/perf: hisi: use cpuhp_state_remove_instance_nocalls() for hisi_hns3_pmu uninit process drivers/perf: xgene: Use device_get_match_data() perf/amlogic: add missing MODULE_DEVICE_TABLE docs/perf: Add ampere_cspmu to toctree to fix a build warning perf: arm_cspmu: ampere_cspmu: Add support for Ampere SoC PMU perf: arm_cspmu: Support implementation specific validation perf: arm_cspmu: Support implementation specific filters perf: arm_cspmu: Split 64-bit write to 32-bit writes perf: arm_cspmu: Separate Arm and vendor module * for-next/sve-remove-pseudo-regs: : arm64/fpsimd: Remove the vector length pseudo registers arm64/sve: Remove SMCR pseudo register from cpufeature code arm64/sve: Remove ZCR pseudo register from cpufeature code * for-next/backtrace-ipi: : Add IPI for backtraces/kgdb, use NMI arm64: smp: Don't directly call arch_smp_send_reschedule() for wakeup arm64: smp: avoid NMI IPIs with broken MediaTek FW arm64: smp: Mark IPI globals as __ro_after_init arm64: kgdb: Implement kgdb_roundup_cpus() to enable pseudo-NMI roundup arm64: smp: IPI_CPU_STOP and IPI_CPU_CRASH_STOP should try for NMI arm64: smp: Add arch support for backtrace using pseudo-NMI arm64: smp: Remove dedicated wakeup IPI arm64: idle: Tag the arm64 idle functions as __cpuidle irqchip/gic-v3: Enable support for SGIs to act as NMIs * for-next/kselftest: : Various arm64 kselftest updates kselftest/arm64: Validate SVCR in streaming SVE stress test * for-next/misc: : Miscellaneous patches arm64: Restrict CPU_BIG_ENDIAN to GNU as or LLVM IAS 15.x or newer arm64: module: Fix PLT counting when CONFIG_RANDOMIZE_BASE=n arm64, irqchip/gic-v3, ACPI: Move MADT GICC enabled check into a helper clocksource/drivers/arm_arch_timer: limit XGene-1 workaround arm64: Remove system_uses_lse_atomics() arm64: Mark the 'addr' argument to set_ptes() and __set_pte_at() as unused arm64/mm: Hoist synchronization out of set_ptes() loop arm64: swiotlb: Reduce the default size if no ZONE_DMA bouncing needed * for-next/cpufeat-display-cores: : arm64 cpufeature display enabled cores arm64: cpufeature: Change DBM to display enabled cores arm64: cpufeature: Display the set of cores with a feature
-
Nathan Chancellor authored
Prior to LLVM 15.0.0, LLVM's integrated assembler would incorrectly byte-swap NOP when compiling for big-endian, and the resulting series of bytes happened to match the encoding of FNMADD S21, S30, S0, S0. This went unnoticed until commit: 34f66c4c ("arm64: Use a positive cpucap for FP/SIMD") Prior to that commit, the kernel would always enable the use of FPSIMD early in boot when __cpu_setup() initialized CPACR_EL1, and so usage of FNMADD within the kernel was not detected, but could result in the corruption of user or kernel FPSIMD state. After that commit, the instructions happen to trap during boot prior to FPSIMD being detected and enabled, e.g. | Unhandled 64-bit el1h sync exception on CPU0, ESR 0x000000001fe00000 -- ASIMD | CPU: 0 PID: 0 Comm: swapper Not tainted 6.6.0-rc3-00013-g34f66c4c #1 | Hardware name: linux,dummy-virt (DT) | pstate: 400000c9 (nZcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--) | pc : __pi_strcmp+0x1c/0x150 | lr : populate_properties+0xe4/0x254 | sp : ffffd014173d3ad0 | x29: ffffd014173d3af0 x28: fffffbfffddffcb8 x27: 0000000000000000 | x26: 0000000000000058 x25: fffffbfffddfe054 x24: 0000000000000008 | x23: fffffbfffddfe000 x22: fffffbfffddfe000 x21: fffffbfffddfe044 | x20: ffffd014173d3b70 x19: 0000000000000001 x18: 0000000000000005 | x17: 0000000000000010 x16: 0000000000000000 x15: 00000000413e7000 | x14: 0000000000000000 x13: 0000000000001bcc x12: 0000000000000000 | x11: 00000000d00dfeed x10: ffffd414193f2cd0 x9 : 0000000000000000 | x8 : 0101010101010101 x7 : ffffffffffffffc0 x6 : 0000000000000000 | x5 : 0000000000000000 x4 : 0101010101010101 x3 : 000000000000002a | x2 : 0000000000000001 x1 : ffffd014171f2988 x0 : fffffbfffddffcb8 | Kernel panic - not syncing: Unhandled exception | CPU: 0 PID: 0 Comm: swapper Not tainted 6.6.0-rc3-00013-g34f66c4c #1 | Hardware name: linux,dummy-virt (DT) | Call trace: | dump_backtrace+0xec/0x108 | show_stack+0x18/0x2c | dump_stack_lvl+0x50/0x68 | dump_stack+0x18/0x24 | panic+0x13c/0x340 | el1t_64_irq_handler+0x0/0x1c | el1_abort+0x0/0x5c | el1h_64_sync+0x64/0x68 | __pi_strcmp+0x1c/0x150 | unflatten_dt_nodes+0x1e8/0x2d8 | __unflatten_device_tree+0x5c/0x15c | unflatten_device_tree+0x38/0x50 | setup_arch+0x164/0x1e0 | start_kernel+0x64/0x38c | __primary_switched+0xbc/0xc4 Restrict CONFIG_CPU_BIG_ENDIAN to a known good assembler, which is either GNU as or LLVM's IAS 15.0.0 and newer, which contains the linked commit. Closes: https://github.com/ClangBuiltLinux/linux/issues/1948 Link: https://github.com/llvm/llvm-project/commit/1379b150991f70a5782e9a143c2ba5308da1161cSigned-off-by: Nathan Chancellor <nathan@kernel.org> Cc: stable@vger.kernel.org Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20231025-disable-arm64-be-ias-b4-llvm-15-v1-1-b25263ed8b23@kernel.orgSigned-off-by: Catalin Marinas <catalin.marinas@arm.com>
-
- 24 Oct, 2023 5 commits
-
-
Maria Yu authored
The counting of module PLTs has been broken when CONFIG_RANDOMIZE_BASE=n since commit: 3e35d303 ("arm64: module: rework module VA range selection") Prior to that commit, when CONFIG_RANDOMIZE_BASE=n, the kernel image and all modules were placed within a 128M region, and no PLTs were necessary for B or BL. Hence count_plts() and partition_branch_plt_relas() skipped handling B and BL when CONFIG_RANDOMIZE_BASE=n. After that commit, modules can be placed anywhere within a 2G window regardless of CONFIG_RANDOMIZE_BASE, and hence PLTs may be necessary for B and BL even when CONFIG_RANDOMIZE_BASE=n. Unfortunately that commit failed to update count_plts() and partition_branch_plt_relas() accordingly. Due to this, module_emit_plt_entry() may fail if an insufficient number of PLT entries have been reserved, resulting in modules failing to load with -ENOEXEC. Fix this by counting PLTs regardless of CONFIG_RANDOMIZE_BASE in count_plts() and partition_branch_plt_relas(). Fixes: 3e35d303 ("arm64: module: rework module VA range selection") Signed-off-by: Maria Yu <quic_aiquny@quicinc.com> Cc: <stable@vger.kernel.org> # 6.5.x Acked-by: Ard Biesheuvel <ardb@kernel.org> Fixes: 3e35d303 ("arm64: module: rework module VA range selection") Reviewed-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20231024010954.6768-1-quic_aiquny@quicinc.comSigned-off-by: Catalin Marinas <catalin.marinas@arm.com>
-
James Morse authored
ACPI, irqchip and the architecture code all inspect the MADT enabled bit for a GICC entry in the MADT. The addition of an 'online capable' bit means all these sites need updating. Move the current checks behind a helper to make future updates easier. Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Signed-off-by: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk> Acked-by: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Link: https://lore.kernel.org/r/E1quv5D-00AeNJ-U8@rmk-PC.armlinux.org.ukSigned-off-by: Catalin Marinas <catalin.marinas@arm.com>
-
Junhao He authored
When we fail to register the uncore pmu, the pmu context may not been allocated. The error handing will call cpuhp_state_remove_instance() to call uncore pmu offline callback, which migrate the pmu context. Since that's liable to lead to some kind of use-after-free. Use cpuhp_state_remove_instance_nocalls() instead of cpuhp_state_remove_instance() so that the notifiers don't execute after the PMU device has been failed to register. Fixes: a0ab25cd ("drivers/perf: hisi: Add support for HiSilicon PA PMU driver") FIxes: 3bf30882 ("drivers/perf: hisi: Add support for HiSilicon SLLC PMU driver") Signed-off-by: Junhao He <hejunhao3@huawei.com> Link: https://lore.kernel.org/r/20231024113630.13472-1-hejunhao3@huawei.comSigned-off-by: Will Deacon <will@kernel.org>
-
Yicong Yang authored
Initialize the event->cpu only on success. To be more reasonable and keep consistent with other PMUs. Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20231024092954.42297-3-yangyicong@huawei.comSigned-off-by: Will Deacon <will@kernel.org>
-
Yicong Yang authored
Check whether the event type matches the PMU type firstly in pmu::event_init() before touching the event. Otherwise we'll change the events of others and lead to incorrect results. Since in perf_init_event() we may call every pmu's event_init() in a certain case, we should not modify the event if it's not ours. Fixes: 8404b0fb ("drivers/perf: hisi: Add driver for HiSilicon PCIe PMU") Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20231024092954.42297-2-yangyicong@huawei.comSigned-off-by: Will Deacon <will@kernel.org>
-
- 23 Oct, 2023 5 commits
-
-
Jeremy Linton authored
Now that we have the ability to display the list of cores with a feature when its selectivly enabled, lets convert DBM to use that as well. Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> Link: https://lore.kernel.org/r/20231017052322.1211099-3-jeremy.linton@arm.comSigned-off-by: Catalin Marinas <catalin.marinas@arm.com>
-
Jeremy Linton authored
The AMU feature can be enabled on a subset of the cores in a system. Because of that, it prints a message for each core as it is detected. This becomes tedious when there are hundreds of cores. Instead, for CPU features which can be enabled on a subset of the present cores, lets wait until update_cpu_capabilities() and print the subset of cores the feature was enabled on. Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com> Tested-by: Ionela Voinescu <ionela.voinescu@arm.com> Reviewed-by: Punit Agrawal <punit.agrawal@bytedance.com> Tested-by: Punit Agrawal <punit.agrawal@bytedance.com> Link: https://lore.kernel.org/r/20231017052322.1211099-2-jeremy.linton@arm.comSigned-off-by: Catalin Marinas <catalin.marinas@arm.com>
-
Robin Murphy authored
Finally enable independent per-DTC-domain counter allocation, except on CMN-600 where we still need to cope with not knowing the domain topology and thus keep counter indices sychronised across domains. This allows users to simultaneously count up to 8 targeted events per domain, rather than 8 globally, for up to 4x wider coverage on maximum configurations. Even though this now looks deceptively simple, I stand by my previous assertion that it was a flippin' nightmare to implement; all the real head-scratchers are hidden in the foundations in the previous patch... Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/849f65566582cb102c6d0843d0f26e231180f8ac.1697824215.git.robin.murphy@arm.comSigned-off-by: Will Deacon <will@kernel.org>
-
Robin Murphy authored
The bitmap-based scheme for tracking DTC counter usage turns out to be a complete dead-end for its imagined purpose, since by the time we have to keep track of a per-DTC counter index anyway, we already have enough information to make the bitmap itself redundant. Revert the remains of it back to almost the original scheme, but now expanded to track per-DTC indices, in preparation for making use of them in anger. Note that since cycle count events always use a dedicated counter on a single DTC, we reuse the field to encode their DTC index directly. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Link: https://lore.kernel.org/r/5f6ade76b47f033836d7a36c03555da896dfb4a3.1697824215.git.robin.murphy@arm.comSigned-off-by: Will Deacon <will@kernel.org>
-
Robin Murphy authored
It transpires that dtm_unit_info is another register which got shuffled in CMN-700 without me noticing. Fix that in a way which also proactively fixes the fragile laziness of its consumer, just in case any further fields ever get added alongside dtc_domain. Fixes: 23760a01 ("perf/arm-cmn: Add CMN-700 support") Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Link: https://lore.kernel.org/r/3076ee83d0554f6939fbb6ee49ab2bdb28d8c7ee.1697824215.git.robin.murphy@arm.comSigned-off-by: Will Deacon <will@kernel.org>
-
- 19 Oct, 2023 3 commits
-
-
Anshuman Khandual authored
All the PMU init functions want the default sysfs attribute groups, and so these all call armv8_pmu_init_nogroups() helper, with none of them calling armv8_pmu_init() directly. When we introduced armv8_pmu_init_nogroups() in the commit e424b179 ("arm64: perf: Refactor PMU init callbacks") ... we thought that we might need custom attribute groups in future, but as we evidently haven't, we can remove the option. This patch folds armv8_pmu_init_nogroups() into armv8_pmu_init(), removing the ability to use custom attribute groups and simplifying the code. CC: James Clark <james.clark@arm.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20231016025436.1368945-1-anshuman.khandual@arm.comSigned-off-by: Will Deacon <will@kernel.org>
-
Anshuman Khandual authored
Currently the PMUv3 driver only reads PMMIR_EL1 if the PMU implements FEAT_PMUv3p4 and the STALL_SLOT event, but the check for STALL_SLOT event isn't necessary and can be removed. The check for STALL_SLOT event was introduced with the read of PMMIR_EL1 in commit f5be3a61 ("arm64: perf: Add support caps under sysfs") When this logic was written, the ARM ARM said: | If STALL_SLOT is not implemented, it is IMPLEMENTATION DEFINED whether | the PMMIR System registers are implemented. ... and thus the driver had to check for STALL_SLOT event to verify that PMMIR_EL1 was implemented and accesses to PMMIR_EL1 would not be UNDEFINED. Subsequently, the architecture was retrospectively tightened to require that any FEAT_PMUv3p4 implementation implements PMMIR_EL1. Since the G.b release of the ARM ARM, the wording regarding STALL_SLOT event has been removed, and the description of PMMIR_EL1 says: | This register is present only when FEAT_PMUv3p4 is implemented. Drop the unnecessary check for STALL_SLOT event when reading PMMIR_EL1. Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20231013024354.1289070-1-anshuman.khandual@arm.comSigned-off-by: Will Deacon <will@kernel.org>
-
Hao Chen authored
When tearing down a 'hisi_hns3' PMU, we mistakenly run the CPU hotplug callbacks after the device has been unregistered, leading to fireworks when we try to execute empty function callbacks within the driver: | Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 | CPU: 0 PID: 15 Comm: cpuhp/0 Tainted: G W O 5.12.0-rc4+ #1 | Hardware name: , BIOS KpxxxFPGA 1P B600 V143 04/22/2021 | pstate: 80400009 (Nzcv daif +PAN -UAO -TCO BTYPE=--) | pc : perf_pmu_migrate_context+0x98/0x38c | lr : perf_pmu_migrate_context+0x94/0x38c | | Call trace: | perf_pmu_migrate_context+0x98/0x38c | hisi_hns3_pmu_offline_cpu+0x104/0x12c [hisi_hns3_pmu] Use cpuhp_state_remove_instance_nocalls() instead of cpuhp_state_remove_instance() so that the notifiers don't execute after the PMU device has been unregistered. Fixes: 66637ab1 ("drivers/perf: hisi: add driver for HNS3 PMU") Signed-off-by: Hao Chen <chenhao418@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20231019091352.998964-1-shaojijie@huawei.com [will: Rewrote commit message] Signed-off-by: Will Deacon <will@kernel.org>
-
- 18 Oct, 2023 3 commits
-
-
Andre Przywara authored
The AppliedMicro XGene-1 CPU has an erratum where the timer condition would only consider TVAL, not CVAL. We currently apply a workaround when seeing the PartNum field of MIDR_EL1 being 0x000, under the assumption that this would match only the XGene-1 CPU model. However even the Ampere eMAG (aka XGene-3) uses that same part number, and only differs in the "Variant" and "Revision" fields: XGene-1's MIDR is 0x500f0000, our eMAG reports 0x503f0002. Experiments show the latter doesn't show the faulty behaviour. Increase the specificity of the check to only consider partnum 0x000 and variant 0x00, to exclude the Ampere eMAG. Fixes: 012f1885 ("clocksource/drivers/arm_arch_timer: Work around broken CVAL implementations") Reported-by: Ross Burton <ross.burton@arm.com> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20231016153127.116101-1-andre.przywara@arm.comSigned-off-by: Catalin Marinas <catalin.marinas@arm.com>
-
Gavin Shan authored
There are two variants of system_uses_lse_atomics(), depending on CONFIG_ARM64_LSE_ATOMICS. The function isn't called anywhere when CONFIG_ARM64_LSE_ATOMICS is disabled. It can be directly replaced by alternative_has_cap_likely(ARM64_HAS_LSE_ATOMICS) when the kernel option is enabled. No need to keep system_uses_lse_atomics() and just remove it. Signed-off-by: Gavin Shan <gshan@redhat.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20231017005036.334067-1-gshan@redhat.comSigned-off-by: Catalin Marinas <catalin.marinas@arm.com>
-
Catalin Marinas authored
This argument is not used by the arm64 implementation. Mark it as __always_unused and also remove the unnecessary 'addr' increment in set_ptes(). Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202310140531.BQQwt3NQ-lkp@intel.com/ Cc: Will Deacon <will@kernel.org> Tested-by: Ryan Roberts <ryan.roberts@arm.com> Link: https://lore.kernel.org/r/ZS6EvMiJ0QF5INkv@arm.com
-
- 17 Oct, 2023 2 commits
-
-
Rob Herring authored
Use preferred device_get_match_data() instead of of_match_device() and acpi_match_device() to get the driver match data. With this, adjust the includes to explicitly include the correct headers. Signed-off-by: Rob Herring <robh@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20231009172923.2457844-14-robh@kernel.orgSigned-off-by: Will Deacon <will@kernel.org>
-
Marek Szyprowski authored
Add missing MODULE_DEVICE_TABLE macro to let this driver to be automatically loaded as module. Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://lore.kernel.org/r/20231012103543.3381326-1-m.szyprowski@samsung.comSigned-off-by: Will Deacon <will@kernel.org>
-
- 16 Oct, 2023 12 commits
-
-
Ryan Roberts authored
set_ptes() sets a physically contiguous block of memory (which all belongs to the same folio) to a contiguous block of ptes. The arm64 implementation of this previously just looped, operating on each individual pte. But the __sync_icache_dcache() and mte_sync_tags() operations can both be hoisted out of the loop so that they are performed once for the contiguous set of pages (which may be less than the whole folio). This should result in minor performance gains. __sync_icache_dcache() already acts on the whole folio, and sets a flag in the folio so that it skips duplicate calls. But by hoisting the call, all the pte testing is done only once. mte_sync_tags() operates on each individual page with its own loop. But by passing the number of pages explicitly, we can rely solely on its loop and do the checks only once. This approach also makes it robust for the future, rather than assuming if a head page of a compound page is being mapped, then the whole compound page is being mapped, instead we explicitly know how many pages are being mapped. The old assumption may not continue to hold once the "anonymous large folios" feature is merged. Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Steven Price <steven.price@arm.com> Link: https://lore.kernel.org/r/20231005140730.2191134-1-ryan.roberts@arm.comSigned-off-by: Catalin Marinas <catalin.marinas@arm.com>
-
Mark Rutland authored
There are no longer any users of cpus_have_const_cap(), and therefore it can be removed. Remove cpus_have_const_cap(). At the same time, remove __cpus_have_const_cap(), as this is a trivial wrapper of alternative_has_cap_unlikely(), which can be used directly instead. The comment for __system_matches_cap() is updated to no longer refer to cpus_have_const_cap(). As we have a number of ways to check the cpucaps, the specific suggestions are removed. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Kristina Martsenko <kristina.martsenko@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
-
Mark Rutland authored
In arch_tlbbatch_should_defer() we use cpus_have_const_cap() to check for ARM64_WORKAROUND_REPEAT_TLBI, but this is not necessary and alternative_has_cap_*() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. The cpus_have_const_cap() check in arch_tlbbatch_should_defer() is an optimization to avoid some redundant work when the ARM64_WORKAROUND_REPEAT_TLBI cpucap is detected and forces the immediate use of TLBI + DSB ISH. In the window between detecting the ARM64_WORKAROUND_REPEAT_TLBI cpucap and patching alternatives this is not a big concern and there's no need to optimize this window at the expsense of subsequent usage at runtime. This patch replaces the use of cpus_have_const_cap() with alternative_has_cap_unlikely(), which will avoid generating code to test the system_cpucaps bitmap and should be better for all subsequent calls at runtime. The ARM64_WORKAROUND_REPEAT_TLBI cpucap is added to cpucap_is_possible() so that code can be elided entirely when this is not possible without requiring ifdeffery or IS_ENABLED() checks at each usage. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
-
Mark Rutland authored
In has_useable_cnp() we use cpus_have_const_cap() to check for ARM64_WORKAROUND_NVIDIA_CARMEL_CNP, but this is not necessary and cpus_have_cap() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. We use has_useable_cnp() to determine whether we have the system-wide ARM64_HAS_CNP cpucap. Due to the structure of the cpufeature code, we call has_useable_cnp() in two distinct cases: 1) When finalizing system capabilities, setup_system_capabilities() will call has_useable_cnp() with SCOPE_SYSTEM to determine whether all CPUs have the feature. This is called after we've detected any local cpucaps including ARM64_WORKAROUND_NVIDIA_CARMEL_CNP, but prior to patching alternatives. If the ARM64_WORKAROUND_NVIDIA_CARMEL_CNP was detected, we will not detect ARM64_HAS_CNP. 2) After finalizing system capabilties, verify_local_cpu_capabilities() will call has_useable_cnp() with SCOPE_LOCAL_CPU to verify that CPUs have CNP if we previously detected it. Note that if ARM64_WORKAROUND_NVIDIA_CARMEL_CNP was detected, we will not have detected ARM64_HAS_CNP. For case 1 we must check the system_cpucaps bitmap as this occurs prior to patching the alternatives. For case 2 we'll only call has_useable_cnp() once per subsequent onlining of a CPU, and as this isn't a fast path it's not necessary to optimize for this case. This patch replaces the use of cpus_have_const_cap() with cpus_have_cap(), which will only generate the bitmap test and avoid generating an alternative sequence, resulting in slightly simpler annd smaller code being generated. The ARM64_WORKAROUND_NVIDIA_CARMEL_CNP cpucap is added to cpucap_is_possible() so that code can be elided entirely when this is not possible. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
-
Mark Rutland authored
In gic_read_iar() we use cpus_have_const_cap() to check for ARM64_WORKAROUND_CAVIUM_23154 but this is not necessary and alternative_has_cap_*() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. The ARM64_WORKAROUND_CAVIUM_23154 cpucap is detected and patched early on the boot CPU before the GICv3 driver is initialized and hence before gic_read_iar() is ever called. Thus it is not necessary to use cpus_have_const_cap(), and alternative_has_cap() is equivalent. In addition, arm64's gic_read_iar() lives in irq-gic-v3.c purely for historical reasons. It was originally added prior to 32-bit arm support in commit: 6d4e11c5 ("irqchip/gicv3: Workaround for Cavium ThunderX erratum 23154") When support for 32-bit arm was added, 32-bit arm's gic_read_iar() implementation was placed in <asm/arch_gicv3.h>, but the arm64 version was kept within irq-gic-v3.c as it depended on a static key local to irq-gic-v3.c and it was easier to add ifdeffery, which is what we did in commit: 7936e914 ("irqchip/gic-v3: Refactor the arm64 specific parts") Subsequently the static key was replaced with a cpucap in commit: a4023f68 ("arm64: Add hypervisor safe helper for checking constant capabilities") Since that commit there has been no need to keep arm64's gic_read_iar() in irq-gic-v3.c. This patch replaces the use of cpus_have_const_cap() with alternative_has_cap_unlikely(), which will avoid generating code to test the system_cpucaps bitmap and should be better for all subsequent calls at runtime. For consistency, move the arm64-specific gic_read_iar() implementation over to arm64's <asm/arch_gicv3.h>. The ARM64_WORKAROUND_CAVIUM_23154 cpucap is added to cpucap_is_possible() so that code can be elided entirely when this is not possible. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
-
Mark Rutland authored
We use cpus_have_const_cap() to check for ARM64_WORKAROUND_2645198 but this is not necessary and alternative_has_cap() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. The ARM64_WORKAROUND_2645198 cpucap is detected and patched before any userspace translation table exist, and the workaround is only necessary when manipulating usrspace translation tables which are in use. Thus it is not necessary to use cpus_have_const_cap(), and alternative_has_cap() is equivalent. This patch replaces the use of cpus_have_const_cap() with alternative_has_cap_unlikely(), which will avoid generating code to test the system_cpucaps bitmap and should be better for all subsequent calls at runtime. The ARM64_WORKAROUND_2645198 cpucap is added to cpucap_is_possible() so that code can be elided entirely when this is not possible, and redundant IS_ENABLED() checks are removed. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
-
Mark Rutland authored
In elf_hwcap_fixup() we use cpus_have_const_cap() to check for ARM64_WORKAROUND_1742098, but this is not necessary and cpus_have_cap() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. The ARM64_WORKAROUND_1742098 cpucap is detected and patched before elf_hwcap_fixup() can run, and hence it is not necessary to use cpus_have_const_cap(). We run cpus_have_const_cap() at most twice: once after finalizing system cpucaps, and potentially once more after detecting mismatched CPUs which support AArch32 at EL0. Due to this, it's not necessary to optimize for many calls to elf_hwcap_fixup(), and it's fine to use cpus_have_cap(). This patch replaces the use of cpus_have_const_cap() with cpus_have_cap(), which will only generate the bitmap test and avoid generating an alternative sequence, resulting in slightly simpler annd smaller code being generated. For consistenct with other cpucaps, the ARM64_WORKAROUND_1742098 cpucap is added to cpucap_is_possible() so that code can be elided when this is not possible. However, as we only define compat_elf_hwcap2 when CONFIG_COMPAT=y, some ifdeffery is still required within user_feature_fixup() to avoid build errors when CONFIG_COMPAT=n. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
-
Mark Rutland authored
We use cpus_have_const_cap() to check for ARM64_WORKAROUND_1542419 but this is not necessary and cpus_have_final_cap() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. The ARM64_WORKAROUND_1542419 cpucap is detected and patched before any userspace code can run, and the both __do_compat_cache_op() and ctr_read_handler() are only reachable from exceptions taken from userspace. Thus it is not necessary for either to use cpus_have_const_cap(), and cpus_have_final_cap() is equivalent. This patch replaces the use of cpus_have_const_cap() with cpus_have_final_cap(), which will avoid generating code to test the system_cpucaps bitmap and should be better for all subsequent calls at runtime. Using cpus_have_final_cap() clearly documents that we do not expect this code to run before cpucaps are finalized, and will make it easier to spot issues if code is changed in future to allow these functions to be reached earlier. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
-
Mark Rutland authored
In count_plts() and is_forbidden_offset_for_adrp() we use cpus_have_const_cap() to check for ARM64_WORKAROUND_843419, but this is not necessary and cpus_have_final_cap() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. It's not possible to load a module in the window between detecting the ARM64_WORKAROUND_843419 cpucap and patching alternatives. The module VA range limits are initialized much later in module_init_limits() which is a subsys_initcall, and module loading cannot happen before this. Hence it's not necessary for count_plts() or is_forbidden_offset_for_adrp() to use cpus_have_const_cap(). This patch replaces the use of cpus_have_const_cap() with cpus_have_final_cap() which will avoid generating code to test the system_cpucaps bitmap and should be better for all subsequent calls at runtime. Using cpus_have_final_cap() clearly documents that we do not expect this code to run before cpucaps are finalized, and will make it easier to spot issues if code is changed in future to allow modules to be loaded earlier. The ARM64_WORKAROUND_843419 cpucap is added to cpucap_is_possible() so that code can be elided entirely when this is not possible, and redundant IS_ENABLED() checks are removed. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
-
Mark Rutland authored
In arm64_kernel_unmapped_at_el0() we use cpus_have_const_cap() to check for ARM64_UNMAP_KERNEL_AT_EL0, but this is only necessary so that arm64_get_bp_hardening_vector() and this_cpu_set_vectors() can run prior to alternatives being patched. Otherwise this is not necessary and alternative_has_cap_*() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. The ARM64_UNMAP_KERNEL_AT_EL0 cpucap is a system-wide feature that is detected and patched before any translation tables are created for userspace. In the window between detecting the ARM64_UNMAP_KERNEL_AT_EL0 cpucap and patching alternatives, most users of arm64_kernel_unmapped_at_el0() do not need to know that the cpucap has been detected: * As KVM is initialized after cpucaps are finalized, no usaef of arm64_kernel_unmapped_at_el0() in the KVM code is reachable during this window. * The arm64_mm_context_get() function in arch/arm64/mm/context.c is only called after the SMMU driver is brought up after alternatives have been patched. Thus this can safely use cpus_have_final_cap() or alternative_has_cap_*(). Similarly the asids_update_limit() function is called after alternatives have been patched as an arch_initcall, and this can safely use cpus_have_final_cap() or alternative_has_cap_*(). Similarly we do not expect an ASID rollover to occur between cpucaps being detected and patching alternatives. Thus set_reserved_asid_bits() can safely use cpus_have_final_cap() or alternative_has_cap_*(). * The __tlbi_user() and __tlbi_user_level() macros are not used during this window, and only need to invalidate additional entries once userspace translation tables have been active on a CPU. Thus these can safely use alternative_has_cap_*(). * The xen_kernel_unmapped_at_usr() function is not used during this window as it is only used in a late_initcall. Thus this can safely use cpus_have_final_cap() or alternative_has_cap_*(). * The arm64_get_meltdown_state() function is not used during this window. It only used by arm64_get_meltdown_state() and KVM code, both of which are only used after cpucaps have been finalized. Thus this can safely use cpus_have_final_cap() or alternative_has_cap_*(). * The tls_thread_switch() uses arm64_kernel_unmapped_at_el0() as an optimization to avoid zeroing tpidrro_el0 when KPTI is enabled and this will be trampled by the KPTI trampoline. It doesn't matter if this continues to zero the register during the window between detecting the cpucap and patching alternatives, so this can safely use alternative_has_cap_*(). * The sdei_arch_get_entry_point() and do_sdei_event() functions aren't reachable at this time as the SDEI driver is registered later by acpi_init() -> acpi_ghes_init() -> sdei_init(), where acpi_init is a subsys_initcall. Thus these can safely use cpus_have_final_cap() or alternative_has_cap_*(). * The uses under drivers/ aren't reachable at this time as the drivers are registered later: - TRBE is registered via module_init() - SMMUv3 is registred via module_driver() - SPE is registred via module_init() * The arm64_get_bp_hardening_vector() and this_cpu_set_vectors() functions need to run on boot CPUs prior to patching alternatives. As these are only called during the onlining of a CPU, it's fine to perform a system_cpucaps bitmap test using cpus_have_cap(). This patch modifies this_cpu_set_vectors() to use cpus_have_cap(), and replaced all other use of cpus_have_const_cap() with alternative_has_cap_unlikely(), which will avoid generating code to test the system_cpucaps bitmap and should be better for all subsequent calls at runtime. The ARM64_UNMAP_KERNEL_AT_EL0 cpucap is added to cpucap_is_possible() so that code can be elided entirely when this is not possible. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
-
Mark Rutland authored
In system_supports_{sve,sme,sme2,fa64}() we use cpus_have_const_cap() to check for the relevant cpucaps, but this is only necessary so that sve_setup() and sme_setup() can run prior to alternatives being patched, and otherwise alternative_has_cap_*() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. All of system_supports_{sve,sme,sme2,fa64}() will return false prior to system cpucaps being detected. In the window between system cpucaps being detected and patching alternatives, we need system_supports_sve() and system_supports_sme() to run to initialize SVE and SME properties, but all other users of system_supports_{sve,sme,sme2,fa64}() don't depend on the relevant cpucap becoming true until alternatives are patched: * No KVM code runs until after alternatives are patched, and so this can safely use cpus_have_final_cap() or alternative_has_cap_*(). * The cpuid_cpu_online() callback in arch/arm64/kernel/cpuinfo.c is registered later from cpuinfo_regs_init() as a device_initcall, and so this can safely use cpus_have_final_cap() or alternative_has_cap_*(). * The entry, signal, and ptrace code isn't reachable until userspace has run, and so this can safely use cpus_have_final_cap() or alternative_has_cap_*(). * Currently perf_reg_validate() will un-reserve the PERF_REG_ARM64_VG pseudo-register before alternatives are patched, and before sve_setup() has run. If a sampling event is created early enough, this would allow perf_ext_reg_value() to sample (the as-yet uninitialized) thread_struct::vl[] prior to alternatives being patched. It would be preferable to defer this until alternatives are patched, and this can safely use alternative_has_cap_*(). * The context-switch code will run during this window as part of stop_machine() used during alternatives_patch_all(), and potentially for other work if other kernel threads are created early. No threads require the use of SVE/SME/SME2/FA64 prior to alternatives being patched, and it would be preferable for the related context-switch logic to take effect after alternatives are patched so that ths is guaranteed to see a consistent system-wide state (e.g. anything initialized by sve_setup() and sme_setup(). This can safely ues alternative_has_cap_*(). This patch replaces the use of cpus_have_const_cap() with alternative_has_cap_unlikely(), which will avoid generating code to test the system_cpucaps bitmap and should be better for all subsequent calls at runtime. The sve_setup() and sme_setup() functions are modified to use cpus_have_cap() directly so that they can observe the cpucaps being set prior to alternatives being patched. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
-
Mark Rutland authored
In arm64_apply_bp_hardening() we use cpus_have_const_cap() to check for ARM64_SPECTRE_V2 , but this is not necessary and alternative_has_cap_*() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. The cpus_have_const_cap() check in arm64_apply_bp_hardening() is intended to avoid the overhead of looking up and invoking a per-cpu function pointer when no branch predictor hardening is required. The arm64_apply_bp_hardening() function itself is called in two distinct flows: 1) When handling certain exceptions taken from EL0, where the PC could be a TTBR1 address and hence might have trained a branch predictor. As cpucaps are detected and alternatives are patched long before it is possible to execute userspace, it is not necessary to use cpus_have_const_cap() for these cases, and cpus_have_final_cap() or alternative_has_cap() would be preferable. 2) When switching between tasks in check_and_switch_context(). This can be called before cpucaps are detected and alternatives are patched, but this is long before the kernel mounts filesystems or accepts any input. At this stage the kernel hasn't loaded any secrets and there is no potential for hostile branch predictor training. Once cpucaps have been finalized and alternatives have been patched, switching tasks will invalidate any prior predictions. Hence it is not necessary to use cpus_have_const_cap() for this case. This patch replaces the use of cpus_have_const_cap() with alternative_has_cap_unlikely(), which will avoid generating code to test the system_cpucaps bitmap and should be better for all subsequent calls at runtime. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
-