Commit 356a1adc authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Will Deacon:

 - Support for including MTE tags in ELF coredumps

 - Instruction encoder updates, including fixes to 64-bit immediate
   generation and support for the LSE atomic instructions

 - Improvements to kselftests for MTE and fpsimd

 - Symbol aliasing and linker script cleanups

 - Reduce instruction cache maintenance performed for user mappings
   created using contiguous PTEs

 - Support for the new "asymmetric" MTE mode, where stores are checked
   asynchronously but loads are checked synchronously

 - Support for the latest pointer authentication algorithm ("QARMA3")

 - Support for the DDR PMU present in the Marvell CN10K platform

 - Support for the CPU PMU present in the Apple M1 platform

 - Use the RNDR instruction for arch_get_random_{int,long}()

 - Update our copy of the Arm optimised string routines for str{n}cmp()

 - Fix signal frame generation for CPUs which have foolishly elected to
   avoid building in support for the fpsimd instructions

 - Workaround for Marvell GICv3 erratum #38545

 - Clarification to our Documentation (booting reqs. and MTE prctl())

 - Miscellanous cleanups and minor fixes

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (90 commits)
  docs: sysfs-devices-system-cpu: document "asymm" value for mte_tcf_preferred
  arm64/mte: Remove asymmetric mode from the prctl() interface
  arm64: Add cavium_erratum_23154_cpus missing sentinel
  perf/marvell: Fix !CONFIG_OF build for CN10K DDR PMU driver
  arm64: mm: Drop 'const' from conditional arm64_dma_phys_limit definition
  Documentation: vmcoreinfo: Fix htmldocs warning
  kasan: fix a missing header include of static_keys.h
  drivers/perf: Add Apple icestorm/firestorm CPU PMU driver
  drivers/perf: arm_pmu: Handle 47 bit counters
  arm64: perf: Consistently make all event numbers as 16-bits
  arm64: perf: Expose some Armv9 common events under sysfs
  perf/marvell: cn10k DDR perf event core ownership
  perf/marvell: cn10k DDR perfmon event overflow handling
  perf/marvell: CN10k DDR performance monitor support
  dt-bindings: perf: marvell: cn10k ddr performance monitor
  arm64: clean up tools Makefile
  perf/arm-cmn: Update watchpoint format
  perf/arm-cmn: Hide XP PUB events for CMN-600
  arm64: drop unused includes of <linux/personality.h>
  arm64: Do not defer reserve_crashkernel() for platforms with no DMA memory zones
  ...
parents 9d8e7007 641d8041
......@@ -662,6 +662,7 @@ Description: Preferred MTE tag checking mode
================ ==============================================
"sync" Prefer synchronous mode
"asymm" Prefer asymmetric mode
"async" Prefer asynchronous mode
================ ==============================================
......
......@@ -494,6 +494,14 @@ architecture which is used to lookup the page-tables for the Virtual
addresses in the higher VA range (refer to ARMv8 ARM document for
more details).
MODULES_VADDR|MODULES_END|VMALLOC_START|VMALLOC_END|VMEMMAP_START|VMEMMAP_END
-----------------------------------------------------------------------------
Used to get the correct ranges:
MODULES_VADDR ~ MODULES_END-1 : Kernel module space.
VMALLOC_START ~ VMALLOC_END-1 : vmalloc() / ioremap() space.
VMEMMAP_START ~ VMEMMAP_END-1 : vmemmap region, used for struct page array.
arm
===
......
......@@ -10,9 +10,9 @@ This document is based on the ARM booting document by Russell King and
is relevant to all public releases of the AArch64 Linux kernel.
The AArch64 exception model is made up of a number of exception levels
(EL0 - EL3), with EL0 and EL1 having a secure and a non-secure
counterpart. EL2 is the hypervisor level and exists only in non-secure
mode. EL3 is the highest priority level and exists only in secure mode.
(EL0 - EL3), with EL0, EL1 and EL2 having a secure and a non-secure
counterpart. EL2 is the hypervisor level, EL3 is the highest priority
level and exists only in secure mode. Both are architecturally optional.
For the purposes of this document, we will use the term `boot loader`
simply to define all software that executes on the CPU(s) before control
......@@ -167,8 +167,8 @@ Before jumping into the kernel, the following conditions must be met:
All forms of interrupts must be masked in PSTATE.DAIF (Debug, SError,
IRQ and FIQ).
The CPU must be in either EL2 (RECOMMENDED in order to have access to
the virtualisation extensions) or non-secure EL1.
The CPU must be in non-secure state, either in EL2 (RECOMMENDED in order
to have access to the virtualisation extensions), or in EL1.
- Caches, MMUs
......
......@@ -259,6 +259,11 @@ HWCAP2_RPRES
Functionality implied by ID_AA64ISAR2_EL1.RPRES == 0b0001.
HWCAP2_MTE3
Functionality implied by ID_AA64PFR1_EL1.MTE == 0b0011, as described
by Documentation/arm64/memory-tagging-extension.rst.
4. Unused AT_HWCAP bits
-----------------------
......
......@@ -76,6 +76,9 @@ configurable behaviours:
with ``.si_code = SEGV_MTEAERR`` and ``.si_addr = 0`` (the faulting
address is unknown).
- *Asymmetric* - Reads are handled as for synchronous mode while writes
are handled as for asynchronous mode.
The user can select the above modes, per thread, using the
``prctl(PR_SET_TAGGED_ADDR_CTRL, flags, 0, 0, 0)`` system call where ``flags``
contains any number of the following values in the ``PR_MTE_TCF_MASK``
......@@ -91,8 +94,9 @@ mode is specified, the program will run in that mode. If multiple
modes are specified, the mode is selected as described in the "Per-CPU
preferred tag checking modes" section below.
The current tag check fault mode can be read using the
``prctl(PR_GET_TAGGED_ADDR_CTRL, 0, 0, 0, 0)`` system call.
The current tag check fault configuration can be read using the
``prctl(PR_GET_TAGGED_ADDR_CTRL, 0, 0, 0, 0)`` system call. If
multiple modes were requested then all will be reported.
Tag checking can also be disabled for a user thread by setting the
``PSTATE.TCO`` bit with ``MSR TCO, #1``.
......@@ -139,18 +143,25 @@ tag checking mode as the CPU's preferred tag checking mode.
The preferred tag checking mode for each CPU is controlled by
``/sys/devices/system/cpu/cpu<N>/mte_tcf_preferred``, to which a
privileged user may write the value ``async`` or ``sync``. The default
preferred mode for each CPU is ``async``.
privileged user may write the value ``async``, ``sync`` or ``asymm``. The
default preferred mode for each CPU is ``async``.
To allow a program to potentially run in the CPU's preferred tag
checking mode, the user program may set multiple tag check fault mode
bits in the ``flags`` argument to the ``prctl(PR_SET_TAGGED_ADDR_CTRL,
flags, 0, 0, 0)`` system call. If the CPU's preferred tag checking
mode is in the task's set of provided tag checking modes (this will
always be the case at present because the kernel only supports two
tag checking modes, but future kernels may support more modes), that
mode will be selected. Otherwise, one of the modes in the task's mode
set will be selected in a currently unspecified manner.
flags, 0, 0, 0)`` system call. If both synchronous and asynchronous
modes are requested then asymmetric mode may also be selected by the
kernel. If the CPU's preferred tag checking mode is in the task's set
of provided tag checking modes, that mode will be selected. Otherwise,
one of the modes in the task's mode will be selected by the kernel
from the task's mode set using the preference order:
1. Asynchronous
2. Asymmetric
3. Synchronous
Note that there is no way for userspace to request multiple modes and
also disable asymmetric mode.
Initial process state
---------------------
......@@ -213,6 +224,29 @@ address ABI control and MTE configuration of a process as per the
Documentation/arm64/tagged-address-abi.rst and above. The corresponding
``regset`` is 1 element of 8 bytes (``sizeof(long))``).
Core dump support
-----------------
The allocation tags for user memory mapped with ``PROT_MTE`` are dumped
in the core file as additional ``PT_ARM_MEMTAG_MTE`` segments. The
program header for such segment is defined as:
:``p_type``: ``PT_ARM_MEMTAG_MTE``
:``p_flags``: 0
:``p_offset``: segment file offset
:``p_vaddr``: segment virtual address, same as the corresponding
``PT_LOAD`` segment
:``p_paddr``: 0
:``p_filesz``: segment size in file, calculated as ``p_mem_sz / 32``
(two 4-bit tags cover 32 bytes of memory)
:``p_memsz``: segment size in memory, same as the corresponding
``PT_LOAD`` segment
:``p_align``: 0
The tags are stored in the core file at ``p_offset`` as two 4-bit tags
in a byte. With the tag granule of 16 bytes, a 4K page requires 128
bytes in the core file.
Example of correct usage
========================
......
......@@ -136,7 +136,7 @@ stable kernels.
+----------------+-----------------+-----------------+-----------------------------+
| Cavium | ThunderX ITS | #23144 | CAVIUM_ERRATUM_23144 |
+----------------+-----------------+-----------------+-----------------------------+
| Cavium | ThunderX GICv3 | #23154 | CAVIUM_ERRATUM_23154 |
| Cavium | ThunderX GICv3 | #23154,38545 | CAVIUM_ERRATUM_23154 |
+----------------+-----------------+-----------------+-----------------------------+
| Cavium | ThunderX GICv3 | #38539 | N/A |
+----------------+-----------------+-----------------+-----------------------------+
......
......@@ -130,14 +130,13 @@ denoting a range of code via ``SYM_*_START/END`` annotations.
In fact, this kind of annotation corresponds to the now deprecated ``ENTRY``
and ``ENDPROC`` macros.
* ``SYM_FUNC_START_ALIAS`` and ``SYM_FUNC_START_LOCAL_ALIAS`` serve for those
who decided to have two or more names for one function. The typical use is::
* ``SYM_FUNC_ALIAS``, ``SYM_FUNC_ALIAS_LOCAL``, and ``SYM_FUNC_ALIAS_WEAK`` can
be used to define multiple names for a function. The typical use is::
SYM_FUNC_START_ALIAS(__memset)
SYM_FUNC_START(memset)
SYM_FUNC_START(__memset)
... asm insns ...
SYM_FUNC_END(memset)
SYM_FUNC_END_ALIAS(__memset)
SYN_FUNC_END(__memset)
SYM_FUNC_ALIAS(memset, __memset)
In this example, one can call ``__memset`` or ``memset`` with the same
result, except the debug information for the instructions is generated to
......
......@@ -20,6 +20,8 @@ properties:
items:
- enum:
- apm,potenza-pmu
- apple,firestorm-pmu
- apple,icestorm-pmu
- arm,armv8-pmuv3 # Only for s/w models
- arm,arm1136-pmu
- arm,arm1176-pmu
......
......@@ -56,6 +56,8 @@ properties:
- 1: virtual HV timer
- 2: physical guest timer
- 3: virtual guest timer
- 4: 'efficient' CPU PMU
- 5: 'performance' CPU PMU
The 3rd cell contains the interrupt flags. This is normally
IRQ_TYPE_LEVEL_HIGH (4).
......@@ -68,6 +70,35 @@ properties:
power-domains:
maxItems: 1
affinities:
type: object
additionalProperties: false
description:
FIQ affinity can be expressed as a single "affinities" node,
containing a set of sub-nodes, one per FIQ with a non-default
affinity.
patternProperties:
"^.+-affinity$":
type: object
additionalProperties: false
properties:
apple,fiq-index:
description:
The interrupt number specified as a FIQ, and for which
the affinity is not the default.
$ref: /schemas/types.yaml#/definitions/uint32
maximum: 5
cpus:
$ref: /schemas/types.yaml#/definitions/phandle-array
description:
Should be a list of phandles to CPU nodes (as described in
Documentation/devicetree/bindings/arm/cpus.yaml).
required:
- fiq-index
- cpus
required:
- compatible
- '#interrupt-cells'
......
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
%YAML 1.2
---
$id: http://devicetree.org/schemas/perf/marvell-cn10k-ddr.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: Marvell CN10K DDR performance monitor
maintainers:
- Bharat Bhushan <bbhushan2@marvell.com>
properties:
compatible:
items:
- enum:
- marvell,cn10k-ddr-pmu
reg:
maxItems: 1
required:
- compatible
- reg
additionalProperties: false
examples:
- |
bus {
#address-cells = <2>;
#size-cells = <2>;
pmu@87e1c0000000 {
compatible = "marvell,cn10k-ddr-pmu";
reg = <0x87e1 0xc0000000 0x0 0x10000>;
};
};
......@@ -10,6 +10,7 @@ config ARM64
select ACPI_SPCR_TABLE if ACPI
select ACPI_PPTT if ACPI
select ARCH_HAS_DEBUG_WX
select ARCH_BINFMT_ELF_EXTRA_PHDRS
select ARCH_BINFMT_ELF_STATE
select ARCH_CORRECT_STACKTRACE_ON_KRETPROBE
select ARCH_ENABLE_HUGEPAGE_MIGRATION if HUGETLB_PAGE && MIGRATION
......@@ -891,13 +892,17 @@ config CAVIUM_ERRATUM_23144
If unsure, say Y.
config CAVIUM_ERRATUM_23154
bool "Cavium erratum 23154: Access to ICC_IAR1_EL1 is not sync'ed"
bool "Cavium errata 23154 and 38545: GICv3 lacks HW synchronisation"
default y
help
The gicv3 of ThunderX requires a modified version for
The ThunderX GICv3 implementation requires a modified version for
reading the IAR status to ensure data synchronization
(access to icc_iar1_el1 is not sync'ed before and after).
It also suffers from erratum 38545 (also present on Marvell's
OcteonTX and OcteonTX2), resulting in deactivated interrupts being
spuriously presented to the CPU interface.
If unsure, say Y.
config CAVIUM_ERRATUM_27456
......
......@@ -97,6 +97,18 @@ timer {
<AIC_FIQ AIC_TMR_HV_VIRT IRQ_TYPE_LEVEL_HIGH>;
};
pmu-e {
compatible = "apple,icestorm-pmu";
interrupt-parent = <&aic>;
interrupts = <AIC_FIQ AIC_CPU_PMU_E IRQ_TYPE_LEVEL_HIGH>;
};
pmu-p {
compatible = "apple,firestorm-pmu";
interrupt-parent = <&aic>;
interrupts = <AIC_FIQ AIC_CPU_PMU_P IRQ_TYPE_LEVEL_HIGH>;
};
clkref: clock-ref {
compatible = "fixed-clock";
#clock-cells = <0>;
......@@ -213,6 +225,18 @@ aic: interrupt-controller@23b100000 {
interrupt-controller;
reg = <0x2 0x3b100000 0x0 0x8000>;
power-domains = <&ps_aic>;
affinities {
e-core-pmu-affinity {
apple,fiq-index = <AIC_CPU_PMU_E>;
cpus = <&cpu0 &cpu1 &cpu2 &cpu3>;
};
p-core-pmu-affinity {
apple,fiq-index = <AIC_CPU_PMU_P>;
cpus = <&cpu4 &cpu5 &cpu6 &cpu7>;
};
};
};
pmgr: power-management@23b700000 {
......
// SPDX-License-Identifier: GPL-2.0
#ifndef __ASM_APPLE_M1_PMU_h
#define __ASM_APPLE_M1_PMU_h
#include <linux/bits.h>
#include <asm/sysreg.h>
/* Counters */
#define SYS_IMP_APL_PMC0_EL1 sys_reg(3, 2, 15, 0, 0)
#define SYS_IMP_APL_PMC1_EL1 sys_reg(3, 2, 15, 1, 0)
#define SYS_IMP_APL_PMC2_EL1 sys_reg(3, 2, 15, 2, 0)
#define SYS_IMP_APL_PMC3_EL1 sys_reg(3, 2, 15, 3, 0)
#define SYS_IMP_APL_PMC4_EL1 sys_reg(3, 2, 15, 4, 0)
#define SYS_IMP_APL_PMC5_EL1 sys_reg(3, 2, 15, 5, 0)
#define SYS_IMP_APL_PMC6_EL1 sys_reg(3, 2, 15, 6, 0)
#define SYS_IMP_APL_PMC7_EL1 sys_reg(3, 2, 15, 7, 0)
#define SYS_IMP_APL_PMC8_EL1 sys_reg(3, 2, 15, 9, 0)
#define SYS_IMP_APL_PMC9_EL1 sys_reg(3, 2, 15, 10, 0)
/* Core PMC control register */
#define SYS_IMP_APL_PMCR0_EL1 sys_reg(3, 1, 15, 0, 0)
#define PMCR0_CNT_ENABLE_0_7 GENMASK(7, 0)
#define PMCR0_IMODE GENMASK(10, 8)
#define PMCR0_IMODE_OFF 0
#define PMCR0_IMODE_PMI 1
#define PMCR0_IMODE_AIC 2
#define PMCR0_IMODE_HALT 3
#define PMCR0_IMODE_FIQ 4
#define PMCR0_IACT BIT(11)
#define PMCR0_PMI_ENABLE_0_7 GENMASK(19, 12)
#define PMCR0_STOP_CNT_ON_PMI BIT(20)
#define PMCR0_CNT_GLOB_L2C_EVT BIT(21)
#define PMCR0_DEFER_PMI_TO_ERET BIT(22)
#define PMCR0_ALLOW_CNT_EN_EL0 BIT(30)
#define PMCR0_CNT_ENABLE_8_9 GENMASK(33, 32)
#define PMCR0_PMI_ENABLE_8_9 GENMASK(45, 44)
#define SYS_IMP_APL_PMCR1_EL1 sys_reg(3, 1, 15, 1, 0)
#define PMCR1_COUNT_A64_EL0_0_7 GENMASK(15, 8)
#define PMCR1_COUNT_A64_EL1_0_7 GENMASK(23, 16)
#define PMCR1_COUNT_A64_EL0_8_9 GENMASK(41, 40)
#define PMCR1_COUNT_A64_EL1_8_9 GENMASK(49, 48)
#define SYS_IMP_APL_PMCR2_EL1 sys_reg(3, 1, 15, 2, 0)
#define SYS_IMP_APL_PMCR3_EL1 sys_reg(3, 1, 15, 3, 0)
#define SYS_IMP_APL_PMCR4_EL1 sys_reg(3, 1, 15, 4, 0)
#define SYS_IMP_APL_PMESR0_EL1 sys_reg(3, 1, 15, 5, 0)
#define PMESR0_EVT_CNT_2 GENMASK(7, 0)
#define PMESR0_EVT_CNT_3 GENMASK(15, 8)
#define PMESR0_EVT_CNT_4 GENMASK(23, 16)
#define PMESR0_EVT_CNT_5 GENMASK(31, 24)
#define SYS_IMP_APL_PMESR1_EL1 sys_reg(3, 1, 15, 6, 0)
#define PMESR1_EVT_CNT_6 GENMASK(7, 0)
#define PMESR1_EVT_CNT_7 GENMASK(15, 8)
#define PMESR1_EVT_CNT_8 GENMASK(23, 16)
#define PMESR1_EVT_CNT_9 GENMASK(31, 24)
#define SYS_IMP_APL_PMSR_EL1 sys_reg(3, 1, 15, 13, 0)
#define PMSR_OVERFLOW GENMASK(9, 0)
#endif /* __ASM_APPLE_M1_PMU_h */
......@@ -53,17 +53,36 @@ static inline u64 gic_read_iar_common(void)
* The gicv3 of ThunderX requires a modified version for reading the
* IAR status to ensure data synchronization (access to icc_iar1_el1
* is not sync'ed before and after).
*
* Erratum 38545
*
* When a IAR register read races with a GIC interrupt RELEASE event,
* GIC-CPU interface could wrongly return a valid INTID to the CPU
* for an interrupt that is already released(non activated) instead of 0x3ff.
*
* To workaround this, return a valid interrupt ID only if there is a change
* in the active priority list after the IAR read.
*
* Common function used for both the workarounds since,
* 1. On Thunderx 88xx 1.x both erratas are applicable.
* 2. Having extra nops doesn't add any side effects for Silicons where
* erratum 23154 is not applicable.
*/
static inline u64 gic_read_iar_cavium_thunderx(void)
{
u64 irqstat;
u64 irqstat, apr;
apr = read_sysreg_s(SYS_ICC_AP1R0_EL1);
nops(8);
irqstat = read_sysreg_s(SYS_ICC_IAR1_EL1);
nops(4);
mb();
return irqstat;
/* Max priority groups implemented is only 32 */
if (likely(apr != read_sysreg_s(SYS_ICC_AP1R0_EL1)))
return irqstat;
return 0x3ff;
}
static inline void gic_write_ctlr(u32 val)
......
......@@ -42,13 +42,47 @@ static inline bool __arm64_rndr(unsigned long *v)
return ok;
}
static inline bool __arm64_rndrrs(unsigned long *v)
{
bool ok;
/*
* Reads of RNDRRS set PSTATE.NZCV to 0b0000 on success,
* and set PSTATE.NZCV to 0b0100 otherwise.
*/
asm volatile(
__mrs_s("%0", SYS_RNDRRS_EL0) "\n"
" cset %w1, ne\n"
: "=r" (*v), "=r" (ok)
:
: "cc");
return ok;
}
static inline bool __must_check arch_get_random_long(unsigned long *v)
{
/*
* Only support the generic interface after we have detected
* the system wide capability, avoiding complexity with the
* cpufeature code and with potential scheduling between CPUs
* with and without the feature.
*/
if (cpus_have_const_cap(ARM64_HAS_RNG) && __arm64_rndr(v))
return true;
return false;
}
static inline bool __must_check arch_get_random_int(unsigned int *v)
{
if (cpus_have_const_cap(ARM64_HAS_RNG)) {
unsigned long val;
if (__arm64_rndr(&val)) {
*v = val;
return true;
}
}
return false;
}
......@@ -71,12 +105,11 @@ static inline bool __must_check arch_get_random_seed_long(unsigned long *v)
}
/*
* Only support the generic interface after we have detected
* the system wide capability, avoiding complexity with the
* cpufeature code and with potential scheduling between CPUs
* with and without the feature.
* RNDRRS is not backed by an entropy source but by a DRBG that is
* reseeded after each invocation. This is not a 100% fit but good
* enough to implement this API if no other entropy source exists.
*/
if (cpus_have_const_cap(ARM64_HAS_RNG) && __arm64_rndr(v))
if (cpus_have_const_cap(ARM64_HAS_RNG) && __arm64_rndrrs(v))
return true;
return false;
......@@ -96,7 +129,7 @@ static inline bool __must_check arch_get_random_seed_int(unsigned int *v)
}
if (cpus_have_const_cap(ARM64_HAS_RNG)) {
if (__arm64_rndr(&val)) {
if (__arm64_rndrrs(&val)) {
*v = val;
return true;
}
......
......@@ -60,6 +60,9 @@ alternative_else_nop_endif
.macro __ptrauth_keys_init_cpu tsk, tmp1, tmp2, tmp3
mrs \tmp1, id_aa64isar1_el1
ubfx \tmp1, \tmp1, #ID_AA64ISAR1_APA_SHIFT, #8
mrs_s \tmp2, SYS_ID_AA64ISAR2_EL1
ubfx \tmp2, \tmp2, #ID_AA64ISAR2_APA3_SHIFT, #4
orr \tmp1, \tmp1, \tmp2
cbz \tmp1, .Lno_addr_auth\@
mov_q \tmp1, (SCTLR_ELx_ENIA | SCTLR_ELx_ENIB | \
SCTLR_ELx_ENDA | SCTLR_ELx_ENDB)
......
......@@ -542,11 +542,6 @@ alternative_endif
#define EXPORT_SYMBOL_NOKASAN(name) EXPORT_SYMBOL(name)
#endif
#ifdef CONFIG_KASAN_HW_TAGS
#define EXPORT_SYMBOL_NOHWKASAN(name)
#else
#define EXPORT_SYMBOL_NOHWKASAN(name) EXPORT_SYMBOL_NOKASAN(name)
#endif
/*
* Emit a 64-bit absolute little endian symbol reference in a way that
* ensures that it will be resolved at build time, even when building a
......
......@@ -356,6 +356,7 @@ struct arm64_cpu_capabilities {
struct { /* Feature register checking */
u32 sys_reg;
u8 field_pos;
u8 field_width;
u8 min_field_value;
u8 hwcap_type;
bool sign;
......@@ -576,6 +577,8 @@ static inline u64 arm64_ftr_reg_user_value(const struct arm64_ftr_reg *reg)
static inline int __attribute_const__
cpuid_feature_extract_field_width(u64 features, int field, int width, bool sign)
{
if (WARN_ON_ONCE(!width))
width = 4;
return (sign) ?
cpuid_feature_extract_signed_field_width(features, field, width) :
cpuid_feature_extract_unsigned_field_width(features, field, width);
......@@ -883,6 +886,7 @@ static inline unsigned int get_vmid_bits(u64 mmfr1)
extern struct arm64_ftr_override id_aa64mmfr1_override;
extern struct arm64_ftr_override id_aa64pfr1_override;
extern struct arm64_ftr_override id_aa64isar1_override;
extern struct arm64_ftr_override id_aa64isar2_override;
u32 get_kvm_ipa_limit(void);
void dump_cpu_features(void);
......
......@@ -88,6 +88,13 @@
#define CAVIUM_CPU_PART_THUNDERX_81XX 0x0A2
#define CAVIUM_CPU_PART_THUNDERX_83XX 0x0A3
#define CAVIUM_CPU_PART_THUNDERX2 0x0AF
/* OcteonTx2 series */
#define CAVIUM_CPU_PART_OCTX2_98XX 0x0B1
#define CAVIUM_CPU_PART_OCTX2_96XX 0x0B2
#define CAVIUM_CPU_PART_OCTX2_95XX 0x0B3
#define CAVIUM_CPU_PART_OCTX2_95XXN 0x0B4
#define CAVIUM_CPU_PART_OCTX2_95XXMM 0x0B5
#define CAVIUM_CPU_PART_OCTX2_95XXO 0x0B6
#define BRCM_CPU_PART_BRAHMA_B53 0x100
#define BRCM_CPU_PART_VULCAN 0x516
......@@ -132,6 +139,12 @@
#define MIDR_THUNDERX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_THUNDERX)
#define MIDR_THUNDERX_81XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_THUNDERX_81XX)
#define MIDR_THUNDERX_83XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_THUNDERX_83XX)
#define MIDR_OCTX2_98XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_OCTX2_98XX)
#define MIDR_OCTX2_96XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_OCTX2_96XX)
#define MIDR_OCTX2_95XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_OCTX2_95XX)
#define MIDR_OCTX2_95XXN MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_OCTX2_95XXN)
#define MIDR_OCTX2_95XXMM MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_OCTX2_95XXMM)
#define MIDR_OCTX2_95XXO MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_OCTX2_95XXO)
#define MIDR_CAVIUM_THUNDERX2 MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_THUNDERX2)
#define MIDR_BRAHMA_B53 MIDR_CPU_MODEL(ARM_CPU_IMP_BRCM, BRCM_CPU_PART_BRAHMA_B53)
#define MIDR_BRCM_VULCAN MIDR_CPU_MODEL(ARM_CPU_IMP_BRCM, BRCM_CPU_PART_VULCAN)
......
......@@ -34,18 +34,6 @@
*/
#define BREAK_INSTR_SIZE AARCH64_INSN_SIZE
/*
* BRK instruction encoding
* The #imm16 value should be placed at bits[20:5] within BRK ins
*/
#define AARCH64_BREAK_MON 0xd4200000
/*
* BRK instruction for provoking a fault on purpose
* Unlike kgdb, #imm16 value with unallocated handler is used for faulting.
*/
#define AARCH64_BREAK_FAULT (AARCH64_BREAK_MON | (FAULT_BRK_IMM << 5))
#define AARCH64_BREAK_KGDB_DYN_DBG \
(AARCH64_BREAK_MON | (KGDB_DYN_DBG_BRK_IMM << 5))
......
......@@ -108,6 +108,7 @@
#define KERNEL_HWCAP_ECV __khwcap2_feature(ECV)
#define KERNEL_HWCAP_AFP __khwcap2_feature(AFP)
#define KERNEL_HWCAP_RPRES __khwcap2_feature(RPRES)
#define KERNEL_HWCAP_MTE3 __khwcap2_feature(MTE3)
/*
* This yields a mask that user programs can use to figure out what
......
......@@ -3,7 +3,21 @@
#ifndef __ASM_INSN_DEF_H
#define __ASM_INSN_DEF_H
#include <asm/brk-imm.h>
/* A64 instructions are always 32 bits. */
#define AARCH64_INSN_SIZE 4
/*
* BRK instruction encoding
* The #imm16 value should be placed at bits[20:5] within BRK ins
*/
#define AARCH64_BREAK_MON 0xd4200000
/*
* BRK instruction for provoking a fault on purpose
* Unlike kgdb, #imm16 value with unallocated handler is used for faulting.
*/
#define AARCH64_BREAK_FAULT (AARCH64_BREAK_MON | (FAULT_BRK_IMM << 5))
#endif /* __ASM_INSN_DEF_H */
......@@ -206,7 +206,9 @@ enum aarch64_insn_ldst_type {
AARCH64_INSN_LDST_LOAD_PAIR_POST_INDEX,
AARCH64_INSN_LDST_STORE_PAIR_POST_INDEX,
AARCH64_INSN_LDST_LOAD_EX,
AARCH64_INSN_LDST_LOAD_ACQ_EX,
AARCH64_INSN_LDST_STORE_EX,
AARCH64_INSN_LDST_STORE_REL_EX,
};
enum aarch64_insn_adsb_type {
......@@ -281,6 +283,36 @@ enum aarch64_insn_adr_type {
AARCH64_INSN_ADR_TYPE_ADR,
};
enum aarch64_insn_mem_atomic_op {
AARCH64_INSN_MEM_ATOMIC_ADD,
AARCH64_INSN_MEM_ATOMIC_CLR,
AARCH64_INSN_MEM_ATOMIC_EOR,
AARCH64_INSN_MEM_ATOMIC_SET,
AARCH64_INSN_MEM_ATOMIC_SWP,
};
enum aarch64_insn_mem_order_type {
AARCH64_INSN_MEM_ORDER_NONE,
AARCH64_INSN_MEM_ORDER_ACQ,
AARCH64_INSN_MEM_ORDER_REL,
AARCH64_INSN_MEM_ORDER_ACQREL,
};
enum aarch64_insn_mb_type {
AARCH64_INSN_MB_SY,
AARCH64_INSN_MB_ST,
AARCH64_INSN_MB_LD,
AARCH64_INSN_MB_ISH,
AARCH64_INSN_MB_ISHST,
AARCH64_INSN_MB_ISHLD,
AARCH64_INSN_MB_NSH,
AARCH64_INSN_MB_NSHST,
AARCH64_INSN_MB_NSHLD,
AARCH64_INSN_MB_OSH,
AARCH64_INSN_MB_OSHST,
AARCH64_INSN_MB_OSHLD,
};
#define __AARCH64_INSN_FUNCS(abbr, mask, val) \
static __always_inline bool aarch64_insn_is_##abbr(u32 code) \
{ \
......@@ -304,6 +336,11 @@ __AARCH64_INSN_FUNCS(store_post, 0x3FE00C00, 0x38000400)
__AARCH64_INSN_FUNCS(load_post, 0x3FE00C00, 0x38400400)
__AARCH64_INSN_FUNCS(str_reg, 0x3FE0EC00, 0x38206800)
__AARCH64_INSN_FUNCS(ldadd, 0x3F20FC00, 0x38200000)
__AARCH64_INSN_FUNCS(ldclr, 0x3F20FC00, 0x38201000)
__AARCH64_INSN_FUNCS(ldeor, 0x3F20FC00, 0x38202000)
__AARCH64_INSN_FUNCS(ldset, 0x3F20FC00, 0x38203000)
__AARCH64_INSN_FUNCS(swp, 0x3F20FC00, 0x38208000)
__AARCH64_INSN_FUNCS(cas, 0x3FA07C00, 0x08A07C00)
__AARCH64_INSN_FUNCS(ldr_reg, 0x3FE0EC00, 0x38606800)
__AARCH64_INSN_FUNCS(ldr_lit, 0xBF000000, 0x18000000)
__AARCH64_INSN_FUNCS(ldrsw_lit, 0xFF000000, 0x98000000)
......@@ -475,13 +512,6 @@ u32 aarch64_insn_gen_load_store_ex(enum aarch64_insn_register reg,
enum aarch64_insn_register state,
enum aarch64_insn_size_type size,
enum aarch64_insn_ldst_type type);
u32 aarch64_insn_gen_ldadd(enum aarch64_insn_register result,
enum aarch64_insn_register address,
enum aarch64_insn_register value,
enum aarch64_insn_size_type size);
u32 aarch64_insn_gen_stadd(enum aarch64_insn_register address,
enum aarch64_insn_register value,
enum aarch64_insn_size_type size);
u32 aarch64_insn_gen_add_sub_imm(enum aarch64_insn_register dst,
enum aarch64_insn_register src,
int imm, enum aarch64_insn_variant variant,
......@@ -542,6 +572,42 @@ u32 aarch64_insn_gen_prefetch(enum aarch64_insn_register base,
enum aarch64_insn_prfm_type type,
enum aarch64_insn_prfm_target target,
enum aarch64_insn_prfm_policy policy);
#ifdef CONFIG_ARM64_LSE_ATOMICS
u32 aarch64_insn_gen_atomic_ld_op(enum aarch64_insn_register result,
enum aarch64_insn_register address,
enum aarch64_insn_register value,
enum aarch64_insn_size_type size,
enum aarch64_insn_mem_atomic_op op,
enum aarch64_insn_mem_order_type order);
u32 aarch64_insn_gen_cas(enum aarch64_insn_register result,
enum aarch64_insn_register address,
enum aarch64_insn_register value,
enum aarch64_insn_size_type size,
enum aarch64_insn_mem_order_type order);
#else
static inline
u32 aarch64_insn_gen_atomic_ld_op(enum aarch64_insn_register result,
enum aarch64_insn_register address,
enum aarch64_insn_register value,
enum aarch64_insn_size_type size,
enum aarch64_insn_mem_atomic_op op,
enum aarch64_insn_mem_order_type order)
{
return AARCH64_BREAK_FAULT;
}
static inline
u32 aarch64_insn_gen_cas(enum aarch64_insn_register result,
enum aarch64_insn_register address,
enum aarch64_insn_register value,
enum aarch64_insn_size_type size,
enum aarch64_insn_mem_order_type order)
{
return AARCH64_BREAK_FAULT;
}
#endif
u32 aarch64_insn_gen_dmb(enum aarch64_insn_mb_type type);
s32 aarch64_get_branch_offset(u32 insn);
u32 aarch64_set_branch_offset(u32 insn, s32 offset);
......
......@@ -355,8 +355,8 @@
ECN(SOFTSTP_CUR), ECN(WATCHPT_LOW), ECN(WATCHPT_CUR), \
ECN(BKPT32), ECN(VECTOR32), ECN(BRK64)
#define CPACR_EL1_FPEN (3 << 20)
#define CPACR_EL1_TTA (1 << 28)
#define CPACR_EL1_DEFAULT (CPACR_EL1_FPEN | CPACR_EL1_ZEN_EL1EN)
#define CPACR_EL1_DEFAULT (CPACR_EL1_FPEN_EL0EN | CPACR_EL1_FPEN_EL1EN |\
CPACR_EL1_ZEN_EL1EN)
#endif /* __ARM64_KVM_ARM_H__ */
......@@ -118,6 +118,7 @@ extern u64 kvm_nvhe_sym(id_aa64pfr0_el1_sys_val);
extern u64 kvm_nvhe_sym(id_aa64pfr1_el1_sys_val);
extern u64 kvm_nvhe_sym(id_aa64isar0_el1_sys_val);
extern u64 kvm_nvhe_sym(id_aa64isar1_el1_sys_val);
extern u64 kvm_nvhe_sym(id_aa64isar2_el1_sys_val);
extern u64 kvm_nvhe_sym(id_aa64mmfr0_el1_sys_val);
extern u64 kvm_nvhe_sym(id_aa64mmfr1_el1_sys_val);
extern u64 kvm_nvhe_sym(id_aa64mmfr2_el1_sys_val);
......
......@@ -39,28 +39,4 @@
SYM_START(name, SYM_L_WEAK, SYM_A_NONE) \
bti c ;
/*
* Annotate a function as position independent, i.e., safe to be called before
* the kernel virtual mapping is activated.
*/
#define SYM_FUNC_START_PI(x) \
SYM_FUNC_START_ALIAS(__pi_##x); \
SYM_FUNC_START(x)
#define SYM_FUNC_START_WEAK_PI(x) \
SYM_FUNC_START_ALIAS(__pi_##x); \
SYM_FUNC_START_WEAK(x)
#define SYM_FUNC_START_WEAK_ALIAS_PI(x) \
SYM_FUNC_START_ALIAS(__pi_##x); \
SYM_START(x, SYM_L_WEAK, SYM_A_ALIGN)
#define SYM_FUNC_END_PI(x) \
SYM_FUNC_END(x); \
SYM_FUNC_END_ALIAS(__pi_##x)
#define SYM_FUNC_END_ALIAS_PI(x) \
SYM_FUNC_END_ALIAS(x); \
SYM_FUNC_END_ALIAS(__pi_##x)
#endif
......@@ -17,12 +17,10 @@
#include <asm/cpucaps.h>
extern struct static_key_false cpu_hwcap_keys[ARM64_NCAPS];
extern struct static_key_false arm64_const_caps_ready;
static inline bool system_uses_lse_atomics(void)
static __always_inline bool system_uses_lse_atomics(void)
{
return (static_branch_likely(&arm64_const_caps_ready)) &&
static_branch_likely(&cpu_hwcap_keys[ARM64_HAS_LSE_ATOMICS]);
return static_branch_likely(&cpu_hwcap_keys[ARM64_HAS_LSE_ATOMICS]);
}
#define __lse_ll_sc_body(op, ...) \
......
SECTIONS {
#ifdef CONFIG_ARM64_MODULE_PLTS
.plt 0 (NOLOAD) : { BYTE(0) }
.init.plt 0 (NOLOAD) : { BYTE(0) }
.text.ftrace_trampoline 0 (NOLOAD) : { BYTE(0) }
.plt 0 : { BYTE(0) }
.init.plt 0 : { BYTE(0) }
.text.ftrace_trampoline 0 : { BYTE(0) }
#endif
#ifdef CONFIG_KASAN_SW_TAGS
......
......@@ -11,6 +11,7 @@
#define MTE_TAG_SHIFT 56
#define MTE_TAG_SIZE 4
#define MTE_TAG_MASK GENMASK((MTE_TAG_SHIFT + (MTE_TAG_SIZE - 1)), MTE_TAG_SHIFT)
#define MTE_PAGE_TAG_STORAGE (MTE_GRANULES_PER_PAGE * MTE_TAG_SIZE / 8)
#define __MTE_PREAMBLE ARM64_ASM_PREAMBLE ".arch_extension memtag\n"
......
......@@ -11,7 +11,9 @@
#ifndef __ASSEMBLY__
#include <linux/bitfield.h>
#include <linux/kasan-enabled.h>
#include <linux/page-flags.h>
#include <linux/sched.h>
#include <linux/types.h>
#include <asm/pgtable-types.h>
......@@ -86,6 +88,26 @@ static inline int mte_ptrace_copy_tags(struct task_struct *child,
#endif /* CONFIG_ARM64_MTE */
static inline void mte_disable_tco_entry(struct task_struct *task)
{
if (!system_supports_mte())
return;
/*
* Re-enable tag checking (TCO set on exception entry). This is only
* necessary if MTE is enabled in either the kernel or the userspace
* task in synchronous or asymmetric mode (SCTLR_EL1.TCF0 bit 0 is set
* for both). With MTE disabled in the kernel and disabled or
* asynchronous in userspace, tag check faults (including in uaccesses)
* are not reported, therefore there is no need to re-enable checking.
* This is beneficial on microarchitectures where re-enabling TCO is
* expensive.
*/
if (kasan_hw_tags_enabled() ||
(task->thread.sctlr_user & (1UL << SCTLR_EL1_TCF0_SHIFT)))
asm volatile(SET_PSTATE_TCO(0));
}
#ifdef CONFIG_KASAN_HW_TAGS
/* Whether the MTE asynchronous mode is enabled. */
DECLARE_STATIC_KEY_FALSE(mte_async_or_asymm_mode);
......
This diff is collapsed.
......@@ -273,6 +273,8 @@
#define TCR_NFD1 (UL(1) << 54)
#define TCR_E0PD0 (UL(1) << 55)
#define TCR_E0PD1 (UL(1) << 56)
#define TCR_TCMA0 (UL(1) << 57)
#define TCR_TCMA1 (UL(1) << 58)
/*
* TTBR.
......
......@@ -21,6 +21,7 @@
#define MTE_CTRL_TCF_SYNC (1UL << 16)
#define MTE_CTRL_TCF_ASYNC (1UL << 17)
#define MTE_CTRL_TCF_ASYMM (1UL << 18)
#ifndef __ASSEMBLY__
......
......@@ -67,7 +67,8 @@ struct bp_hardening_data {
DECLARE_PER_CPU_READ_MOSTLY(struct bp_hardening_data, bp_hardening_data);
static inline void arm64_apply_bp_hardening(void)
/* Called during entry so must be __always_inline */
static __always_inline void arm64_apply_bp_hardening(void)
{
struct bp_hardening_data *d;
......
......@@ -12,13 +12,11 @@ extern char *strrchr(const char *, int c);
#define __HAVE_ARCH_STRCHR
extern char *strchr(const char *, int c);
#ifndef CONFIG_KASAN_HW_TAGS
#define __HAVE_ARCH_STRCMP
extern int strcmp(const char *, const char *);
#define __HAVE_ARCH_STRNCMP
extern int strncmp(const char *, const char *, __kernel_size_t);
#endif
#define __HAVE_ARCH_STRLEN
extern __kernel_size_t strlen(const char *);
......
......@@ -774,6 +774,8 @@
/* id_aa64isar2 */
#define ID_AA64ISAR2_CLEARBHB_SHIFT 28
#define ID_AA64ISAR2_APA3_SHIFT 12
#define ID_AA64ISAR2_GPA3_SHIFT 8
#define ID_AA64ISAR2_RPRES_SHIFT 4
#define ID_AA64ISAR2_WFXT_SHIFT 0
......@@ -787,6 +789,16 @@
#define ID_AA64ISAR2_WFXT_NI 0x0
#define ID_AA64ISAR2_WFXT_SUPPORTED 0x2
#define ID_AA64ISAR2_APA3_NI 0x0
#define ID_AA64ISAR2_APA3_ARCHITECTED 0x1
#define ID_AA64ISAR2_APA3_ARCH_EPAC 0x2
#define ID_AA64ISAR2_APA3_ARCH_EPAC2 0x3
#define ID_AA64ISAR2_APA3_ARCH_EPAC2_FPAC 0x4
#define ID_AA64ISAR2_APA3_ARCH_EPAC2_FPAC_CMB 0x5
#define ID_AA64ISAR2_GPA3_NI 0x0
#define ID_AA64ISAR2_GPA3_ARCHITECTED 0x1
/* id_aa64pfr0 */
#define ID_AA64PFR0_CSV3_SHIFT 60
#define ID_AA64PFR0_CSV2_SHIFT 56
......@@ -1099,13 +1111,11 @@
#define ZCR_ELx_LEN_SIZE 9
#define ZCR_ELx_LEN_MASK 0x1ff
#define CPACR_EL1_FPEN_EL1EN (BIT(20)) /* enable EL1 access */
#define CPACR_EL1_FPEN_EL0EN (BIT(21)) /* enable EL0 access, if EL1EN set */
#define CPACR_EL1_ZEN_EL1EN (BIT(16)) /* enable EL1 access */
#define CPACR_EL1_ZEN_EL0EN (BIT(17)) /* enable EL0 access, if EL1EN set */
#define CPACR_EL1_ZEN (CPACR_EL1_ZEN_EL1EN | CPACR_EL1_ZEN_EL0EN)
/* TCR EL1 Bit Definitions */
#define SYS_TCR_EL1_TCMA1 (BIT(58))
#define SYS_TCR_EL1_TCMA0 (BIT(57))
/* GCR_EL1 Definitions */
#define SYS_GCR_EL1_RRND (BIT(16))
......
......@@ -78,5 +78,6 @@
#define HWCAP2_ECV (1 << 19)
#define HWCAP2_AFP (1 << 20)
#define HWCAP2_RPRES (1 << 21)
#define HWCAP2_MTE3 (1 << 22)
#endif /* _UAPI__ASM_HWCAP_H */
......@@ -61,6 +61,7 @@ obj-$(CONFIG_ARM64_ACPI_PARKING_PROTOCOL) += acpi_parking_protocol.o
obj-$(CONFIG_PARAVIRT) += paravirt.o
obj-$(CONFIG_RANDOMIZE_BASE) += kaslr.o
obj-$(CONFIG_HIBERNATION) += hibernate.o hibernate-asm.o
obj-$(CONFIG_ELF_CORE) += elfcore.o
obj-$(CONFIG_KEXEC_CORE) += machine_kexec.o relocate_kernel.o \
cpu-reset.o
obj-$(CONFIG_KEXEC_FILE) += machine_kexec_file.o kexec_image.o
......
......@@ -214,6 +214,21 @@ static const struct arm64_cpu_capabilities arm64_repeat_tlbi_list[] = {
};
#endif
#ifdef CONFIG_CAVIUM_ERRATUM_23154
const struct midr_range cavium_erratum_23154_cpus[] = {
MIDR_ALL_VERSIONS(MIDR_THUNDERX),
MIDR_ALL_VERSIONS(MIDR_THUNDERX_81XX),
MIDR_ALL_VERSIONS(MIDR_THUNDERX_83XX),
MIDR_ALL_VERSIONS(MIDR_OCTX2_98XX),
MIDR_ALL_VERSIONS(MIDR_OCTX2_96XX),
MIDR_ALL_VERSIONS(MIDR_OCTX2_95XX),
MIDR_ALL_VERSIONS(MIDR_OCTX2_95XXN),
MIDR_ALL_VERSIONS(MIDR_OCTX2_95XXMM),
MIDR_ALL_VERSIONS(MIDR_OCTX2_95XXO),
{},
};
#endif
#ifdef CONFIG_CAVIUM_ERRATUM_27456
const struct midr_range cavium_erratum_27456_cpus[] = {
/* Cavium ThunderX, T88 pass 1.x - 2.1 */
......@@ -425,10 +440,10 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
#endif
#ifdef CONFIG_CAVIUM_ERRATUM_23154
{
/* Cavium ThunderX, pass 1.x */
.desc = "Cavium erratum 23154",
.desc = "Cavium errata 23154 and 38545",
.capability = ARM64_WORKAROUND_CAVIUM_23154,
ERRATA_MIDR_REV_RANGE(MIDR_THUNDERX, 0, 0, 1),
.type = ARM64_CPUCAP_LOCAL_CPU_ERRATUM,
ERRATA_MIDR_RANGE_LIST(cavium_erratum_23154_cpus),
},
#endif
#ifdef CONFIG_CAVIUM_ERRATUM_27456
......
This diff is collapsed.
......@@ -97,6 +97,7 @@ static const char *const hwcap_str[] = {
[KERNEL_HWCAP_ECV] = "ecv",
[KERNEL_HWCAP_AFP] = "afp",
[KERNEL_HWCAP_RPRES] = "rpres",
[KERNEL_HWCAP_MTE3] = "mte3",
};
#ifdef CONFIG_COMPAT
......
......@@ -20,6 +20,12 @@ void arch_crash_save_vmcoreinfo(void)
{
VMCOREINFO_NUMBER(VA_BITS);
/* Please note VMCOREINFO_NUMBER() uses "%d", not "%x" */
vmcoreinfo_append_str("NUMBER(MODULES_VADDR)=0x%lx\n", MODULES_VADDR);
vmcoreinfo_append_str("NUMBER(MODULES_END)=0x%lx\n", MODULES_END);
vmcoreinfo_append_str("NUMBER(VMALLOC_START)=0x%lx\n", VMALLOC_START);
vmcoreinfo_append_str("NUMBER(VMALLOC_END)=0x%lx\n", VMALLOC_END);
vmcoreinfo_append_str("NUMBER(VMEMMAP_START)=0x%lx\n", VMEMMAP_START);
vmcoreinfo_append_str("NUMBER(VMEMMAP_END)=0x%lx\n", VMEMMAP_END);
vmcoreinfo_append_str("NUMBER(kimage_voffset)=0x%llx\n",
kimage_voffset);
vmcoreinfo_append_str("NUMBER(PHYS_OFFSET)=0x%llx\n",
......
// SPDX-License-Identifier: GPL-2.0-only
#include <linux/coredump.h>
#include <linux/elfcore.h>
#include <linux/kernel.h>
#include <linux/mm.h>
#include <asm/cpufeature.h>
#include <asm/mte.h>
#ifndef VMA_ITERATOR
#define VMA_ITERATOR(name, mm, addr) \
struct mm_struct *name = mm
#define for_each_vma(vmi, vma) \
for (vma = vmi->mmap; vma; vma = vma->vm_next)
#endif
#define for_each_mte_vma(vmi, vma) \
if (system_supports_mte()) \
for_each_vma(vmi, vma) \
if (vma->vm_flags & VM_MTE)
static unsigned long mte_vma_tag_dump_size(struct vm_area_struct *vma)
{
if (vma->vm_flags & VM_DONTDUMP)
return 0;
return vma_pages(vma) * MTE_PAGE_TAG_STORAGE;
}
/* Derived from dump_user_range(); start/end must be page-aligned */
static int mte_dump_tag_range(struct coredump_params *cprm,
unsigned long start, unsigned long end)
{
unsigned long addr;
for (addr = start; addr < end; addr += PAGE_SIZE) {
char tags[MTE_PAGE_TAG_STORAGE];
struct page *page = get_dump_page(addr);
/*
* get_dump_page() returns NULL when encountering an empty
* page table entry that would otherwise have been filled with
* the zero page. Skip the equivalent tag dump which would
* have been all zeros.
*/
if (!page) {
dump_skip(cprm, MTE_PAGE_TAG_STORAGE);
continue;
}
/*
* Pages mapped in user space as !pte_access_permitted() (e.g.
* PROT_EXEC only) may not have the PG_mte_tagged flag set.
*/
if (!test_bit(PG_mte_tagged, &page->flags)) {
put_page(page);
dump_skip(cprm, MTE_PAGE_TAG_STORAGE);
continue;
}
mte_save_page_tags(page_address(page), tags);
put_page(page);
if (!dump_emit(cprm, tags, MTE_PAGE_TAG_STORAGE))
return 0;
}
return 1;
}
Elf_Half elf_core_extra_phdrs(void)
{
struct vm_area_struct *vma;
int vma_count = 0;
VMA_ITERATOR(vmi, current->mm, 0);
for_each_mte_vma(vmi, vma)
vma_count++;
return vma_count;
}
int elf_core_write_extra_phdrs(struct coredump_params *cprm, loff_t offset)
{
struct vm_area_struct *vma;
VMA_ITERATOR(vmi, current->mm, 0);
for_each_mte_vma(vmi, vma) {
struct elf_phdr phdr;
phdr.p_type = PT_ARM_MEMTAG_MTE;
phdr.p_offset = offset;
phdr.p_vaddr = vma->vm_start;
phdr.p_paddr = 0;
phdr.p_filesz = mte_vma_tag_dump_size(vma);
phdr.p_memsz = vma->vm_end - vma->vm_start;
offset += phdr.p_filesz;
phdr.p_flags = 0;
phdr.p_align = 0;
if (!dump_emit(cprm, &phdr, sizeof(phdr)))
return 0;
}
return 1;
}
size_t elf_core_extra_data_size(void)
{
struct vm_area_struct *vma;
size_t data_size = 0;
VMA_ITERATOR(vmi, current->mm, 0);
for_each_mte_vma(vmi, vma)
data_size += mte_vma_tag_dump_size(vma);
return data_size;
}
int elf_core_write_extra_data(struct coredump_params *cprm)
{
struct vm_area_struct *vma;
VMA_ITERATOR(vmi, current->mm, 0);
for_each_mte_vma(vmi, vma) {
if (vma->vm_flags & VM_DONTDUMP)
continue;
if (!mte_dump_tag_range(cprm, vma->vm_start, vma->vm_end))
return 0;
}
return 1;
}
......@@ -6,6 +6,7 @@
*/
#include <linux/context_tracking.h>
#include <linux/kasan.h>
#include <linux/linkage.h>
#include <linux/lockdep.h>
#include <linux/ptrace.h>
......@@ -56,6 +57,7 @@ static void noinstr enter_from_kernel_mode(struct pt_regs *regs)
{
__enter_from_kernel_mode(regs);
mte_check_tfsr_entry();
mte_disable_tco_entry(current);
}
/*
......@@ -103,6 +105,7 @@ static __always_inline void __enter_from_user_mode(void)
CT_WARN_ON(ct_state() != CONTEXT_USER);
user_exit_irqoff();
trace_hardirqs_off_finish();
mte_disable_tco_entry(current);
}
static __always_inline void enter_from_user_mode(struct pt_regs *regs)
......
......@@ -307,6 +307,7 @@ alternative_else_nop_endif
str w21, [sp, #S_SYSCALLNO]
.endif
#ifdef CONFIG_ARM64_PSEUDO_NMI
/* Save pmr */
alternative_if ARM64_HAS_IRQ_PRIO_MASKING
mrs_s x20, SYS_ICC_PMR_EL1
......@@ -314,12 +315,6 @@ alternative_if ARM64_HAS_IRQ_PRIO_MASKING
mov x20, #GIC_PRIO_IRQON | GIC_PRIO_PSR_I_SET
msr_s SYS_ICC_PMR_EL1, x20
alternative_else_nop_endif
/* Re-enable tag checking (TCO set on exception entry) */
#ifdef CONFIG_ARM64_MTE
alternative_if ARM64_MTE
SET_PSTATE_TCO(0)
alternative_else_nop_endif
#endif
/*
......@@ -337,6 +332,7 @@ alternative_else_nop_endif
disable_daif
.endif
#ifdef CONFIG_ARM64_PSEUDO_NMI
/* Restore pmr */
alternative_if ARM64_HAS_IRQ_PRIO_MASKING
ldr x20, [sp, #S_PMR_SAVE]
......@@ -346,6 +342,7 @@ alternative_if ARM64_HAS_IRQ_PRIO_MASKING
dsb sy // Ensure priority change is seen by redistributor
.L__skip_pmr_sync\@:
alternative_else_nop_endif
#endif
ldp x21, x22, [sp, #S_PC] // load ELR, SPSR
......
......@@ -17,7 +17,7 @@
#define FTR_DESC_NAME_LEN 20
#define FTR_DESC_FIELD_LEN 10
#define FTR_ALIAS_NAME_LEN 30
#define FTR_ALIAS_OPTION_LEN 80
#define FTR_ALIAS_OPTION_LEN 116
struct ftr_set_desc {
char name[FTR_DESC_NAME_LEN];
......@@ -71,6 +71,16 @@ static const struct ftr_set_desc isar1 __initconst = {
},
};
static const struct ftr_set_desc isar2 __initconst = {
.name = "id_aa64isar2",
.override = &id_aa64isar2_override,
.fields = {
{ "gpa3", ID_AA64ISAR2_GPA3_SHIFT },
{ "apa3", ID_AA64ISAR2_APA3_SHIFT },
{}
},
};
extern struct arm64_ftr_override kaslr_feature_override;
static const struct ftr_set_desc kaslr __initconst = {
......@@ -88,6 +98,7 @@ static const struct ftr_set_desc * const regs[] __initconst = {
&mmfr1,
&pfr1,
&isar1,
&isar2,
&kaslr,
};
......@@ -100,7 +111,8 @@ static const struct {
{ "arm64.nobti", "id_aa64pfr1.bt=0" },
{ "arm64.nopauth",
"id_aa64isar1.gpi=0 id_aa64isar1.gpa=0 "
"id_aa64isar1.api=0 id_aa64isar1.apa=0" },
"id_aa64isar1.api=0 id_aa64isar1.apa=0 "
"id_aa64isar2.gpa3=0 id_aa64isar2.apa3=0" },
{ "arm64.nomte", "id_aa64pfr1.mte=0" },
{ "nokaslr", "kaslr.disabled=1" },
};
......
......@@ -186,6 +186,11 @@ void mte_check_tfsr_el1(void)
}
#endif
/*
* This is where we actually resolve the system and process MTE mode
* configuration into an actual value in SCTLR_EL1 that affects
* userspace.
*/
static void mte_update_sctlr_user(struct task_struct *task)
{
/*
......@@ -199,9 +204,20 @@ static void mte_update_sctlr_user(struct task_struct *task)
unsigned long pref, resolved_mte_tcf;
pref = __this_cpu_read(mte_tcf_preferred);
/*
* If there is no overlap between the system preferred and
* program requested values go with what was requested.
*/
resolved_mte_tcf = (mte_ctrl & pref) ? pref : mte_ctrl;
sctlr &= ~SCTLR_EL1_TCF0_MASK;
if (resolved_mte_tcf & MTE_CTRL_TCF_ASYNC)
/*
* Pick an actual setting. The order in which we check for
* set bits and map into register values determines our
* default order.
*/
if (resolved_mte_tcf & MTE_CTRL_TCF_ASYMM)
sctlr |= SCTLR_EL1_TCF0_ASYMM;
else if (resolved_mte_tcf & MTE_CTRL_TCF_ASYNC)
sctlr |= SCTLR_EL1_TCF0_ASYNC;
else if (resolved_mte_tcf & MTE_CTRL_TCF_SYNC)
sctlr |= SCTLR_EL1_TCF0_SYNC;
......@@ -253,6 +269,9 @@ void mte_thread_switch(struct task_struct *next)
mte_update_sctlr_user(next);
mte_update_gcr_excl(next);
/* TCO may not have been disabled on exception entry for the current task. */
mte_disable_tco_entry(next);
/*
* Check if an async tag exception occurred at EL1.
*
......@@ -293,6 +312,17 @@ long set_mte_ctrl(struct task_struct *task, unsigned long arg)
if (arg & PR_MTE_TCF_SYNC)
mte_ctrl |= MTE_CTRL_TCF_SYNC;
/*
* If the system supports it and both sync and async modes are
* specified then implicitly enable asymmetric mode.
* Userspace could see a mix of both sync and async anyway due
* to differing or changing defaults on CPUs.
*/
if (cpus_have_cap(ARM64_MTE_ASYMM) &&
(arg & PR_MTE_TCF_ASYNC) &&
(arg & PR_MTE_TCF_SYNC))
mte_ctrl |= MTE_CTRL_TCF_ASYMM;
task->thread.mte_ctrl = mte_ctrl;
if (task == current) {
preempt_disable();
......@@ -467,6 +497,8 @@ static ssize_t mte_tcf_preferred_show(struct device *dev,
return sysfs_emit(buf, "async\n");
case MTE_CTRL_TCF_SYNC:
return sysfs_emit(buf, "sync\n");
case MTE_CTRL_TCF_ASYMM:
return sysfs_emit(buf, "asymm\n");
default:
return sysfs_emit(buf, "???\n");
}
......@@ -482,6 +514,8 @@ static ssize_t mte_tcf_preferred_store(struct device *dev,
tcf = MTE_CTRL_TCF_ASYNC;
else if (sysfs_streq(buf, "sync"))
tcf = MTE_CTRL_TCF_SYNC;
else if (cpus_have_cap(ARM64_MTE_ASYMM) && sysfs_streq(buf, "asymm"))
tcf = MTE_CTRL_TCF_ASYMM;
else
return -EINVAL;
......
......@@ -242,6 +242,16 @@ static struct attribute *armv8_pmuv3_event_attrs[] = {
ARMV8_EVENT_ATTR(l2d_cache_lmiss_rd, ARMV8_PMUV3_PERFCTR_L2D_CACHE_LMISS_RD),
ARMV8_EVENT_ATTR(l2i_cache_lmiss, ARMV8_PMUV3_PERFCTR_L2I_CACHE_LMISS),
ARMV8_EVENT_ATTR(l3d_cache_lmiss_rd, ARMV8_PMUV3_PERFCTR_L3D_CACHE_LMISS_RD),
ARMV8_EVENT_ATTR(trb_wrap, ARMV8_PMUV3_PERFCTR_TRB_WRAP),
ARMV8_EVENT_ATTR(trb_trig, ARMV8_PMUV3_PERFCTR_TRB_TRIG),
ARMV8_EVENT_ATTR(trcextout0, ARMV8_PMUV3_PERFCTR_TRCEXTOUT0),
ARMV8_EVENT_ATTR(trcextout1, ARMV8_PMUV3_PERFCTR_TRCEXTOUT1),
ARMV8_EVENT_ATTR(trcextout2, ARMV8_PMUV3_PERFCTR_TRCEXTOUT2),
ARMV8_EVENT_ATTR(trcextout3, ARMV8_PMUV3_PERFCTR_TRCEXTOUT3),
ARMV8_EVENT_ATTR(cti_trigout4, ARMV8_PMUV3_PERFCTR_CTI_TRIGOUT4),
ARMV8_EVENT_ATTR(cti_trigout5, ARMV8_PMUV3_PERFCTR_CTI_TRIGOUT5),
ARMV8_EVENT_ATTR(cti_trigout6, ARMV8_PMUV3_PERFCTR_CTI_TRIGOUT6),
ARMV8_EVENT_ATTR(cti_trigout7, ARMV8_PMUV3_PERFCTR_CTI_TRIGOUT7),
ARMV8_EVENT_ATTR(ldst_align_lat, ARMV8_PMUV3_PERFCTR_LDST_ALIGN_LAT),
ARMV8_EVENT_ATTR(ld_align_lat, ARMV8_PMUV3_PERFCTR_LD_ALIGN_LAT),
ARMV8_EVENT_ATTR(st_align_lat, ARMV8_PMUV3_PERFCTR_ST_ALIGN_LAT),
......
......@@ -635,7 +635,8 @@ long set_tagged_addr_ctrl(struct task_struct *task, unsigned long arg)
return -EINVAL;
if (system_supports_mte())
valid_mask |= PR_MTE_TCF_MASK | PR_MTE_TAG_MASK;
valid_mask |= PR_MTE_TCF_SYNC | PR_MTE_TCF_ASYNC \
| PR_MTE_TAG_MASK;
if (arg & ~valid_mask)
return -EINVAL;
......
......@@ -233,17 +233,20 @@ static void install_bp_hardening_cb(bp_hardening_cb_t fn)
__this_cpu_write(bp_hardening_data.slot, HYP_VECTOR_SPECTRE_DIRECT);
}
static void call_smc_arch_workaround_1(void)
/* Called during entry so must be noinstr */
static noinstr void call_smc_arch_workaround_1(void)
{
arm_smccc_1_1_smc(ARM_SMCCC_ARCH_WORKAROUND_1, NULL);
}
static void call_hvc_arch_workaround_1(void)
/* Called during entry so must be noinstr */
static noinstr void call_hvc_arch_workaround_1(void)
{
arm_smccc_1_1_hvc(ARM_SMCCC_ARCH_WORKAROUND_1, NULL);
}
static void qcom_link_stack_sanitisation(void)
/* Called during entry so must be noinstr */
static noinstr void qcom_link_stack_sanitisation(void)
{
u64 tmp;
......
......@@ -11,7 +11,6 @@
#include <linux/errno.h>
#include <linux/kernel.h>
#include <linux/signal.h>
#include <linux/personality.h>
#include <linux/freezer.h>
#include <linux/stddef.h>
#include <linux/uaccess.h>
......@@ -577,10 +576,12 @@ static int setup_sigframe_layout(struct rt_sigframe_user_layout *user,
{
int err;
err = sigframe_alloc(user, &user->fpsimd_offset,
sizeof(struct fpsimd_context));
if (err)
return err;
if (system_supports_fpsimd()) {
err = sigframe_alloc(user, &user->fpsimd_offset,
sizeof(struct fpsimd_context));
if (err)
return err;
}
/* fault information, if valid */
if (add_all || current->thread.fault_code) {
......
......@@ -9,7 +9,6 @@
#include <linux/compat.h>
#include <linux/cpufeature.h>
#include <linux/personality.h>
#include <linux/sched.h>
#include <linux/sched/signal.h>
#include <linux/slab.h>
......
......@@ -9,7 +9,6 @@
#include <linux/bug.h>
#include <linux/context_tracking.h>
#include <linux/signal.h>
#include <linux/personality.h>
#include <linux/kallsyms.h>
#include <linux/kprobes.h>
#include <linux/spinlock.h>
......
......@@ -1867,6 +1867,7 @@ static int kvm_hyp_init_protection(u32 hyp_va_bits)
kvm_nvhe_sym(id_aa64pfr1_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64PFR1_EL1);
kvm_nvhe_sym(id_aa64isar0_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64ISAR0_EL1);
kvm_nvhe_sym(id_aa64isar1_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64ISAR1_EL1);
kvm_nvhe_sym(id_aa64isar2_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64ISAR2_EL1);
kvm_nvhe_sym(id_aa64mmfr0_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
kvm_nvhe_sym(id_aa64mmfr1_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
kvm_nvhe_sym(id_aa64mmfr2_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64MMFR2_EL1);
......
......@@ -174,9 +174,9 @@ static bool kvm_hyp_handle_fpsimd(struct kvm_vcpu *vcpu, u64 *exit_code)
/* Valid trap. Switch the context: */
if (has_vhe()) {
reg = CPACR_EL1_FPEN;
reg = CPACR_EL1_FPEN_EL0EN | CPACR_EL1_FPEN_EL1EN;
if (sve_guest)
reg |= CPACR_EL1_ZEN;
reg |= CPACR_EL1_ZEN_EL0EN | CPACR_EL1_ZEN_EL1EN;
sysreg_clear_set(cpacr_el1, 0, reg);
} else {
......
......@@ -192,6 +192,11 @@
ARM64_FEATURE_MASK(ID_AA64ISAR1_I8MM) \
)
#define PVM_ID_AA64ISAR2_ALLOW (\
ARM64_FEATURE_MASK(ID_AA64ISAR2_GPA3) | \
ARM64_FEATURE_MASK(ID_AA64ISAR2_APA3) \
)
u64 pvm_read_id_reg(const struct kvm_vcpu *vcpu, u32 id);
bool kvm_handle_pvm_sysreg(struct kvm_vcpu *vcpu, u64 *exit_code);
bool kvm_handle_pvm_restricted(struct kvm_vcpu *vcpu, u64 *exit_code);
......
......@@ -7,7 +7,8 @@
#include <asm/assembler.h>
#include <asm/alternative.h>
SYM_FUNC_START_PI(dcache_clean_inval_poc)
SYM_FUNC_START(__pi_dcache_clean_inval_poc)
dcache_by_line_op civac, sy, x0, x1, x2, x3
ret
SYM_FUNC_END_PI(dcache_clean_inval_poc)
SYM_FUNC_END(__pi_dcache_clean_inval_poc)
SYM_FUNC_ALIAS(dcache_clean_inval_poc, __pi_dcache_clean_inval_poc)
......@@ -22,6 +22,7 @@ u64 id_aa64pfr0_el1_sys_val;
u64 id_aa64pfr1_el1_sys_val;
u64 id_aa64isar0_el1_sys_val;
u64 id_aa64isar1_el1_sys_val;
u64 id_aa64isar2_el1_sys_val;
u64 id_aa64mmfr0_el1_sys_val;
u64 id_aa64mmfr1_el1_sys_val;
u64 id_aa64mmfr2_el1_sys_val;
......@@ -183,6 +184,17 @@ static u64 get_pvm_id_aa64isar1(const struct kvm_vcpu *vcpu)
return id_aa64isar1_el1_sys_val & allow_mask;
}
static u64 get_pvm_id_aa64isar2(const struct kvm_vcpu *vcpu)
{
u64 allow_mask = PVM_ID_AA64ISAR2_ALLOW;
if (!vcpu_has_ptrauth(vcpu))
allow_mask &= ~(ARM64_FEATURE_MASK(ID_AA64ISAR2_APA3) |
ARM64_FEATURE_MASK(ID_AA64ISAR2_GPA3));
return id_aa64isar2_el1_sys_val & allow_mask;
}
static u64 get_pvm_id_aa64mmfr0(const struct kvm_vcpu *vcpu)
{
u64 set_mask;
......@@ -225,6 +237,8 @@ u64 pvm_read_id_reg(const struct kvm_vcpu *vcpu, u32 id)
return get_pvm_id_aa64isar0(vcpu);
case SYS_ID_AA64ISAR1_EL1:
return get_pvm_id_aa64isar1(vcpu);
case SYS_ID_AA64ISAR2_EL1:
return get_pvm_id_aa64isar2(vcpu);
case SYS_ID_AA64MMFR0_EL1:
return get_pvm_id_aa64mmfr0(vcpu);
case SYS_ID_AA64MMFR1_EL1:
......
......@@ -41,7 +41,7 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
val = read_sysreg(cpacr_el1);
val |= CPACR_EL1_TTA;
val &= ~CPACR_EL1_ZEN;
val &= ~(CPACR_EL1_ZEN_EL0EN | CPACR_EL1_ZEN_EL1EN);
/*
* With VHE (HCR.E2H == 1), accesses to CPACR_EL1 are routed to
......@@ -56,9 +56,9 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
if (update_fp_enabled(vcpu)) {
if (vcpu_has_sve(vcpu))
val |= CPACR_EL1_ZEN;
val |= CPACR_EL1_ZEN_EL0EN | CPACR_EL1_ZEN_EL1EN;
} else {
val &= ~CPACR_EL1_FPEN;
val &= ~(CPACR_EL1_FPEN_EL0EN | CPACR_EL1_FPEN_EL1EN);
__activate_traps_fpsimd32(vcpu);
}
......
......@@ -1097,6 +1097,11 @@ static u64 read_id_reg(const struct kvm_vcpu *vcpu,
ARM64_FEATURE_MASK(ID_AA64ISAR1_GPA) |
ARM64_FEATURE_MASK(ID_AA64ISAR1_GPI));
break;
case SYS_ID_AA64ISAR2_EL1:
if (!vcpu_has_ptrauth(vcpu))
val &= ~(ARM64_FEATURE_MASK(ID_AA64ISAR2_APA3) |
ARM64_FEATURE_MASK(ID_AA64ISAR2_GPA3));
break;
case SYS_ID_AA64DFR0_EL1:
/* Limit debug to ARMv8.0 */
val &= ~ARM64_FEATURE_MASK(ID_AA64DFR0_DEBUGVER);
......
......@@ -14,7 +14,7 @@
* Parameters:
* x0 - dest
*/
SYM_FUNC_START_PI(clear_page)
SYM_FUNC_START(__pi_clear_page)
mrs x1, dczid_el0
tbnz x1, #4, 2f /* Branch if DC ZVA is prohibited */
and w1, w1, #0xf
......@@ -35,5 +35,6 @@ SYM_FUNC_START_PI(clear_page)
tst x0, #(PAGE_SIZE - 1)
b.ne 2b
ret
SYM_FUNC_END_PI(clear_page)
SYM_FUNC_END(__pi_clear_page)
SYM_FUNC_ALIAS(clear_page, __pi_clear_page)
EXPORT_SYMBOL(clear_page)
......@@ -17,7 +17,7 @@
* x0 - dest
* x1 - src
*/
SYM_FUNC_START_PI(copy_page)
SYM_FUNC_START(__pi_copy_page)
alternative_if ARM64_HAS_NO_HW_PREFETCH
// Prefetch three cache lines ahead.
prfm pldl1strm, [x1, #128]
......@@ -75,5 +75,6 @@ alternative_else_nop_endif
stnp x16, x17, [x0, #112 - 256]
ret
SYM_FUNC_END_PI(copy_page)
SYM_FUNC_END(__pi_copy_page)
SYM_FUNC_ALIAS(copy_page, __pi_copy_page)
EXPORT_SYMBOL(copy_page)
......@@ -578,10 +578,16 @@ u32 aarch64_insn_gen_load_store_ex(enum aarch64_insn_register reg,
switch (type) {
case AARCH64_INSN_LDST_LOAD_EX:
case AARCH64_INSN_LDST_LOAD_ACQ_EX:
insn = aarch64_insn_get_load_ex_value();
if (type == AARCH64_INSN_LDST_LOAD_ACQ_EX)
insn |= BIT(15);
break;
case AARCH64_INSN_LDST_STORE_EX:
case AARCH64_INSN_LDST_STORE_REL_EX:
insn = aarch64_insn_get_store_ex_value();
if (type == AARCH64_INSN_LDST_STORE_REL_EX)
insn |= BIT(15);
break;
default:
pr_err("%s: unknown load/store exclusive encoding %d\n", __func__, type);
......@@ -603,12 +609,65 @@ u32 aarch64_insn_gen_load_store_ex(enum aarch64_insn_register reg,
state);
}
u32 aarch64_insn_gen_ldadd(enum aarch64_insn_register result,
enum aarch64_insn_register address,
enum aarch64_insn_register value,
enum aarch64_insn_size_type size)
#ifdef CONFIG_ARM64_LSE_ATOMICS
static u32 aarch64_insn_encode_ldst_order(enum aarch64_insn_mem_order_type type,
u32 insn)
{
u32 insn = aarch64_insn_get_ldadd_value();
u32 order;
switch (type) {
case AARCH64_INSN_MEM_ORDER_NONE:
order = 0;
break;
case AARCH64_INSN_MEM_ORDER_ACQ:
order = 2;
break;
case AARCH64_INSN_MEM_ORDER_REL:
order = 1;
break;
case AARCH64_INSN_MEM_ORDER_ACQREL:
order = 3;
break;
default:
pr_err("%s: unknown mem order %d\n", __func__, type);
return AARCH64_BREAK_FAULT;
}
insn &= ~GENMASK(23, 22);
insn |= order << 22;
return insn;
}
u32 aarch64_insn_gen_atomic_ld_op(enum aarch64_insn_register result,
enum aarch64_insn_register address,
enum aarch64_insn_register value,
enum aarch64_insn_size_type size,
enum aarch64_insn_mem_atomic_op op,
enum aarch64_insn_mem_order_type order)
{
u32 insn;
switch (op) {
case AARCH64_INSN_MEM_ATOMIC_ADD:
insn = aarch64_insn_get_ldadd_value();
break;
case AARCH64_INSN_MEM_ATOMIC_CLR:
insn = aarch64_insn_get_ldclr_value();
break;
case AARCH64_INSN_MEM_ATOMIC_EOR:
insn = aarch64_insn_get_ldeor_value();
break;
case AARCH64_INSN_MEM_ATOMIC_SET:
insn = aarch64_insn_get_ldset_value();
break;
case AARCH64_INSN_MEM_ATOMIC_SWP:
insn = aarch64_insn_get_swp_value();
break;
default:
pr_err("%s: unimplemented mem atomic op %d\n", __func__, op);
return AARCH64_BREAK_FAULT;
}
switch (size) {
case AARCH64_INSN_SIZE_32:
......@@ -621,6 +680,8 @@ u32 aarch64_insn_gen_ldadd(enum aarch64_insn_register result,
insn = aarch64_insn_encode_ldst_size(size, insn);
insn = aarch64_insn_encode_ldst_order(order, insn);
insn = aarch64_insn_encode_register(AARCH64_INSN_REGTYPE_RT, insn,
result);
......@@ -631,17 +692,68 @@ u32 aarch64_insn_gen_ldadd(enum aarch64_insn_register result,
value);
}
u32 aarch64_insn_gen_stadd(enum aarch64_insn_register address,
enum aarch64_insn_register value,
enum aarch64_insn_size_type size)
static u32 aarch64_insn_encode_cas_order(enum aarch64_insn_mem_order_type type,
u32 insn)
{
/*
* STADD is simply encoded as an alias for LDADD with XZR as
* the destination register.
*/
return aarch64_insn_gen_ldadd(AARCH64_INSN_REG_ZR, address,
value, size);
u32 order;
switch (type) {
case AARCH64_INSN_MEM_ORDER_NONE:
order = 0;
break;
case AARCH64_INSN_MEM_ORDER_ACQ:
order = BIT(22);
break;
case AARCH64_INSN_MEM_ORDER_REL:
order = BIT(15);
break;
case AARCH64_INSN_MEM_ORDER_ACQREL:
order = BIT(15) | BIT(22);
break;
default:
pr_err("%s: unknown mem order %d\n", __func__, type);
return AARCH64_BREAK_FAULT;
}
insn &= ~(BIT(15) | BIT(22));
insn |= order;
return insn;
}
u32 aarch64_insn_gen_cas(enum aarch64_insn_register result,
enum aarch64_insn_register address,
enum aarch64_insn_register value,
enum aarch64_insn_size_type size,
enum aarch64_insn_mem_order_type order)
{
u32 insn;
switch (size) {
case AARCH64_INSN_SIZE_32:
case AARCH64_INSN_SIZE_64:
break;
default:
pr_err("%s: unimplemented size encoding %d\n", __func__, size);
return AARCH64_BREAK_FAULT;
}
insn = aarch64_insn_get_cas_value();
insn = aarch64_insn_encode_ldst_size(size, insn);
insn = aarch64_insn_encode_cas_order(order, insn);
insn = aarch64_insn_encode_register(AARCH64_INSN_REGTYPE_RT, insn,
result);
insn = aarch64_insn_encode_register(AARCH64_INSN_REGTYPE_RN, insn,
address);
return aarch64_insn_encode_register(AARCH64_INSN_REGTYPE_RS, insn,
value);
}
#endif
static u32 aarch64_insn_encode_prfm_imm(enum aarch64_insn_prfm_type type,
enum aarch64_insn_prfm_target target,
......@@ -1379,7 +1491,7 @@ static u32 aarch64_encode_immediate(u64 imm,
* Compute the rotation to get a continuous set of
* ones, with the first bit set at position 0
*/
ror = fls(~imm);
ror = fls64(~imm);
}
/*
......@@ -1456,3 +1568,48 @@ u32 aarch64_insn_gen_extr(enum aarch64_insn_variant variant,
insn = aarch64_insn_encode_register(AARCH64_INSN_REGTYPE_RN, insn, Rn);
return aarch64_insn_encode_register(AARCH64_INSN_REGTYPE_RM, insn, Rm);
}
u32 aarch64_insn_gen_dmb(enum aarch64_insn_mb_type type)
{
u32 opt;
u32 insn;
switch (type) {
case AARCH64_INSN_MB_SY:
opt = 0xf;
break;
case AARCH64_INSN_MB_ST:
opt = 0xe;
break;
case AARCH64_INSN_MB_LD:
opt = 0xd;
break;
case AARCH64_INSN_MB_ISH:
opt = 0xb;
break;
case AARCH64_INSN_MB_ISHST:
opt = 0xa;
break;
case AARCH64_INSN_MB_ISHLD:
opt = 0x9;
break;
case AARCH64_INSN_MB_NSH:
opt = 0x7;
break;
case AARCH64_INSN_MB_NSHST:
opt = 0x6;
break;
case AARCH64_INSN_MB_NSHLD:
opt = 0x5;
break;
default:
pr_err("%s: unknown dmb type %d\n", __func__, type);
return AARCH64_BREAK_FAULT;
}
insn = aarch64_insn_get_dmb_value();
insn &= ~GENMASK(11, 8);
insn |= (opt << 8);
return insn;
}
......@@ -38,7 +38,7 @@
.p2align 4
nop
SYM_FUNC_START_WEAK_PI(memchr)
SYM_FUNC_START(__pi_memchr)
and chrin, chrin, #0xff
lsr wordcnt, cntin, #3
cbz wordcnt, L(byte_loop)
......@@ -71,5 +71,6 @@ CPU_LE( rev tmp, tmp)
L(not_found):
mov result, #0
ret
SYM_FUNC_END_PI(memchr)
SYM_FUNC_END(__pi_memchr)
SYM_FUNC_ALIAS_WEAK(memchr, __pi_memchr)
EXPORT_SYMBOL_NOKASAN(memchr)
......@@ -32,7 +32,7 @@
#define tmp1 x7
#define tmp2 x8
SYM_FUNC_START_WEAK_PI(memcmp)
SYM_FUNC_START(__pi_memcmp)
subs limit, limit, 8
b.lo L(less8)
......@@ -134,6 +134,6 @@ L(byte_loop):
b.eq L(byte_loop)
sub result, data1w, data2w
ret
SYM_FUNC_END_PI(memcmp)
SYM_FUNC_END(__pi_memcmp)
SYM_FUNC_ALIAS_WEAK(memcmp, __pi_memcmp)
EXPORT_SYMBOL_NOKASAN(memcmp)
......@@ -57,10 +57,7 @@
The loop tail is handled by always copying 64 bytes from the end.
*/
SYM_FUNC_START_ALIAS(__memmove)
SYM_FUNC_START_WEAK_ALIAS_PI(memmove)
SYM_FUNC_START_ALIAS(__memcpy)
SYM_FUNC_START_WEAK_PI(memcpy)
SYM_FUNC_START(__pi_memcpy)
add srcend, src, count
add dstend, dstin, count
cmp count, 128
......@@ -241,12 +238,16 @@ L(copy64_from_start):
stp B_l, B_h, [dstin, 16]
stp C_l, C_h, [dstin]
ret
SYM_FUNC_END(__pi_memcpy)
SYM_FUNC_END_PI(memcpy)
EXPORT_SYMBOL(memcpy)
SYM_FUNC_END_ALIAS(__memcpy)
SYM_FUNC_ALIAS(__memcpy, __pi_memcpy)
EXPORT_SYMBOL(__memcpy)
SYM_FUNC_END_ALIAS_PI(memmove)
EXPORT_SYMBOL(memmove)
SYM_FUNC_END_ALIAS(__memmove)
SYM_FUNC_ALIAS_WEAK(memcpy, __memcpy)
EXPORT_SYMBOL(memcpy)
SYM_FUNC_ALIAS(__pi_memmove, __pi_memcpy)
SYM_FUNC_ALIAS(__memmove, __pi_memmove)
EXPORT_SYMBOL(__memmove)
SYM_FUNC_ALIAS_WEAK(memmove, __memmove)
EXPORT_SYMBOL(memmove)
......@@ -42,8 +42,7 @@ dst .req x8
tmp3w .req w9
tmp3 .req x9
SYM_FUNC_START_ALIAS(__memset)
SYM_FUNC_START_WEAK_PI(memset)
SYM_FUNC_START(__pi_memset)
mov dst, dstin /* Preserve return value. */
and A_lw, val, #255
orr A_lw, A_lw, A_lw, lsl #8
......@@ -202,7 +201,10 @@ SYM_FUNC_START_WEAK_PI(memset)
ands count, count, zva_bits_x
b.ne .Ltail_maybe_long
ret
SYM_FUNC_END_PI(memset)
EXPORT_SYMBOL(memset)
SYM_FUNC_END_ALIAS(__memset)
SYM_FUNC_END(__pi_memset)
SYM_FUNC_ALIAS(__memset, __pi_memset)
EXPORT_SYMBOL(__memset)
SYM_FUNC_ALIAS_WEAK(memset, __pi_memset)
EXPORT_SYMBOL(memset)
......@@ -134,7 +134,7 @@ SYM_FUNC_END(mte_copy_tags_to_user)
/*
* Save the tags in a page
* x0 - page address
* x1 - tag storage
* x1 - tag storage, MTE_PAGE_TAG_STORAGE bytes
*/
SYM_FUNC_START(mte_save_page_tags)
multitag_transfer_size x7, x5
......@@ -158,7 +158,7 @@ SYM_FUNC_END(mte_save_page_tags)
/*
* Restore the tags in a page
* x0 - page address
* x1 - tag storage
* x1 - tag storage, MTE_PAGE_TAG_STORAGE bytes
*/
SYM_FUNC_START(mte_restore_page_tags)
multitag_transfer_size x7, x5
......
......@@ -18,7 +18,7 @@
* Returns:
* x0 - address of first occurrence of 'c' or 0
*/
SYM_FUNC_START_WEAK(strchr)
SYM_FUNC_START(__pi_strchr)
and w1, w1, #0xff
1: ldrb w2, [x0], #1
cmp w2, w1
......@@ -28,5 +28,7 @@ SYM_FUNC_START_WEAK(strchr)
cmp w2, w1
csel x0, x0, xzr, eq
ret
SYM_FUNC_END(strchr)
SYM_FUNC_END(__pi_strchr)
SYM_FUNC_ALIAS_WEAK(strchr, __pi_strchr)
EXPORT_SYMBOL_NOKASAN(strchr)
/* SPDX-License-Identifier: GPL-2.0-only */
/*
* Copyright (c) 2012-2021, Arm Limited.
* Copyright (c) 2012-2022, Arm Limited.
*
* Adapted from the original at:
* https://github.com/ARM-software/optimized-routines/blob/afd6244a1f8d9229/string/aarch64/strcmp.S
* https://github.com/ARM-software/optimized-routines/blob/189dfefe37d54c5b/string/aarch64/strcmp.S
*/
#include <linux/linkage.h>
......@@ -11,166 +11,180 @@
/* Assumptions:
*
* ARMv8-a, AArch64
* ARMv8-a, AArch64.
* MTE compatible.
*/
#define L(label) .L ## label
#define REP8_01 0x0101010101010101
#define REP8_7f 0x7f7f7f7f7f7f7f7f
#define REP8_80 0x8080808080808080
/* Parameters and result. */
#define src1 x0
#define src2 x1
#define result x0
/* Internal variables. */
#define data1 x2
#define data1w w2
#define data2 x3
#define data2w w3
#define has_nul x4
#define diff x5
#define off1 x5
#define syndrome x6
#define tmp1 x7
#define tmp2 x8
#define tmp3 x9
#define zeroones x10
#define pos x11
/* Start of performance-critical section -- one 64B cache line. */
.align 6
SYM_FUNC_START_WEAK_PI(strcmp)
eor tmp1, src1, src2
mov zeroones, #REP8_01
tst tmp1, #7
#define tmp x6
#define data3 x7
#define zeroones x8
#define shift x9
#define off2 x10
/* On big-endian early bytes are at MSB and on little-endian LSB.
LS_FW means shifting towards early bytes. */
#ifdef __AARCH64EB__
# define LS_FW lsl
#else
# define LS_FW lsr
#endif
/* NUL detection works on the principle that (X - 1) & (~X) & 0x80
(=> (X - 1) & ~(X | 0x7f)) is non-zero iff a byte is zero, and
can be done in parallel across the entire word.
Since carry propagation makes 0x1 bytes before a NUL byte appear
NUL too in big-endian, byte-reverse the data before the NUL check. */
SYM_FUNC_START(__pi_strcmp)
sub off2, src2, src1
mov zeroones, REP8_01
and tmp, src1, 7
tst off2, 7
b.ne L(misaligned8)
ands tmp1, src1, #7
b.ne L(mutual_align)
/* NUL detection works on the principle that (X - 1) & (~X) & 0x80
(=> (X - 1) & ~(X | 0x7f)) is non-zero iff a byte is zero, and
can be done in parallel across the entire word. */
cbnz tmp, L(mutual_align)
.p2align 4
L(loop_aligned):
ldr data1, [src1], #8
ldr data2, [src2], #8
ldr data2, [src1, off2]
ldr data1, [src1], 8
L(start_realigned):
sub tmp1, data1, zeroones
orr tmp2, data1, #REP8_7f
eor diff, data1, data2 /* Non-zero if differences found. */
bic has_nul, tmp1, tmp2 /* Non-zero if NUL terminator. */
#ifdef __AARCH64EB__
rev tmp, data1
sub has_nul, tmp, zeroones
orr tmp, tmp, REP8_7f
#else
sub has_nul, data1, zeroones
orr tmp, data1, REP8_7f
#endif
bics has_nul, has_nul, tmp /* Non-zero if NUL terminator. */
ccmp data1, data2, 0, eq
b.eq L(loop_aligned)
#ifdef __AARCH64EB__
rev has_nul, has_nul
#endif
eor diff, data1, data2
orr syndrome, diff, has_nul
cbz syndrome, L(loop_aligned)
/* End of performance-critical section -- one 64B cache line. */
L(end):
#ifndef __AARCH64EB__
#ifndef __AARCH64EB__
rev syndrome, syndrome
rev data1, data1
/* The MS-non-zero bit of the syndrome marks either the first bit
that is different, or the top bit of the first zero byte.
Shifting left now will bring the critical information into the
top bits. */
clz pos, syndrome
rev data2, data2
lsl data1, data1, pos
lsl data2, data2, pos
/* But we need to zero-extend (char is unsigned) the value and then
perform a signed 32-bit subtraction. */
lsr data1, data1, #56
sub result, data1, data2, lsr #56
ret
#else
/* For big-endian we cannot use the trick with the syndrome value
as carry-propagation can corrupt the upper bits if the trailing
bytes in the string contain 0x01. */
/* However, if there is no NUL byte in the dword, we can generate
the result directly. We can't just subtract the bytes as the
MSB might be significant. */
cbnz has_nul, 1f
cmp data1, data2
cset result, ne
cneg result, result, lo
ret
1:
/* Re-compute the NUL-byte detection, using a byte-reversed value. */
rev tmp3, data1
sub tmp1, tmp3, zeroones
orr tmp2, tmp3, #REP8_7f
bic has_nul, tmp1, tmp2
rev has_nul, has_nul
orr syndrome, diff, has_nul
clz pos, syndrome
/* The MS-non-zero bit of the syndrome marks either the first bit
that is different, or the top bit of the first zero byte.
#endif
clz shift, syndrome
/* The most-significant-non-zero bit of the syndrome marks either the
first bit that is different, or the top bit of the first zero byte.
Shifting left now will bring the critical information into the
top bits. */
lsl data1, data1, pos
lsl data2, data2, pos
lsl data1, data1, shift
lsl data2, data2, shift
/* But we need to zero-extend (char is unsigned) the value and then
perform a signed 32-bit subtraction. */
lsr data1, data1, #56
sub result, data1, data2, lsr #56
lsr data1, data1, 56
sub result, data1, data2, lsr 56
ret
#endif
.p2align 4
L(mutual_align):
/* Sources are mutually aligned, but are not currently at an
alignment boundary. Round down the addresses and then mask off
the bytes that preceed the start point. */
bic src1, src1, #7
bic src2, src2, #7
lsl tmp1, tmp1, #3 /* Bytes beyond alignment -> bits. */
ldr data1, [src1], #8
neg tmp1, tmp1 /* Bits to alignment -64. */
ldr data2, [src2], #8
mov tmp2, #~0
#ifdef __AARCH64EB__
/* Big-endian. Early bytes are at MSB. */
lsl tmp2, tmp2, tmp1 /* Shift (tmp1 & 63). */
#else
/* Little-endian. Early bytes are at LSB. */
lsr tmp2, tmp2, tmp1 /* Shift (tmp1 & 63). */
#endif
orr data1, data1, tmp2
orr data2, data2, tmp2
the bytes that precede the start point. */
bic src1, src1, 7
ldr data2, [src1, off2]
ldr data1, [src1], 8
neg shift, src2, lsl 3 /* Bits to alignment -64. */
mov tmp, -1
LS_FW tmp, tmp, shift
orr data1, data1, tmp
orr data2, data2, tmp
b L(start_realigned)
L(misaligned8):
/* Align SRC1 to 8 bytes and then compare 8 bytes at a time, always
checking to make sure that we don't access beyond page boundary in
SRC2. */
tst src1, #7
b.eq L(loop_misaligned)
checking to make sure that we don't access beyond the end of SRC2. */
cbz tmp, L(src1_aligned)
L(do_misaligned):
ldrb data1w, [src1], #1
ldrb data2w, [src2], #1
cmp data1w, #1
ccmp data1w, data2w, #0, cs /* NZCV = 0b0000. */
ldrb data1w, [src1], 1
ldrb data2w, [src2], 1
cmp data1w, 0
ccmp data1w, data2w, 0, ne /* NZCV = 0b0000. */
b.ne L(done)
tst src1, #7
tst src1, 7
b.ne L(do_misaligned)
L(loop_misaligned):
/* Test if we are within the last dword of the end of a 4K page. If
yes then jump back to the misaligned loop to copy a byte at a time. */
and tmp1, src2, #0xff8
eor tmp1, tmp1, #0xff8
cbz tmp1, L(do_misaligned)
ldr data1, [src1], #8
ldr data2, [src2], #8
sub tmp1, data1, zeroones
orr tmp2, data1, #REP8_7f
eor diff, data1, data2 /* Non-zero if differences found. */
bic has_nul, tmp1, tmp2 /* Non-zero if NUL terminator. */
L(src1_aligned):
neg shift, src2, lsl 3
bic src2, src2, 7
ldr data3, [src2], 8
#ifdef __AARCH64EB__
rev data3, data3
#endif
lsr tmp, zeroones, shift
orr data3, data3, tmp
sub has_nul, data3, zeroones
orr tmp, data3, REP8_7f
bics has_nul, has_nul, tmp
b.ne L(tail)
sub off1, src2, src1
.p2align 4
L(loop_unaligned):
ldr data3, [src1, off1]
ldr data2, [src1, off2]
#ifdef __AARCH64EB__
rev data3, data3
#endif
sub has_nul, data3, zeroones
orr tmp, data3, REP8_7f
ldr data1, [src1], 8
bics has_nul, has_nul, tmp
ccmp data1, data2, 0, eq
b.eq L(loop_unaligned)
lsl tmp, has_nul, shift
#ifdef __AARCH64EB__
rev tmp, tmp
#endif
eor diff, data1, data2
orr syndrome, diff, tmp
cbnz syndrome, L(end)
L(tail):
ldr data1, [src1]
neg shift, shift
lsr data2, data3, shift
lsr has_nul, has_nul, shift
#ifdef __AARCH64EB__
rev data2, data2
rev has_nul, has_nul
#endif
eor diff, data1, data2
orr syndrome, diff, has_nul
cbz syndrome, L(loop_misaligned)
b L(end)
L(done):
sub result, data1, data2
ret
SYM_FUNC_END_PI(strcmp)
EXPORT_SYMBOL_NOHWKASAN(strcmp)
SYM_FUNC_END(__pi_strcmp)
SYM_FUNC_ALIAS_WEAK(strcmp, __pi_strcmp)
EXPORT_SYMBOL_NOKASAN(strcmp)
......@@ -79,7 +79,7 @@
whether the first fetch, which may be misaligned, crosses a page
boundary. */
SYM_FUNC_START_WEAK_PI(strlen)
SYM_FUNC_START(__pi_strlen)
and tmp1, srcin, MIN_PAGE_SIZE - 1
mov zeroones, REP8_01
cmp tmp1, MIN_PAGE_SIZE - 16
......@@ -208,6 +208,6 @@ L(page_cross):
csel data1, data1, tmp4, eq
csel data2, data2, tmp2, eq
b L(page_cross_entry)
SYM_FUNC_END_PI(strlen)
SYM_FUNC_END(__pi_strlen)
SYM_FUNC_ALIAS_WEAK(strlen, __pi_strlen)
EXPORT_SYMBOL_NOKASAN(strlen)
This diff is collapsed.
......@@ -47,7 +47,7 @@ limit_wd .req x14
#define REP8_7f 0x7f7f7f7f7f7f7f7f
#define REP8_80 0x8080808080808080
SYM_FUNC_START_WEAK_PI(strnlen)
SYM_FUNC_START(__pi_strnlen)
cbz limit, .Lhit_limit
mov zeroones, #REP8_01
bic src, srcin, #15
......@@ -156,5 +156,7 @@ CPU_LE( lsr tmp2, tmp2, tmp4 ) /* Shift (tmp1 & 63). */
.Lhit_limit:
mov len, limit
ret
SYM_FUNC_END_PI(strnlen)
SYM_FUNC_END(__pi_strnlen)
SYM_FUNC_ALIAS_WEAK(strnlen, __pi_strnlen)
EXPORT_SYMBOL_NOKASAN(strnlen)
......@@ -18,7 +18,7 @@
* Returns:
* x0 - address of last occurrence of 'c' or 0
*/
SYM_FUNC_START_WEAK_PI(strrchr)
SYM_FUNC_START(__pi_strrchr)
mov x3, #0
and w1, w1, #0xff
1: ldrb w2, [x0], #1
......@@ -29,5 +29,6 @@ SYM_FUNC_START_WEAK_PI(strrchr)
b 1b
2: mov x0, x3
ret
SYM_FUNC_END_PI(strrchr)
SYM_FUNC_END(__pi_strrchr)
SYM_FUNC_ALIAS_WEAK(strrchr, __pi_strrchr)
EXPORT_SYMBOL_NOKASAN(strrchr)
......@@ -107,10 +107,11 @@ SYM_FUNC_END(icache_inval_pou)
* - start - virtual start address of region
* - end - virtual end address of region
*/
SYM_FUNC_START_PI(dcache_clean_inval_poc)
SYM_FUNC_START(__pi_dcache_clean_inval_poc)
dcache_by_line_op civac, sy, x0, x1, x2, x3
ret
SYM_FUNC_END_PI(dcache_clean_inval_poc)
SYM_FUNC_END(__pi_dcache_clean_inval_poc)
SYM_FUNC_ALIAS(dcache_clean_inval_poc, __pi_dcache_clean_inval_poc)
/*
* dcache_clean_pou(start, end)
......@@ -140,7 +141,7 @@ SYM_FUNC_END(dcache_clean_pou)
* - start - kernel start address of region
* - end - kernel end address of region
*/
SYM_FUNC_START_PI(dcache_inval_poc)
SYM_FUNC_START(__pi_dcache_inval_poc)
dcache_line_size x2, x3
sub x3, x2, #1
tst x1, x3 // end cache line aligned?
......@@ -158,7 +159,8 @@ SYM_FUNC_START_PI(dcache_inval_poc)
b.lo 2b
dsb sy
ret
SYM_FUNC_END_PI(dcache_inval_poc)
SYM_FUNC_END(__pi_dcache_inval_poc)
SYM_FUNC_ALIAS(dcache_inval_poc, __pi_dcache_inval_poc)
/*
* dcache_clean_poc(start, end)
......@@ -169,10 +171,11 @@ SYM_FUNC_END_PI(dcache_inval_poc)
* - start - virtual start address of region
* - end - virtual end address of region
*/
SYM_FUNC_START_PI(dcache_clean_poc)
SYM_FUNC_START(__pi_dcache_clean_poc)
dcache_by_line_op cvac, sy, x0, x1, x2, x3
ret
SYM_FUNC_END_PI(dcache_clean_poc)
SYM_FUNC_END(__pi_dcache_clean_poc)
SYM_FUNC_ALIAS(dcache_clean_poc, __pi_dcache_clean_poc)
/*
* dcache_clean_pop(start, end)
......@@ -183,13 +186,14 @@ SYM_FUNC_END_PI(dcache_clean_poc)
* - start - virtual start address of region
* - end - virtual end address of region
*/
SYM_FUNC_START_PI(dcache_clean_pop)
SYM_FUNC_START(__pi_dcache_clean_pop)
alternative_if_not ARM64_HAS_DCPOP
b dcache_clean_poc
alternative_else_nop_endif
dcache_by_line_op cvap, sy, x0, x1, x2, x3
ret
SYM_FUNC_END_PI(dcache_clean_pop)
SYM_FUNC_END(__pi_dcache_clean_pop)
SYM_FUNC_ALIAS(dcache_clean_pop, __pi_dcache_clean_pop)
/*
* __dma_flush_area(start, size)
......@@ -199,11 +203,12 @@ SYM_FUNC_END_PI(dcache_clean_pop)
* - start - virtual start address of region
* - size - size in question
*/
SYM_FUNC_START_PI(__dma_flush_area)
SYM_FUNC_START(__pi___dma_flush_area)
add x1, x0, x1
dcache_by_line_op civac, sy, x0, x1, x2, x3
ret
SYM_FUNC_END_PI(__dma_flush_area)
SYM_FUNC_END(__pi___dma_flush_area)
SYM_FUNC_ALIAS(__dma_flush_area, __pi___dma_flush_area)
/*
* __dma_map_area(start, size, dir)
......@@ -211,12 +216,13 @@ SYM_FUNC_END_PI(__dma_flush_area)
* - size - size of region
* - dir - DMA direction
*/
SYM_FUNC_START_PI(__dma_map_area)
SYM_FUNC_START(__pi___dma_map_area)
add x1, x0, x1
cmp w2, #DMA_FROM_DEVICE
b.eq __pi_dcache_inval_poc
b __pi_dcache_clean_poc
SYM_FUNC_END_PI(__dma_map_area)
SYM_FUNC_END(__pi___dma_map_area)
SYM_FUNC_ALIAS(__dma_map_area, __pi___dma_map_area)
/*
* __dma_unmap_area(start, size, dir)
......@@ -224,9 +230,10 @@ SYM_FUNC_END_PI(__dma_map_area)
* - size - size of region
* - dir - DMA direction
*/
SYM_FUNC_START_PI(__dma_unmap_area)
SYM_FUNC_START(__pi___dma_unmap_area)
add x1, x0, x1
cmp w2, #DMA_TO_DEVICE
b.ne __pi_dcache_inval_poc
ret
SYM_FUNC_END_PI(__dma_unmap_area)
SYM_FUNC_END(__pi___dma_unmap_area)
SYM_FUNC_ALIAS(__dma_unmap_area, __pi___dma_unmap_area)
......@@ -52,6 +52,13 @@ void __sync_icache_dcache(pte_t pte)
{
struct page *page = pte_page(pte);
/*
* HugeTLB pages are always fully mapped, so only setting head page's
* PG_dcache_clean flag is enough.
*/
if (PageHuge(page))
page = compound_head(page);
if (!test_bit(PG_dcache_clean, &page->flags)) {
sync_icache_aliases((unsigned long)page_address(page),
(unsigned long)page_address(page) +
......
......@@ -56,25 +56,34 @@ void __init arm64_hugetlb_cma_reserve(void)
}
#endif /* CONFIG_CMA */
#ifdef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
bool arch_hugetlb_migration_supported(struct hstate *h)
static bool __hugetlb_valid_size(unsigned long size)
{
size_t pagesize = huge_page_size(h);
switch (pagesize) {
switch (size) {
#ifndef __PAGETABLE_PMD_FOLDED
case PUD_SIZE:
return pud_sect_supported();
#endif
case PMD_SIZE:
case CONT_PMD_SIZE:
case PMD_SIZE:
case CONT_PTE_SIZE:
return true;
}
pr_warn("%s: unrecognized huge page size 0x%lx\n",
__func__, pagesize);
return false;
}
#ifdef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
bool arch_hugetlb_migration_supported(struct hstate *h)
{
size_t pagesize = huge_page_size(h);
if (!__hugetlb_valid_size(pagesize)) {
pr_warn("%s: unrecognized huge page size 0x%lx\n",
__func__, pagesize);
return false;
}
return true;
}
#endif
int pmd_huge(pmd_t pmd)
......@@ -506,16 +515,5 @@ arch_initcall(hugetlbpage_init);
bool __init arch_hugetlb_valid_size(unsigned long size)
{
switch (size) {
#ifndef __PAGETABLE_PMD_FOLDED
case PUD_SIZE:
return pud_sect_supported();
#endif
case CONT_PMD_SIZE:
case PMD_SIZE:
case CONT_PTE_SIZE:
return true;
}
return false;
return __hugetlb_valid_size(size);
}
......@@ -61,8 +61,34 @@ EXPORT_SYMBOL(memstart_addr);
* unless restricted on specific platforms (e.g. 30-bit on Raspberry Pi 4).
* In such case, ZONE_DMA32 covers the rest of the 32-bit addressable memory,
* otherwise it is empty.
*
* Memory reservation for crash kernel either done early or deferred
* depending on DMA memory zones configs (ZONE_DMA) --
*
* In absence of ZONE_DMA configs arm64_dma_phys_limit initialized
* here instead of max_zone_phys(). This lets early reservation of
* crash kernel memory which has a dependency on arm64_dma_phys_limit.
* Reserving memory early for crash kernel allows linear creation of block
* mappings (greater than page-granularity) for all the memory bank rangs.
* In this scheme a comparatively quicker boot is observed.
*
* If ZONE_DMA configs are defined, crash kernel memory reservation
* is delayed until DMA zone memory range size initilazation performed in
* zone_sizes_init(). The defer is necessary to steer clear of DMA zone
* memory range to avoid overlap allocation. So crash kernel memory boundaries
* are not known when mapping all bank memory ranges, which otherwise means
* not possible to exclude crash kernel range from creating block mappings
* so page-granularity mappings are created for the entire memory range.
* Hence a slightly slower boot is observed.
*
* Note: Page-granularity mapppings are necessary for crash kernel memory
* range for shrinking its size via /sys/kernel/kexec_crash_size interface.
*/
phys_addr_t arm64_dma_phys_limit __ro_after_init;
#if IS_ENABLED(CONFIG_ZONE_DMA) || IS_ENABLED(CONFIG_ZONE_DMA32)
phys_addr_t __ro_after_init arm64_dma_phys_limit;
#else
phys_addr_t __ro_after_init arm64_dma_phys_limit = PHYS_MASK + 1;
#endif
#ifdef CONFIG_KEXEC_CORE
/*
......@@ -153,8 +179,6 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max)
if (!arm64_dma_phys_limit)
arm64_dma_phys_limit = dma32_phys_limit;
#endif
if (!arm64_dma_phys_limit)
arm64_dma_phys_limit = PHYS_MASK + 1;
max_zone_pfns[ZONE_NORMAL] = max;
free_area_init(max_zone_pfns);
......@@ -315,6 +339,9 @@ void __init arm64_memblock_init(void)
early_init_fdt_scan_reserved_mem();
if (!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32))
reserve_crashkernel();
high_memory = __va(memblock_end_of_DRAM() - 1) + 1;
}
......@@ -361,7 +388,8 @@ void __init bootmem_init(void)
* request_standard_resources() depends on crashkernel's memory being
* reserved, so do it here.
*/
reserve_crashkernel();
if (IS_ENABLED(CONFIG_ZONE_DMA) || IS_ENABLED(CONFIG_ZONE_DMA32))
reserve_crashkernel();
memblock_dump_all();
}
......
......@@ -63,6 +63,7 @@ static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss __maybe_unused;
static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss __maybe_unused;
static DEFINE_SPINLOCK(swapper_pgdir_lock);
static DEFINE_MUTEX(fixmap_lock);
void set_swapper_pgd(pgd_t *pgdp, pgd_t pgd)
{
......@@ -294,18 +295,6 @@ static void alloc_init_cont_pmd(pud_t *pudp, unsigned long addr,
} while (addr = next, addr != end);
}
static inline bool use_1G_block(unsigned long addr, unsigned long next,
unsigned long phys)
{
if (PAGE_SHIFT != 12)
return false;
if (((addr | next | phys) & ~PUD_MASK) != 0)
return false;
return true;
}
static void alloc_init_pud(pgd_t *pgdp, unsigned long addr, unsigned long end,
phys_addr_t phys, pgprot_t prot,
phys_addr_t (*pgtable_alloc)(int),
......@@ -329,6 +318,12 @@ static void alloc_init_pud(pgd_t *pgdp, unsigned long addr, unsigned long end,
}
BUG_ON(p4d_bad(p4d));
/*
* No need for locking during early boot. And it doesn't work as
* expected with KASLR enabled.
*/
if (system_state != SYSTEM_BOOTING)
mutex_lock(&fixmap_lock);
pudp = pud_set_fixmap_offset(p4dp, addr);
do {
pud_t old_pud = READ_ONCE(*pudp);
......@@ -338,7 +333,8 @@ static void alloc_init_pud(pgd_t *pgdp, unsigned long addr, unsigned long end,
/*
* For 4K granule only, attempt to put down a 1GB block
*/
if (use_1G_block(addr, next, phys) &&
if (pud_sect_supported() &&
((addr | next | phys) & ~PUD_MASK) == 0 &&
(flags & NO_BLOCK_MAPPINGS) == 0) {
pud_set_huge(pudp, phys, prot);
......@@ -359,6 +355,8 @@ static void alloc_init_pud(pgd_t *pgdp, unsigned long addr, unsigned long end,
} while (pudp++, addr = next, addr != end);
pud_clear_fixmap();
if (system_state != SYSTEM_BOOTING)
mutex_unlock(&fixmap_lock);
}
static void __create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
......@@ -517,7 +515,7 @@ static void __init map_mem(pgd_t *pgdp)
*/
BUILD_BUG_ON(pgd_index(direct_map_end - 1) == pgd_index(direct_map_end));
if (can_set_direct_map() || crash_mem_map || IS_ENABLED(CONFIG_KFENCE))
if (can_set_direct_map() || IS_ENABLED(CONFIG_KFENCE))
flags |= NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
/*
......@@ -528,6 +526,17 @@ static void __init map_mem(pgd_t *pgdp)
*/
memblock_mark_nomap(kernel_start, kernel_end - kernel_start);
#ifdef CONFIG_KEXEC_CORE
if (crash_mem_map) {
if (IS_ENABLED(CONFIG_ZONE_DMA) ||
IS_ENABLED(CONFIG_ZONE_DMA32))
flags |= NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
else if (crashk_res.end)
memblock_mark_nomap(crashk_res.start,
resource_size(&crashk_res));
}
#endif
/* map all the memory banks */
for_each_mem_range(i, &start, &end) {
if (start >= end)
......@@ -554,6 +563,25 @@ static void __init map_mem(pgd_t *pgdp)
__map_memblock(pgdp, kernel_start, kernel_end,
PAGE_KERNEL, NO_CONT_MAPPINGS);
memblock_clear_nomap(kernel_start, kernel_end - kernel_start);
/*
* Use page-level mappings here so that we can shrink the region
* in page granularity and put back unused memory to buddy system
* through /sys/kernel/kexec_crash_size interface.
*/
#ifdef CONFIG_KEXEC_CORE
if (crash_mem_map &&
!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32)) {
if (crashk_res.end) {
__map_memblock(pgdp, crashk_res.start,
crashk_res.end + 1,
PAGE_KERNEL,
NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS);
memblock_clear_nomap(crashk_res.start,
resource_size(&crashk_res));
}
}
#endif
}
void mark_rodata_ro(void)
......
......@@ -12,7 +12,7 @@ static DEFINE_XARRAY(mte_pages);
void *mte_allocate_tag_storage(void)
{
/* tags granule is 16 bytes, 2 tags stored per byte */
return kmalloc(PAGE_SIZE / 16 / 2, GFP_KERNEL);
return kmalloc(MTE_PAGE_TAG_STORAGE, GFP_KERNEL);
}
void mte_free_tag_storage(char *storage)
......
......@@ -46,7 +46,7 @@
#endif
#ifdef CONFIG_KASAN_HW_TAGS
#define TCR_MTE_FLAGS SYS_TCR_EL1_TCMA1 | TCR_TBI1 | TCR_TBID1
#define TCR_MTE_FLAGS TCR_TCMA1 | TCR_TBI1 | TCR_TBID1
#else
/*
* The mte_zero_clear_page_tags() implementation uses DC GZVA, which relies on
......
......@@ -89,9 +89,16 @@
#define A64_STXR(sf, Rt, Rn, Rs) \
A64_LSX(sf, Rt, Rn, Rs, STORE_EX)
/* LSE atomics */
/*
* LSE atomics
*
* STADD is simply encoded as an alias for LDADD with XZR as
* the destination register.
*/
#define A64_STADD(sf, Rn, Rs) \
aarch64_insn_gen_stadd(Rn, Rs, A64_SIZE(sf))
aarch64_insn_gen_atomic_ld_op(A64_ZR, Rn, Rs, \
A64_SIZE(sf), AARCH64_INSN_MEM_ATOMIC_ADD, \
AARCH64_INSN_MEM_ORDER_NONE)
/* Add/subtract (immediate) */
#define A64_ADDSUB_IMM(sf, Rd, Rn, imm12, type) \
......
......@@ -5,18 +5,14 @@ kapi := $(gen)/asm
kapi-hdrs-y := $(kapi)/cpucaps.h
targets += $(addprefix ../../../,$(gen-y) $(kapi-hdrs-y))
targets += $(addprefix ../../../, $(kapi-hdrs-y))
PHONY += kapi
kapi: $(kapi-hdrs-y) $(gen-y)
# Create output directory if not already present
_dummy := $(shell [ -d '$(kapi)' ] || mkdir -p '$(kapi)')
kapi: $(kapi-hdrs-y)
quiet_cmd_gen_cpucaps = GEN $@
cmd_gen_cpucaps = mkdir -p $(dir $@) && \
$(AWK) -f $(filter-out $(PHONY),$^) > $@
cmd_gen_cpucaps = mkdir -p $(dir $@); $(AWK) -f $(real-prereqs) > $@
$(kapi)/cpucaps.h: $(src)/gen-cpucaps.awk $(src)/cpucaps FORCE
$(call if_changed,gen_cpucaps)
......@@ -7,7 +7,8 @@ BTI
HAS_32BIT_EL0_DO_NOT_USE
HAS_32BIT_EL1
HAS_ADDRESS_AUTH
HAS_ADDRESS_AUTH_ARCH
HAS_ADDRESS_AUTH_ARCH_QARMA3
HAS_ADDRESS_AUTH_ARCH_QARMA5
HAS_ADDRESS_AUTH_IMP_DEF
HAS_AMU_EXTN
HAS_ARMv8_4_TTL
......@@ -21,7 +22,8 @@ HAS_E0PD
HAS_ECV
HAS_EPAN
HAS_GENERIC_AUTH
HAS_GENERIC_AUTH_ARCH
HAS_GENERIC_AUTH_ARCH_QARMA3
HAS_GENERIC_AUTH_ARCH_QARMA5
HAS_GENERIC_AUTH_IMP_DEF
HAS_IRQ_PRIO_MASKING
HAS_LDAPR
......
......@@ -8,6 +8,7 @@ menu "Processor type and features"
config IA64
bool
select ARCH_BINFMT_ELF_EXTRA_PHDRS
select ARCH_HAS_DMA_MARK_CLEAN
select ARCH_HAS_STRNCPY_FROM_USER
select ARCH_HAS_STRNLEN_USER
......
......@@ -152,14 +152,13 @@ SYM_FUNC_END(startup_32)
#ifdef CONFIG_EFI_STUB
SYM_FUNC_START(efi32_stub_entry)
SYM_FUNC_START_ALIAS(efi_stub_entry)
add $0x4, %esp
movl 8(%esp), %esi /* save boot_params pointer */
call efi_main
/* efi_main returns the possibly relocated address of startup_32 */
jmp *%eax
SYM_FUNC_END(efi32_stub_entry)
SYM_FUNC_END_ALIAS(efi_stub_entry)
SYM_FUNC_ALIAS(efi_stub_entry, efi32_stub_entry)
#endif
.text
......
......@@ -535,7 +535,6 @@ SYM_CODE_END(startup_64)
#ifdef CONFIG_EFI_STUB
.org 0x390
SYM_FUNC_START(efi64_stub_entry)
SYM_FUNC_START_ALIAS(efi_stub_entry)
and $~0xf, %rsp /* realign the stack */
movq %rdx, %rbx /* save boot_params pointer */
call efi_main
......@@ -543,7 +542,7 @@ SYM_FUNC_START_ALIAS(efi_stub_entry)
leaq rva(startup_64)(%rax), %rax
jmp *%rax
SYM_FUNC_END(efi64_stub_entry)
SYM_FUNC_END_ALIAS(efi_stub_entry)
SYM_FUNC_ALIAS(efi_stub_entry, efi64_stub_entry)
#endif
.text
......
......@@ -1751,8 +1751,6 @@ SYM_FUNC_END(aesni_gcm_finalize)
#endif
SYM_FUNC_START_LOCAL_ALIAS(_key_expansion_128)
SYM_FUNC_START_LOCAL(_key_expansion_256a)
pshufd $0b11111111, %xmm1, %xmm1
shufps $0b00010000, %xmm0, %xmm4
......@@ -1764,7 +1762,7 @@ SYM_FUNC_START_LOCAL(_key_expansion_256a)
add $0x10, TKEYP
RET
SYM_FUNC_END(_key_expansion_256a)
SYM_FUNC_END_ALIAS(_key_expansion_128)
SYM_FUNC_ALIAS_LOCAL(_key_expansion_128, _key_expansion_256a)
SYM_FUNC_START_LOCAL(_key_expansion_192a)
pshufd $0b01010101, %xmm1, %xmm1
......
......@@ -27,8 +27,7 @@
* Output:
* rax original destination
*/
SYM_FUNC_START_ALIAS(__memcpy)
SYM_FUNC_START_WEAK(memcpy)
SYM_FUNC_START(__memcpy)
ALTERNATIVE_2 "jmp memcpy_orig", "", X86_FEATURE_REP_GOOD, \
"jmp memcpy_erms", X86_FEATURE_ERMS
......@@ -40,11 +39,12 @@ SYM_FUNC_START_WEAK(memcpy)
movl %edx, %ecx
rep movsb
RET
SYM_FUNC_END(memcpy)
SYM_FUNC_END_ALIAS(__memcpy)
EXPORT_SYMBOL(memcpy)
SYM_FUNC_END(__memcpy)
EXPORT_SYMBOL(__memcpy)
SYM_FUNC_ALIAS_WEAK(memcpy, __memcpy)
EXPORT_SYMBOL(memcpy)
/*
* memcpy_erms() - enhanced fast string memcpy. This is faster and
* simpler than memcpy. Use memcpy_erms when possible.
......
......@@ -24,7 +24,6 @@
* Output:
* rax: dest
*/
SYM_FUNC_START_WEAK(memmove)
SYM_FUNC_START(__memmove)
mov %rdi, %rax
......@@ -207,6 +206,7 @@ SYM_FUNC_START(__memmove)
13:
RET
SYM_FUNC_END(__memmove)
SYM_FUNC_END_ALIAS(memmove)
EXPORT_SYMBOL(__memmove)
SYM_FUNC_ALIAS_WEAK(memmove, __memmove)
EXPORT_SYMBOL(memmove)
......@@ -17,7 +17,6 @@
*
* rax original destination
*/
SYM_FUNC_START_WEAK(memset)
SYM_FUNC_START(__memset)
/*
* Some CPUs support enhanced REP MOVSB/STOSB feature. It is recommended
......@@ -42,10 +41,11 @@ SYM_FUNC_START(__memset)
movq %r9,%rax
RET
SYM_FUNC_END(__memset)
SYM_FUNC_END_ALIAS(memset)
EXPORT_SYMBOL(memset)
EXPORT_SYMBOL(__memset)
SYM_FUNC_ALIAS_WEAK(memset, __memset)
EXPORT_SYMBOL(memset)
/*
* ISO C memset - set a memory block to a byte value. This function uses
* enhanced rep stosb to override the fast string function.
......
......@@ -8,6 +8,7 @@ endmenu
config UML_X86
def_bool y
select ARCH_BINFMT_ELF_EXTRA_PHDRS if X86_32
config 64BIT
bool "64-bit kernel" if "$(SUBARCH)" = "x86"
......
......@@ -55,6 +55,7 @@
#include <linux/limits.h>
#include <linux/of_address.h>
#include <linux/slab.h>
#include <asm/apple_m1_pmu.h>
#include <asm/exception.h>
#include <asm/sysreg.h>
#include <asm/virt.h>
......@@ -109,16 +110,6 @@
* Note: sysreg-based IPIs are not supported yet.
*/
/* Core PMC control register */
#define SYS_IMP_APL_PMCR0_EL1 sys_reg(3, 1, 15, 0, 0)
#define PMCR0_IMODE GENMASK(10, 8)
#define PMCR0_IMODE_OFF 0
#define PMCR0_IMODE_PMI 1
#define PMCR0_IMODE_AIC 2
#define PMCR0_IMODE_HALT 3
#define PMCR0_IMODE_FIQ 4
#define PMCR0_IACT BIT(11)
/* IPI request registers */
#define SYS_IMP_APL_IPI_RR_LOCAL_EL1 sys_reg(3, 5, 15, 0, 0)
#define SYS_IMP_APL_IPI_RR_GLOBAL_EL1 sys_reg(3, 5, 15, 0, 1)
......@@ -155,7 +146,7 @@
#define SYS_IMP_APL_UPMSR_EL1 sys_reg(3, 7, 15, 6, 4)
#define UPMSR_IACT BIT(0)
#define AIC_NR_FIQ 4
#define AIC_NR_FIQ 6
#define AIC_NR_SWIPI 32
/*
......@@ -177,6 +168,9 @@ struct aic_irq_chip {
void __iomem *base;
struct irq_domain *hw_domain;
struct irq_domain *ipi_domain;
struct {
cpumask_t aff;
} *fiq_aff[AIC_NR_FIQ];
int nr_hw;
};
......@@ -412,16 +406,15 @@ static void __exception_irq_entry aic_handle_fiq(struct pt_regs *regs)
aic_irqc->nr_hw + AIC_TMR_EL02_VIRT);
}
if ((read_sysreg_s(SYS_IMP_APL_PMCR0_EL1) & (PMCR0_IMODE | PMCR0_IACT)) ==
(FIELD_PREP(PMCR0_IMODE, PMCR0_IMODE_FIQ) | PMCR0_IACT)) {
/*
* Not supported yet, let's figure out how to handle this when
* we implement these proprietary performance counters. For now,
* just mask it and move on.
*/
pr_err_ratelimited("PMC FIQ fired. Masking.\n");
sysreg_clear_set_s(SYS_IMP_APL_PMCR0_EL1, PMCR0_IMODE | PMCR0_IACT,
FIELD_PREP(PMCR0_IMODE, PMCR0_IMODE_OFF));
if (read_sysreg_s(SYS_IMP_APL_PMCR0_EL1) & PMCR0_IACT) {
int irq;
if (cpumask_test_cpu(smp_processor_id(),
&aic_irqc->fiq_aff[AIC_CPU_PMU_P]->aff))
irq = AIC_CPU_PMU_P;
else
irq = AIC_CPU_PMU_E;
generic_handle_domain_irq(aic_irqc->hw_domain,
aic_irqc->nr_hw + irq);
}
if (FIELD_GET(UPMCR0_IMODE, read_sysreg_s(SYS_IMP_APL_UPMCR0_EL1)) == UPMCR0_IMODE_FIQ &&
......@@ -461,7 +454,18 @@ static int aic_irq_domain_map(struct irq_domain *id, unsigned int irq,
handle_fasteoi_irq, NULL, NULL);
irqd_set_single_target(irq_desc_get_irq_data(irq_to_desc(irq)));
} else {
irq_set_percpu_devid(irq);
int fiq = hw - ic->nr_hw;
switch (fiq) {
case AIC_CPU_PMU_P:
case AIC_CPU_PMU_E:
irq_set_percpu_devid_partition(irq, &ic->fiq_aff[fiq]->aff);
break;
default:
irq_set_percpu_devid(irq);
break;
}
irq_domain_set_info(id, irq, hw, &fiq_chip, id->host_data,
handle_percpu_devid_irq, NULL, NULL);
}
......@@ -793,12 +797,50 @@ static struct gic_kvm_info vgic_info __initdata = {
.no_hw_deactivation = true,
};
static void build_fiq_affinity(struct aic_irq_chip *ic, struct device_node *aff)
{
int i, n;
u32 fiq;
if (of_property_read_u32(aff, "apple,fiq-index", &fiq) ||
WARN_ON(fiq >= AIC_NR_FIQ) || ic->fiq_aff[fiq])
return;
n = of_property_count_elems_of_size(aff, "cpus", sizeof(u32));
if (WARN_ON(n < 0))
return;
ic->fiq_aff[fiq] = kzalloc(sizeof(ic->fiq_aff[fiq]), GFP_KERNEL);
if (!ic->fiq_aff[fiq])
return;
for (i = 0; i < n; i++) {
struct device_node *cpu_node;
u32 cpu_phandle;
int cpu;
if (of_property_read_u32_index(aff, "cpus", i, &cpu_phandle))
continue;
cpu_node = of_find_node_by_phandle(cpu_phandle);
if (WARN_ON(!cpu_node))
continue;
cpu = of_cpu_node_to_id(cpu_node);
if (WARN_ON(cpu < 0))
continue;
cpumask_set_cpu(cpu, &ic->fiq_aff[fiq]->aff);
}
}
static int __init aic_of_ic_init(struct device_node *node, struct device_node *parent)
{
int i;
void __iomem *regs;
u32 info;
struct aic_irq_chip *irqc;
struct device_node *affs;
regs = of_iomap(node, 0);
if (WARN_ON(!regs))
......@@ -832,6 +874,14 @@ static int __init aic_of_ic_init(struct device_node *node, struct device_node *p
return -ENODEV;
}
affs = of_get_child_by_name(node, "affinities");
if (affs) {
struct device_node *chld;
for_each_child_of_node(affs, chld)
build_fiq_affinity(irqc, chld);
}
set_handle_irq(aic_handle_irq);
set_handle_fiq(aic_handle_fiq);
......
......@@ -141,11 +141,25 @@ config ARM_DMC620_PMU
config MARVELL_CN10K_TAD_PMU
tristate "Marvell CN10K LLC-TAD PMU"
depends on ARM64 || (COMPILE_TEST && 64BIT)
depends on ARCH_THUNDER || (COMPILE_TEST && 64BIT)
help
Provides support for Last-Level cache Tag-and-data Units (LLC-TAD)
performance monitors on CN10K family silicons.
config APPLE_M1_CPU_PMU
bool "Apple M1 CPU PMU support"
depends on ARM_PMU && ARCH_APPLE
help
Provides support for the non-architectural CPU PMUs present on
the Apple M1 SoCs and derivatives.
source "drivers/perf/hisilicon/Kconfig"
config MARVELL_CN10K_DDR_PMU
tristate "Enable MARVELL CN10K DRAM Subsystem(DSS) PMU Support"
depends on ARM64 || (COMPILE_TEST && 64BIT)
help
Enable perf support for Marvell DDR Performance monitoring
event on CN10K platform.
endmenu
......@@ -15,3 +15,5 @@ obj-$(CONFIG_XGENE_PMU) += xgene_pmu.o
obj-$(CONFIG_ARM_SPE_PMU) += arm_spe_pmu.o
obj-$(CONFIG_ARM_DMC620_PMU) += arm_dmc620_pmu.o
obj-$(CONFIG_MARVELL_CN10K_TAD_PMU) += marvell_cn10k_tad_pmu.o
obj-$(CONFIG_MARVELL_CN10K_DDR_PMU) += marvell_cn10k_ddr_pmu.o
obj-$(CONFIG_APPLE_M1_CPU_PMU) += apple_m1_cpu_pmu.o
This diff is collapsed.
......@@ -1096,7 +1096,7 @@ static void cci_pmu_enable(struct pmu *pmu)
{
struct cci_pmu *cci_pmu = to_cci_pmu(pmu);
struct cci_pmu_hw_events *hw_events = &cci_pmu->hw_events;
int enabled = bitmap_weight(hw_events->used_mask, cci_pmu->num_cntrs);
bool enabled = !bitmap_empty(hw_events->used_mask, cci_pmu->num_cntrs);
unsigned long flags;
if (!enabled)
......
......@@ -1460,8 +1460,7 @@ static irqreturn_t arm_ccn_irq_handler(int irq, void *dev_id)
static int arm_ccn_probe(struct platform_device *pdev)
{
struct arm_ccn *ccn;
struct resource *res;
unsigned int irq;
int irq;
int err;
ccn = devm_kzalloc(&pdev->dev, sizeof(*ccn), GFP_KERNEL);
......@@ -1474,10 +1473,9 @@ static int arm_ccn_probe(struct platform_device *pdev)
if (IS_ERR(ccn->base))
return PTR_ERR(ccn->base);
res = platform_get_resource(pdev, IORESOURCE_IRQ, 0);
if (!res)
return -EINVAL;
irq = res->start;
irq = platform_get_irq(pdev, 0);
if (irq < 0)
return irq;
/* Check if we can use the interrupt */
writel(CCN_MN_ERRINT_STATUS__PMU_EVENTS__DISABLE,
......
......@@ -71,9 +71,11 @@
#define CMN_DTM_WPn(n) (0x1A0 + (n) * 0x18)
#define CMN_DTM_WPn_CONFIG(n) (CMN_DTM_WPn(n) + 0x00)
#define CMN_DTM_WPn_CONFIG_WP_DEV_SEL2 GENMASK_ULL(18,17)
#define CMN_DTM_WPn_CONFIG_WP_COMBINE BIT(6)
#define CMN_DTM_WPn_CONFIG_WP_EXCLUSIVE BIT(5)
#define CMN_DTM_WPn_CONFIG_WP_GRP BIT(4)
#define CMN_DTM_WPn_CONFIG_WP_COMBINE BIT(9)
#define CMN_DTM_WPn_CONFIG_WP_EXCLUSIVE BIT(8)
#define CMN600_WPn_CONFIG_WP_COMBINE BIT(6)
#define CMN600_WPn_CONFIG_WP_EXCLUSIVE BIT(5)
#define CMN_DTM_WPn_CONFIG_WP_GRP GENMASK_ULL(5, 4)
#define CMN_DTM_WPn_CONFIG_WP_CHN_SEL GENMASK_ULL(3, 1)
#define CMN_DTM_WPn_CONFIG_WP_DEV_SEL BIT(0)
#define CMN_DTM_WPn_VAL(n) (CMN_DTM_WPn(n) + 0x08)
......@@ -155,6 +157,7 @@
#define CMN_CONFIG_WP_COMBINE GENMASK_ULL(27, 24)
#define CMN_CONFIG_WP_DEV_SEL GENMASK_ULL(50, 48)
#define CMN_CONFIG_WP_CHN_SEL GENMASK_ULL(55, 51)
/* Note that we don't yet support the tertiary match group on newer IPs */
#define CMN_CONFIG_WP_GRP BIT_ULL(56)
#define CMN_CONFIG_WP_EXCLUSIVE BIT_ULL(57)
#define CMN_CONFIG1_WP_VAL GENMASK_ULL(63, 0)
......@@ -353,7 +356,7 @@ static struct arm_cmn_node *arm_cmn_node(const struct arm_cmn *cmn,
return NULL;
}
struct dentry *arm_cmn_debugfs;
static struct dentry *arm_cmn_debugfs;
#ifdef CONFIG_DEBUG_FS
static const char *arm_cmn_device_type(u8 type)
......@@ -595,6 +598,9 @@ static umode_t arm_cmn_event_attr_is_visible(struct kobject *kobj,
if ((intf & 4) && !(cmn->ports_used & BIT(intf & 3)))
return 0;
if (chan == 4 && cmn->model == CMN600)
return 0;
if ((chan == 5 && cmn->rsp_vc_num < 2) ||
(chan == 6 && cmn->dat_vc_num < 2))
return 0;
......@@ -905,15 +911,18 @@ static u32 arm_cmn_wp_config(struct perf_event *event)
u32 grp = CMN_EVENT_WP_GRP(event);
u32 exc = CMN_EVENT_WP_EXCLUSIVE(event);
u32 combine = CMN_EVENT_WP_COMBINE(event);
bool is_cmn600 = to_cmn(event->pmu)->model == CMN600;
config = FIELD_PREP(CMN_DTM_WPn_CONFIG_WP_DEV_SEL, dev) |
FIELD_PREP(CMN_DTM_WPn_CONFIG_WP_CHN_SEL, chn) |
FIELD_PREP(CMN_DTM_WPn_CONFIG_WP_GRP, grp) |
FIELD_PREP(CMN_DTM_WPn_CONFIG_WP_EXCLUSIVE, exc) |
FIELD_PREP(CMN_DTM_WPn_CONFIG_WP_DEV_SEL2, dev >> 1);
if (exc)
config |= is_cmn600 ? CMN600_WPn_CONFIG_WP_EXCLUSIVE :
CMN_DTM_WPn_CONFIG_WP_EXCLUSIVE;
if (combine && !grp)
config |= CMN_DTM_WPn_CONFIG_WP_COMBINE;
config |= is_cmn600 ? CMN600_WPn_CONFIG_WP_COMBINE :
CMN_DTM_WPn_CONFIG_WP_COMBINE;
return config;
}
......
......@@ -109,6 +109,8 @@ static inline u64 arm_pmu_event_max_period(struct perf_event *event)
{
if (event->hw.flags & ARMPMU_EVT_64BIT)
return GENMASK_ULL(63, 0);
else if (event->hw.flags & ARMPMU_EVT_47BIT)
return GENMASK_ULL(46, 0);
else
return GENMASK_ULL(31, 0);
}
......@@ -524,7 +526,7 @@ static void armpmu_enable(struct pmu *pmu)
{
struct arm_pmu *armpmu = to_arm_pmu(pmu);
struct pmu_hw_events *hw_events = this_cpu_ptr(armpmu->hw_events);
int enabled = bitmap_weight(hw_events->used_mask, armpmu->num_events);
bool enabled = !bitmap_empty(hw_events->used_mask, armpmu->num_events);
/* For task-bound events we may be called on other CPUs */
if (!cpumask_test_cpu(smp_processor_id(), &armpmu->supported_cpus))
......@@ -785,7 +787,7 @@ static int cpu_pm_pmu_notify(struct notifier_block *b, unsigned long cmd,
{
struct arm_pmu *armpmu = container_of(b, struct arm_pmu, cpu_pm_nb);
struct pmu_hw_events *hw_events = this_cpu_ptr(armpmu->hw_events);
int enabled = bitmap_weight(hw_events->used_mask, armpmu->num_events);
bool enabled = !bitmap_empty(hw_events->used_mask, armpmu->num_events);
if (!cpumask_test_cpu(smp_processor_id(), &armpmu->supported_cpus))
return NOTIFY_DONE;
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment