1. 05 Oct, 2022 1 commit
    • Nathan Chancellor's avatar
      arm64: alternatives: Use vdso/bits.h instead of linux/bits.h · d2995249
      Nathan Chancellor authored
      When building with CONFIG_LTO after commit ba00c2a0 ("arm64: fix the
      build with binutils 2.27"), the following build error occurs:
      
        In file included from arch/arm64/kernel/module-plts.c:6:
        In file included from include/linux/elf.h:6:
        In file included from arch/arm64/include/asm/elf.h:8:
        In file included from arch/arm64/include/asm/hwcap.h:9:
        In file included from arch/arm64/include/asm/cpufeature.h:9:
        In file included from arch/arm64/include/asm/alternative-macros.h:5:
        In file included from include/linux/bits.h:22:
        In file included from include/linux/build_bug.h:5:
        In file included from include/linux/compiler.h:248:
        In file included from arch/arm64/include/asm/rwonce.h:71:
        include/asm-generic/rwonce.h:67:9: error: expected string literal in 'asm'
                return __READ_ONCE(*(unsigned long *)addr);
                      ^
        arch/arm64/include/asm/rwonce.h:43:16: note: expanded from macro '__READ_ONCE'
                        asm volatile(__LOAD_RCPC(b, %w0, %1)                    \
                                    ^
        arch/arm64/include/asm/rwonce.h:17:2: note: expanded from macro '__LOAD_RCPC'
                ALTERNATIVE(                                                    \
                ^
      
      Similar to the issue resolved by commit 0072dc1b ("arm64: avoid
      BUILD_BUG_ON() in alternative-macros"), there is a circular include
      dependency through <linux/bits.h> when CONFIG_LTO is enabled due to
      <asm/rwonce.h> appearing in the include chain before the contents of
      <asm/alternative-macros.h>, which results in ALTERNATIVE() not getting
      expanded properly because it has not been defined yet.
      
      Avoid this issue by including <vdso/bits.h>, which includes the
      definition of the BIT() macro, instead of <linux/bits.h>, as BIT() is the
      only macro from bits.h that is relevant to this header.
      
      Fixes: ba00c2a0 ("arm64: fix the build with binutils 2.27")
      Link: https://github.com/ClangBuiltLinux/linux/issues/1728Signed-off-by: default avatarNathan Chancellor <nathan@kernel.org>
      Tested-by: default avatarWill Deacon <will@kernel.org>
      Link: https://lore.kernel.org/r/20221003193759.1141709-1-nathan@kernel.orgSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      d2995249
  2. 30 Sep, 2022 5 commits
    • Catalin Marinas's avatar
      Merge branch 'for-next/misc' into for-next/core · 53630a1f
      Catalin Marinas authored
      * for-next/misc:
        : Miscellaneous patches
        arm64/kprobe: Optimize the performance of patching single-step slot
        ARM64: reloc_test: add __init/__exit annotations to module init/exit funcs
        arm64/mm: fold check for KFENCE into can_set_direct_map()
        arm64: uaccess: simplify uaccess_mask_ptr()
        arm64: mte: move register initialization to C
        arm64: mm: handle ARM64_KERNEL_USES_PMD_MAPS in vmemmap_populate()
        arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()
        arm64: support huge vmalloc mappings
        arm64: spectre: increase parameters that can be used to turn off bhb mitigation individually
        arm64: run softirqs on the per-CPU IRQ stack
        arm64: compat: Implement misalignment fixups for multiword loads
      53630a1f
    • Catalin Marinas's avatar
      Merge branch 'for-next/alternatives' into for-next/core · c704cf27
      Catalin Marinas authored
      * for-next/alternatives:
        : Alternatives (code patching) improvements
        arm64: fix the build with binutils 2.27
        arm64: avoid BUILD_BUG_ON() in alternative-macros
        arm64: alternatives: add shared NOP callback
        arm64: alternatives: add alternative_has_feature_*()
        arm64: alternatives: have callbacks take a cap
        arm64: alternatives: make alt_region const
        arm64: alternatives: hoist print out of __apply_alternatives()
        arm64: alternatives: proton-pack: prepare for cap changes
        arm64: alternatives: kvm: prepare for cap changes
        arm64: cpufeature: make cpus_have_cap() noinstr-safe
      c704cf27
    • Catalin Marinas's avatar
      Merge branch 'for-next/kselftest' into for-next/core · c3976232
      Catalin Marinas authored
      * for-next/kselftest: (28 commits)
        : Kselftest updates for arm64
        kselftest/arm64: Handle EINTR while reading data from children
        kselftest/arm64: Flag fp-stress as exiting when we begin finishing up
        kselftest/arm64: Don't repeat termination handler for fp-stress
        kselftest/arm64: Don't enable v8.5 for MTE selftest builds
        kselftest/arm64: Fix typo in hwcap check
        kselftest/arm64: Add hwcap test for RNG
        kselftest/arm64: Add SVE 2 to the tested hwcaps
        kselftest/arm64: Add missing newline in hwcap output
        kselftest/arm64: Fix spelling misakes of signal names
        kselftest/arm64: Enforce actual ABI for SVE syscalls
        kselftest/arm64: Correct buffer allocation for SVE Z registers
        kselftest/arm64: Include larger SVE and SME VLs in signal tests
        kselftest/arm64: Allow larger buffers in get_signal_context()
        kselftest/arm64: Preserve any EXTRA_CONTEXT in handle_signal_copyctx()
        kselftest/arm64: Validate contents of EXTRA_CONTEXT blocks
        kselftest/arm64: Only validate each signal context once
        kselftest/arm64: Remove unneeded protype for validate_extra_context()
        kselftest/arm64: Fix validation of EXTRA_CONTEXT signal context location
        kselftest/arm64: Fix validatation termination record after EXTRA_CONTEXT
        kselftest/arm64: Validate signal ucontext in place
        ...
      c3976232
    • Catalin Marinas's avatar
      Merge branches 'for-next/doc', 'for-next/sve', 'for-next/sysreg',... · b23ec74c
      Catalin Marinas authored
      Merge branches 'for-next/doc', 'for-next/sve', 'for-next/sysreg', 'for-next/gettimeofday', 'for-next/stacktrace', 'for-next/atomics', 'for-next/el1-exceptions', 'for-next/a510-erratum-2658417', 'for-next/defconfig', 'for-next/tpidr2_el0' and 'for-next/ftrace', remote-tracking branch 'arm64/for-next/perf' into for-next/core
      
      * arm64/for-next/perf:
        arm64: asm/perf_regs.h: Avoid C++-style comment in UAPI header
        arm64/sve: Add Perf extensions documentation
        perf: arm64: Add SVE vector granule register to user regs
        MAINTAINERS: add maintainers for Alibaba' T-Head PMU driver
        drivers/perf: add DDR Sub-System Driveway PMU driver for Yitian 710 SoC
        docs: perf: Add description for Alibaba's T-Head PMU driver
      
      * for-next/doc:
        : Documentation/arm64 updates
        arm64/sve: Document our actual ABI for clearing registers on syscall
      
      * for-next/sve:
        : SVE updates
        arm64/sysreg: Add hwcap for SVE EBF16
      
      * for-next/sysreg: (35 commits)
        : arm64 system registers generation (more conversions)
        arm64/sysreg: Fix a few missed conversions
        arm64/sysreg: Convert ID_AA64AFRn_EL1 to automatic generation
        arm64/sysreg: Convert ID_AA64DFR1_EL1 to automatic generation
        arm64/sysreg: Convert ID_AA64FDR0_EL1 to automatic generation
        arm64/sysreg: Use feature numbering for PMU and SPE revisions
        arm64/sysreg: Add _EL1 into ID_AA64DFR0_EL1 definition names
        arm64/sysreg: Align field names in ID_AA64DFR0_EL1 with architecture
        arm64/sysreg: Add defintion for ALLINT
        arm64/sysreg: Convert SCXTNUM_EL1 to automatic generation
        arm64/sysreg: Convert TIPDR_EL1 to automatic generation
        arm64/sysreg: Convert ID_AA64PFR1_EL1 to automatic generation
        arm64/sysreg: Convert ID_AA64PFR0_EL1 to automatic generation
        arm64/sysreg: Convert ID_AA64MMFR2_EL1 to automatic generation
        arm64/sysreg: Convert ID_AA64MMFR1_EL1 to automatic generation
        arm64/sysreg: Convert ID_AA64MMFR0_EL1 to automatic generation
        arm64/sysreg: Convert HCRX_EL2 to automatic generation
        arm64/sysreg: Standardise naming of ID_AA64PFR1_EL1 SME enumeration
        arm64/sysreg: Standardise naming of ID_AA64PFR1_EL1 BTI enumeration
        arm64/sysreg: Standardise naming of ID_AA64PFR1_EL1 fractional version fields
        arm64/sysreg: Standardise naming for MTE feature enumeration
        ...
      
      * for-next/gettimeofday:
        : Use self-synchronising counter access in gettimeofday() (if FEAT_ECV)
        arm64: vdso: use SYS_CNTVCTSS_EL0 for gettimeofday
        arm64: alternative: patch alternatives in the vDSO
        arm64: module: move find_section to header
      
      * for-next/stacktrace:
        : arm64 stacktrace cleanups and improvements
        arm64: stacktrace: track hyp stacks in unwinder's address space
        arm64: stacktrace: track all stack boundaries explicitly
        arm64: stacktrace: remove stack type from fp translator
        arm64: stacktrace: rework stack boundary discovery
        arm64: stacktrace: add stackinfo_on_stack() helper
        arm64: stacktrace: move SDEI stack helpers to stacktrace code
        arm64: stacktrace: rename unwind_next_common() -> unwind_next_frame_record()
        arm64: stacktrace: simplify unwind_next_common()
        arm64: stacktrace: fix kerneldoc comments
      
      * for-next/atomics:
        : arm64 atomics improvements
        arm64: atomic: always inline the assembly
        arm64: atomics: remove LL/SC trampolines
      
      * for-next/el1-exceptions:
        : Improve the reporting of EL1 exceptions
        arm64: rework BTI exception handling
        arm64: rework FPAC exception handling
        arm64: consistently pass ESR_ELx to die()
        arm64: die(): pass 'err' as long
        arm64: report EL1 UNDEFs better
      
      * for-next/a510-erratum-2658417:
        : Cortex-A510: 2658417: remove BF16 support due to incorrect result
        arm64: errata: remove BF16 HWCAP due to incorrect result on Cortex-A510
        arm64: cpufeature: Expose get_arm64_ftr_reg() outside cpufeature.c
        arm64: cpufeature: Force HWCAP to be based on the sysreg visible to user-space
      
      * for-next/defconfig:
        : arm64 defconfig updates
        arm64: defconfig: Add Coresight as module
        arm64: Enable docker support in defconfig
        arm64: defconfig: Enable memory hotplug and hotremove config
        arm64: configs: Enable all PMUs provided by Arm
      
      * for-next/tpidr2_el0:
        : arm64 ptrace() support for TPIDR2_EL0
        kselftest/arm64: Add coverage of TPIDR2_EL0 ptrace interface
        arm64/ptrace: Support access to TPIDR2_EL0
        arm64/ptrace: Document extension of NT_ARM_TLS to cover TPIDR2_EL0
        kselftest/arm64: Add test coverage for NT_ARM_TLS
      
      * for-next/ftrace:
        : arm64 ftraces updates/fixes
        arm64: ftrace: fix module PLTs with mcount
        arm64: module: Remove unused plt_entry_is_initialized()
        arm64: module: Make plt_equals_entry() static
      b23ec74c
    • Liao Chang's avatar
      arm64/kprobe: Optimize the performance of patching single-step slot · a0caebbd
      Liao Chang authored
      Single-step slot would not be used until kprobe is enabled, that means
      no race condition occurs on it under SMP, hence it is safe to pacth ss
      slot without stopping machine.
      
      Since I and D caches are coherent within single-step slot from
      aarch64_insn_patch_text_nosync(), hence no need to do it again via
      flush_icache_range().
      Acked-by: default avatarWill Deacon <will@kernel.org>
      Acked-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Signed-off-by: default avatarLiao Chang <liaochang1@huawei.com>
      Link: https://lore.kernel.org/r/20220927022435.129965-4-liaochang1@huawei.comSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      a0caebbd
  3. 29 Sep, 2022 11 commits
  4. 23 Sep, 2022 1 commit
    • Mark Rutland's avatar
      arm64: uaccess: simplify uaccess_mask_ptr() · 2305b809
      Mark Rutland authored
      We introduced uaccess pointer masking for arm64 in commit:
      
        4d8efc2d ("arm64: Use pointer masking to limit uaccess speculation")
      
      Which was intended to prevent speculative uaccesses to kernel memory on
      CPUs where access permissions were not respected under speculation.
      
      At the time, the uaccess primitives were occasionally used to access
      kernel memory, with the maximum permitted address held in
      thread_info::addr_limit. Consequently, the address masking needed to
      take this dynamic limit into account.
      
      Subsequently the uaccess primitives were reworked such that they are
      only used for user memory, and as of commit:
      
        3d2403fd ("arm64: uaccess: remove set_fs()")
      
      ... the address limit was made a compile-time constant, but the logic
      was otherwise unchanged.
      
      Regardless of the configured VA size or whether TBI is in use, the
      address space can be divided into three ranges:
      
      * The TTBR0 VA range, for which any valid pointer has bit 55 *clear*,
        and any non-tag bits [63-56] must match bit 55 (i.e. must be clear).
      
      * The TTBR1 VA range, for which any valid pointer has bit 55 *set*, and
        any non-tag bits [63-56] must match bit 55 (i.e. must be set).
      
      * The gap between the TTBR0 and TTBR1 ranges, where bit 55 may be set or
        clear, but any access will result in a fault.
      
      As the uaccess primitives are now only used for user memory in the TTBR0
      VA range, we can prevent generation of TTBR1 addresses by clearing bit
      55, which will either result in a TTBR0 address or a faulting address
      between the TTBR VA ranges.
      
      This is beneficial for code generation as:
      
      * We no longer clobber the condition codes.
      
      * We no longer burn a register on (TASK_SIZE_MAX - 1).
      
      * We no longer need to consume the untagged pointer.
      
      When building a defconfig v6.0-rc3 with GCC 12.1.0, this change makes
      the resulting Image 64KiB smaller.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Reviewed-by: default avatarRobin Murphy <robin.murphy@arm.com>
      Link: https://lore.kernel.org/r/20220922151053.3520750-1-mark.rutland@arm.com
      [catalin.marinas@arm.com: remove csdb() as the bit clearing is unconditional]
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      2305b809
  5. 22 Sep, 2022 10 commits
  6. 21 Sep, 2022 9 commits
  7. 16 Sep, 2022 3 commits
    • Mark Rutland's avatar
      arm64: alternatives: add shared NOP callback · d926079f
      Mark Rutland authored
      For each instance of an alternative, the compiler outputs a distinct
      copy of the alternative instructions into a subsection. As the compiler
      doesn't have special knowledge of alternatives, it cannot coalesce these
      to save space.
      
      In a defconfig kernel built with GCC 12.1.0, there are approximately
      10,000 instances of alternative_has_feature_likely(), where the
      replacement instruction is always a NOP. As NOPs are
      position-independent, we don't need a unique copy per alternative
      sequence.
      
      This patch adds a callback to patch an alternative sequence with NOPs,
      and make use of this in alternative_has_feature_likely(). So that this
      can be used for other sites in future, this is written to patch multiple
      instructions up to the original sequence length.
      
      For NVHE, an alias is added to image-vars.h.
      
      For modules, the callback is exported. Note that as modules are loaded
      within 2GiB of the kernel, an alt_instr entry in a module can always
      refer directly to the callback, and no special handling is necessary.
      
      When building with GCC 12.1.0, the vmlinux is ~158KiB smaller, though
      the resulting Image size is unchanged due to alignment constraints and
      padding:
      
      | % ls -al vmlinux-*
      | -rwxr-xr-x 1 mark mark 134644592 Sep  1 14:52 vmlinux-after
      | -rwxr-xr-x 1 mark mark 134486232 Sep  1 14:50 vmlinux-before
      | % ls -al Image-*
      | -rw-r--r-- 1 mark mark 37108224 Sep  1 14:52 Image-after
      | -rw-r--r-- 1 mark mark 37108224 Sep  1 14:50 Image-before
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: James Morse <james.morse@arm.com>
      Cc: Joey Gouly <joey.gouly@arm.com>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Will Deacon <will@kernel.org>
      Reviewed-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Link: https://lore.kernel.org/r/20220912162210.3626215-9-mark.rutland@arm.comSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      d926079f
    • Mark Rutland's avatar
      arm64: alternatives: add alternative_has_feature_*() · 21fb26bf
      Mark Rutland authored
      Currrently we use a mixture of alternative sequences and static branches
      to handle features detected at boot time. For ease of maintenance we
      generally prefer to use static branches in C code, but this has a few
      downsides:
      
      * Each static branch has metadata in the __jump_table section, which is
        not discarded after features are finalized. This wastes some space,
        and slows down the patching of other static branches.
      
      * The static branches are patched at a different point in time from the
        alternatives, so changes are not atomic. This leaves a transient
        period where there could be a mismatch between the behaviour of
        alternatives and static branches, which could be problematic for some
        features (e.g. pseudo-NMI).
      
      * More (instrumentable) kernel code is executed to patch each static
        branch, which can be risky when patching certain features (e.g.
        irqflags management for pseudo-NMI).
      
      * When CONFIG_JUMP_LABEL=n, static branches are turned into a load of a
        flag and a conditional branch. This means it isn't safe to use such
        static branches in an alternative address space (e.g. the NVHE/PKVM
        hyp code), where the generated address isn't safe to acccess.
      
      To deal with these issues, this patch introduces new
      alternative_has_feature_*() helpers, which work like static branches but
      are patched using alternatives. This ensures the patching is performed
      at the same time as other alternative patching, allows the metadata to
      be freed after patching, and is safe for use in alternative address
      spaces.
      
      Note that all supported toolchains have asm goto support, and since
      commit:
      
        a0a12c3e ("asm goto: eradicate CC_HAS_ASM_GOTO)"
      
      ... the CC_HAS_ASM_GOTO Kconfig symbol has been removed, so no feature
      check is necessary, and we can always make use of asm goto.
      
      Additionally, note that:
      
      * This has no impact on cpus_have_cap(), which is a dynamic check.
      
      * This has no functional impact on cpus_have_const_cap(). The branches
        are patched slightly later than before this patch, but these branches
        are not reachable until caps have been finalised.
      
      * It is now invalid to use cpus_have_final_cap() in the window between
        feature detection and patching. All existing uses are only expected
        after patching anyway, so this should not be a problem.
      
      * The LSE atomics will now be enabled during alternatives patching
        rather than immediately before. As the LL/SC an LSE atomics are
        functionally equivalent this should not be problematic.
      
      When building defconfig with GCC 12.1.0, the resulting Image is 64KiB
      smaller:
      
      | % ls -al Image-*
      | -rw-r--r-- 1 mark mark 37108224 Aug 23 09:56 Image-after
      | -rw-r--r-- 1 mark mark 37173760 Aug 23 09:54 Image-before
      
      According to bloat-o-meter.pl:
      
      | add/remove: 44/34 grow/shrink: 602/1294 up/down: 39692/-61108 (-21416)
      | Function                                     old     new   delta
      | [...]
      | Total: Before=16618336, After=16596920, chg -0.13%
      | add/remove: 0/2 grow/shrink: 0/0 up/down: 0/-1296 (-1296)
      | Data                                         old     new   delta
      | arm64_const_caps_ready                        16       -     -16
      | cpu_hwcap_keys                              1280       -   -1280
      | Total: Before=8987120, After=8985824, chg -0.01%
      | add/remove: 0/0 grow/shrink: 0/0 up/down: 0/0 (0)
      | RO Data                                      old     new   delta
      | Total: Before=18408, After=18408, chg +0.00%
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: James Morse <james.morse@arm.com>
      Cc: Joey Gouly <joey.gouly@arm.com>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Will Deacon <will@kernel.org>
      Reviewed-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Link: https://lore.kernel.org/r/20220912162210.3626215-8-mark.rutland@arm.comSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      21fb26bf
    • Mark Rutland's avatar
      arm64: alternatives: have callbacks take a cap · 4c0bd995
      Mark Rutland authored
      Today, callback alternatives are special-cased within
      __apply_alternatives(), and are applied alongside patching for system
      capabilities as ARM64_NCAPS is not part of the boot_capabilities feature
      mask.
      
      This special-casing is less than ideal. Giving special meaning to
      ARM64_NCAPS for this requires some structures and loops to use
      ARM64_NCAPS + 1 (AKA ARM64_NPATCHABLE), while others use ARM64_NCAPS.
      It's also not immediately clear callback alternatives are only applied
      when applying alternatives for system-wide features.
      
      To make this a bit clearer, changes the way that callback alternatives
      are identified to remove the special-casing of ARM64_NCAPS, and to allow
      callback alternatives to be associated with a cpucap as with all other
      alternatives.
      
      New cpucaps, ARM64_ALWAYS_BOOT and ARM64_ALWAYS_SYSTEM are added which
      are always detected alongside boot cpu capabilities and system
      capabilities respectively. All existing callback alternatives are made
      to use ARM64_ALWAYS_SYSTEM, and so will be patched at the same point
      during the boot flow as before.
      
      Subsequent patches will make more use of these new cpucaps.
      
      There should be no functional change as a result of this patch.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: James Morse <james.morse@arm.com>
      Cc: Joey Gouly <joey.gouly@arm.com>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Will Deacon <will@kernel.org>
      Reviewed-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Link: https://lore.kernel.org/r/20220912162210.3626215-7-mark.rutland@arm.comSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      4c0bd995