1. 16 Jul, 2020 5 commits
    • Will Deacon's avatar
      arm64: ptrace: Add a comment describing our syscall entry/exit trap ABI · 59ee987e
      Will Deacon authored
      Our tracehook logic for syscall entry/exit raises a SIGTRAP back to the
      tracer following a ptrace request such as PTRACE_SYSCALL. As part of this
      procedure, we clobber the reported value of one of the tracee's general
      purpose registers (x7 for native tasks, r12 for compat) to indicate
      whether the stop occurred on syscall entry or exit. This is a slightly
      unfortunate ABI, as it prevents the tracer from accessing the real
      register value and is at odds with other similar stops such as seccomp
      traps.
      
      Since we're stuck with this ABI, expand the comment in our tracehook
      logic to acknowledge the issue and describe the behaviour in more detail.
      
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Luis Machado <luis.machado@linaro.org>
      Reported-by: default avatarKeno Fischer <keno@juliacomputing.com>
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      59ee987e
    • Will Deacon's avatar
      arm64: compat: Ensure upper 32 bits of x0 are zero on syscall return · 15956689
      Will Deacon authored
      Although we zero the upper bits of x0 on entry to the kernel from an
      AArch32 task, we do not clear them on the exception return path and can
      therefore expose 64-bit sign extended syscall return values to userspace
      via interfaces such as the 'perf_regs' ABI, which deal exclusively with
      64-bit registers.
      
      Explicitly clear the upper 32 bits of x0 on return from a compat system
      call.
      
      Cc: <stable@vger.kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Keno Fischer <keno@juliacomputing.com>
      Cc: Luis Machado <luis.machado@linaro.org>
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      15956689
    • Will Deacon's avatar
      arm64: ptrace: Override SPSR.SS when single-stepping is enabled · 3a5a4366
      Will Deacon authored
      Luis reports that, when reverse debugging with GDB, single-step does not
      function as expected on arm64:
      
        | I've noticed, under very specific conditions, that a PTRACE_SINGLESTEP
        | request by GDB won't execute the underlying instruction. As a consequence,
        | the PC doesn't move, but we return a SIGTRAP just like we would for a
        | regular successful PTRACE_SINGLESTEP request.
      
      The underlying problem is that when the CPU register state is restored
      as part of a reverse step, the SPSR.SS bit is cleared and so the hardware
      single-step state can transition to the "active-pending" state, causing
      an unexpected step exception to be taken immediately if a step operation
      is attempted.
      
      In hindsight, we probably shouldn't have exposed SPSR.SS in the pstate
      accessible by the GPR regset, but it's a bit late for that now. Instead,
      simply prevent userspace from configuring the bit to a value which is
      inconsistent with the TIF_SINGLESTEP state for the task being traced.
      
      Cc: <stable@vger.kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Keno Fischer <keno@juliacomputing.com>
      Link: https://lore.kernel.org/r/1eed6d69-d53d-9657-1fc9-c089be07f98c@linaro.orgReported-by: default avatarLuis Machado <luis.machado@linaro.org>
      Tested-by: default avatarLuis Machado <luis.machado@linaro.org>
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      3a5a4366
    • Will Deacon's avatar
      arm64: ptrace: Consistently use pseudo-singlestep exceptions · ac2081cd
      Will Deacon authored
      Although the arm64 single-step state machine can be fast-forwarded in
      cases where we wish to generate a SIGTRAP without actually executing an
      instruction, this has two major limitations outside of simply skipping
      an instruction due to emulation.
      
      1. Stepping out of a ptrace signal stop into a signal handler where
         SIGTRAP is blocked. Fast-forwarding the stepping state machine in
         this case will result in a forced SIGTRAP, with the handler reset to
         SIG_DFL.
      
      2. The hardware implicitly fast-forwards the state machine when executing
         an SVC instruction for issuing a system call. This can interact badly
         with subsequent ptrace stops signalled during the execution of the
         system call (e.g. SYSCALL_EXIT or seccomp traps), as they may corrupt
         the stepping state by updating the PSTATE for the tracee.
      
      Resolve both of these issues by injecting a pseudo-singlestep exception
      on entry to a signal handler and also on return to userspace following a
      system call.
      
      Cc: <stable@vger.kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Tested-by: default avatarLuis Machado <luis.machado@linaro.org>
      Reported-by: default avatarKeno Fischer <keno@juliacomputing.com>
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      ac2081cd
    • Qi Liu's avatar
      drivers/perf: Fix kernel panic when rmmod PMU modules during perf sampling · bdc5c744
      Qi Liu authored
      When users try to remove PMU modules during perf sampling, kernel panic
      will happen because the pmu->read() is a NULL pointer here.
      
      INFO on HiSilicon hip08 platform as follow:
      pc : hisi_uncore_pmu_event_update+0x30/0xa4 [hisi_uncore_pmu]
      lr : hisi_uncore_pmu_read+0x20/0x2c [hisi_uncore_pmu]
      sp : ffff800010103e90
      x29: ffff800010103e90 x28: ffff0027db0c0e40
      x27: ffffa29a76f129d8 x26: ffffa29a77ceb000
      x25: ffffa29a773a5000 x24: ffffa29a77392000
      x23: ffffddffe5943f08 x22: ffff002784285960
      x21: ffff002784285800 x20: ffff0027d2e76c80
      x19: ffff0027842859e0 x18: ffff80003498bcc8
      x17: ffffa29a76afe910 x16: ffffa29a7583f530
      x15: 16151a1512061a1e x14: 0000000000000000
      x13: ffffa29a76f1e238 x12: 0000000000000001
      x11: 0000000000000400 x10: 00000000000009f0
      x9 : ffff8000107b3e70 x8 : ffff0027db0c1890
      x7 : ffffa29a773a7000 x6 : 00000007f5131013
      x5 : 00000007f5131013 x4 : 09f257d417c00000
      x3 : 00000002187bd7ce x2 : ffffa29a38f0f0d8
      x1 : ffffa29a38eae268 x0 : ffff0027d2e76c80
      Call trace:
      hisi_uncore_pmu_event_update+0x30/0xa4 [hisi_uncore_pmu]
      hisi_uncore_pmu_read+0x20/0x2c [hisi_uncore_pmu]
      __perf_event_read+0x1a0/0x1f8
      flush_smp_call_function_queue+0xa0/0x160
      generic_smp_call_function_single_interrupt+0x18/0x20
      handle_IPI+0x31c/0x4dc
      gic_handle_irq+0x2c8/0x310
      el1_irq+0xcc/0x180
      arch_cpu_idle+0x4c/0x20c
      default_idle_call+0x20/0x30
      do_idle+0x1b4/0x270
      cpu_startup_entry+0x28/0x30
      secondary_start_kernel+0x1a4/0x1fc
      
      To solve the above issue, current module should be registered to kernel,
      so that try_module_get() can be invoked when perf sampling starts. This
      adds the reference counting of module and could prevent users from removing
      modules during sampling.
      Reported-by: default avatarHaifeng Wang <wang.wanghaifeng@huawei.com>
      Signed-off-by: default avatarQi Liu <liuqi115@huawei.com>
      Reviewed-by: default avatarJohn Garry <john.garry@huawei.com>
      Link: https://lore.kernel.org/r/1594891165-8228-1-git-send-email-liuqi115@huawei.comSigned-off-by: default avatarWill Deacon <will@kernel.org>
      bdc5c744
  2. 13 Jul, 2020 1 commit
    • Will Deacon's avatar
      efi/libstub/arm64: Retain 2MB kernel Image alignment if !KASLR · 7c116db2
      Will Deacon authored
      Since commit 82046702 ("efi/libstub/arm64: Replace 'preferred' offset
      with alignment check"), loading a relocatable arm64 kernel at a physical
      address which is not 2MB aligned and subsequently booting with EFI will
      leave the Image in-place, relying on the kernel to relocate itself early
      during boot. In conjunction with commit dd4bc607 ("arm64: warn on
      incorrect placement of the kernel by the bootloader"), which enables
      CONFIG_RELOCATABLE by default, this effectively means that entering an
      arm64 kernel loaded at an alignment smaller than 2MB with EFI (e.g. using
      QEMU) will result in silent relocation at runtime.
      
      Unfortunately, this has a subtle but confusing affect for developers
      trying to inspect the PC value during a crash and comparing it to the
      symbol addresses in vmlinux using tools such as 'nm' or 'addr2line';
      all text addresses will be displaced by a sub-2MB offset, resulting in
      the wrong symbol being identified in many cases. Passing "nokaslr" on
      the command line or disabling "CONFIG_RANDOMIZE_BASE" does not help,
      since the EFI stub only copies the kernel Image to a 2MB boundary if it
      is not relocatable.
      
      Adjust the EFI stub for arm64 so that the minimum Image alignment is 2MB
      unless KASLR is in use.
      
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: David Brazdil <dbrazdil@google.com>
      Acked-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      7c116db2
  3. 09 Jul, 2020 2 commits
  4. 08 Jul, 2020 11 commits
  5. 03 Jul, 2020 3 commits
  6. 02 Jul, 2020 1 commit
    • Ard Biesheuvel's avatar
      arm64/alternatives: use subsections for replacement sequences · f7b93d42
      Ard Biesheuvel authored
      When building very large kernels, the logic that emits replacement
      sequences for alternatives fails when relative branches are present
      in the code that is emitted into the .altinstr_replacement section
      and patched in at the original site and fixed up. The reason is that
      the linker will insert veneers if relative branches go out of range,
      and due to the relative distance of the .altinstr_replacement from
      the .text section where its branch targets usually live, veneers
      may be emitted at the end of the .altinstr_replacement section, with
      the relative branches in the sequence pointed at the veneers instead
      of the actual target.
      
      The alternatives patching logic will attempt to fix up the branch to
      point to its original target, which will be the veneer in this case,
      but given that the patch site is likely to be far away as well, it
      will be out of range and so patching will fail. There are other cases
      where these veneers are problematic, e.g., when the target of the
      branch is in .text while the patch site is in .init.text, in which
      case putting the replacement sequence inside .text may not help either.
      
      So let's use subsections to emit the replacement code as closely as
      possible to the patch site, to ensure that veneers are only likely to
      be emitted if they are required at the patch site as well, in which
      case they will be in range for the replacement sequence both before
      and after it is transported to the patch site.
      
      This will prevent alternative sequences in non-init code from being
      released from memory after boot, but this is tolerable given that the
      entire section is only 512 KB on an allyesconfig build (which weighs in
      at 500+ MB for the entire Image). Also, note that modules today carry
      the replacement sequences in non-init sections as well, and any of
      those that target init code will be emitted into init sections after
      this change.
      
      This fixes an early crash when booting an allyesconfig kernel on a
      system where any of the alternatives sequences containing relative
      branches are activated at boot (e.g., ARM64_HAS_PAN on TX2)
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Andre Przywara <andre.przywara@arm.com>
      Cc: Dave P Martin <dave.martin@arm.com>
      Link: https://lore.kernel.org/r/20200630081921.13443-1-ardb@kernel.orgSigned-off-by: default avatarWill Deacon <will@kernel.org>
      f7b93d42
  7. 25 Jun, 2020 2 commits
  8. 24 Jun, 2020 5 commits
  9. 23 Jun, 2020 5 commits
    • Mark Brown's avatar
      arm64: Depend on newer binutils when building PAC · 4dc9b282
      Mark Brown authored
      Versions of binutils prior to 2.33.1 don't understand the ELF notes that
      are added by modern compilers to indicate the PAC and BTI options used
      to build the code. This causes them to emit large numbers of warnings in
      the form:
      
      aarch64-linux-gnu-nm: warning: .tmp_vmlinux.kallsyms2: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0000000
      
      during the kernel build which is currently causing quite a bit of
      disruption for automated build testing using clang.
      
      In commit 15cd0e67 (arm64: Kconfig: ptrauth: Add binutils version
      check to fix mismatch) we added a dependency on binutils to avoid this
      issue when building with versions of GCC that emit the notes but did not
      do so for clang as it was believed that the existing check for
      .cfi_negate_ra_state was already requiring a new enough binutils. This
      does not appear to be the case for some versions of binutils (eg, the
      binutils in Debian 10) so instead refactor so we require a new enough
      GNU binutils in all cases other than when we are using an old GCC
      version that does not emit notes.
      
      Other, more exotic, combinations of tools are possible such as using
      clang, lld and gas together are possible and may have further problems
      but rather than adding further version checks it looks like the most
      robust thing will be to just test that we can build cleanly with the
      configured tools but that will require more review and discussion so do
      this for now to address the immediate problem disrupting build testing.
      Reported-by: default avatarKernelCI <bot@kernelci.org>
      Reported-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Link: https://github.com/ClangBuiltLinux/linux/issues/1054
      Link: https://lore.kernel.org/r/20200619123550.48098-1-broonie@kernel.orgSigned-off-by: default avatarWill Deacon <will@kernel.org>
      4dc9b282
    • Will Deacon's avatar
      arm64: compat: Remove 32-bit sigreturn code from the vDSO · 2d071968
      Will Deacon authored
      The sigreturn code in the compat vDSO is unused. Remove it.
      Reviewed-by: default avatarVincenzo Frascino <vincenzo.frascino@arm.com>
      Reviewed-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Reviewed-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      2d071968
    • Will Deacon's avatar
      arm64: compat: Always use sigpage for sigreturn trampoline · 8e411be6
      Will Deacon authored
      The 32-bit sigreturn trampoline in the compat sigpage matches the binary
      representation of the arch/arm/ sigpage exactly. This is important for
      debuggers (e.g. GDB) and unwinders (e.g. libunwind) since they rely
      on matching the instruction sequence in order to identify that they are
      unwinding through a signal. The same cannot be said for the sigreturn
      trampoline in the compat vDSO, which defeats the unwinder heuristics and
      instead attempts to use unwind directives for the unwinding. This is in
      contrast to arch/arm/, which never uses the vDSO for sigreturn.
      
      Ensure compatibility with arch/arm/ and existing unwinders by always
      using the sigpage for the sigreturn trampoline, regardless of the
      presence of the compat vDSO.
      Reviewed-by: default avatarVincenzo Frascino <vincenzo.frascino@arm.com>
      Reviewed-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Reviewed-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      8e411be6
    • Will Deacon's avatar
      arm64: compat: Allow 32-bit vdso and sigpage to co-exist · a39060b0
      Will Deacon authored
      In preparation for removing the signal trampoline from the compat vDSO,
      allow the sigpage and the compat vDSO to co-exist.
      
      For the moment the vDSO signal trampoline will still be used when built.
      Subsequent patches will move to the sigpage consistently.
      Acked-by: default avatarDave Martin <Dave.Martin@arm.com>
      Reviewed-by: default avatarVincenzo Frascino <vincenzo.frascino@arm.com>
      Reviewed-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Reviewed-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      a39060b0
    • Will Deacon's avatar
      arm64: vdso: Disable dwarf unwinding through the sigreturn trampoline · 87676cfc
      Will Deacon authored
      Commit 7e9f5e66 ("arm64: vdso: Add --eh-frame-hdr to ldflags") results
      in a .eh_frame_hdr section for the vDSO, which in turn causes the libgcc
      unwinder to unwind out of signal handlers using the .eh_frame information
      populated by our .cfi directives. In conjunction with a4eb355a
      ("arm64: vdso: Fix CFI directives in sigreturn trampoline"), this has
      been shown to cause segmentation faults originating from within the
      unwinder during thread cancellation:
      
       | Thread 14 "virtio-net-rx" received signal SIGSEGV, Segmentation fault.
       | 0x0000000000435e24 in uw_frame_state_for ()
       | (gdb) bt
       | #0  0x0000000000435e24 in uw_frame_state_for ()
       | #1  0x0000000000436e88 in _Unwind_ForcedUnwind_Phase2 ()
       | #2  0x00000000004374d8 in _Unwind_ForcedUnwind ()
       | #3  0x0000000000428400 in __pthread_unwind (buf=<optimized out>) at unwind.c:121
       | #4  0x0000000000429808 in __do_cancel () at ./pthreadP.h:304
       | #5  sigcancel_handler (sig=32, si=0xffff33c743f0, ctx=<optimized out>) at nptl-init.c:200
       | #6  sigcancel_handler (sig=<optimized out>, si=0xffff33c743f0, ctx=<optimized out>) at nptl-init.c:165
       | #7  <signal handler called>
       | #8  futex_wait_cancelable (private=0, expected=0, futex_word=0x3890b708) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
      
      After considerable bashing of heads, it appears that our CFI directives
      for unwinding out of the sigreturn trampoline are only processed by libgcc
      when both a .eh_frame_hdr section is present *and* the mysterious NOP is
      covered by an entry in .eh_frame. With both of these now in place, it has
      highlighted that our CFI directives are not comprehensive enough to
      restore the stack pointer of the interrupted context. This results in libgcc
      falling back to an arm64-specific unwinder after computing a bogus PC value
      from the unwind tables. The unwinder promptly dereferences this bogus address
      in an attempt to see if the pointed-to instruction sequence looks like
      the sigreturn trampoline.
      
      Restore the old unwind behaviour, which relied solely on heuristics in
      the unwinder, by removing the .eh_frame_hdr section from the vDSO and
      commenting out the insufficient CFI directives for now. Add comments to
      explain the current, miserable state of affairs.
      
      Cc: Tamas Zsoldos <tamas.zsoldos@arm.com>
      Cc: Szabolcs Nagy <szabolcs.nagy@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Daniel Kiss <daniel.kiss@arm.com>
      Acked-by: default avatarDave Martin <Dave.Martin@arm.com>
      Reviewed-by: default avatarVincenzo Frascino <vincenzo.frascino@arm.com>
      Reviewed-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Reported-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      87676cfc
  10. 18 Jun, 2020 3 commits
  11. 17 Jun, 2020 1 commit
    • Will Deacon's avatar
      arm64: bti: Require clang >= 10.0.1 for in-kernel BTI support · b9249cba
      Will Deacon authored
      Unfortunately, most versions of clang that support BTI are capable of
      miscompiling the kernel when converting a switch statement into a jump
      table. As an example, attempting to spawn a KVM guest results in a panic:
      
      [   56.253312] Kernel panic - not syncing: bad mode
      [   56.253834] CPU: 0 PID: 279 Comm: lkvm Not tainted 5.8.0-rc1 #2
      [   56.254225] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
      [   56.254712] Call trace:
      [   56.254952]  dump_backtrace+0x0/0x1d4
      [   56.255305]  show_stack+0x1c/0x28
      [   56.255647]  dump_stack+0xc4/0x128
      [   56.255905]  panic+0x16c/0x35c
      [   56.256146]  bad_el0_sync+0x0/0x58
      [   56.256403]  el1_sync_handler+0xb4/0xe0
      [   56.256674]  el1_sync+0x7c/0x100
      [   56.256928]  kvm_vm_ioctl_check_extension_generic+0x74/0x98
      [   56.257286]  __arm64_sys_ioctl+0x94/0xcc
      [   56.257569]  el0_svc_common+0x9c/0x150
      [   56.257836]  do_el0_svc+0x84/0x90
      [   56.258083]  el0_sync_handler+0xf8/0x298
      [   56.258361]  el0_sync+0x158/0x180
      
      This is because the switch in kvm_vm_ioctl_check_extension_generic()
      is executed as an indirect branch to tail-call through a jump table:
      
      ffff800010032dc8:       3869694c        ldrb    w12, [x10, x9]
      ffff800010032dcc:       8b0c096b        add     x11, x11, x12, lsl #2
      ffff800010032dd0:       d61f0160        br      x11
      
      However, where the target case uses the stack, the landing pad is elided
      due to the presence of a paciasp instruction:
      
      ffff800010032e14:       d503233f        paciasp
      ffff800010032e18:       a9bf7bfd        stp     x29, x30, [sp, #-16]!
      ffff800010032e1c:       910003fd        mov     x29, sp
      ffff800010032e20:       aa0803e0        mov     x0, x8
      ffff800010032e24:       940017c0        bl      ffff800010038d24 <kvm_vm_ioctl_check_extension>
      ffff800010032e28:       93407c00        sxtw    x0, w0
      ffff800010032e2c:       a8c17bfd        ldp     x29, x30, [sp], #16
      ffff800010032e30:       d50323bf        autiasp
      ffff800010032e34:       d65f03c0        ret
      
      Unfortunately, this results in a fatal exception because paciasp is
      compatible only with branch-and-link (call) instructions and not simple
      indirect branches.
      
      A fix is being merged into Clang 10.0.1 so that a 'bti j' instruction is
      emitted as an explicit landing pad in this situation. Make in-kernel
      BTI depend on that compiler version when building with clang.
      
      Cc: Tom Stellard <tstellar@redhat.com>
      Cc: Daniel Kiss <daniel.kiss@arm.com>
      Reviewed-by: default avatarMark Brown <broonie@kernel.org>
      Acked-by: default avatarDave Martin <Dave.Martin@arm.com>
      Reviewed-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Acked-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Link: https://lore.kernel.org/r/20200615105524.GA2694@willie-the-truck
      Link: https://lore.kernel.org/r/20200616183630.2445-1-will@kernel.orgSigned-off-by: default avatarWill Deacon <will@kernel.org>
      b9249cba
  12. 16 Jun, 2020 1 commit