Commits · eaa46a28d59655aa89a8fb885affa6fc0de44376 · Kirill Smelkov / linux

09 May, 2024 8 commits

Merge branch kvm-arm64/mpidr-reset into kvmarm-master/next · eaa46a28

Marc Zyngier authored May 09, 2024

* kvm-arm64/mpidr-reset:
  : .
  : Fixes for CLIDR_EL1 and MPIDR_EL1 being accidentally mutable across
  : a vcpu reset, courtesy of Oliver. From the cover letter:
  :
  : "For VM-wide feature ID registers we ensure they get initialized once for
  : the lifetime of a VM. On the other hand, vCPU-local feature ID registers
  : get re-initialized on every vCPU reset, potentially clobbering the
  : values userspace set up.
  :
  : MPIDR_EL1 and CLIDR_EL1 are the only registers in this space that we
  : allow userspace to modify for now. Clobbering the value of MPIDR_EL1 has
  : some disastrous side effects as the compressed index used by the
  : MPIDR-to-vCPU lookup table assumes MPIDR_EL1 is immutable after KVM_RUN.
  :
  : Series + reproducer test case to address the problem of KVM wiping out
  : userspace changes to these registers. Note that there are still some
  : differences between VM and vCPU scoped feature ID registers from the
  : perspective of userspace. We do not allow the value of VM-scope
  : registers to change after KVM_RUN, but vCPU registers remain mutable."
  : .
  KVM: selftests: arm64: Test vCPU-scoped feature ID registers
  KVM: selftests: arm64: Test that feature ID regs survive a reset
  KVM: selftests: arm64: Store expected register value in set_id_regs
  KVM: selftests: arm64: Rename helper in set_id_regs to imply VM scope
  KVM: arm64: Only reset vCPU-scoped feature ID regs once
  KVM: arm64: Reset VM feature ID regs from kvm_reset_sys_regs()
  KVM: arm64: Rename is_id_reg() to imply VM scope
Signed-off-by: Marc Zyngier <maz@kernel.org>

eaa46a28

KVM: selftests: arm64: Test vCPU-scoped feature ID registers · 606af829

Oliver Upton authored May 02, 2024

Test that CLIDR_EL1 and MPIDR_EL1 are modifiable from userspace and that
the values are preserved across a vCPU reset like the other feature ID
registers.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240502233529.1958459-8-oliver.upton@linux.devSigned-off-by: Marc Zyngier <maz@kernel.org>

606af829

KVM: selftests: arm64: Test that feature ID regs survive a reset · 07eabd8a

Oliver Upton authored May 02, 2024

One of the expectations with feature ID registers is that their values
survive a vCPU reset. Start testing that.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240502233529.1958459-7-oliver.upton@linux.devSigned-off-by: Marc Zyngier <maz@kernel.org>

07eabd8a

KVM: selftests: arm64: Store expected register value in set_id_regs · 46247a31

Oliver Upton authored May 02, 2024

Rather than comparing against what is returned by the ioctl, store
expected values for the feature ID registers in a table and compare with
that instead.

This will prove useful for subsequent tests involving vCPU reset.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240502233529.1958459-6-oliver.upton@linux.devSigned-off-by: Marc Zyngier <maz@kernel.org>

46247a31

KVM: selftests: arm64: Rename helper in set_id_regs to imply VM scope · 41ee9b33

Oliver Upton authored May 02, 2024

Prepare for a later change that'll cram in per-vCPU feature ID test
cases by renaming the current test case.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240502233529.1958459-5-oliver.upton@linux.devSigned-off-by: Marc Zyngier <maz@kernel.org>

41ee9b33

KVM: arm64: Only reset vCPU-scoped feature ID regs once · e0163337

Oliver Upton authored May 02, 2024

The general expecation with feature ID registers is that they're 'reset'
exactly once by KVM for the lifetime of a vCPU/VM, such that any
userspace changes to the CPU features / identity are honored after a
vCPU gets reset (e.g. PSCI_ON).

KVM handles what it calls VM-scoped feature ID registers correctly, but
feature ID registers local to a vCPU (CLIDR_EL1, MPIDR_EL1) get wiped
after every reset. What's especially concerning is that a
potentially-changing MPIDR_EL1 breaks MPIDR compression for indexing
mpidr_data, as the mask of useful bits to build the index could change.

This is absolutely no good. Avoid resetting vCPU feature ID registers
more than once.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240502233529.1958459-4-oliver.upton@linux.devSigned-off-by: Marc Zyngier <maz@kernel.org>

e0163337

KVM: arm64: Reset VM feature ID regs from kvm_reset_sys_regs() · 44cbe80b

Oliver Upton authored May 02, 2024

A subsequent change to KVM will expand the range of feature ID registers
that get special treatment at reset. Fold the existing ones back in to
kvm_reset_sys_regs() to avoid the need for an additional table walk.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240502233529.1958459-3-oliver.upton@linux.devSigned-off-by: Marc Zyngier <maz@kernel.org>

44cbe80b

KVM: arm64: Rename is_id_reg() to imply VM scope · 592efc60

Oliver Upton authored May 02, 2024

The naming of some of the feature ID checks is ambiguous. Rephrase the
is_id_reg() helper to make its purpose slightly clearer.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240502233529.1958459-2-oliver.upton@linux.devSigned-off-by: Marc Zyngier <maz@kernel.org>

592efc60

08 May, 2024 4 commits

Merge branch kvm-arm64/misc-6.10 into kvmarm-master/next · e2815706

Marc Zyngier authored May 08, 2024

* kvm-arm64/misc-6.10:
  : .
  : Misc fixes and updates targeting 6.10
  :
  : - Improve boot-time diagnostics when the sysreg tables
  :   are not correctly sorted
  :
  : - Allow FFA_MSG_SEND_DIRECT_REQ in the FFA proxy
  :
  : - Fix duplicate XNX field in the ID_AA64MMFR1_EL1
  :   writeable mask
  :
  : - Allocate PPIs and SGIs outside of the vcpu structure, allowing
  :   for smaller EL2 mapping and some flexibility in implementing
  :   more or less than 32 private IRQs.
  :
  : - Use bitmap_gather() instead of its open-coded equivalent
  :
  : - Make protected mode use hVHE if available
  :
  : - Purge stale mpidr_data if a vcpu is created after the MPIDR
  :   map has been created
  : .
  KVM: arm64: Destroy mpidr_data for 'late' vCPU creation
  KVM: arm64: Use hVHE in pKVM by default on CPUs with VHE support
  KVM: arm64: Fix hvhe/nvhe early alias parsing
  KVM: arm64: Convert kvm_mpidr_index() to bitmap_gather()
  KVM: arm64: vgic: Allocate private interrupts on demand
  KVM: arm64: Remove duplicated AA64MMFR1_EL1 XNX
  KVM: arm64: Remove FFA_MSG_SEND_DIRECT_REQ from the denylist
  KVM: arm64: Improve out-of-order sysreg table diagnostics
Signed-off-by: Marc Zyngier <maz@kernel.org>

e2815706

KVM: arm64: Destroy mpidr_data for 'late' vCPU creation · ce5d2448

Oliver Upton authored May 08, 2024

A particularly annoying userspace could create a vCPU after KVM has
computed mpidr_data for the VM, either by racing against VGIC
initialization or having a userspace irqchip.

In any case, this means mpidr_data no longer fully describes the VM, and
attempts to find the new vCPU with kvm_mpidr_to_vcpu() will fail. The
fix is to discard mpidr_data altogether, as it is only a performance
optimization and not required for correctness. In all likelihood KVM
will recompute the mappings when KVM_RUN is called on the new vCPU.

Note that reads of mpidr_data are not guarded by a lock; promote to RCU
to cope with the possibility of mpidr_data being invalidated at runtime.

Fixes: 54a8006d ("KVM: arm64: Fast-track kvm_mpidr_to_vcpu() when mpidr_data is available")
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240508071952.2035422-1-oliver.upton@linux.devSigned-off-by: Marc Zyngier <maz@kernel.org>

ce5d2448

KVM: arm64: Use hVHE in pKVM by default on CPUs with VHE support · 5053c3f0

Will Deacon authored May 01, 2024

The early command line parsing treats "kvm-arm.mode=protected" as an
alias for "id_aa64mmfr1.vh=0", forcing the use of nVHE so that the host
kernel runs at EL1 with the pKVM hypervisor at EL2.

With the introduction of hVHE support in ad744e8c ("arm64: Allow
arm64_sw.hvhe on command line"), the hypervisor can run using the EL2+0
translation regime. This is interesting for unusual CPUs that have VH
stuck to 1, but also because it opens the possibility of a hypervisor
"userspace" in the distant future which could be used to isolate vCPU
contexts in the hypervisor (see Marc's talk from KVM Forum 2022 [1]).

Repaint the "kvm-arm.mode=protected" alias to map to "arm64_sw.hvhe=1",
which will use hVHE on CPUs that support it and remain with nVHE
otherwise.

[1] https://www.youtube.com/watch?v=1F_Mf2j9eIoSigned-off-by: Will Deacon <will@kernel.org>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240501163400.15838-3-will@kernel.orgSigned-off-by: Marc Zyngier <maz@kernel.org>

5053c3f0

KVM: arm64: Fix hvhe/nvhe early alias parsing · 3c142f9d

Will Deacon authored May 01, 2024

Booting a kernel with "arm64_sw.hvhe=1 kvm-arm.mode=nvhe" on the
command-line results in KVM initialising using hVHE, whereas one might
expect the latter option to override the former.

Fix this by adding "arm64_sw.hvhe=0" to the alias expansion for
"kvm-arm.mode=nvhe".
Signed-off-by: Will Deacon <will@kernel.org>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240501163400.15838-2-will@kernel.orgSigned-off-by: Marc Zyngier <maz@kernel.org>

3c142f9d

03 May, 2024 7 commits

Merge branch kvm-arm64/pkvm-6.10 into kvmarm-master/next · 8540bd1b

Marc Zyngier authored May 03, 2024

* kvm-arm64/pkvm-6.10: (25 commits)
  : .
  : At last, a bunch of pKVM patches, courtesy of Fuad Tabba.
  : From the cover letter:
  :
  : "This series is a bit of a bombay-mix of patches we've been
  : carrying. There's no one overarching theme, but they do improve
  : the code by fixing existing bugs in pKVM, refactoring code to
  : make it more readable and easier to re-use for pKVM, or adding
  : functionality to the existing pKVM code upstream."
  : .
  KVM: arm64: Force injection of a data abort on NISV MMIO exit
  KVM: arm64: Restrict supported capabilities for protected VMs
  KVM: arm64: Refactor setting the return value in kvm_vm_ioctl_enable_cap()
  KVM: arm64: Document the KVM/arm64-specific calls in hypercalls.rst
  KVM: arm64: Rename firmware pseudo-register documentation file
  KVM: arm64: Reformat/beautify PTP hypercall documentation
  KVM: arm64: Clarify rationale for ZCR_EL1 value restored on guest exit
  KVM: arm64: Introduce and use predicates that check for protected VMs
  KVM: arm64: Add is_pkvm_initialized() helper
  KVM: arm64: Simplify vgic-v3 hypercalls
  KVM: arm64: Move setting the page as dirty out of the critical section
  KVM: arm64: Change kvm_handle_mmio_return() return polarity
  KVM: arm64: Fix comment for __pkvm_vcpu_init_traps()
  KVM: arm64: Prevent kmemleak from accessing .hyp.data
  KVM: arm64: Do not map the host fpsimd state to hyp in pKVM
  KVM: arm64: Rename __tlb_switch_to_{guest,host}() in VHE
  KVM: arm64: Support TLB invalidation in guest context
  KVM: arm64: Avoid BBM when changing only s/w bits in Stage-2 PTE
  KVM: arm64: Check for PTE validity when checking for executable/cacheable
  KVM: arm64: Avoid BUG-ing from the host abort path
  ...
Signed-off-by: Marc Zyngier <maz@kernel.org>

8540bd1b

Merge branch kvm-arm64/lpi-xa-cache into kvmarm-master/next · 3d5689e0

Marc Zyngier authored May 03, 2024

* kvm-arm64/lpi-xa-cache:
  : .
  : New and improved LPI translation cache from Oliver Upton.
  :
  : From the cover letter:
  :
  : "As discussed [*], here is the new take on the LPI translation cache,
  : migrating to an xarray indexed by (devid, eventid) per ITS.
  :
  : The end result is quite satisfying, as it becomes possible to rip out
  : other nasties such as the lpi_list_lock. To that end, patches 2-6 aren't
  : _directly_ related to the translation cache cleanup, but instead are
  : done to enable the cleanups at the end of the series.
  :
  : I changed out my test machine from the last time so the baseline has
  : moved a bit, but here are the results from the vgic_lpi_stress test:
  :
  : +----------------------------+------------+-------------------+
  : |       Configuration        |  v6.8-rc1  | v6.8-rc1 + series |
  : +----------------------------+------------+-------------------+
  : | -v 1 -d 1 -e 1 -i 1000000  | 2063296.81 |        1362602.35 |
  : | -v 16 -d 16 -e 16 -i 10000 |  610678.33 |        5200910.01 |
  : | -v 16 -d 16 -e 17 -i 10000 |  678361.53 |        5890675.51 |
  : | -v 32 -d 32 -e 1 -i 100000 |  580918.96 |        8304552.67 |
  : | -v 1 -d 1 -e 17 -i 1000    | 1512443.94 |         1425953.8 |
  : +----------------------------+------------+-------------------+
  :
  : Unlike last time, no dramatic regressions at any performance point. The
  : regression on a single interrupt stream is to be expected, as the
  : overheads of SRCU and two tree traversals (kvm_io_bus_get_dev(),
  : translation cache xarray) are likely greater than that of a linked-list
  : with a single node."
  : .
  KVM: selftests: Add stress test for LPI injection
  KVM: selftests: Use MPIDR_HWID_BITMASK from cputype.h
  KVM: selftests: Add helper for enabling LPIs on a redistributor
  KVM: selftests: Add a minimal library for interacting with an ITS
  KVM: selftests: Add quadword MMIO accessors
  KVM: selftests: Standardise layout of GIC frames
  KVM: selftests: Align with kernel's GIC definitions
  KVM: arm64: vgic-its: Get rid of the lpi_list_lock
  KVM: arm64: vgic-its: Rip out the global translation cache
  KVM: arm64: vgic-its: Use the per-ITS translation cache for injection
  KVM: arm64: vgic-its: Spin off helper for finding ITS by doorbell addr
  KVM: arm64: vgic-its: Maintain a translation cache per ITS
  KVM: arm64: vgic-its: Scope translation cache invalidations to an ITS
  KVM: arm64: vgic-its: Get rid of vgic_copy_lpi_list()
  KVM: arm64: vgic-debug: Use an xarray mark for debug iterator
  KVM: arm64: vgic-its: Walk LPI xarray in vgic_its_cmd_handle_movall()
  KVM: arm64: vgic-its: Walk LPI xarray in vgic_its_invall()
  KVM: arm64: vgic-its: Walk LPI xarray in its_sync_lpi_pending_table()
  KVM: Treat the device list as an rculist
Signed-off-by: Marc Zyngier <maz@kernel.org>

3d5689e0

Merge branch kvm-arm64/nv-eret-pauth into kvmarm-master/next · 2d38f439

Marc Zyngier authored May 03, 2024

* kvm-arm64/nv-eret-pauth:
  : .
  : Add NV support for the ERETAA/ERETAB instructions. From the cover letter:
  :
  : "Although the current upstream NV support has *some* support for
  : correctly emulating ERET, that support is only partial as it doesn't
  : support the ERETAA and ERETAB variants.
  :
  : Supporting these instructions was cast aside for a long time as it
  : involves implementing some form of PAuth emulation, something I wasn't
  : overly keen on. But I have reached a point where enough of the
  : infrastructure is there that it actually makes sense. So here it is!"
  : .
  KVM: arm64: nv: Work around lack of pauth support in old toolchains
  KVM: arm64: Drop trapping of PAuth instructions/keys
  KVM: arm64: nv: Advertise support for PAuth
  KVM: arm64: nv: Handle ERETA[AB] instructions
  KVM: arm64: nv: Add emulation for ERETAx instructions
  KVM: arm64: nv: Add kvm_has_pauth() helper
  KVM: arm64: nv: Reinject PAC exceptions caused by HCR_EL2.API==0
  KVM: arm64: nv: Handle HCR_EL2.{API,APK} independently
  KVM: arm64: nv: Honor HFGITR_EL2.ERET being set
  KVM: arm64: nv: Fast-track 'InHost' exception returns
  KVM: arm64: nv: Add trap forwarding for ERET and SMC
  KVM: arm64: nv: Configure HCR_EL2 for FEAT_NV2
  KVM: arm64: nv: Drop VCPU_HYP_CONTEXT flag
  KVM: arm64: Constraint PAuth support to consistent implementations
  KVM: arm64: Add helpers for ESR_ELx_ERET_ISS_ERET*
  KVM: arm64: Harden __ctxt_sys_reg() against out-of-range values
Signed-off-by: Marc Zyngier <maz@kernel.org>

2d38f439

Merge branch kvm-arm64/host_data into kvmarm-master/next · 34c0d5a6

Marc Zyngier authored May 03, 2024

* kvm-arm64/host_data:
  : .
  : Rationalise the host-specific data to live as part of the per-CPU state.
  :
  : From the cover letter:
  :
  : "It appears that over the years, we have accumulated a lot of cruft in
  : the kvm_vcpu_arch structure. Part of the gunk is data that is strictly
  : host CPU specific, and this result in two main problems:
  :
  : - the structure itself is stupidly large, over 8kB. With the
  :   arch-agnostic kvm_vcpu, we're above 10kB, which is insane. This has
  :   some ripple effects, as we need physically contiguous allocation to
  :   be able to map it at EL2 for !VHE. There is more to it though, as
  :   some data structures, although per-vcpu, could be allocated
  :   separately.
  :
  : - We lose track of the life-cycle of this data, because we're
  :   guaranteed that it will be around forever and we start relying on
  :   wrong assumptions. This is becoming a maintenance burden.
  :
  : This series rectifies some of these things, starting with the two main
  : offenders: debug and FP, a lot of which gets pushed out to the per-CPU
  : host structure. Indeed, their lifetime really isn't that of the vcpu,
  : but tied to the physical CPU the vpcu runs on.
  :
  : This results in a small reduction of the vcpu size, but mainly a much
  : clearer understanding of the life-cycle of these structures."
  : .
  KVM: arm64: Move management of __hyp_running_vcpu to load/put on VHE
  KVM: arm64: Exclude FP ownership from kvm_vcpu_arch
  KVM: arm64: Exclude host_fpsimd_state pointer from kvm_vcpu_arch
  KVM: arm64: Exclude mdcr_el2_host from kvm_vcpu_arch
  KVM: arm64: Exclude host_debug_data from vcpu_arch
  KVM: arm64: Add accessor for per-CPU state
Signed-off-by: Marc Zyngier <maz@kernel.org>

34c0d5a6

KVM: arm64: Move management of __hyp_running_vcpu to load/put on VHE · 9a393599

Marc Zyngier authored May 02, 2024

The per-CPU host context structure contains a __hyp_running_vcpu that
serves as a replacement for kvm_get_current_vcpu() in contexts where
we cannot make direct use of it (such as in the nVHE hypervisor).
Since there is a lot of common code between nVHE and VHE, the latter
also populates this field even if kvm_get_running_vcpu() always works.

We currently pretty inconsistent when populating __hyp_running_vcpu
to point to the currently running vcpu:

- on {n,h}VHE, we set __hyp_running_vcpu on entry to __kvm_vcpu_run
  and clear it on exit.

- on VHE, we set __hyp_running_vcpu on entry to __kvm_vcpu_run_vhe
  and never clear it, effectively leaving a dangling pointer...

VHE is obviously the odd one here. Although we could make it behave
just like nVHE, this wouldn't match the behaviour of KVM with VHE,
where the load phase is where most of the context-switch gets done.

So move all the __hyp_running_vcpu management to the VHE-specific
load/put phases, giving us a bit more sanity and matching the
behaviour of kvm_get_running_vcpu().
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240502154030.3011995-1-maz@kernel.orgSigned-off-by: Marc Zyngier <maz@kernel.org>

9a393599

KVM: arm64: Convert kvm_mpidr_index() to bitmap_gather() · 838d992b

Marc Zyngier authored May 02, 2024

Linux 6.9 has introduced new bitmap manipulation helpers, with
bitmap_gather() being of special interest, as it does exactly
what kvm_mpidr_index() is already doing.

Make the latter a wrapper around the former.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240502154247.3012042-1-maz@kernel.orgSigned-off-by: Marc Zyngier <maz@kernel.org>

838d992b

KVM: arm64: vgic: Allocate private interrupts on demand · 03b3d00a

Marc Zyngier authored May 02, 2024

Private interrupts are currently part of the CPU interface structure
that is part of each and every vcpu we create.

Currently, we have 32 of them per vcpu, resulting in a per-vcpu array
that is just shy of 4kB. On its own, that's no big deal, but it gets
in the way of other things:

- each vcpu gets mapped at EL2 on nVHE/hVHE configurations. This
  requires memory that is physically contiguous. However, the EL2
  code has no purpose looking at the interrupt structures and
  could do without them being mapped.

- supporting features such as EPPIs, which extend the number of
  private interrupts past the 32 limit would make the array
  even larger, even for VMs that do not use the EPPI feature.

Address these issues by moving the private interrupt array outside
of the vcpu, and replace it with a simple pointer. We take this
opportunity to make it obvious what gets initialised when, as
that path was remarkably opaque, and tighten the locking.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240502154545.3012089-1-maz@kernel.orgSigned-off-by: Marc Zyngier <maz@kernel.org>

03b3d00a

01 May, 2024 21 commits

KVM: arm64: Force injection of a data abort on NISV MMIO exit · 3b467b16

Marc Zyngier authored Apr 23, 2024

If a vcpu exits for a data abort with an invalid syndrome, the
expectations are that userspace has a chance to save the day if
it has requested to see such exits.

However, this is completely futile in the case of a protected VM,
as none of the state is available. In this particular case, inject
a data abort directly into the vcpu, consistent with what userspace
could do.

This also helps with pKVM, which discards all syndrome information when
forwarding data aborts that are not known to be MMIO.

Finally, document this tweak to the API.
Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240423150538.2103045-31-tabba@google.comSigned-off-by: Marc Zyngier <maz@kernel.org>

3b467b16

KVM: arm64: Restrict supported capabilities for protected VMs · 92536992

Fuad Tabba authored Apr 23, 2024

For practical reasons as well as security related ones, not all
capabilities are supported for protected VMs in pKVM.

Add a function that restricts the capabilities for protected VMs.
This behaves as an allow-list to ensure that future capabilities
are checked for compatibility and security before being allowed
for protected VMs.
Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240423150538.2103045-30-tabba@google.comSigned-off-by: Marc Zyngier <maz@kernel.org>

92536992

KVM: arm64: Refactor setting the return value in kvm_vm_ioctl_enable_cap() · 97a3dee1

Fuad Tabba authored Apr 23, 2024

Initialize r = -EINVAL to get rid of the error-path
initializations in kvm_vm_ioctl_enable_cap().

No functional change intended.
Suggested-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240423150538.2103045-29-tabba@google.comSigned-off-by: Marc Zyngier <maz@kernel.org>

97a3dee1

KVM: arm64: Document the KVM/arm64-specific calls in hypercalls.rst · 4dc8c9de

Will Deacon authored Apr 23, 2024

KVM/arm64 makes use of the SMCCC "Vendor Specific Hypervisor Service
Call Range" to expose KVM-specific hypercalls to guests in a
discoverable and extensible fashion.

Document the existence of this interface and the discovery hypercall.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240423150538.2103045-28-tabba@google.comSigned-off-by: Marc Zyngier <maz@kernel.org>

4dc8c9de

KVM: arm64: Rename firmware pseudo-register documentation file · af725804

Will Deacon authored Apr 23, 2024

In preparation for describing the guest view of KVM/arm64 hypercalls in
hypercalls.rst, move the existing contents of the file concerning the
firmware pseudo-registers elsewhere.

Cc: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240423150538.2103045-27-tabba@google.comSigned-off-by: Marc Zyngier <maz@kernel.org>

af725804

KVM: arm64: Reformat/beautify PTP hypercall documentation · 5a08146d

Will Deacon authored Apr 23, 2024

The PTP hypercall documentation doesn't produce the best-looking table
when formatting in HTML as all of the return value definitions end up
on the same line.

Reformat the PTP hypercall documentation to follow the formatting used
by hypercalls.rst.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240423150538.2103045-26-tabba@google.comSigned-off-by: Marc Zyngier <maz@kernel.org>

5a08146d

KVM: arm64: Clarify rationale for ZCR_EL1 value restored on guest exit · eef4ce63

Fuad Tabba authored Apr 23, 2024

Expand comment clarifying why the host value representing SVE
vector length being restored for ZCR_EL1 on guest exit isn't the
same as it was on guest entry.
Signed-off-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240423150538.2103045-21-tabba@google.comSigned-off-by: Marc Zyngier <maz@kernel.org>

eef4ce63

KVM: arm64: Introduce and use predicates that check for protected VMs · b6ed4fa9

Fuad Tabba authored Apr 23, 2024

In order to determine whether or not a VM or vcpu are protected,
introduce helpers to query this state. While at it, use the vcpu
helper to check vcpus protected state instead of the kvm one.
Co-authored-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240423150538.2103045-19-tabba@google.comSigned-off-by: Marc Zyngier <maz@kernel.org>

b6ed4fa9

KVM: arm64: Add is_pkvm_initialized() helper · d81a91af

Quentin Perret authored Apr 23, 2024

Add a helper allowing to check when the pkvm static key is enabled to
ease the introduction of pkvm hooks in other parts of the code.
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240423150538.2103045-18-tabba@google.comSigned-off-by: Marc Zyngier <maz@kernel.org>

d81a91af

KVM: arm64: Simplify vgic-v3 hypercalls · 948e1a53

Marc Zyngier authored Apr 23, 2024

Consolidate the GICv3 VMCR accessor hypercalls into the APR save/restore
hypercalls so that all of the EL2 GICv3 state is covered by a single pair
of hypercalls.
Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240423150538.2103045-17-tabba@google.comSigned-off-by: Marc Zyngier <maz@kernel.org>

948e1a53

KVM: arm64: Move setting the page as dirty out of the critical section · 9c30fc61

Fuad Tabba authored Apr 23, 2024

Move the unlock earlier in user_mem_abort() to shorten the
critical section. This also helps for future refactoring and
reuse of similar code.

This moves out marking the page as dirty outside of the critical
section. That code does not interact with the stage-2 page
tables, which the read lock in the critical section protects.
Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240423150538.2103045-16-tabba@google.comSigned-off-by: Marc Zyngier <maz@kernel.org>

9c30fc61

KVM: arm64: Change kvm_handle_mmio_return() return polarity · cc81b6df

Fuad Tabba authored Apr 23, 2024

Most exit handlers return <= 0 to indicate that the host needs to
handle the exit. Make kvm_handle_mmio_return() consistent with
the exit handlers in handle_exit(). This makes the code easier to
reason about, and makes it easier to add other handlers in future
patches.

No functional change intended.
Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240423150538.2103045-15-tabba@google.comSigned-off-by: Marc Zyngier <maz@kernel.org>

cc81b6df

KVM: arm64: Fix comment for __pkvm_vcpu_init_traps() · 40458a66

Fuad Tabba authored Apr 23, 2024

Fix the comment to clarify that __pkvm_vcpu_init_traps()
initializes traps for all VMs in protected mode, and not only
for protected VMs.
Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240423150538.2103045-14-tabba@google.comSigned-off-by: Marc Zyngier <maz@kernel.org>

40458a66

KVM: arm64: Prevent kmemleak from accessing .hyp.data · 06cacc9d

Quentin Perret authored Apr 23, 2024

We've added a .data section for the hypervisor, which kmemleak is
eager to parse. This clearly doesn't go well, so add the section
to kmemleak's block list.
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240423150538.2103045-13-tabba@google.comSigned-off-by: Marc Zyngier <maz@kernel.org>

06cacc9d

KVM: arm64: Do not map the host fpsimd state to hyp in pKVM · d48965bc

Fuad Tabba authored Apr 23, 2024

pKVM maintains its own state at EL2 for tracking the host fpsimd
state. Therefore, no need to map and share the host's view with
it.
Signed-off-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240423150538.2103045-12-tabba@google.comSigned-off-by: Marc Zyngier <maz@kernel.org>

d48965bc

KVM: arm64: Rename __tlb_switch_to_{guest,host}() in VHE · cfbdc546

Fuad Tabba authored Apr 23, 2024

Rename __tlb_switch_to_{guest,host}() to
{enter,exit}_vmid_context() in VHE code to maintain symmetry
between the nVHE and VHE TLB invalidations.

No functional change intended.
Suggested-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240423150538.2103045-11-tabba@google.comSigned-off-by: Marc Zyngier <maz@kernel.org>

cfbdc546

KVM: arm64: Support TLB invalidation in guest context · 58f3b0fc

Will Deacon authored Apr 23, 2024

Typically, TLB invalidation of guest stage-2 mappings using nVHE is
performed by a hypercall originating from the host. For the invalidation
instruction to be effective, therefore, __tlb_switch_to_{guest,host}()
swizzle the active stage-2 context around the TLBI instruction.

With guest-to-host memory sharing and unsharing hypercalls
originating from the guest under pKVM, there is need to support
both guest and host VMID invalidations issued from guest context.

Replace the __tlb_switch_to_{guest,host}() functions with a more general
{enter,exit}_vmid_context() implementation which supports being invoked
from guest context and acts as a no-op if the target context matches the
running context.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240423150538.2103045-10-tabba@google.comSigned-off-by: Marc Zyngier <maz@kernel.org>

58f3b0fc

KVM: arm64: Avoid BBM when changing only s/w bits in Stage-2 PTE · 7cc1d214

Will Deacon authored Apr 23, 2024

Break-before-make (BBM) can be expensive, as transitioning via an
invalid mapping (i.e. the "break" step) requires the completion of TLB
invalidation and can also cause other agents to fault concurrently on
the invalid mapping.

Since BBM is not required when changing only the software bits of a PTE,
avoid the sequence in this case and just update the PTE directly.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240423150538.2103045-9-tabba@google.comSigned-off-by: Marc Zyngier <maz@kernel.org>

7cc1d214

KVM: arm64: Check for PTE validity when checking for executable/cacheable · 96171cfa

Marc Zyngier authored Apr 23, 2024

Don't just assume that the PTE is valid when checking whether it
describes an executable or cacheable mapping.

This makes sure that we don't issue CMOs for invalid mappings.
Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240423150538.2103045-8-tabba@google.comSigned-off-by: Marc Zyngier <maz@kernel.org>

96171cfa

KVM: arm64: Avoid BUG-ing from the host abort path · 02949f36

Quentin Perret authored Apr 23, 2024

Under certain circumstances __get_fault_info() may resolve the faulting
address using the AT instruction. Given that this is being done outside
of the host lock critical section, it is racy and the resolution via AT
may fail. We currently BUG() in this situation, which is obviously less
than ideal. Moving the address resolution to the critical section may
have a performance impact, so let's keep it where it is, but bail out
and return to the host to try a second time.
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240423150538.2103045-7-tabba@google.comSigned-off-by: Marc Zyngier <maz@kernel.org>

02949f36

KVM: arm64: Issue CMOs when tearing down guest s2 pages · cb163016

Quentin Perret authored Apr 23, 2024

On the guest teardown path, pKVM will zero the pages used to back
the guest data structures before returning them to the host as
they may contain secrets (e.g. in the vCPU registers). However,
the zeroing is done using a cacheable alias, and CMOs are
missing, hence giving the host a potential opportunity to read
the original content of the guest structs from memory.

Fix this by issuing CMOs after zeroing the pages.
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240423150538.2103045-6-tabba@google.comSigned-off-by: Marc Zyngier <maz@kernel.org>

cb163016