Commits · 9bd8d7df1971c2ebdcaf4526cf7a3f4ea38d0ede · Kirill Smelkov / linux

07 Mar, 2024 5 commits

Merge branch kvm-arm64/vfio-normal-nc into kvmarm/next · 9bd8d7df

Oliver Upton authored Mar 07, 2024

* kvm-arm64/vfio-normal-nc:
  : Normal-NC support for vfio-pci @ stage-2, courtesy of Ankit Agrawal
  :
  : KVM's policy to date has been that any and all MMIO mapping at stage-2
  : is treated as Device-nGnRE. This is primarily done due to concerns of
  : the guest triggering uncontainable failures in the system if they manage
  : to tickle the device / memory system the wrong way, though this is
  : unnecessarily restrictive for devices that can be reasoned as 'safe'.
  :
  : Unsurprisingly, the Device-* mapping can really hurt the performance of
  : assigned devices that can handle Gathering, and can be an outright
  : correctness issue if the guest driver does unaligned accesses.
  :
  : Rather than opening the floodgates to the full ecosystem of devices that
  : can be exposed to VMs, take the conservative approach and allow PCI
  : devices to be mapped as Normal-NC since it has been determined to be
  : 'safe'.
  vfio: Convey kvm that the vfio-pci device is wc safe
  KVM: arm64: Set io memory s2 pte as normalnc for vfio pci device
  mm: Introduce new flag to indicate wc safe
  KVM: arm64: Introduce new flag for non-cacheable IO memory
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>

9bd8d7df

Merge branch kvm-arm64/lpi-xarray into kvmarm/next · 8dbc4110

Oliver Upton authored Mar 07, 2024

* kvm-arm64/lpi-xarray:
  : xarray-based representation of vgic LPIs
  :
  : KVM's linked-list of LPI state has proven to be a bottleneck in LPI
  : injection paths, due to lock serialization when acquiring / releasing a
  : reference on an IRQ.
  :
  : Start the tedious process of reworking KVM's LPI injection by replacing
  : the LPI linked-list with an xarray, leveraging this to allow RCU readers
  : to walk it outside of the spinlock.
  KVM: arm64: vgic: Don't acquire the lpi_list_lock in vgic_put_irq()
  KVM: arm64: vgic: Ensure the irq refcount is nonzero when taking a ref
  KVM: arm64: vgic: Rely on RCU protection in vgic_get_lpi()
  KVM: arm64: vgic: Free LPI vgic_irq structs in an RCU-safe manner
  KVM: arm64: vgic: Use atomics to count LPIs
  KVM: arm64: vgic: Get rid of the LPI linked-list
  KVM: arm64: vgic-its: Walk the LPI xarray in vgic_copy_lpi_list()
  KVM: arm64: vgic-v3: Iterate the xarray to find pending LPIs
  KVM: arm64: vgic: Use xarray to find LPI in vgic_get_lpi()
  KVM: arm64: vgic: Store LPIs in an xarray
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>

8dbc4110

Merge branch kvm-arm64/vm-configuration into kvmarm/next · 0d874858

Oliver Upton authored Mar 07, 2024

* kvm-arm64/vm-configuration: (29 commits)
  : VM configuration enforcement, courtesy of Marc Zyngier
  :
  : Userspace has gained the ability to control the features visible
  : through the ID registers, yet KVM didn't take this into account as the
  : effective feature set when determing trap / emulation behavior. This
  : series adds:
  :
  :  - Mechanism for testing the presence of a particular CPU feature in the
  :    guest's ID registers
  :
  :  - Infrastructure for computing the effective value of VNCR-backed
  :    registers, taking into account the RES0 / RES1 bits for a particular
  :    VM configuration
  :
  :  - Implementation of 'fine-grained UNDEF' controls that shadow the FGT
  :    register definitions.
  KVM: arm64: Don't initialize idreg debugfs w/ preemption disabled
  KVM: arm64: Fail the idreg iterator if idregs aren't initialized
  KVM: arm64: Make build-time check of RES0/RES1 bits optional
  KVM: arm64: Add debugfs file for guest's ID registers
  KVM: arm64: Snapshot all non-zero RES0/RES1 sysreg fields for later checking
  KVM: arm64: Make FEAT_MOPS UNDEF if not advertised to the guest
  KVM: arm64: Make AMU sysreg UNDEF if FEAT_AMU is not advertised to the guest
  KVM: arm64: Make PIR{,E0}_EL1 UNDEF if S1PIE is not advertised to the guest
  KVM: arm64: Make TLBI OS/Range UNDEF if not advertised to the guest
  KVM: arm64: Streamline save/restore of HFG[RW]TR_EL2
  KVM: arm64: Move existing feature disabling over to FGU infrastructure
  KVM: arm64: Propagate and handle Fine-Grained UNDEF bits
  KVM: arm64: Add Fine-Grained UNDEF tracking information
  KVM: arm64: Rename __check_nv_sr_forward() to triage_sysreg_trap()
  KVM: arm64: Use the xarray as the primary sysreg/sysinsn walker
  KVM: arm64: Register AArch64 system register entries with the sysreg xarray
  KVM: arm64: Always populate the trap configuration xarray
  KVM: arm64: nv: Move system instructions to their own sys_reg_desc array
  KVM: arm64: Drop the requirement for XARRAY_MULTI
  KVM: arm64: nv: Turn encoding ranges into discrete XArray stores
  ...
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>

0d874858

Merge branch kvm-arm64/misc into kvmarm/next · a040adfb

Oliver Upton authored Mar 07, 2024

* kvm-arm64/misc:
  : Miscellaneous updates
  :
  :  - Fix handling of features w/ nonzero safe values in set_id_regs
  :    selftest
  :
  :  - Cleanup the unused kern_hyp_va() asm macro
  :
  :  - Differentiate nVHE and hVHE in boot-time message
  :
  :  - Several selftests cleanups
  :
  :  - Drop bogus return value from kvm_arch_create_vm_debugfs()
  :
  :  - Make save/restore of SPE and TRBE control registers affect EL1 state
  :    in hVHE mode
  :
  :  - Typos
  KVM: arm64: Fix TRFCR_EL1/PMSCR_EL1 access in hVHE mode
  KVM: selftests: aarch64: Remove unused functions from vpmu test
  KVM: arm64: Fix typos
  KVM: Get rid of return value from kvm_arch_create_vm_debugfs()
  KVM: selftests: Print timer ctl register in ISTATUS assertion
  KVM: selftests: Fix GUEST_PRINTF() format warnings in ARM code
  KVM: arm64: removed unused kern_hyp_va asm macro
  KVM: arm64: add comments to __kern_hyp_va
  KVM: arm64: print Hyp mode
  KVM: arm64: selftests: Handle feature fields with nonzero minimum value correctly
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>

a040adfb

Merge branch kvm-arm64/feat_e2h0 into kvmarm/next · 262cd16e

Oliver Upton authored Mar 07, 2024

* kvm-arm64/feat_e2h0:
  : Support for FEAT_E2H0, courtesy of Marc Zyngier
  :
  : As described in the cover letter:
  :
  :   Since ARMv8.1, the architecture has grown the VHE feature, which makes
  :   EL2 a superset of EL1. With ARMv9.5 (and retroactively allowed from
  :   ARMv8.1), the architecture allows implementations to have VHE as the
  :   *only* implemented behaviour, meaning that HCR_EL2.E2H can be
  :   implemented as RES1. As a follow-up, HCR_EL2.NV1 can also be
  :   implemented as RES0, making the VHE-ness of the architecture
  :   recursive.
  :
  : This series adds support for detecting the architectural feature of E2H
  : being RES1, leveraging the existing infrastructure for handling
  : out-of-spec CPUs that are VHE-only. Additionally, the (incomplete) NV
  : infrastructure in KVM is updated to enforce E2H=1 for guest hypervisors
  : on implementations that do not support NV1.
  arm64: cpufeatures: Fix FEAT_NV check when checking for FEAT_NV1
  arm64: cpufeatures: Only check for NV1 if NV is present
  arm64: cpufeatures: Add missing ID_AA64MMFR4_EL1 to __read_sysreg_by_encoding()
  KVM: arm64: Handle Apple M2 as not having HCR_EL2.NV1 implemented
  KVM: arm64: Force guest's HCR_EL2.E2H RES1 when NV1 is not implemented
  KVM: arm64: Expose ID_AA64MMFR4_EL1 to guests
  arm64: Treat HCR_EL2.E2H as RES1 when ID_AA64MMFR4_EL1.E2H0 is negative
  arm64: cpufeature: Detect HCR_EL2.NV1 being RES0
  arm64: cpufeature: Add ID_AA64MMFR4_EL1 handling
  arm64: sysreg: Add layout for ID_AA64MMFR4_EL1
  arm64: cpufeature: Correctly display signed override values
  arm64: cpufeatures: Correctly handle signed values
  arm64: Add macro to compose a sysreg field value
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>

262cd16e

01 Mar, 2024 1 commit

KVM: arm64: Fix TRFCR_EL1/PMSCR_EL1 access in hVHE mode · 9a3bfb27

Marc Zyngier authored Feb 29, 2024

When running in hVHE mode, EL1 accesses are performed with the EL12
accessor, as we run with HCR_EL2.E2H=1.

Unfortunately, both PMSCR_EL1 and TRFCR_EL1 are used with the
EL1 accessor, meaning that we actually affect the EL2 state. Duh.

Switch to using the {read,write}_sysreg_el1() helpers that will do
the right thing in all circumstances.

Note that the 'Fixes:' tag doesn't represent the point where the bug
was introduced (there is no such point), but the first practical point
where the hVHE feature is usable.

Cc: James Clark <james.clark@arm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Fixes: 38cba550 ("KVM: arm64: Force HCR_E2H in guest context when ARM64_KVM_HVHE is set")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240229145417.3606279-1-maz@kernel.orgSigned-off-by: Oliver Upton <oliver.upton@linux.dev>

9a3bfb27

29 Feb, 2024 1 commit

KVM: selftests: aarch64: Remove unused functions from vpmu test · 43b3bedb

Raghavendra Rao Ananta authored Nov 22, 2023

vpmu_counter_access's disable_counter() carries a bug that disables
all the counters that are enabled, instead of just the requested one.
Fortunately, it's not an issue as there are no callers of it. Hence,
instead of fixing it, remove the definition entirely.

Remove enable_counter() as it's unused as well.
Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Link: https://lore.kernel.org/r/20231122221526.2750966-1-rananta@google.comSigned-off-by: Oliver Upton <oliver.upton@linux.dev>

43b3bedb

27 Feb, 2024 2 commits

KVM: arm64: Don't initialize idreg debugfs w/ preemption disabled · 5c1ebe9a

Oliver Upton authored Feb 27, 2024

Testing KVM with DEBUG_ATOMIC_SLEEP enabled doesn't get far before hitting the
first splat:

  BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1578
  in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 13062, name: vgic_lpi_stress
  preempt_count: 1, expected: 0
  2 locks held by vgic_lpi_stress/13062:
   #0: ffff080084553240 (&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0xc0/0x13f0
   #1: ffff800080485f08 (&kvm->arch.config_lock){+.+.}-{3:3}, at: kvm_arch_vcpu_ioctl+0xd60/0x1788
  CPU: 19 PID: 13062 Comm: vgic_lpi_stress Tainted: G        W  O       6.8.0-dbg-DEV #1
  Call trace:
   dump_backtrace+0xf8/0x148
   show_stack+0x20/0x38
   dump_stack_lvl+0xb4/0xf8
   dump_stack+0x18/0x40
   __might_resched+0x248/0x2a0
   __might_sleep+0x50/0x88
   down_write+0x30/0x150
   start_creating+0x90/0x1a0
   __debugfs_create_file+0x5c/0x1b0
   debugfs_create_file+0x34/0x48
   kvm_reset_sys_regs+0x120/0x1e8
   kvm_reset_vcpu+0x148/0x270
   kvm_arch_vcpu_ioctl+0xddc/0x1788
   kvm_vcpu_ioctl+0xb6c/0x13f0
   __arm64_sys_ioctl+0x98/0xd8
   invoke_syscall+0x48/0x108
   el0_svc_common+0xb4/0xf0
   do_el0_svc+0x24/0x38
   el0_svc+0x54/0x128
   el0t_64_sync_handler+0x68/0xc0
   el0t_64_sync+0x1a8/0x1b0

kvm_reset_vcpu() disables preemption as it needs to unload vCPU state
from the CPU to twiddle with it, which subsequently explodes when
taking the parent inode's rwsem while creating the idreg debugfs file.

Fix it by moving the initialization to kvm_arch_create_vm_debugfs().

Fixes: 89176658 ("KVM: arm64: Add debugfs file for guest's ID registers")
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240227094115.1723330-3-oliver.upton@linux.devSigned-off-by: Oliver Upton <oliver.upton@linux.dev>

5c1ebe9a

KVM: arm64: Fail the idreg iterator if idregs aren't initialized · 29ef55ce

Oliver Upton authored Feb 27, 2024

Return an error to userspace if the VM's ID register values haven't been
initialized in preparation for changing the debugfs file initialization
order.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240227094115.1723330-2-oliver.upton@linux.devSigned-off-by: Oliver Upton <oliver.upton@linux.dev>

29ef55ce

24 Feb, 2024 5 commits

vfio: Convey kvm that the vfio-pci device is wc safe · a39d3a96

Ankit Agrawal authored Feb 24, 2024

The VM_ALLOW_ANY_UNCACHED flag is implemented for ARM64,
allowing KVM stage 2 device mapping attributes to use Normal-NC
rather than DEVICE_nGnRE, which allows guest mappings supporting
write-combining attributes (WC). ARM does not architecturally
guarantee this is safe, and indeed some MMIO regions like the GICv2
VCPU interface can trigger uncontained faults if Normal-NC is used.

To safely use VFIO in KVM the platform must guarantee full safety
in the guest where no action taken against a MMIO mapping can
trigger an uncontained failure. The expectation is that most VFIO PCI
platforms support this for both mapping types, at least in common
flows, based on some expectations of how PCI IP is integrated. So
make vfio-pci set the VM_ALLOW_ANY_UNCACHED flag.
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
Link: https://lore.kernel.org/r/20240224150546.368-5-ankita@nvidia.comSigned-off-by: Oliver Upton <oliver.upton@linux.dev>

a39d3a96

KVM: arm64: Set io memory s2 pte as normalnc for vfio pci device · 8c47ce3e

Ankit Agrawal authored Feb 24, 2024

To provide VM with the ability to get device IO memory with NormalNC
property, map device MMIO in KVM for ARM64 at stage2 as NormalNC.
Having NormalNC S2 default puts guests in control (based on [1],
"Combining stage 1 and stage 2 memory type attributes") of device
MMIO regions memory mappings. The rules are summarized below:
([(S1) - stage1], [(S2) - stage 2])

S1           |  S2           | Result
NORMAL-WB    |  NORMAL-NC    | NORMAL-NC
NORMAL-WT    |  NORMAL-NC    | NORMAL-NC
NORMAL-NC    |  NORMAL-NC    | NORMAL-NC
DEVICE<attr> |  NORMAL-NC    | DEVICE<attr>

Still this cannot be generalized to non PCI devices such as GICv2.
There is insufficient information and uncertainity in the behavior
of non PCI driver. A driver must indicate support using the
new flag VM_ALLOW_ANY_UNCACHED.

Adapt KVM to make use of the flag VM_ALLOW_ANY_UNCACHED as indicator to
activate the S2 setting to NormalNc.

[1] section D8.5.5 of DDI0487J_a_a-profile_architecture_reference_manual.pdf
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
Link: https://lore.kernel.org/r/20240224150546.368-4-ankita@nvidia.comSigned-off-by: Oliver Upton <oliver.upton@linux.dev>

8c47ce3e

mm: Introduce new flag to indicate wc safe · 5c656fcd

Ankit Agrawal authored Feb 24, 2024

The VM_ALLOW_ANY_UNCACHED flag is implemented for ARM64, allowing KVM
stage 2 device mapping attributes to use NormalNC rather than
DEVICE_nGnRE, which allows guest mappings supporting write-combining
attributes (WC). ARM does not architecturally guarantee this is safe,
and indeed some MMIO regions like the GICv2 VCPU interface can trigger
uncontained faults if NormalNC is used.

Even worse, the expectation is that there are platforms where even
DEVICE_nGnRE can allow uncontained faults in corner cases. Unfortunately
existing ARM IP requires platform integration to take responsibility to
prevent this.

To safely use VFIO in KVM the platform must guarantee full safety in the
guest where no action taken against a MMIO mapping can trigger an
uncontained failure. The assumption is that most VFIO PCI platforms
support this for both mapping types, at least in common flows, based
on some expectations of how PCI IP is integrated. This can be enabled
more broadly, for instance into vfio-platform drivers, but only after
the platform vendor completes auditing for safety.

The VMA flag VM_ALLOW_ANY_UNCACHED was found to be the simplest and
cleanest way to communicate the information from VFIO to KVM that
mapping the region in S2 as NormalNC is safe. KVM consumes it to
activate the code that does the S2 mapping as NormalNC.
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
Link: https://lore.kernel.org/r/20240224150546.368-3-ankita@nvidia.comSigned-off-by: Oliver Upton <oliver.upton@linux.dev>

5c656fcd

KVM: arm64: Introduce new flag for non-cacheable IO memory · c034ec84

Ankit Agrawal authored Feb 24, 2024

Currently, KVM for ARM64 maps at stage 2 memory that is considered device
(i.e. it is not RAM) with DEVICE_nGnRE memory attributes; this setting
overrides (as per the ARM architecture [1]) any device MMIO mapping
present at stage 1, resulting in a set-up whereby a guest operating
system cannot determine device MMIO mapping memory attributes on its
own but it is always overridden by the KVM stage 2 default.

This set-up does not allow guest operating systems to select device
memory attributes independently from KVM stage-2 mappings
(refer to [1], "Combining stage 1 and stage 2 memory type attributes"),
which turns out to be an issue in that guest operating systems
(e.g. Linux) may request to map devices MMIO regions with memory
attributes that guarantee better performance (e.g. gathering
attribute - that for some devices can generate larger PCIe memory
writes TLPs) and specific operations (e.g. unaligned transactions)
such as the NormalNC memory type.

The default device stage 2 mapping was chosen in KVM for ARM64 since
it was considered safer (i.e. it would not allow guests to trigger
uncontained failures ultimately crashing the machine) but this
turned out to be asynchronous (SError) defeating the purpose.

Failures containability is a property of the platform and is independent
from the memory type used for MMIO device memory mappings.

Actually, DEVICE_nGnRE memory type is even more problematic than
Normal-NC memory type in terms of faults containability in that e.g.
aborts triggered on DEVICE_nGnRE loads cannot be made, architecturally,
synchronous (i.e. that would imply that the processor should issue at
most 1 load transaction at a time - it cannot pipeline them - otherwise
the synchronous abort semantics would break the no-speculation attribute
attached to DEVICE_XXX memory).

This means that regardless of the combined stage1+stage2 mappings a
platform is safe if and only if device transactions cannot trigger
uncontained failures and that in turn relies on platform capabilities
and the device type being assigned (i.e. PCIe AER/DPC error containment
and RAS architecture[3]); therefore the default KVM device stage 2
memory attributes play no role in making device assignment safer
for a given platform (if the platform design adheres to design
guidelines outlined in [3]) and therefore can be relaxed.

For all these reasons, relax the KVM stage 2 device memory attributes
from DEVICE_nGnRE to Normal-NC.

The NormalNC was chosen over a different Normal memory type default
at stage-2 (e.g. Normal Write-through) to avoid cache allocation/snooping.

Relaxing S2 KVM device MMIO mappings to Normal-NC is not expected to
trigger any issue on guest device reclaim use cases either (i.e. device
MMIO unmap followed by a device reset) at least for PCIe devices, in that
in PCIe a device reset is architected and carried out through PCI config
space transactions that are naturally ordered with respect to MMIO
transactions according to the PCI ordering rules.

Having Normal-NC S2 default puts guests in control (thanks to
stage1+stage2 combined memory attributes rules [1]) of device MMIO
regions memory mappings, according to the rules described in [1]
and summarized here ([(S1) - stage1], [(S2) - stage 2]):

S1           |  S2           | Result
NORMAL-WB    |  NORMAL-NC    | NORMAL-NC
NORMAL-WT    |  NORMAL-NC    | NORMAL-NC
NORMAL-NC    |  NORMAL-NC    | NORMAL-NC
DEVICE<attr> |  NORMAL-NC    | DEVICE<attr>

It is worth noting that currently, to map devices MMIO space to user
space in a device pass-through use case the VFIO framework applies memory
attributes derived from pgprot_noncached() settings applied to VMAs, which
result in device-nGnRnE memory attributes for the stage-1 VMM mappings.

This means that a userspace mapping for device MMIO space carried
out with the current VFIO framework and a guest OS mapping for the same
MMIO space may result in a mismatched alias as described in [2].

Defaulting KVM device stage-2 mappings to Normal-NC attributes does not
change anything in this respect, in that the mismatched aliases would
only affect (refer to [2] for a detailed explanation) ordering between
the userspace and GuestOS mappings resulting stream of transactions
(i.e. it does not cause loss of property for either stream of
transactions on its own), which is harmless given that the userspace
and GuestOS access to the device is carried out through independent
transactions streams.

A Normal-NC flag is not present today. So add a new kvm_pgtable_prot
(KVM_PGTABLE_PROT_NORMAL_NC) flag for it, along with its
corresponding PTE value 0x5 (0b101) determined from [1].

Lastly, adapt the stage2 PTE property setter function
(stage2_set_prot_attr) to handle the NormalNC attribute.

The entire discussion leading to this patch series may be followed through
the following links.
Link: https://lore.kernel.org/all/20230907181459.18145-3-ankita@nvidia.com
Link: https://lore.kernel.org/r/20231205033015.10044-1-ankita@nvidia.com

[1] section D8.5.5 - DDI0487J_a_a-profile_architecture_reference_manual.pdf
[2] section B2.8 - DDI0487J_a_a-profile_architecture_reference_manual.pdf
[3] sections 1.7.7.3/1.8.5.2/appendix C - DEN0029H_SBSA_7.1.pdf
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Will Deacon <will@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
Link: https://lore.kernel.org/r/20240224150546.368-2-ankita@nvidia.comSigned-off-by: Oliver Upton <oliver.upton@linux.dev>

c034ec84

KVM: arm64: Fix typos · 75841d89

Bjorn Helgaas authored Jan 03, 2024

Fix typos, most reported by "codespell arch/arm64".  Only touches comments,
no code changes.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: kvmarm@lists.linux.dev
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20240103231605.1801364-6-helgaas@kernel.orgSigned-off-by: Oliver Upton <oliver.upton@linux.dev>

75841d89

23 Feb, 2024 11 commits

KVM: arm64: vgic: Don't acquire the lpi_list_lock in vgic_put_irq() · e27f2d56

Oliver Upton authored Feb 21, 2024

The LPI xarray's xa_lock is sufficient for synchronizing writers when
freeing a given LPI. Furthermore, readers can only take a new reference
on an IRQ if it was already nonzero.

Stop taking the lpi_list_lock unnecessarily and get rid of
__vgic_put_lpi_locked().
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240221054253.3848076-11-oliver.upton@linux.devSigned-off-by: Oliver Upton <oliver.upton@linux.dev>

e27f2d56

KVM: arm64: vgic: Ensure the irq refcount is nonzero when taking a ref · 50ac89bb

Oliver Upton authored Feb 21, 2024

It will soon be possible for get() and put() calls to happen in
parallel, which means in most cases we must ensure the refcount is
nonzero when taking a new reference. Switch to using
vgic_try_get_irq_kref() where necessary, and document the few conditions
where an IRQ's refcount is guaranteed to be nonzero.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240221054253.3848076-10-oliver.upton@linux.devSigned-off-by: Oliver Upton <oliver.upton@linux.dev>

50ac89bb

KVM: arm64: vgic: Rely on RCU protection in vgic_get_lpi() · 864d4304

Oliver Upton authored Feb 21, 2024

Stop acquiring the lpi_list_lock in favor of RCU for protecting
the read-side critical section in vgic_get_lpi(). In order for this to
be safe, we also need to be careful not to take a reference on an irq
with a refcount of 0, as it is about to be freed.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240221054253.3848076-9-oliver.upton@linux.devSigned-off-by: Oliver Upton <oliver.upton@linux.dev>

864d4304

KVM: arm64: vgic: Free LPI vgic_irq structs in an RCU-safe manner · a5c7f011

Oliver Upton authored Feb 21, 2024

Free the vgic_irq structs in an RCU-safe manner to allow reads of the
LPI configuration data to happen in parallel with the release of LPIs.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240221054253.3848076-8-oliver.upton@linux.devSigned-off-by: Oliver Upton <oliver.upton@linux.dev>

a5c7f011

KVM: arm64: vgic: Use atomics to count LPIs · 05f4d4f5

Oliver Upton authored Feb 21, 2024

Switch to using atomics for LPI accounting, allowing vgic_irq references
to be dropped in parallel.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240221054253.3848076-7-oliver.upton@linux.devSigned-off-by: Oliver Upton <oliver.upton@linux.dev>

05f4d4f5

KVM: arm64: vgic: Get rid of the LPI linked-list · 9880835a

Oliver Upton authored Feb 21, 2024

All readers of LPI configuration have been transitioned to use the LPI
xarray. Get rid of the linked-list altogether.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240221054253.3848076-6-oliver.upton@linux.devSigned-off-by: Oliver Upton <oliver.upton@linux.dev>

9880835a

KVM: arm64: vgic-its: Walk the LPI xarray in vgic_copy_lpi_list() · 2798683b

Oliver Upton authored Feb 21, 2024

Start iterating the LPI xarray in anticipation of removing the LPI
linked-list.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240221054253.3848076-5-oliver.upton@linux.devSigned-off-by: Oliver Upton <oliver.upton@linux.dev>

2798683b

KVM: arm64: vgic-v3: Iterate the xarray to find pending LPIs · 49f0a468

Oliver Upton authored Feb 21, 2024

Start walking the LPI xarray to find pending LPIs in preparation for
the removal of the LPI linked-list. Note that the 'basic' iterator
is chosen here as each iteration needs to drop the xarray read lock
(RCU) as reads/writes to guest memory can potentially block.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240221054253.3848076-4-oliver.upton@linux.devSigned-off-by: Oliver Upton <oliver.upton@linux.dev>

49f0a468

KVM: arm64: vgic: Use xarray to find LPI in vgic_get_lpi() · 5a021df7

Oliver Upton authored Feb 21, 2024

Iterating over the LPI linked-list is less than ideal when the desired
index is already known. Use the INTID to index the LPI xarray instead.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240221054253.3848076-3-oliver.upton@linux.devSigned-off-by: Oliver Upton <oliver.upton@linux.dev>

5a021df7

KVM: arm64: vgic: Store LPIs in an xarray · 1d6f83f6

Oliver Upton authored Feb 21, 2024

Using a linked-list for LPIs is less than ideal as it of course requires
iterative searches to find a particular entry. An xarray is a better
data structure for this use case, as it provides faster searches and can
still handle a potentially sparse range of INTID allocations.

Start by storing LPIs in an xarray, punting usage of the xarray to a
subsequent change. The observant among you will notice that we added yet
another lock to the chain of locking order rules; document the ordering
of the xa_lock. Don't worry, we'll get rid of the lpi_list_lock one
day...
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240221054253.3848076-2-oliver.upton@linux.devSigned-off-by: Oliver Upton <oliver.upton@linux.dev>

1d6f83f6

KVM: Get rid of return value from kvm_arch_create_vm_debugfs() · 284851ee

Oliver Upton authored Feb 16, 2024

The general expectation with debugfs is that any initialization failure
is nonfatal. Nevertheless, kvm_arch_create_vm_debugfs() allows
implementations to return an error and kvm_create_vm_debugfs() allows
that to fail VM creation.

Change to a void return to discourage architectures from making debugfs
failures fatal for the VM. Seems like everyone already had the right
idea, as all implementations already return 0 unconditionally.
Acked-by: Marc Zyngier <maz@kernel.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20240216155941.2029458-1-oliver.upton@linux.devSigned-off-by: Oliver Upton <oliver.upton@linux.dev>

284851ee

22 Feb, 2024 1 commit

KVM: arm64: Make build-time check of RES0/RES1 bits optional · 99101dda

Marc Zyngier authored Feb 22, 2024

In order to ease the transition towards a state of absolute
paranoia where all RES0/RES1 bits gets checked against what
KVM know of them, make the checks optional and guarded by a
config symbol (CONFIG_KVM_ARM64_RES_BITS_PARANOIA) default to n.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/kvm/87frxka7ud.wl-maz@kernel.org/Signed-off-by: Oliver Upton <oliver.upton@linux.dev>

99101dda

19 Feb, 2024 14 commits

KVM: arm64: Add debugfs file for guest's ID registers · 89176658

Marc Zyngier authored Feb 14, 2024

Debugging ID register setup can be a complicated affair. Give the
kernel hacker a way to dump that state in an easy to parse way.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240214131827.2856277-27-maz@kernel.orgSigned-off-by: Oliver Upton <oliver.upton@linux.dev>

89176658

KVM: arm64: Snapshot all non-zero RES0/RES1 sysreg fields for later checking · b80b701d

Marc Zyngier authored Feb 14, 2024

As KVM now strongly relies on accurately handling the RES0/RES1 bits
on a number of paths, add a compile-time checker that will blow in
the face of the innocent bystander, should they try to sneak in an
update that changes any of these RES0/RES1 fields.

It is expected that such an update will come with the relevant
KVM update if needed.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240214131827.2856277-26-maz@kernel.orgSigned-off-by: Oliver Upton <oliver.upton@linux.dev>

b80b701d

KVM: arm64: Make FEAT_MOPS UNDEF if not advertised to the guest · 84de212d

Marc Zyngier authored Feb 14, 2024

We unconditionally enable FEAT_MOPS, which is obviously wrong.

So let's only do that when it is advertised to the guest.
Which means we need to rely on a per-vcpu HCRX_EL2 shadow register.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Link: https://lore.kernel.org/r/20240214131827.2856277-25-maz@kernel.orgSigned-off-by: Oliver Upton <oliver.upton@linux.dev>

84de212d

KVM: arm64: Make AMU sysreg UNDEF if FEAT_AMU is not advertised to the guest · b03e8bb5

Marc Zyngier authored Feb 14, 2024

No AMU? No AMU! IF we see an AMU-related trap, let's turn it into
an UNDEF!
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240214131827.2856277-24-maz@kernel.orgSigned-off-by: Oliver Upton <oliver.upton@linux.dev>

b03e8bb5

KVM: arm64: Make PIR{,E0}_EL1 UNDEF if S1PIE is not advertised to the guest · 58627b72

Marc Zyngier authored Feb 14, 2024

As part of the ongoing effort to honor the guest configuration,
add the necessary checks to make PIR_EL1 and co UNDEF if not
advertised to the guest, and avoid context switching them.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Link: https://lore.kernel.org/r/20240214131827.2856277-23-maz@kernel.orgSigned-off-by: Oliver Upton <oliver.upton@linux.dev>

58627b72

KVM: arm64: Make TLBI OS/Range UNDEF if not advertised to the guest · 8ecdccb9

Marc Zyngier authored Feb 14, 2024

Outer Shareable and Range TLBI instructions shouldn't be made available
to the guest if they are not advertised. Use FGU to disable those,
and set HCR_EL2.TLBIOS in the case the host doesn't have FGT. Note
that in that later case, we cannot efficiently disable TLBI Range
instructions, as this would require to trap all TLBIs.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Link: https://lore.kernel.org/r/20240214131827.2856277-22-maz@kernel.orgSigned-off-by: Oliver Upton <oliver.upton@linux.dev>

8ecdccb9

KVM: arm64: Streamline save/restore of HFG[RW]TR_EL2 · d196c20c

Marc Zyngier authored Feb 14, 2024

The way we save/restore HFG[RW]TR_EL2 can now be simplified, and
the Ampere erratum hack is the only thing that still stands out.
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240214131827.2856277-21-maz@kernel.orgSigned-off-by: Oliver Upton <oliver.upton@linux.dev>

d196c20c

KVM: arm64: Move existing feature disabling over to FGU infrastructure · c5bac1ef

Marc Zyngier authored Feb 14, 2024

We already trap a bunch of existing features for the purpose of
disabling them (MAIR2, POR, ACCDATA, SME...).

Let's move them over to our brand new FGU infrastructure.
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240214131827.2856277-20-maz@kernel.orgSigned-off-by: Oliver Upton <oliver.upton@linux.dev>

c5bac1ef

KVM: arm64: Propagate and handle Fine-Grained UNDEF bits · f5a5a406

Marc Zyngier authored Feb 14, 2024

In order to correctly honor our FGU bits, they must be converted
into a set of FGT bits. They get merged as part of the existing
FGT setting.

Similarly, the UNDEF injection phase takes place when handling
the trap.

This results in a bit of rework in the FGT macros in order to
help with the code generation, as burying per-CPU accesses in
macros results in a lot of expansion, not to mention the vcpu->kvm
access on nvhe (kern_hyp_va() is not optimisation-friendly).
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240214131827.2856277-19-maz@kernel.orgSigned-off-by: Oliver Upton <oliver.upton@linux.dev>

f5a5a406

KVM: arm64: Add Fine-Grained UNDEF tracking information · 2fd8f31c

Marc Zyngier authored Feb 14, 2024

In order to efficiently handle system register access being disabled,
and this resulting in an UNDEF exception being injected, we introduce
the (slightly dubious) concept of Fine-Grained UNDEF, modeled after
the architectural Fine-Grained Traps.

For each FGT group, we keep a 64 bit word that has the exact same
bit assignment as the corresponding FGT register, where a 1 indicates
that trapping this register should result in an UNDEF exception being
reinjected.

So far, nothing populates this information, nor sets the corresponding
trap bits.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240214131827.2856277-18-maz@kernel.orgSigned-off-by: Oliver Upton <oliver.upton@linux.dev>

2fd8f31c

KVM: arm64: Rename __check_nv_sr_forward() to triage_sysreg_trap() · 085eabaa

Marc Zyngier authored Feb 14, 2024

__check_nv_sr_forward() is not specific to NV anymore, and does
a lot more. Rename it to triage_sysreg_trap(), making it plain
that its role is to handle where an exception is to be handled.
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240214131827.2856277-17-maz@kernel.orgSigned-off-by: Oliver Upton <oliver.upton@linux.dev>

085eabaa

KVM: arm64: Use the xarray as the primary sysreg/sysinsn walker · cc5f84fb

Marc Zyngier authored Feb 14, 2024

Since we always start sysreg/sysinsn handling by searching the
xarray, use it as the source of the index in the correct sys_reg_desc
array.

This allows some cleanup, such as moving the handling of unknown
sysregs in a single location.
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240214131827.2856277-16-maz@kernel.orgSigned-off-by: Oliver Upton <oliver.upton@linux.dev>

cc5f84fb

KVM: arm64: Register AArch64 system register entries with the sysreg xarray · 19f3e7ea

Marc Zyngier authored Feb 14, 2024

In order to reduce the number of lookups that we have to perform
when handling a sysreg, register each AArch64 sysreg descriptor
with the global xarray. The index of the descriptor is stored
as a 10 bit field in the data word.

Subsequent patches will retrieve and use the stored index.
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240214131827.2856277-15-maz@kernel.orgSigned-off-by: Oliver Upton <oliver.upton@linux.dev>

19f3e7ea

KVM: arm64: Always populate the trap configuration xarray · 7fd498f4

Marc Zyngier authored Feb 14, 2024

As we are going to rely more and more on the global xarray that
contains the trap configuration, always populate it, even in the
non-NV case.
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240214131827.2856277-14-maz@kernel.orgSigned-off-by: Oliver Upton <oliver.upton@linux.dev>

7fd498f4