Commits · 4b5f67120a88c713b82907d55a767693382e9e9d · Kirill Smelkov / linux

26 Jul, 2024 14 commits

KVM: extend kvm_range_has_memory_attributes() to check subset of attributes · 4b5f6712

Paolo Bonzini authored Jul 11, 2024

While currently there is no other attribute than KVM_MEMORY_ATTRIBUTE_PRIVATE,
KVM code such as kvm_mem_is_private() is written to expect their existence.
Allow using kvm_range_has_memory_attributes() as a multi-page version of
kvm_mem_is_private(), without it breaking later when more attributes are
introduced.
Reviewed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

4b5f6712

KVM: cleanup and add shortcuts to kvm_range_has_memory_attributes() · e300614f

Paolo Bonzini authored Jul 11, 2024

Use a guard to simplify early returns, and add two more easy
shortcuts.  If the requested attributes are invalid, the attributes
xarray will never show them as set.  And if testing a single page,
kvm_get_memory_attributes() is more efficient.
Reviewed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

e300614f

KVM: guest_memfd: move check for already-populated page to common code · de802524

Paolo Bonzini authored Jul 11, 2024

Do not allow populating the same page twice with startup data.  In the
case of SEV-SNP, for example, the firmware does not allow it anyway,
since the launch-update operation is only possible on pages that are
still shared in the RMP.

Even if it worked, kvm_gmem_populate()'s callback is meant to have side
effects such as updating launch measurements, and updating the same
page twice is unlikely to have the desired results.

Races between calls to the ioctl are not possible because
kvm_gmem_populate() holds slots_lock and the VM should not be running.
But again, even if this worked on other confidential computing technology,
it doesn't matter to guest_memfd.c whether this is something fishy
such as missing synchronization in userspace, or rather something
intentional.  One of the racers wins, and the page is initialized by
either kvm_gmem_prepare_folio() or kvm_gmem_populate().

Anyway, out of paranoia, adjust sev_gmem_post_populate() anyway to use
the same errno that kvm_gmem_populate() is using.
Reviewed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

de802524

KVM: remove kvm_arch_gmem_prepare_needed() · 7239ed74

Paolo Bonzini authored Jul 11, 2024

It is enough to return 0 if a guest need not do any preparation.
This is in fact how sev_gmem_prepare() works for non-SNP guests,
and it extends naturally to Intel hosts: the x86 callback for
gmem_prepare is optional and returns 0 if not defined.
Reviewed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

7239ed74

KVM: guest_memfd: make kvm_gmem_prepare_folio() operate on a single struct kvm · 6dd761d9

Paolo Bonzini authored Jul 11, 2024

This is now possible because preparation is done by kvm_gmem_get_pfn()
instead of fallocate().  In practice this is not a limitation, because
even though guest_memfd can be bound to multiple struct kvm, for
hardware implementations of confidential computing only one guest
(identified by an ASID on SEV-SNP, or an HKID on TDX) will be able
to access it.

In the case of intra-host migration (not implemented yet for SEV-SNP,
but we can use SEV-ES as an idea of how it will work), the new struct
kvm inherits the same ASID and preparation need not be repeated.
Reviewed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

6dd761d9

KVM: guest_memfd: delay kvm_gmem_prepare_folio() until the memory is passed to the guest · b8552431

Paolo Bonzini authored Jul 11, 2024

Initializing the contents of the folio on fallocate() is unnecessarily
restrictive. It means that the page is registered with the firmware and
then it cannot be touched anymore. In particular, this loses the
possibility of using fallocate() to pre-allocate the page for SEV-SNP
guests, because kvm_arch_gmem_prepare() then fails.

It's only when the guest actually accesses the page (and therefore
kvm_gmem_get_pfn() is called) that the page must be cleared from any
stale host data and registered with the firmware. The up-to-date flag
is clear if this has to be done (i.e. it is the first access and
kvm_gmem_populate() has not been called).

All in all, there are enough differences between kvm_gmem_get_pfn() and
kvm_gmem_populate(), that it's better to separate the two flows completely.
Extract the bulk of kvm_gmem_get_folio(), which take a folio and end up
setting its up-to-date flag, to a new function kvm_gmem_prepare_folio();
these are now done only by the non-__-prefixed kvm_gmem_get_pfn().
As a bonus, __kvm_gmem_get_pfn() loses its ugly "bool prepare" argument.

One difference is that fallocate(PUNCH_HOLE) can now race with a
page fault. Potentially this causes a page to be prepared and into the
filemap even after fallocate(PUNCH_HOLE). This is harmless, as it can be
fixed by another hole punching operation, and can be avoided by clearing
the private-page attribute prior to invoking fallocate(PUNCH_HOLE).
This way, the page fault will cause an exit to user space.

The previous semantics, where fallocate() could be used to prepare
the pages in advance of running the guest, can be accessed with
KVM_PRE_FAULT_MEMORY.

For now, accessing a page in one VM will attempt to call
kvm_arch_gmem_prepare() in all of those that have bound the guest_memfd.
Cleaning this up is left to a separate patch.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

b8552431

KVM: guest_memfd: return locked folio from __kvm_gmem_get_pfn · 78c42933

Paolo Bonzini authored Jul 11, 2024

Allow testing the up-to-date flag in the caller without taking the
lock again.
Reviewed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

78c42933

KVM: rename CONFIG_HAVE_KVM_GMEM_* to CONFIG_HAVE_KVM_ARCH_GMEM_* · 564429a6

Paolo Bonzini authored Jul 11, 2024

Add "ARCH" to the symbols; shortly, the "prepare" phase will include both
the arch-independent step to clear out contents left in the page by the
host, and the arch-dependent step enabled by CONFIG_HAVE_KVM_GMEM_PREPARE.
For consistency do the same for CONFIG_HAVE_KVM_GMEM_INVALIDATE as well.
Reviewed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

564429a6

KVM: guest_memfd: do not go through struct page · 7fbdda31

Paolo Bonzini authored Jul 11, 2024

We have a perfectly usable folio, use it to retrieve the pfn and order.
All that's needed is a version of folio_file_page that returns a pfn.
Reviewed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

7fbdda31

KVM: guest_memfd: delay folio_mark_uptodate() until after successful preparation · d04c77d2

Paolo Bonzini authored Jul 11, 2024

The up-to-date flag as is now is not too useful; it tells guest_memfd not
to overwrite the contents of a folio, but it doesn't say that the page
is ready to be mapped into the guest. For encrypted guests, mapping
a private page requires that the "preparation" phase has succeeded,
and at the same time the same page cannot be prepared twice.

So, ensure that folio_mark_uptodate() is only called on a prepared page. If
kvm_gmem_prepare_folio() or the post_populate callback fail, the folio
will not be marked up-to-date; it's not a problem to call clear_highpage()
again on such a page prior to the next preparation attempt.
Reviewed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

d04c77d2

KVM: guest_memfd: return folio from __kvm_gmem_get_pfn() · d0d87226

Paolo Bonzini authored Jul 11, 2024

Right now this is simply more consistent and avoids use of pfn_to_page()
and put_page().  It will be put to more use in upcoming patches, to
ensure that the up-to-date flag is set at the very end of both the
kvm_gmem_get_pfn() and kvm_gmem_populate() flows.
Reviewed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

d0d87226

KVM: x86: disallow pre-fault for SNP VMs before initialization · 5932ca41

Paolo Bonzini authored Jul 17, 2024

KVM_PRE_FAULT_MEMORY for an SNP guest can race with
sev_gmem_post_populate() in bad ways. The following sequence for
instance can potentially trigger an RMP fault:

  thread A, sev_gmem_post_populate: called
  thread B, sev_gmem_prepare: places below 'pfn' in a private state in RMP
  thread A, sev_gmem_post_populate: *vaddr = kmap_local_pfn(pfn + i);
  thread A, sev_gmem_post_populate: copy_from_user(vaddr, src + i * PAGE_SIZE, PAGE_SIZE);
  RMP #PF

Fix this by only allowing KVM_PRE_FAULT_MEMORY to run after a guest's
initial private memory contents have been finalized via
KVM_SEV_SNP_LAUNCH_FINISH.

Beyond fixing this issue, it just sort of makes sense to enforce this,
since the KVM_PRE_FAULT_MEMORY documentation states:

  "KVM maps memory as if the vCPU generated a stage-2 read page fault"

which sort of implies we should be acting on the same guest state that a
vCPU would see post-launch after the initial guest memory is all set up.
Co-developed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

5932ca41

KVM: Documentation: Fix title underline too short warning · c2adcf05

Chang Yu authored Jul 23, 2024

Fix "WARNING: Title underline too short" by extending title line to the
proper length.
Signed-off-by: Chang Yu <marcus.yu.56@gmail.com>
Message-ID: <ZqB3lofbzMQh5Q-5@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

c2adcf05

KVM: x86: Eliminate log spam from limited APIC timer periods · 0005ca20

Jim Mattson authored Jul 24, 2024

SAP's vSMP MemoryONE continuously requests a local APIC timer period
less than 500 us, resulting in the following kernel log spam:

  kvm: vcpu 15: requested 70240 ns lapic timer period limited to 500000 ns
  kvm: vcpu 19: requested 52848 ns lapic timer period limited to 500000 ns
  kvm: vcpu 15: requested 70256 ns lapic timer period limited to 500000 ns
  kvm: vcpu 9: requested 70256 ns lapic timer period limited to 500000 ns
  kvm: vcpu 9: requested 70208 ns lapic timer period limited to 500000 ns
  kvm: vcpu 9: requested 387520 ns lapic timer period limited to 500000 ns
  kvm: vcpu 9: requested 70160 ns lapic timer period limited to 500000 ns
  kvm: vcpu 66: requested 205744 ns lapic timer period limited to 500000 ns
  kvm: vcpu 9: requested 70224 ns lapic timer period limited to 500000 ns
  kvm: vcpu 9: requested 70256 ns lapic timer period limited to 500000 ns
  limit_periodic_timer_frequency: 7569 callbacks suppressed
  ...

To eliminate this spam, change the pr_info_ratelimited() in
limit_periodic_timer_frequency() to pr_info_once().
Reported-by: James Houghton <jthoughton@google.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Message-ID: <20240724190640.2449291-1-jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

0005ca20

17 Jul, 2024 1 commit

crypto: ccp: Add the SNP_VLEK_LOAD command · 332d2c1d

Michael Roth authored May 01, 2024

When requesting an attestation report a guest is able to specify whether
it wants SNP firmware to sign the report using either a Versioned Chip
Endorsement Key (VCEK), which is derived from chip-unique secrets, or a
Versioned Loaded Endorsement Key (VLEK) which is obtained from an AMD
Key Derivation Service (KDS) and derived from seeds allocated to
enrolled cloud service providers (CSPs).

For VLEK keys, an SNP_VLEK_LOAD SNP firmware command is used to load
them into the system after obtaining them from the KDS. Add a
corresponding userspace interface so to allow the loading of VLEK keys
into the system.

See SEV-SNP Firmware ABI 1.54, SNP_VLEK_LOAD for more details.
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Message-ID: <20240501085210.2213060-21-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

332d2c1d

16 Jul, 2024 20 commits

KVM: x86/pmu: Add kvm_pmu_call() to simplify static calls of kvm_pmu_ops · 5d766508

Wei Wang authored May 07, 2024

Similar to kvm_x86_call(), kvm_pmu_call() is added to streamline the usage
of static calls of kvm_pmu_ops, which improves code readability.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Link: https://lore.kernel.org/r/20240507133103.15052-4-wei.w.wang@intel.comSigned-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

5d766508

KVM: x86: Introduce kvm_x86_call() to simplify static calls of kvm_x86_ops · 89604647

Wei Wang authored May 07, 2024

Introduces kvm_x86_call(), to streamline the usage of static calls of
kvm_x86_ops. The current implementation of these calls is verbose and
could lead to alignment challenges. This makes the code susceptible to
exceeding the "80 columns per single line of code" limit as defined in
the coding-style document. Another issue with the existing implementation
is that the addition of kvm_x86_ prefix to hooks at the static_call sites
hinders code readability and navigation. kvm_x86_call() is added to
improve code readability and maintainability, while adhering to the coding
style guidelines.
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Link: https://lore.kernel.org/r/20240507133103.15052-3-wei.w.wang@intel.comSigned-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

89604647

KVM: x86: Replace static_call_cond() with static_call() · f4854bf7

Wei Wang authored May 07, 2024

The use of static_call_cond() is essentially the same as static_call() on
x86 (e.g. static_call() now handles a NULL pointer as a NOP), so replace
it with static_call() to simplify the code.

Link: https://lore.kernel.org/all/3916caa1dcd114301a49beafa5030eca396745c1.1679456900.git.jpoimboe@kernel.org/Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Link: https://lore.kernel.org/r/20240507133103.15052-2-wei.w.wang@intel.comSigned-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

f4854bf7

Merge branch 'kvm-6.11-sev-attestation' into HEAD · bc9cd5a2

Paolo Bonzini authored Jul 16, 2024

The GHCB 2.0 specification defines 2 GHCB request types to allow SNP guests
to send encrypted messages/requests to firmware: SNP Guest Requests and SNP
Extended Guest Requests. These encrypted messages are used for things like
servicing attestation requests issued by the guest. Implementing support for
these is required to be fully GHCB-compliant.

For the most part, KVM only needs to handle forwarding these requests to
firmware (to be issued via the SNP_GUEST_REQUEST firmware command defined
in the SEV-SNP Firmware ABI), and then forwarding the encrypted response to
the guest.

However, in the case of SNP Extended Guest Requests, the host is also
able to provide the certificate data corresponding to the endorsement key
used by firmware to sign attestation report requests. This certificate data
is provided by userspace because:

1) It allows for different keys/key types to be used for each particular
guest with requiring any sort of KVM API to configure the certificate
table in advance on a per-guest basis.

2) It provides additional flexibility with how attestation requests might
be handled during live migration where the certificate data for
source/dest might be different.

3) It allows all synchronization between certificates and firmware/signing
key updates to be handled purely by userspace rather than requiring
some in-kernel mechanism to facilitate it. [1]

To support fetching certificate data from userspace, a new KVM exit type will
be needed to handle fetching the certificate from userspace. An attempt to
define a new KVM_EXIT_COCO/KVM_EXIT_COCO_REQ_CERTS exit type to handle this
was introduced in v1 of this patchset, but is still being discussed by
community, so for now this patchset only implements a stub version of SNP
Extended Guest Requests that does not provide certificate data, but is still
enough to provide compliance with the GHCB 2.0 spec.

bc9cd5a2

KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event · 74458e48

Michael Roth authored Jul 01, 2024

Version 2 of GHCB specification added support for the SNP Extended Guest
Request Message NAE event. This event serves a nearly identical purpose
to the previously-added SNP_GUEST_REQUEST event, but for certain message
types it allows the guest to supply a buffer to be used for additional
information in some cases.

Currently the GHCB spec only defines extended handling of this sort in
the case of attestation requests, where the additional buffer is used to
supply a table of certificate data corresponding to the attestion
report's signing key. Support for this extended handling will require
additional KVM APIs to handle coordinating with userspace.

Whether or not the hypervisor opts to provide this certificate data is
optional. However, support for processing SNP_EXTENDED_GUEST_REQUEST
GHCB requests is required by the GHCB 2.0 specification for SNP guests,
so for now implement a stub implementation that provides an empty
certificate table to the guest if it supplies an additional buffer, but
otherwise behaves identically to SNP_GUEST_REQUEST.
Reviewed-by: Carlos Bilbao <carlos.bilbao.osdev@gmail.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Message-ID: <20240701223148.3798365-4-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

74458e48

x86/sev: Move sev_guest.h into common SEV header · f55f3c3a

Michael Roth authored Jul 01, 2024

sev_guest.h currently contains various definitions relating to the
format of SNP_GUEST_REQUEST commands to SNP firmware. Currently only the
sev-guest driver makes use of them, but when the KVM side of this is
implemented there's a need to parse the SNP_GUEST_REQUEST header to
determine whether additional information needs to be provided to the
guest. Prepare for this by moving those definitions to a common header
that's shared by host/guest code so that KVM can also make use of them.
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Message-ID: <20240701223148.3798365-3-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

f55f3c3a

KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event · 88caf544

Brijesh Singh authored Jul 01, 2024

Version 2 of GHCB specification added support for the SNP Guest Request
Message NAE event. The event allows for an SEV-SNP guest to make
requests to the SEV-SNP firmware through the hypervisor using the
SNP_GUEST_REQUEST API defined in the SEV-SNP firmware specification.

This is used by guests primarily to request attestation reports from
firmware. There are other request types are available as well, but the
specifics of what guest requests are being made generally does not
affect how they are handled by the hypervisor, which only serves as a
proxy for the guest requests and firmware responses.

Implement handling for these events.

When an SNP Guest Request is issued, the guest will provide its own
request/response pages, which could in theory be passed along directly
to firmware. However, these pages would need special care:

  - Both pages are from shared guest memory, so they need to be
    protected from migration/etc. occurring while firmware reads/writes
    to them. At a minimum, this requires elevating the ref counts and
    potentially needing an explicit pinning of the memory. This places
    additional restrictions on what type of memory backends userspace
    can use for shared guest memory since there would be some reliance
    on using refcounted pages.

  - The response page needs to be switched to Firmware-owned state
    before the firmware can write to it, which can lead to potential
    host RMP #PFs if the guest is misbehaved and hands the host a
    guest page that KVM is writing to for other reasons (e.g. virtio
    buffers).

Both of these issues can be avoided completely by using
separately-allocated bounce pages for both the request/response pages
and passing those to firmware instead. So that's the approach taken
here.
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Co-developed-by: Alexey Kardashevskiy <aik@amd.com>
Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
[mdr: ensure FW command failures are indicated to guest, drop extended
 request handling to be re-written as separate patch, massage commit]
Signed-off-by: Michael Roth <michael.roth@amd.com>
Message-ID: <20240701223148.3798365-2-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

88caf544

KVM: x86: Suppress MMIO that is triggered during task switch emulation · 2a1fc7dc

Sean Christopherson authored Jul 12, 2024

Explicitly suppress userspace emulated MMIO exits that are triggered when
emulating a task switch as KVM doesn't support userspace MMIO during
complex (multi-step) emulation. Silently ignoring the exit request can
result in the WARN_ON_ONCE(vcpu->mmio_needed) firing if KVM exits to
userspace for some other reason prior to purging mmio_needed.

See commit 0dc90226 ("KVM: x86: Suppress pending MMIO write exits if
emulator detects exception") for more details on KVM's limitations with
respect to emulated MMIO during complex emulator flows.

Reported-by: syzbot+2fb9f8ed752c01bc9a3f@syzkaller.appspotmail.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20240712144841.1230591-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

2a1fc7dc

KVM: x86/mmu: Clean up make_huge_page_split_spte() definition and intro · 9fe17d2a

Sean Christopherson authored Jul 12, 2024

Tweak the definition of make_huge_page_split_spte() to eliminate an
unnecessarily long line, and opportunistically initialize child_spte to
make it more obvious that the child is directly derived from the huge
parent.

No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20240712151335.1242633-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

9fe17d2a

KVM: x86/mmu: Bug the VM if KVM tries to split a !hugepage SPTE · 3d4415ed

Sean Christopherson authored Jul 12, 2024

Bug the VM instead of simply warning if KVM tries to split a SPTE that is
non-present or not-huge.  KVM is guaranteed to end up in a broken state as
the callers fully expect a valid SPTE, e.g. the shadow MMU will add an
rmap entry, and all MMUs will account the expected small page.  Returning
'0' is also technically wrong now that SHADOW_NONPRESENT_VALUE exists,
i.e. would cause KVM to create a potential #VE SPTE.

While it would be possible to have the callers gracefully handle failure,
doing so would provide no practical value as the scenario really should be
impossible, while the error handling would add a non-trivial amount of
noise.

Fixes: a3fe5dbd ("KVM: x86/mmu: Split huge pages mapped by the TDP MMU when dirty logging is enabled")
Cc: David Matlack <dmatlack@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20240712151335.1242633-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

3d4415ed

Merge tag 'kvm-x86-vmx-6.11' of https://github.com/kvm-x86/linux into HEAD · 208a352a

Paolo Bonzini authored Jul 16, 2024

KVM VMX changes for 6.11

 - Remove an unnecessary EPT TLB flush when enabling hardware.

 - Fix a series of bugs that cause KVM to fail to detect nested pending posted
   interrupts as valid wake eents for a vCPU executing HLT in L2 (with
   HLT-exiting disable by L1).

 - Misc cleanups

208a352a

Merge tag 'kvm-x86-svm-6.11' of https://github.com/kvm-x86/linux into HEAD · 1229cbef

Paolo Bonzini authored Jul 16, 2024

KVM SVM changes for 6.11

 - Make per-CPU save_area allocations NUMA-aware.

 - Force sev_es_host_save_area() to be inlined to avoid calling into an
   instrumentable function from noinstr code.

1229cbef

Merge tag 'kvm-x86-selftests-6.11' of https://github.com/kvm-x86/linux into HEAD · dbfd50cb

Paolo Bonzini authored Jul 16, 2024

KVM selftests for 6.11

 - Remove dead code in the memslot modification stress test.

 - Treat "branch instructions retired" as supported on all AMD Family 17h+ CPUs.

 - Print the guest pseudo-RNG seed only when it changes, to avoid spamming the
   log for tests that create lots of VMs.

 - Make the PMU counters test less flaky when counting LLC cache misses by
   doing CLFLUSH{OPT} in every loop iteration.

dbfd50cb

Merge tag 'kvm-x86-pmu-6.11' of https://github.com/kvm-x86/linux into HEAD · cda231cd

Paolo Bonzini authored Jul 16, 2024

KVM x86/pmu changes for 6.11

 - Don't advertise IA32_PERF_GLOBAL_OVF_CTRL as an MSR-to-be-saved, as it reads
   '0' and writes from userspace are ignored.

 - Update to the newfangled Intel CPU FMS infrastructure.

 - Use macros instead of open-coded literals to clean up KVM's manipulation of
   FIXED_CTR_CTRL MSRs.

cda231cd

Merge tag 'kvm-x86-mtrrs-6.11' of https://github.com/kvm-x86/linux into HEAD · 5c5ddf71

Paolo Bonzini authored Jul 16, 2024

KVM x86 MTRR virtualization removal

Remove support for virtualizing MTRRs on Intel CPUs, along with a nasty CR0.CD
hack, and instead always honor guest PAT on CPUs that support self-snoop.

5c5ddf71

Merge tag 'kvm-x86-mmu-6.11' of https://github.com/kvm-x86/linux into HEAD · 34b69ede

Paolo Bonzini authored Jul 16, 2024

KVM x86 MMU changes for 6.11

 - Don't allocate kvm_mmu_page.shadowed_translation for shadow pages that can't
   hold leafs SPTEs.

 - Unconditionally drop mmu_lock when allocating TDP MMU page tables for eager
   page splitting to avoid stalling vCPUs when splitting huge pages.

 - Misc cleanups

34b69ede

Merge tag 'kvm-x86-misc-6.11' of https://github.com/kvm-x86/linux into HEAD · 5dcc1e76

Paolo Bonzini authored Jul 16, 2024

KVM x86 misc changes for 6.11

 - Add a global struct to consolidate tracking of host values, e.g. EFER, and
   move "shadow_phys_bits" into the structure as "maxphyaddr".

 - Add KVM_CAP_X86_APIC_BUS_CYCLES_NS to allow configuring the effective APIC
   bus frequency, because TDX.

 - Print the name of the APICv/AVIC inhibits in the relevant tracepoint.

 - Clean up KVM's handling of vendor specific emulation to consistently act on
   "compatible with Intel/AMD", versus checking for a specific vendor.

 - Misc cleanups

5dcc1e76

Merge tag 'kvm-x86-generic-6.11' of https://github.com/kvm-x86/linux into HEAD · 86014c1e

Paolo Bonzini authored Jul 16, 2024

KVM generic changes for 6.11

 - Enable halt poll shrinking by default, as Intel found it to be a clear win.

 - Setup empty IRQ routing when creating a VM to avoid having to synchronize
   SRCU when creating a split IRQCHIP on x86.

 - Rework the sched_in/out() paths to replace kvm_arch_sched_in() with a flag
   that arch code can use for hooking both sched_in() and sched_out().

 - Take the vCPU @id as an "unsigned long" instead of "u32" to avoid
   truncating a bogus value from userspace, e.g. to help userspace detect bugs.

 - Mark a vCPU as preempted if and only if it's scheduled out while in the
   KVM_RUN loop, e.g. to avoid marking it preempted and thus writing guest
   memory when retrieving guest state during live migration blackout.

 - A few minor cleanups

86014c1e

Merge tag 'kvm-x86-fixes-6.10-11' of https://github.com/kvm-x86/linux into HEAD · f4501e8b

Paolo Bonzini authored Jul 16, 2024

KVM Xen:

Fix a bug where KVM fails to check the validity of an incoming userspace
virtual address and tries to activate a gfn_to_pfn_cache with a kernel address.

f4501e8b

Merge tag 'kvmarm-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD · 1c5a0b55

Paolo Bonzini authored Jul 16, 2024

KVM/arm64 changes for 6.11

 - Initial infrastructure for shadow stage-2 MMUs, as part of nested
   virtualization enablement

 - Support for userspace changes to the guest CTR_EL0 value, enabling
   (in part) migration of VMs between heterogenous hardware

 - Fixes + improvements to pKVM's FF-A proxy, adding support for v1.1 of
   the protocol

 - FPSIMD/SVE support for nested, including merged trap configuration
   and exception routing

 - New command-line parameter to control the WFx trap behavior under KVM

 - Introduce kCFI hardening in the EL2 hypervisor

 - Fixes + cleanups for handling presence/absence of FEAT_TCRX

 - Miscellaneous fixes + documentation updates

1c5a0b55

14 Jul, 2024 5 commits

Merge branch kvm-arm64/docs into kvmarm/next · bb032b23

Oliver Upton authored Jul 14, 2024

* kvm-arm64/docs:
  : KVM Documentation fixes, courtesy of Changyuan Lyu
  :
  : Small set of typo fixes / corrections to the KVM API documentation
  : relating to MSIs and arm64 VGIC UAPI.
  MAINTAINERS: Include documentation in KVM/arm64 entry
  KVM: Documentation: Correct the VGIC V2 CPU interface addr space size
  KVM: Documentation: Enumerate allowed value macros of `irq_type`
  KVM: Documentation: Fix typo `BFD`
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>

bb032b23

Merge branch kvm-arm64/nv-tcr2 into kvmarm/next · bc2e3253

Oliver Upton authored Jul 14, 2024

* kvm-arm64/nv-tcr2:
  : Fixes to the handling of TCR_EL1, courtesy of Marc Zyngier
  :
  : Series addresses a couple gaps that are present in KVM (from cover
  : letter):
  :
  :   - VM configuration: HCRX_EL2.TCR2En is forced to 1, and we blindly
  :     save/restore stuff.
  :
  :   - trap bit description and routing: none, obviously, since we make a
  :     point in not trapping.
  KVM: arm64: Honor trap routing for TCR2_EL1
  KVM: arm64: Make PIR{,E0}_EL1 save/restore conditional on FEAT_TCRX
  KVM: arm64: Make TCR2_EL1 save/restore dependent on the VM features
  KVM: arm64: Get rid of HCRX_GUEST_FLAGS
  KVM: arm64: Correctly honor the presence of FEAT_TCRX
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>

bc2e3253

Merge branch kvm-arm64/nv-sve into kvmarm/next · 8c2899e7

Oliver Upton authored Jul 14, 2024

* kvm-arm64/nv-sve:
  : CPTR_EL2, FPSIMD/SVE support for nested
  :
  : This series brings support for honoring the guest hypervisor's CPTR_EL2
  : trap configuration when running a nested guest, along with support for
  : FPSIMD/SVE usage at L1 and L2.
  KVM: arm64: Allow the use of SVE+NV
  KVM: arm64: nv: Add additional trap setup for CPTR_EL2
  KVM: arm64: nv: Add trap description for CPTR_EL2
  KVM: arm64: nv: Add TCPAC/TTA to CPTR->CPACR conversion helper
  KVM: arm64: nv: Honor guest hypervisor's FP/SVE traps in CPTR_EL2
  KVM: arm64: nv: Load guest FP state for ZCR_EL2 trap
  KVM: arm64: nv: Handle CPACR_EL1 traps
  KVM: arm64: Spin off helper for programming CPTR traps
  KVM: arm64: nv: Ensure correct VL is loaded before saving SVE state
  KVM: arm64: nv: Use guest hypervisor's max VL when running nested guest
  KVM: arm64: nv: Save guest's ZCR_EL2 when in hyp context
  KVM: arm64: nv: Load guest hyp's ZCR into EL1 state
  KVM: arm64: nv: Handle ZCR_EL2 traps
  KVM: arm64: nv: Forward SVE traps to guest hypervisor
  KVM: arm64: nv: Forward FP/ASIMD traps to guest hypervisor
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>

8c2899e7

Merge branch kvm-arm64/el2-kcfi into kvmarm/next · 1270dad3

Oliver Upton authored Jul 14, 2024

* kvm-arm64/el2-kcfi:
  : kCFI support in the EL2 hypervisor, courtesy of Pierre-Clément Tosi
  :
  : Enable the usage fo CONFIG_CFI_CLANG (kCFI) for hardening indirect
  : branches in the EL2 hypervisor. Unlike kernel support for the feature,
  : CFI failures at EL2 are always fatal.
  KVM: arm64: nVHE: Support CONFIG_CFI_CLANG at EL2
  KVM: arm64: Introduce print_nvhe_hyp_panic helper
  arm64: Introduce esr_brk_comment, esr_is_cfi_brk
  KVM: arm64: VHE: Mark __hyp_call_panic __noreturn
  KVM: arm64: nVHE: gen-hyprel: Skip R_AARCH64_ABS32
  KVM: arm64: nVHE: Simplify invalid_host_el2_vect
  KVM: arm64: Fix __pkvm_init_switch_pgd call ABI
  KVM: arm64: Fix clobbered ELR in sync abort/SError
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>

1270dad3

Merge branch kvm-arm64/ctr-el0 into kvmarm/next · 377d0e5d

Oliver Upton authored Jul 14, 2024

* kvm-arm64/ctr-el0:
  : Support for user changes to CTR_EL0, courtesy of Sebastian Ott
  :
  : Allow userspace to change the guest-visible value of CTR_EL0 for a VM,
  : so long as the requested value represents a subset of features supported
  : by hardware. In other words, prevent the VMM from over-promising the
  : capabilities of hardware.
  :
  : Make this happen by fitting CTR_EL0 into the existing infrastructure for
  : feature ID registers.
  KVM: selftests: Assert that MPIDR_EL1 is unchanged across vCPU reset
  KVM: arm64: nv: Unfudge ID_AA64PFR0_EL1 masking
  KVM: selftests: arm64: Test writes to CTR_EL0
  KVM: arm64: rename functions for invariant sys regs
  KVM: arm64: show writable masks for feature registers
  KVM: arm64: Treat CTR_EL0 as a VM feature ID register
  KVM: arm64: unify code to prepare traps
  KVM: arm64: nv: Use accessors for modifying ID registers
  KVM: arm64: Add helper for writing ID regs
  KVM: arm64: Use read-only helper for reading VM ID registers
  KVM: arm64: Make idregs debugfs iterator search sysreg table directly
  KVM: arm64: Get sys_reg encoding from descriptor in idregs_debug_show()
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>

377d0e5d