Commits · 1bbc60d0c7e5728aced352e528ef936ebe2344c0 · Kirill Smelkov / linux

18 Feb, 2022 9 commits

KVM: x86/mmu: Remove MMU auditing · 1bbc60d0

Sean Christopherson authored Feb 18, 2022

Remove mmu_audit.c and all its collateral, the auditing code has suffered
severe bitrot, ironically partly due to shadow paging being more stable
and thus not benefiting as much from auditing, but mostly due to TDP
supplanting shadow paging for non-nested guests and shadowing of nested
TDP not heavily stressing the logic that is being audited.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

1bbc60d0

KVM: x86: allow defining return-0 static calls · 5be2226f

Paolo Bonzini authored Feb 15, 2022

A few vendor callbacks are only used by VMX, but they return an integer
or bool value.  Introduce KVM_X86_OP_OPTIONAL_RET0 for them: if a func is
NULL in struct kvm_x86_ops, it will be changed to __static_call_return0
when updating static calls.
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

5be2226f

KVM: x86: make several APIC virtualization callbacks optional · abb6d479

Paolo Bonzini authored Feb 08, 2022

All their invocations are conditional on vcpu->arch.apicv_active,
meaning that they need not be implemented by vendor code: even
though at the moment both vendors implement APIC virtualization,
all of them can be optional.  In fact SVM does not need many of
them, and their implementation can be deleted now.
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

abb6d479

KVM: x86: warn on incorrectly NULL members of kvm_x86_ops · dd2319c6

Paolo Bonzini authored Dec 09, 2021

Use the newly corrected KVM_X86_OP annotations to warn about possible
NULL pointer dereferences as soon as the vendor module is loaded.
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

dd2319c6

KVM: x86: remove KVM_X86_OP_NULL and mark optional kvm_x86_ops · e4fc23ba

Paolo Bonzini authored Dec 09, 2021

The original use of KVM_X86_OP_NULL, which was to mark calls
that do not follow a specific naming convention, is not in use
anymore.  Instead, let's mark calls that are optional because
they are always invoked within conditionals or with static_call_cond.
Those that are _not_, i.e. those that are defined with KVM_X86_OP,
must be defined by both vendor modules or some kind of NULL pointer
dereference is bound to happen at runtime.
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

e4fc23ba

KVM: x86: use static_call_cond for optional callbacks · 2a890614

Paolo Bonzini authored Feb 01, 2022

SVM implements neither update_emulated_instruction nor
set_apic_access_page_addr.  Remove an "if" by calling them
with static_call_cond().
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

2a890614

KVM: x86: return 1 unconditionally for availability of KVM_CAP_VAPIC · 8a289785

Paolo Bonzini authored Feb 15, 2022

The two ioctls used to implement userspace-accelerated TPR,
KVM_TPR_ACCESS_REPORTING and KVM_SET_VAPIC_ADDR, are available
even if hardware-accelerated TPR can be used.  So there is
no reason not to report KVM_CAP_VAPIC.
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

8a289785

selftests: KVM: allow sev_migrate_tests on machines without SEV-ES · 1e8ff29f

Paolo Bonzini authored Feb 18, 2022

I managed to get hold of a machine that has SEV but not SEV-ES, and
sev_migrate_tests fails because sev_vm_create(true) returns ENOTTY.
Fix this, and while at it also return KSFT_SKIP on machines that do
not have SEV at all, instead of returning 0.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

1e8ff29f

KVM: SEV: Allow SEV intra-host migration of VM with mirrors · b2125513

Peter Gonda authored Feb 11, 2022

For SEV-ES VMs with mirrors to be intra-host migrated they need to be
able to migrate with the mirror. This is due to that fact that all VMSAs
need to be added into the VM with LAUNCH_UPDATE_VMSA before
lAUNCH_FINISH. Allowing migration with mirrors allows users of SEV-ES to
keep the mirror VMs VMSAs during migration.

Adds a list of mirror VMs for the original VM iterate through during its
migration. During the iteration the owner pointers can be updated from
the source to the destination. This fixes the ASID leaking issue which
caused the blocking of migration of VMs with mirrors.
Signed-off-by: Peter Gonda <pgonda@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Marc Orr <marcorr@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Message-Id: <20220211193634.3183388-1-pgonda@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

b2125513

14 Feb, 2022 4 commits

KVM: SVM: Rename AVIC helpers to use "avic" prefix instead of "svm" · db6e7adf

Sean Christopherson authored Jan 28, 2022

Use "avic" instead of "svm" for SVM's all of APICv hooks and make a few
additional funciton name tweaks so that the AVIC functions conform to
their associated kvm_x86_ops hooks.

No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220128005208.4008533-19-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

db6e7adf

Merge remote-tracking branch 'kvm/master' into HEAD · 4e71cad3
Paolo Bonzini authored Feb 14, 2022
```
Merge bugfix patches from Linux 5.17-rc.
```
4e71cad3

KVM: x86/pmu: Use AMD64_RAW_EVENT_MASK for PERF_TYPE_RAW · 710c4765

Jim Mattson authored Feb 02, 2022

AMD's event select is 3 nybbles, with the high nybble in bits 35:32 of
a PerfEvtSeln MSR. Don't mask off the high nybble when configuring a
RAW perf event.

Fixes: ca724305 ("KVM: x86/vPMU: Implement AMD vPMU code for KVM")
Signed-off-by: Jim Mattson <jmattson@google.com>
Message-Id: <20220203014813.2130559-2-jmattson@google.com>
Reviewed-by: David Dunn <daviddunn@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

710c4765

KVM: x86/pmu: Don't truncate the PerfEvtSeln MSR when creating a perf event · b8bfee85

Jim Mattson authored Feb 02, 2022

AMD's event select is 3 nybbles, with the high nybble in bits 35:32 of
a PerfEvtSeln MSR. Don't drop the high nybble when setting up the
config field of a perf_event_attr structure for a call to
perf_event_create_kernel_counter().

Fixes: ca724305 ("KVM: x86/vPMU: Implement AMD vPMU code for KVM")
Reported-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Message-Id: <20220203014813.2130559-1-jmattson@google.com>
Reviewed-by: David Dunn <daviddunn@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

b8bfee85

11 Feb, 2022 6 commits

KVM: SVM: fix race between interrupt delivery and AVIC inhibition · 66fa226c

Maxim Levitsky authored Feb 08, 2022

If svm_deliver_avic_intr is called just after the target vcpu's AVIC got
inhibited, it might read a stale value of vcpu->arch.apicv_active
which can lead to the target vCPU not noticing the interrupt.

To fix this use load-acquire/store-release so that, if the target vCPU
is IN_GUEST_MODE, we're guaranteed to see a previous disabling of the
AVIC.  If AVIC has been disabled in the meanwhile, proceed with the
KVM_REQ_EVENT-based delivery.

Incomplete IPI vmexit has the same races as svm_deliver_avic_intr, and
in fact it can be handled in exactly the same way; the only difference
lies in who has set IRR, whether svm_deliver_interrupt or the processor.
Therefore, svm_complete_interrupt_delivery can be used to fix incomplete
IPI vmexits as well.
Co-developed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

66fa226c

KVM: SVM: set IRR in svm_deliver_interrupt · 30811174

Paolo Bonzini authored Feb 08, 2022

SVM has to set IRR for both the AVIC and the software-LAPIC case,
so pull it up to the common function that handles both configurations.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

30811174

KVM: SVM: extract avic_ring_doorbell · 0a5f7842

Maxim Levitsky authored Feb 08, 2022

The check on the current CPU adds an extra level of indentation to
svm_deliver_avic_intr and conflates documentation on what happens
if the vCPU exits (of interest to svm_deliver_avic_intr) and migrates
(only of interest to avic_ring_doorbell, which calls get/put_cpu()).
Extract the wrmsr to a separate function and rewrite the
comment in svm_deliver_avic_intr().
Co-developed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

0a5f7842

selftests: kvm: Remove absent target file · 0316dbb9

Muhammad Usama Anjum authored Feb 10, 2022

There is no vmx_pi_mmio_test file. Remove it to get rid of error while
creation of selftest archive:

rsync: [sender] link_stat "/kselftest/kvm/x86_64/vmx_pi_mmio_test" failed: No such file or directory (2)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1333) [sender=3.2.3]

Fixes: 6a581508 ("selftest: KVM: Add intra host migration tests")
Reported-by: "kernelci.org bot" <bot@kernelci.org>
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Message-Id: <20220210172352.1317554-1-usama.anjum@collabora.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

0316dbb9

Merge tag 'kvmarm-fixes-5.17-3' of... · ed343aa8

Paolo Bonzini authored Feb 11, 2022

Merge tag 'kvmarm-fixes-5.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 5.17, take #3

- Fix pending state read of a HW interrupt

ed343aa8

KVM: arm64: vgic: Read HW interrupt pending state from the HW · 5bfa685e

Marc Zyngier authored Feb 03, 2022

It appears that a read access to GIC[DR]_I[CS]PENDRn doesn't always
result in the pending interrupts being accurately reported if they are
mapped to a HW interrupt. This is particularily visible when acking
the timer interrupt and reading the GICR_ISPENDR1 register immediately
after, for example (the interrupt appears as not-pending while it really
is...).

This is because a HW interrupt has its 'active and pending state' kept
in the *physical* distributor, and not in the virtual one, as mandated
by the spec (this is what allows the direct deactivation). The virtual
distributor only caries the pending and active *states* (note the
plural, as these are two independent and non-overlapping states).

Fix it by reading the HW state back, either from the timer itself or
from the distributor if necessary.
Reported-by: Ricardo Koller <ricarkol@google.com>
Tested-by: Ricardo Koller <ricarkol@google.com>
Reviewed-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220208123726.3604198-1-maz@kernel.org

5bfa685e

10 Feb, 2022 21 commits

KVM: VMX: Use local pointer to vcpu_vmx in vmx_vcpu_after_set_cpuid() · 48ebd0cf

Oliver Upton authored Feb 04, 2022

There is a local that contains a pointer to vcpu_vmx already. Just use
that instead to get at the structure directly instead of doing pointer
arithmetic.

No functional change intended.
Signed-off-by: Oliver Upton <oupton@google.com>
Message-Id: <20220204204705.3538240-8-oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

48ebd0cf

KVM: selftests: nSVM: Add enlightened MSR-Bitmap selftest · e67bd7df

Vitaly Kuznetsov authored Feb 03, 2022

Introduce a new test for Hyper-V nSVM extensions (Hyper-V on KVM) and add
a test for enlightened MSR-Bitmap feature:

- Intercept access to MSR_FS_BASE in L1 and check that this works
  with enlightened MSR-Bitmap disabled.
- Enabled enlightened MSR-Bitmap and check that the intercept still works
  as expected.
- Intercept access to MSR_GS_BASE but don't clear the corresponding bit
  from clean fields mask, KVM is supposed to skip updating MSR-Bitmap02 and
  thus the consequent access to the MSR from L2 will not get intercepted.
- Finally, clear the corresponding bit from clean fields mask and check
  that access to MSR_GS_BASE is now intercepted.

The test works with the assumption, that access to MSR_FS_BASE/MSR_GS_BASE
is not intercepted for L1. If this ever becomes not true the test will
fail as nested_svm_exit_handled_msr() always checks L1's MSR-Bitmap for
L2 irrespective of clean fields. The behavior is correct as enlightened
MSR-Bitmap feature is just an optimization, KVM is not obliged to ignore
updates when the corresponding bit in clean fields stays clear.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220203104620.277031-7-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

e67bd7df

KVM: selftests: nSVM: Update 'struct vmcb_control_area' definition · 29f557d5

Vitaly Kuznetsov authored Feb 03, 2022

There's a copy of 'struct vmcb_control_area' definition in KVM selftests,
update it to allow testing of the newly introduced features.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220203104620.277031-6-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

29f557d5

KVM: selftests: nSVM: Set up MSR-Bitmap for SVM guests · 0b815117

Vitaly Kuznetsov authored Feb 03, 2022

Similar to VMX, allocate memory for MSR-Bitmap and fill in 'msrpm_base_pa'
in VMCB. To use it, tests will need to set INTERCEPT_MSR_PROT interception
along with the required bits in the MSR-Bitmap.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220203104620.277031-5-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

0b815117

KVM: selftests: nVMX: Add enlightened MSR-Bitmap selftest · 70e477d9

Vitaly Kuznetsov authored Feb 03, 2022

Introduce a test for enlightened MSR-Bitmap feature (Hyper-V on KVM):
- Intercept access to MSR_FS_BASE in L1 and check that this works
 with enlightened MSR-Bitmap disabled.
- Enabled enlightened MSR-Bitmap and check that the intercept still works
as expected.
- Intercept access to MSR_GS_BASE but don't clear the corresponding bit
from 'hv_clean_fields', KVM is supposed to skip updating MSR-Bitmap02 and
thus the consequent access to the MSR from L2 will not get intercepted.
- Finally, clear the corresponding bit from 'hv_clean_fields' and check
that access to MSR_GS_BASE is now intercepted.

The test works with the assumption, that access to MSR_FS_BASE/MSR_GS_BASE
is not intercepted for L1. If this ever becomes not true the test will
fail as nested_vmx_exit_handled_msr() always checks L1's MSR-Bitmap for
L2 irrespective of 'hv_clean_fields'. The behavior is correct as
enlightened MSR-Bitmap feature is just an optimization, KVM is not obliged
to ignore updates when the corresponding bit in 'hv_clean_fields' stays
clear.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220203104620.277031-4-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

70e477d9

KVM: selftests: nVMX: Properly deal with 'hv_clean_fields' · 761b5eba

Vitaly Kuznetsov authored Feb 03, 2022

Instead of just resetting 'hv_clean_fields' to 0 on every enlightened
vmresume, do the expected cleaning of the corresponding bit on enlightened
vmwrite. Avoid direct access to 'current_evmcs' from evmcs_test to support
the change.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220203104620.277031-3-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

761b5eba

KVM: selftests: Adapt hyperv_cpuid test to the newly introduced Enlightened MSR-Bitmap · 6081f9c7

Vitaly Kuznetsov authored Feb 03, 2022

CPUID 0x40000000.EAX is now always present as it has Enlightened
MSR-Bitmap feature bit set. Adapt the test accordingly. Opportunistically
add a check for the supported eVMCS version range.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220203104620.277031-2-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

6081f9c7

KVM: nSVM: Implement Enlightened MSR-Bitmap feature · 66c03a92

Vitaly Kuznetsov authored Feb 02, 2022

Similar to nVMX commit 502d2bf5 ("KVM: nVMX: Implement Enlightened MSR
Bitmap feature"), add support for the feature for nSVM (Hyper-V on KVM).

Notable differences from nVMX implementation:
- As the feature uses SW reserved fields in VMCB control, KVM needs to
make sure it's dealing with a Hyper-V guest (kvm_hv_hypercall_enabled()).

- 'msrpm_base_pa' needs to be always be overwritten in
nested_svm_vmrun_msrpm(), even when the update is skipped. As an
optimization, nested_vmcb02_prepare_control() copies it from VMCB01
so when MSR-Bitmap feature for L2 is disabled nothing needs to be done.

- 'struct vmcb_ctrl_area_cached' needs to be extended with clean
fields/sw reserved data and __nested_copy_vmcb_control_to_cache() needs to
copy it so nested_svm_vmrun_msrpm() can use it later.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220202095100.129834-5-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

66c03a92

KVM: nSVM: Split off common definitions for Hyper-V on KVM and KVM on Hyper-V · 9e083ec7

Vitaly Kuznetsov authored Feb 02, 2022

In preparation to implementing Enlightened MSR-Bitmap feature for Hyper-V
on KVM, split off the required definitions into common 'svm/hyperv.h'
header.

No functional change intended.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220202095100.129834-4-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

9e083ec7

KVM: x86: Make kvm_hv_hypercall_enabled() static inline · ce385917

Vitaly Kuznetsov authored Feb 02, 2022

In preparation for using kvm_hv_hypercall_enabled() from SVM code, make
it static inline to avoid the need to export it. The function is a
simple check with only two call sites currently.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220202095100.129834-3-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

ce385917

KVM: nSVM: Track whether changes in L0 require MSR bitmap for L2 to be rebuilt · 73c25546

Vitaly Kuznetsov authored Feb 02, 2022

Similar to nVMX commit ed2a4800 ("KVM: nVMX: Track whether changes in
L0 require MSR bitmap for L2 to be rebuilt"), introduce a flag to keep
track of whether MSR bitmap for L2 needs to be rebuilt due to changes in
MSR bitmap for L1 or switching to a different L2.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220202095100.129834-2-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

73c25546

KVM: selftests: Add an option to disable MANUAL_PROTECT_ENABLE and INITIALLY_SET · 951cb0a3

David Matlack authored Jan 19, 2022

Add an option to dirty_log_perf_test.c to disable
KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE and KVM_DIRTY_LOG_INITIALLY_SET so
the legacy dirty logging code path can be tested.
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20220119230739.2234394-19-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

951cb0a3

KVM: x86/mmu: Add tracepoint for splitting huge pages · e0b728b1

David Matlack authored Jan 19, 2022

Add a tracepoint that records whenever KVM eagerly splits a huge page
and the error status of the split to indicate if it succeeded or failed
and why.
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20220119230739.2234394-18-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

e0b728b1

KVM: x86/mmu: Split huge pages mapped by the TDP MMU during KVM_CLEAR_DIRTY_LOG · cb00a70b

David Matlack authored Jan 19, 2022

When using KVM_DIRTY_LOG_INITIALLY_SET, huge pages are not
write-protected when dirty logging is enabled on the memslot. Instead
they are write-protected once userspace invokes KVM_CLEAR_DIRTY_LOG for
the first time and only for the specific sub-region being cleared.

Enhance KVM_CLEAR_DIRTY_LOG to also try to split huge pages prior to
write-protecting to avoid causing write-protection faults on vCPU
threads. This also allows userspace to smear the cost of huge page
splitting across multiple ioctls, rather than splitting the entire
memslot as is the case when initially-all-set is not used.
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20220119230739.2234394-17-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

cb00a70b

KVM: x86/mmu: Split huge pages mapped by the TDP MMU when dirty logging is enabled · a3fe5dbd

David Matlack authored Jan 19, 2022

When dirty logging is enabled without initially-all-set, try to split
all huge pages in the memslot down to 4KB pages so that vCPUs do not
have to take expensive write-protection faults to split huge pages.

Eager page splitting is best-effort only. This commit only adds the
support for the TDP MMU, and even there splitting may fail due to out
of memory conditions. Failures to split a huge page is fine from a
correctness standpoint because KVM will always follow up splitting by
write-protecting any remaining huge pages.

Eager page splitting moves the cost of splitting huge pages off of the
vCPU threads and onto the thread enabling dirty logging on the memslot.
This is useful because:

 1. Splitting on the vCPU thread interrupts vCPUs execution and is
    disruptive to customers whereas splitting on VM ioctl threads can
    run in parallel with vCPU execution.

 2. Splitting all huge pages at once is more efficient because it does
    not require performing VM-exit handling or walking the page table for
    every 4KiB page in the memslot, and greatly reduces the amount of
    contention on the mmu_lock.

For example, when running dirty_log_perf_test with 96 virtual CPUs, 1GiB
per vCPU, and 1GiB HugeTLB memory, the time it takes vCPUs to write to
all of their memory after dirty logging is enabled decreased by 95% from
2.94s to 0.14s.

Eager Page Splitting is over 100x more efficient than the current
implementation of splitting on fault under the read lock. For example,
taking the same workload as above, Eager Page Splitting reduced the CPU
required to split all huge pages from ~270 CPU-seconds ((2.94s - 0.14s)
* 96 vCPU threads) to only 1.55 CPU-seconds.

Eager page splitting does increase the amount of time it takes to enable
dirty logging since it has split all huge pages. For example, the time
it took to enable dirty logging in the 96GiB region of the
aforementioned test increased from 0.001s to 1.55s.
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20220119230739.2234394-16-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

a3fe5dbd

KVM: x86/mmu: Separate TDP MMU shadow page allocation and initialization · a82070b6

David Matlack authored Jan 19, 2022

Separate the allocation of shadow pages from their initialization.  This
is in preparation for splitting huge pages outside of the vCPU fault
context, which requires a different allocation mechanism.

No functional changed intended.
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20220119230739.2234394-15-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

a82070b6

KVM: x86/mmu: Derive page role for TDP MMU shadow pages from parent · a3aca4de

David Matlack authored Jan 19, 2022

Derive the page role from the parent shadow page, since the only thing
that changes is the level. This is in preparation for splitting huge
pages during VM-ioctls which do not have access to the vCPU MMU context.

No functional change intended.
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20220119230739.2234394-14-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

a3aca4de

KVM: x86/mmu: Remove redundant role overrides for TDP MMU shadow pages · a81399a5

David Matlack authored Jan 19, 2022

The vCPU's mmu_role already has the correct values for direct,
has_4_byte_gpte, access, and ad_disabled. Remove the code that was
redundantly overwriting these fields with the same values.

No functional change intended.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20220119230739.2234394-13-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

a81399a5

KVM: x86/mmu: Refactor TDP MMU iterators to take kvm_mmu_page root · 77aa6075

David Matlack authored Jan 19, 2022

Instead of passing a pointer to the root page table and the root level
separately, pass in a pointer to the root kvm_mmu_page struct.  This
reduces the number of arguments by 1, cutting down on line lengths.

No functional change intended.
Reviewed-by: Ben Gardon <bgardon@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20220119230739.2234394-12-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

77aa6075

KVM: x86/mmu: Move restore_acc_track_spte() to spte.h · 315d86da

David Matlack authored Jan 19, 2022

restore_acc_track_spte() is pure SPTE bit manipulation, making it a good
fit for spte.h. And now that the WARN_ON_ONCE() calls have been removed,
there isn't any good reason to not inline it.

This move also prepares for a follow-up commit that will need to call
restore_acc_track_spte() from spte.c

No functional change intended.
Reviewed-by: Ben Gardon <bgardon@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20220119230739.2234394-11-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

315d86da

KVM: x86/mmu: Drop new_spte local variable from restore_acc_track_spte() · 77c23c77

David Matlack authored Jan 19, 2022

The new_spte local variable is unnecessary. Deleting it can save a line
of code and simplify the remaining lines a bit.

No functional change intended.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20220119230739.2234394-10-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

77c23c77