Commits · bdedff263132c862924f5cad96f0e82eeeb4e2e6 · Kirill Smelkov / linux

23 Mar, 2023 2 commits

KVM: x86: Route pending NMIs from userspace through process_nmi() · bdedff26

Sean Christopherson authored Mar 22, 2023

Use the asynchronous NMI queue to handle pending NMIs coming in from
userspace during KVM_SET_VCPU_EVENTS so that all of KVM's logic for
handling multiple NMIs goes through process_nmi().  This will simplify
supporting SVM's upcoming "virtual NMI" functionality, which will need
changes KVM manages pending NMIs.
Signed-off-by: Sean Christopherson <seanjc@google.com>

bdedff26

KVM: SVM: Add definitions for new bits in VMCB::int_ctrl related to vNMI · 1c4522ab

Santosh Shukla authored Feb 27, 2023

Add defines for three new bits in VMVC::int_ctrl that are part of SVM's
Virtual NMI (vNMI) support:

  V_NMI_PENDING_MASK(11)  - Virtual NMI is pending
  V_NMI_BLOCKING_MASK(12) - Virtual NMI is masked
  V_NMI_ENABLE_MASK(26)   - Enable NMI virtualization

To "inject" an NMI, the hypervisor (KVM) sets V_NMI_PENDING.  When the
CPU services the pending vNMI, hardware clears V_NMI_PENDING and sets
V_NMI_BLOCKING, e.g. to indicate that the vCPU is handling an NMI.
Hardware clears V_NMI_BLOCKING upon successful execution of IRET, or if a
VM-Exit occurs while delivering the virtual NMI.
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
Link: https://lore.kernel.org/r/20230227084016.3368-10-santosh.shukla@amd.com
[sean: massage changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>

1c4522ab

22 Mar, 2023 8 commits

x86/cpufeatures: Redefine synthetic virtual NMI bit as AMD's "real" vNMI · 3763bf58

Sean Christopherson authored Mar 22, 2023

The existing X86_FEATURE_VNMI is a synthetic feature flag that exists
purely to maintain /proc/cpuinfo's ABI, the "real" Intel vNMI feature flag
is tracked as VMX_FEATURE_VIRTUAL_NMIS, as the feature is enumerated
through VMX MSRs, not CPUID.

AMD is also gaining virtual NMI support, but in true VMX vs. SVM form,
enumerates support through CPUID, i.e. wants to add real feature flag for
vNMI.

Redefine the syntheic X86_FEATURE_VNMI to AMD's real CPUID bit to avoid
having both X86_FEATURE_VNMI and e.g. X86_FEATURE_AMD_VNMI.
Signed-off-by: Sean Christopherson <seanjc@google.com>

3763bf58

KVM: x86: Save/restore all NMIs when multiple NMIs are pending · ab2ee212

Sean Christopherson authored Feb 27, 2023

Save all pending NMIs in KVM_GET_VCPU_EVENTS, and queue KVM_REQ_NMI if one
or more NMIs are pending after KVM_SET_VCPU_EVENTS in order to re-evaluate
pending NMIs with respect to NMI blocking.

KVM allows multiple NMIs to be pending in order to faithfully emulate bare
metal handling of simultaneous NMIs (on bare metal, truly simultaneous
NMIs are impossible, i.e. one will always arrive first and be consumed).
Support for simultaneous NMIs botched the save/restore though. KVM only
saves one pending NMI, but allows userspace to restore 255 pending NMIs
as kvm_vcpu_events.nmi.pending is a u8, and KVM's internal state is stored
in an unsigned int.

Fixes: 7460fb4a ("KVM: Fix simultaneous NMIs")
Signed-off-by: Santosh Shukla <Santosh.Shukla@amd.com>
Link: https://lore.kernel.org/r/20230227084016.3368-8-santosh.shukla@amd.comSigned-off-by: Sean Christopherson <seanjc@google.com>

ab2ee212

KVM: x86: Tweak the code and comment related to handling concurrent NMIs · 400fee8c

Sean Christopherson authored Feb 27, 2023

Tweak the code and comment that deals with concurrent NMIs to explicitly
call out that x86 allows exactly one pending NMI, but that KVM needs to
temporarily allow two pending NMIs in order to workaround the fact that
the target vCPU cannot immediately recognize an incoming NMI, unlike bare
metal.

No functional change intended.
Signed-off-by: Santosh Shukla <Santosh.Shukla@amd.com>
Link: https://lore.kernel.org/r/20230227084016.3368-7-santosh.shukla@amd.comSigned-off-by: Sean Christopherson <seanjc@google.com>

400fee8c

KVM: x86: Raise an event request when processing NMIs if an NMI is pending · 2cb93173

Sean Christopherson authored Feb 27, 2023

Don't raise KVM_REQ_EVENT if no NMIs are pending at the end of
process_nmi(). Finishing process_nmi() without a pending NMI will become
much more likely when KVM gains support for AMD's vNMI, which allows
pending vNMIs in hardware, i.e. doesn't require explicit injection.
Signed-off-by: Santosh Shukla <Santosh.Shukla@amd.com>
Link: https://lore.kernel.org/r/20230227084016.3368-6-santosh.shukla@amd.comSigned-off-by: Sean Christopherson <seanjc@google.com>

2cb93173

KVM: SVM: add wrappers to enable/disable IRET interception · 772f254d

Maxim Levitsky authored Feb 27, 2023

SEV-ES guests don't use IRET interception for the detection of
an end of a NMI.

Therefore it makes sense to create a wrapper to avoid repeating
the check for the SEV-ES.

No functional change is intended.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
[Renamed iret intercept API of style svm_{clr,set}_iret_intercept()]
Signed-off-by: Santosh Shukla <Santosh.Shukla@amd.com>
Link: https://lore.kernel.org/r/20230227084016.3368-5-santosh.shukla@amd.comSigned-off-by: Sean Christopherson <seanjc@google.com>

772f254d

KVM: nSVM: Raise event on nested VM exit if L1 doesn't intercept IRQs · 5d1ec456

Maxim Levitsky authored Feb 27, 2023

If L1 doesn't intercept interrupts, then KVM will use vmcb02's V_IRQ
to detect an interrupt window for L1 IRQs.  On a subsequent nested
VM-Exit, KVM might need to copy the current V_IRQ from vmcb02 to vmcb01
to continue waiting for an interrupt window, i.e. if there is still a
pending IRQ for L1.

Raise KVM_REQ_EVENT on nested exit if L1 isn't intercepting IRQs to ensure
that KVM will re-enable interrupt window detection if needed.

Note that this is a theoretical bug because KVM already raises
KVM_REQ_EVENT on each nested VM exit, because the nested VM exit resets
RFLAGS and kvm_set_rflags() raises the KVM_REQ_EVENT unconditionally.

Explicitly raise KVM_REQ_EVENT for the interrupt window case to avoid
having an unnecessary dependency on kvm_set_rflags(), and to document
the scenario.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
[santosh: reworded description as per Sean's v2 comment]
Signed-off-by: Santosh Shukla <Santosh.Shukla@amd.com>
Link: https://lore.kernel.org/r/20230227084016.3368-4-santosh.shukla@amd.com
[sean: further massage changelog and comment]
Signed-off-by: Sean Christopherson <seanjc@google.com>

5d1ec456

KVM: nSVM: Disable intercept of VINTR if saved L1 host RFLAGS.IF is 0 · 7334ede4

Santosh Shukla authored Feb 27, 2023

Disable intercept of virtual interrupts (used to detect interrupt windows)
if the saved host (L1) RFLAGS.IF is '0', as the effective RFLAGS.IF for L1
interrupts will never be set while L2 is running (L2's RFLAGS.IF doesn't
affect L1 IRQs when virtual interrupts are enabled).
Suggested-by: Sean Christopherson <seanjc@google.com>
Link: https://lkml.kernel.org/r/Y9hybI65So5X2LFg%40google.comSigned-off-by: Santosh Shukla <Santosh.Shukla@amd.com>
Link: https://lore.kernel.org/r/20230227084016.3368-3-santosh.shukla@amd.comSigned-off-by: Sean Christopherson <seanjc@google.com>

7334ede4

KVM: nSVM: Don't sync vmcb02 V_IRQ back to vmcb12 if KVM (L0) is intercepting VINTR · 5faaffab

Santosh Shukla authored Feb 27, 2023

Don't sync vmcb02 V_IRQ back to vmcb12 if KVM (L0) is intercepting
virtual interrupts in order to request an interrupt window, as KVM
has usurped vmcb02's int_ctl. If an interrupt window opens before
the next VM-Exit, svm_clear_vintr() will restore vmcb12's int_ctl.
If no window opens, V_IRQ will be correctly preserved in vmcb12's
int_ctl (because it was never recognized while L2 was running).
Suggested-by: Sean Christopherson <seanjc@google.com>
Link: https://lkml.kernel.org/r/Y9hybI65So5X2LFg%40google.comSigned-off-by: Santosh Shukla <Santosh.Shukla@amd.com>
Link: https://lore.kernel.org/r/20230227084016.3368-2-santosh.shukla@amd.comSigned-off-by: Sean Christopherson <seanjc@google.com>

5faaffab

16 Mar, 2023 9 commits

KVM: Change return type of kvm_arch_vm_ioctl() to "int" · d8708b80

Thomas Huth authored Feb 08, 2023

All kvm_arch_vm_ioctl() implementations now only deal with "int"
types as return values, so we can change the return type of these
functions to use "int" instead of "long".
Signed-off-by: Thomas Huth <thuth@redhat.com>
Acked-by: Anup Patel <anup@brainfault.org>
Message-Id: <20230208140105.655814-7-thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

d8708b80

KVM: Standardize on "int" return types instead of "long" in kvm_main.c · f15ba52b

Thomas Huth authored Feb 08, 2023

KVM functions use "long" return values for functions that are wired up
to "struct file_operations", but otherwise use "int" return values for
functions that can return 0/-errno in order to avoid unintentional
divergences between 32-bit and 64-bit kernels.
Some code still uses "long" in unnecessary spots, though, which can
cause a little bit of confusion and unnecessary size casts. Let's
change these spots to use "int" types, too.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20230208140105.655814-6-thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

f15ba52b

KVM: arm64: Limit length in kvm_vm_ioctl_mte_copy_tags() to INT_MAX · 2def950c

Thomas Huth authored Feb 08, 2023

In case of success, this function returns the amount of handled bytes.
However, this does not work for large values: The function is called
from kvm_arch_vm_ioctl() (which still returns a long), which in turn
is called from kvm_vm_ioctl() in virt/kvm/kvm_main.c. And that function
stores the return value in an "int r" variable. So the upper 32-bits
of the "long" return value are lost there.

KVM ioctl functions should only return "int" values, so let's limit
the amount of bytes that can be requested here to INT_MAX to avoid
the problem with the truncated return value. We can then also change
the return type of the function to "int" to make it clearer that it
is not possible to return a "long" here.

Fixes: f0376edb ("KVM: arm64: Add ioctl to fetch/store tags in a guest")
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Message-Id: <20230208140105.655814-5-thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

2def950c

KVM: x86: Remove the KVM_GET_NR_MMU_PAGES ioctl · c5edd753

Thomas Huth authored Feb 08, 2023

The KVM_GET_NR_MMU_PAGES ioctl is quite questionable on 64-bit hosts
since it fails to return the full 64 bits of the value that can be
set with the corresponding KVM_SET_NR_MMU_PAGES call. Its "long" return
value is truncated into an "int" in the kvm_arch_vm_ioctl() function.

Since this ioctl also never has been used by userspace applications
(QEMU, Google's internal VMM, kvmtool and CrosVM have been checked),
it's likely the best if we remove this badly designed ioctl before
anybody really tries to use it.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230208140105.655814-4-thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

c5edd753

KVM: s390: Use "int" as return type for kvm_s390_get/set_skeys() · 71fb165e

Thomas Huth authored Feb 08, 2023

These two functions only return normal integers, so it does not
make sense to declare the return type as "long" here.
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20230208140105.655814-3-thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

71fb165e

KVM: PPC: Standardize on "int" return types in the powerpc KVM code · 67c48662

Thomas Huth authored Feb 08, 2023

Most functions that are related to kvm_arch_vm_ioctl() already use
"int" as return type to pass error values back to the caller. Some
outlier functions use "long" instead for no good reason (they do not
really require long values here). Let's standardize on "int" here to
avoid casting the values back and forth between the two types.
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20230208140105.655814-2-thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

67c48662

kvm: x86: Advertise FLUSH_L1D to user space · 45cf86f2

Emanuele Giuseppe Esposito authored Feb 01, 2023

FLUSH_L1D was already added in 11e34e64, but the feature is not
visible to userspace yet.

The bit definition:
CPUID.(EAX=7,ECX=0):EDX[bit 28]

If the feature is supported by the host, kvm should support it too so
that userspace can choose whether to expose it to the guest or not.
One disadvantage of not exposing it is that the guest will report
a non existing vulnerability in
/sys/devices/system/cpu/vulnerabilities/mmio_stale_data
because the mitigation is present only if the guest supports
(FLUSH_L1D and MD_CLEAR) or FB_CLEAR.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20230201132905.549148-4-eesposit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

45cf86f2

kvm: svm: Add IA32_FLUSH_CMD guest support · 723d5fb0

Emanuele Giuseppe Esposito authored Feb 01, 2023

Expose IA32_FLUSH_CMD to the guest if the guest CPUID enumerates
support for this MSR. As with IA32_PRED_CMD, permission for
unintercepted writes to this MSR will be granted to the guest after
the first non-zero write.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20230201132905.549148-3-eesposit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

723d5fb0

kvm: vmx: Add IA32_FLUSH_CMD guest support · a807b78a

Emanuele Giuseppe Esposito authored Feb 01, 2023

Expose IA32_FLUSH_CMD to the guest if the guest CPUID enumerates
support for this MSR. As with IA32_PRED_CMD, permission for
unintercepted writes to this MSR will be granted to the guest after
the first non-zero write.
Co-developed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20230201132905.549148-2-eesposit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

a807b78a

14 Mar, 2023 21 commits

KVM: VMX: Rename "KVM is using eVMCS" static key to match its wrapper · fbc722aa

Sean Christopherson authored Feb 11, 2023

Rename enable_evmcs to __kvm_is_using_evmcs to match its wrapper, and to
avoid confusion with enabling eVMCS for nested virtualization, i.e. have
"enable eVMCS" be reserved for "enable eVMCS support for L1".

No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230211003534.564198-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

fbc722aa

KVM: VMX: Stub out enable_evmcs static key for CONFIG_HYPERV=n · 19f10315

Sean Christopherson authored Feb 11, 2023

Wrap enable_evmcs in a helper and stub it out when CONFIG_HYPERV=n in
order to eliminate the static branch nop placeholders.  clang-14 is clever
enough to elide the nop, but gcc-12 is not.  Stubbing out the key reduces
the size of kvm-intel.ko by ~7.5% (200KiB) when compiled with gcc-12
(there are a _lot_ of VMCS accesses throughout KVM).
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230211003534.564198-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

19f10315

KVM: nVMX: Move EVMCS1_SUPPORT_* macros to hyperv.c · 68ac4221

Sean Christopherson authored Feb 11, 2023

Move the macros that define the set of VMCS controls that are supported
by eVMCS1 from hyperv.h to hyperv.c, i.e. make them "private".   The
macros should never be consumed directly by KVM at-large since the "final"
set of supported controls depends on guest CPUID.

No functional change intended.
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230211003534.564198-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

68ac4221

KVM: x86/mmu: Remove FNAME(is_self_change_mapping) · 9a967700

Lai Jiangshan authored Feb 02, 2023

Drop FNAME(is_self_change_mapping) and instead rely on
kvm_mmu_hugepage_adjust() to adjust the hugepage accordingly. Prior to
commit 4cd071d1 ("KVM: x86/mmu: Move calls to thp_adjust() down a
level"), the hugepage adjustment was done before allocating new shadow
pages, i.e. failed to restrict the hugepage sizes if a new shadow page
resulted in account_shadowed() changing the disallowed hugepage tracking.

Removing FNAME(is_self_change_mapping) fixes a bug reported by Huang Hang
where KVM unnecessarily forces a 4KiB page. FNAME(is_self_change_mapping)
has a defect in that it blindly disables _all_ hugepage mappings rather
than trying to reduce the size of the hugepage. If the guest is writing
to a 1GiB page and the 1GiB is self-referential but a 2MiB page is not,
then KVM can and should create a 2MiB mapping.

Add a comment above the call to kvm_mmu_hugepage_adjust() to call out the
new dependency on adjusting the hugepage size after walking indirect PTEs.
Reported-by: Huang Hang <hhuang@linux.alibaba.com>
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Link: https://lore.kernel.org/r/20221213125538.81209-1-jiangshanlai@gmail.com
[sean: rework changelog after separating out the emulator change]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230202182817.407394-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

9a967700

KVM: x86/mmu: Detect write #PF to shadow pages during FNAME(fetch) walk · 39fda5d8

Lai Jiangshan authored Feb 02, 2023

Move the detection of write #PF to shadow pages, i.e. a fault on a write
to a page table that is being shadowed by KVM that is used to translate
the write itself, from FNAME(is_self_change_mapping) to FNAME(fetch).
There is no need to detect the self-referential write before
kvm_faultin_pfn() as KVM does not consume EMULTYPE_WRITE_PF_TO_SP for
accesses that resolve to "error or no-slot" pfns, i.e. KVM doesn't allow
retrying MMIO accesses or writes to read-only memslots.

Detecting the EMULTYPE_WRITE_PF_TO_SP scenario in FNAME(fetch) will allow
dropping FNAME(is_self_change_mapping) entirely, as the hugepage
interaction can be deferred to kvm_mmu_hugepage_adjust().

Cc: Huang Hang <hhuang@linux.alibaba.com>
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Link: https://lore.kernel.org/r/20221213125538.81209-1-jiangshanlai@gmail.com
[sean: split to separate patch, write changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230202182817.407394-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

39fda5d8

KVM: x86/mmu: Use EMULTYPE flag to track write #PFs to shadow pages · 258d985f

Sean Christopherson authored Feb 02, 2023

Use a new EMULTYPE flag, EMULTYPE_WRITE_PF_TO_SP, to track page faults
on self-changing writes to shadowed page tables instead of propagating
that information to the emulator via a semi-persistent vCPU flag. Using
a flag in "struct kvm_vcpu_arch" is confusing, especially as implemented,
as it's not at all obvious that clearing the flag only when emulation
actually occurs is correct.

E.g. if KVM sets the flag and then retries the fault without ever getting
to the emulator, the flag will be left set for future calls into the
emulator. But because the flag is consumed if and only if both
EMULTYPE_PF and EMULTYPE_ALLOW_RETRY_PF are set, and because
EMULTYPE_ALLOW_RETRY_PF is deliberately not set for direct MMUs, emulated
MMIO, or while L2 is active, KVM avoids false positives on a stale flag
since FNAME(page_fault) is guaranteed to be run and refresh the flag
before it's ultimately consumed by the tail end of reexecute_instruction().
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230202182817.407394-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

258d985f

KVM: selftests: Sync KVM exit reasons in selftests · f3e70741

Vipin Sharma authored Feb 03, 2023

Add missing KVM_EXIT_* reasons in KVM selftests from
include/uapi/linux/kvm.h
Signed-off-by: Vipin Sharma <vipinsh@google.com>
Message-Id: <20230204014547.583711-5-vipinsh@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

f3e70741

KVM: selftests: Add macro to generate KVM exit reason strings · 1b3d660e

Sean Christopherson authored Feb 03, 2023

Add and use a macro to generate the KVM exit reason strings array
instead of relying on developers to correctly copy+paste+edit each
string.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230204014547.583711-4-vipinsh@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

1b3d660e

KVM: selftests: Print expected and actual exit reason in KVM exit reason assert · 6f974494

Vipin Sharma authored Feb 03, 2023

Print what KVM exit reason a test was expecting and what it actually
got int TEST_ASSERT_KVM_EXIT_REASON().
Signed-off-by: Vipin Sharma <vipinsh@google.com>
Message-Id: <20230204014547.583711-3-vipinsh@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

6f974494

KVM: selftests: Make vCPU exit reason test assertion common · c96f57b0

Vipin Sharma authored Feb 03, 2023

Make TEST_ASSERT_KVM_EXIT_REASON() macro and replace all exit reason
test assert statements with it.

No functional changes intended.
Signed-off-by: Vipin Sharma <vipinsh@google.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Message-Id: <20230204014547.583711-2-vipinsh@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

c96f57b0

KVM: selftests: Add EVTCHNOP_send slow path test to xen_shinfo_test · e6239a4e

David Woodhouse authored Feb 04, 2023

When kvm_xen_evtchn_send() takes the slow path because the shinfo GPC
needs to be revalidated, it used to violate the SRCU vs. kvm->lock
locking rules and potentially cause a deadlock.

Now that lockdep is learning to catch such things, make sure that code
path is exercised by the selftest.

Link: https://lore.kernel.org/all/20230113124606.10221-2-dwmw2@infradead.orgSigned-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230204024151.1373296-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

e6239a4e

KVM: selftests: Use enum for test numbers in xen_shinfo_test · e7062a98

David Woodhouse authored Feb 04, 2023

The xen_shinfo_test started off with very few iterations, and the numbers
we used in GUEST_SYNC() were precisely mapped to the RUNSTATE_xxx values
anyway to start with.

It has since grown quite a few more tests, and it's kind of awful to be
handling them all as bare numbers. Especially when I want to add a new
test in the middle. Define an enum for the test stages, and use it both
in the guest code and the host switch statement.

No functional change, if I can count to 24.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230204024151.1373296-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

e7062a98

KVM: selftests: Add helpers to make Xen-style VMCALL/VMMCALL hypercalls · c0c76d99

Sean Christopherson authored Feb 04, 2023

Add wrappers to do hypercalls using VMCALL/VMMCALL and Xen's register ABI
(as opposed to full Xen-style hypercalls through a hypervisor provided
page).  Using the common helpers dedups a pile of code, and uses the
native hypercall instruction when running on AMD.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230204024151.1373296-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

c0c76d99

KVM: selftests: Move the guts of kvm_hypercall() to a separate macro · 4009e0bb

Sean Christopherson authored Feb 04, 2023

Extract the guts of kvm_hypercall() to a macro so that Xen hypercalls,
which have a different register ABI, can reuse the VMCALL vs. VMMCALL
logic.

No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230204024151.1373296-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

4009e0bb

KVM: SVM: WARN if GATag generation drops VM or vCPU ID information · c281794e

Sean Christopherson authored Feb 07, 2023

WARN if generating a GATag given a VM ID and vCPU ID doesn't yield the
same IDs when pulling the IDs back out of the tag. Don't bother adding
error handling to callers, this is very much a paranoid sanity check as
KVM fully controls the VM ID and is supposed to reject too-big vCPU IDs.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Tested-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Message-Id: <20230207002156.521736-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

c281794e

KVM: SVM: Modify AVIC GATag to support max number of 512 vCPUs · 59997159

Suravee Suthikulpanit authored Feb 07, 2023

Define AVIC_VCPU_ID_MASK based on AVIC_PHYSICAL_MAX_INDEX, i.e. the mask
that effectively controls the largest guest physical APIC ID supported by
x2AVIC, instead of hardcoding the number of bits to 8 (and the number of
VM bits to 24).

The AVIC GATag is programmed into the AMD IOMMU IRTE to provide a
reference back to KVM in case the IOMMU cannot inject an interrupt into a
non-running vCPU. In such a case, the IOMMU notifies software by creating
a GALog entry with the corresponded GATag, and KVM then uses the GATag to
find the correct VM+vCPU to kick. Dropping bit 8 from the GATag results
in kicking the wrong vCPU when targeting vCPUs with x2APIC ID > 255.

Fixes: 4d1d7942 ("KVM: SVM: Introduce logic to (de)activate x2AVIC mode")
Cc: stable@vger.kernel.org
Reported-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Tested-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Message-Id: <20230207002156.521736-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

59997159

KVM: SVM: Fix a benign off-by-one bug in AVIC physical table mask · 3ec7a1b2

Sean Christopherson authored Feb 07, 2023

Define the "physical table max index mask" as bits 8:0, not 9:0.  x2AVIC
currently supports a max of 512 entries, i.e. the max index is 511, and
the inputs to GENMASK_ULL() are inclusive.  The bug is benign as bit 9 is
reserved and never set by KVM, i.e. KVM is just clearing bits that are
guaranteed to be zero.

Note, as of this writing, APM "Rev. 3.39-October 2022" incorrectly states
that bits 11:8 are reserved in Table B-1. VMCB Layout, Control Area.  I.e.
that table wasn't updated when x2AVIC support was added.

Opportunistically fix the comment for the max AVIC ID to align with the
code, and clean up comment formatting too.

Fixes: 4d1d7942 ("KVM: SVM: Introduce logic to (de)activate x2AVIC mode")
Cc: stable@vger.kernel.org
Cc: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Tested-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Message-Id: <20230207002156.521736-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

3ec7a1b2

selftests: KVM: skip hugetlb tests if huge pages are not available · 3dc40cf8

Paolo Bonzini authored Mar 14, 2023

Right now, if KVM memory stress tests are run with hugetlb sources but hugetlb is
not available (either in the kernel or because /proc/sys/vm/nr_hugepages is 0)
the test will fail with a memory allocation error.

This makes it impossible to add tests that default to hugetlb-backed memory,
because on a machine with a default configuration they will fail. Therefore,
check HugePages_Total as well and, if zero, direct the user to enable hugepages
in procfs. Furthermore, return KSFT_SKIP whenever hugetlb is not available.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

3dc40cf8

KVM: VMX: Use tabs instead of spaces for indentation · 53293cb8

Rong Tao authored Dec 21, 2022

Code indentation should use tabs where possible and miss a '*'.
Signed-off-by: Rong Tao <rongtao@cestc.cn>
Message-Id: <tencent_A492CB3F9592578451154442830EA1B02C07@qq.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

53293cb8

KVM: VMX: Fix indentation coding style issue · 06e18547

Rong Tao authored Dec 21, 2022

Code indentation should use tabs where possible.
Signed-off-by: Rong Tao <rongtao@cestc.cn>
Message-Id: <tencent_31E6ACADCB6915E157CF5113C41803212107@qq.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

06e18547

KVM: nVMX: remove unnecessary #ifdef · 77900bff

Paolo Bonzini authored Mar 14, 2023

nested_vmx_check_controls() has already run by the time KVM checks host state,
so the "host address space size" exit control can only be set on x86-64 hosts.
Simplify the condition at the cost of adding some dead code to 32-bit kernels.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

77900bff