- 09 Nov, 2022 40 commits
-
-
Peter Xu authored
Add a new "interruptible" flag showing that the caller is willing to be interrupted by signals during the __gfn_to_pfn_memslot() request. Wire it up with a FOLL_INTERRUPTIBLE flag that we've just introduced. This prepares KVM to be able to respond to SIGUSR1 (for QEMU that's the SIGIPI) even during e.g. handling an userfaultfd page fault. No functional change intended. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221011195809.557016-4-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Peter Xu authored
Add a new pfn error to show that we've got a pending signal to handle during hva_to_pfn_slow() procedure (of -EINTR retval). Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221011195809.557016-3-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Peter Xu authored
We have had FAULT_FLAG_INTERRUPTIBLE but it was never applied to GUPs. One issue with it is that not all GUP paths are able to handle signal delivers besides SIGKILL. That's not ideal for the GUP users who are actually able to handle these cases, like KVM. KVM uses GUP extensively on faulting guest pages, during which we've got existing infrastructures to retry a page fault at a later time. Allowing the GUP to be interrupted by generic signals can make KVM related threads to be more responsive. For examples: (1) SIGUSR1: which QEMU/KVM uses to deliver an inter-process IPI, e.g. when the admin issues a vm_stop QMP command, SIGUSR1 can be generated to kick the vcpus out of kernel context immediately, (2) SIGINT: which can be used with interactive hypervisor users to stop a virtual machine with Ctrl-C without any delays/hangs, (3) SIGTRAP: which grants GDB capability even during page faults that are stuck for a long time. Normally hypervisor will be able to receive these signals properly, but not if we're stuck in a GUP for a long time for whatever reason. It happens easily with a stucked postcopy migration when e.g. a network temp failure happens, then some vcpu threads can hang death waiting for the pages. With the new FOLL_INTERRUPTIBLE, we can allow GUP users like KVM to selectively enable the ability to trap these signals. Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20221011195809.557016-2-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Maxim Levitsky authored
When #SMI is asserted, the CPU can be in interrupt shadow due to sti or mov ss. It is not mandatory in Intel/AMD prm to have the #SMI blocked during the shadow, and on top of that, since neither SVM nor VMX has true support for SMI window, waiting for one instruction would mean single stepping the guest. Instead, allow #SMI in this case, but both reset the interrupt window and stash its value in SMRAM to restore it on exit from SMM. This fixes rare failures seen mostly on windows guests on VMX, when #SMI falls on the sti instruction which mainfest in VM entry failure due to EFLAGS.IF not being set, but STI interrupt window still being set in the VMCS. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20221025124741.228045-24-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Maxim Levitsky authored
When the guest CPUID doesn't have support for long mode, 32 bit SMRAM layout is used and it has no support for preserving EFER and/or SVM state. Note that this isn't relevant to running 32 bit guests on VM which is long mode capable - such VM can still run 32 bit guests in compatibility mode. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20221025124741.228045-23-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Maxim Levitsky authored
Use SMM structs in the SVM code as well, which removes the last user of put_smstate/GET_SMSTATE so remove these macros as well. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20221025124741.228045-22-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Maxim Levitsky authored
if kvm_vcpu_map returns non zero value, error path should be triggered regardless of the exact returned error value. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20221025124741.228045-21-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Maxim Levitsky authored
Use kvm_smram_state_64 struct to save/restore the 64 bit SMM state (used when X86_FEATURE_LM is present in the guest CPUID, regardless of 32-bitness of the guest). Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20221025124741.228045-20-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Maxim Levitsky authored
Use kvm_smram_state_32 struct to save/restore 32 bit SMM state (used when X86_FEATURE_LM is not present in the guest CPUID). Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20221025124741.228045-19-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Maxim Levitsky authored
Use kvm_smram union instad of raw arrays in the common smm code. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20221025124741.228045-18-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Maxim Levitsky authored
Add structs that will be used to define and read/write the KVM's SMRAM layout, instead of reading/writing to raw offsets. Also document the differences between KVM's SMRAM layout and SMRAM layout that is used by real Intel/AMD cpus. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20221025124741.228045-17-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Maxim Levitsky authored
In the rare case of the failure on SMM entry, the KVM should at least terminate the VM instead of going south. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20221025124741.228045-16-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
The hidden processor flags HF_SMM_MASK and HF_SMM_INSIDE_NMI_MASK are not needed if CONFIG_KVM_SMM is turned off. Remove the definitions altogether and the code that uses them. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
This allows making some fields optional, as will be the case soon for SMM-related data. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
This ensures that all the relevant code is compiled out, in fact the process_smi stub can be removed too. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220929172016.319443-9-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
If CONFIG_KVM_SMM is not defined HF_SMM_MASK will always be zero, and we can spare userspace the hassle of setting up the SMRAM address space simply by reporting that only one address space is supported. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220929172016.319443-8-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
Vendor-specific code that deals with SMI injection and saving/restoring SMM state is not needed if CONFIG_KVM_SMM is disabled, so remove the four callbacks smi_allowed, enter_smm, leave_smm and enable_smi_window. The users in svm/nested.c and x86.c also have to be compiled out; the amount of #ifdef'ed code is small and it's not worth moving it to smm.c. enter_smm is now used only within #ifdef CONFIG_KVM_SMM, and the stub can therefore be removed. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220929172016.319443-7-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
Some users of KVM implement the UEFI variable store through a paravirtual device that does not require the "SMM lockbox" component of edk2; allow them to compile out system management mode, which is not a full implementation especially in how it interacts with nested virtualization. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220929172016.319443-6-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
Now that RSM is implemented in a single emulator callback, there is no point in going through other callbacks for the sake of modifying processor state. Just invoke KVM's own internal functions directly, and remove the callbacks that were only used by em_rsm; the only substantial difference is in the handling of the segment registers and descriptor cache, which have to be parsed into a struct kvm_segment instead of a struct desc_struct. This also fixes a bug where emulator_set_segment was shifting the limit left by 12 if the G bit is set, but the limit had not been shifted right upon entry to SMM. The emulator context is still used to restore EIP and the general purpose registers. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220929172016.319443-5-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
Some users of KVM implement the UEFI variable store through a paravirtual device that does not require the "SMM lockbox" component of edk2, and would like to compile out system management mode. In preparation for that, move the SMM exit code out of emulate.c and into a new file. The code is still written as a series of invocations of the emulator callbacks, but the two exiting_smm and leave_smm callbacks are merged into one, and all the code from em_rsm is now part of the callback. This removes all knowledge of the format of the SMM save state area from the emulator. Further patches will clean up the code and invoke KVM's own functions to access control registers, descriptor caches, etc. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220929172016.319443-4-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
Some users of KVM implement the UEFI variable store through a paravirtual device that does not require the "SMM lockbox" component of edk2, and would like to compile out system management mode. In preparation for that, move the SMM entry code out of x86.c and into a new file. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220929172016.319443-3-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
Create a new header and source with code related to system management mode emulation. Entry and exit will move there too; for now, opportunistically rename put_smstate to PUT_SMSTATE while moving it to smm.h, and adjust the SMM state saving code. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220929172016.319443-2-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Carlos Bilbao authored
Rename reserved fields on all structs in arch/x86/include/asm/svm.h following their offset within the structs. Include compile time checks for this in the same place where other BUILD_BUG_ON for the structs are. This also solves that fields of struct sev_es_save_area are named by their order of appearance, but right now they jump from reserved_5 to reserved_7. Link: https://lkml.org/lkml/2022/10/22/376Signed-off-by: Carlos Bilbao <carlos.bilbao@amd.com> Message-Id: <20221024164448.203351-1-carlos.bilbao@amd.com> [Use ASSERT_STRUCT_OFFSET + fix a couple wrong offsets. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Maxim Levitsky authored
ASSERT_STRUCT_OFFSET allows to assert during the build of the kernel that a field in a struct have an expected offset. KVM used to have such macro, but there is almost nothing KVM specific in it so move it to build_bug.h, so that it can be used in other places in KVM. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20221025124741.228045-10-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Rafael Mendonca authored
Presumably, this was introduced due to a conflict resolution with commit ef68017e ("x86/kvm: Handle async page faults directly through do_page_fault()"), given that the last posted version [1] of the blamed commit was not based on the aforementioned commit. [1] https://lore.kernel.org/kvm/20200525144125.143875-9-vkuznets@redhat.com/ Fixes: b1d40575 ("KVM: x86: Switch KVM guest to using interrupts for page ready APF delivery") Signed-off-by: Rafael Mendonca <rafaelmendsr@gmail.com> Message-Id: <20221021020113.922027-1-rafaelmendsr@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Jim Mattson authored
Intel and AMD have separate CPUID bits for each SPEC_CTRL bit. In the case of every bit other than PFSD, the Intel CPUID bit has no vendor name qualifier, but the AMD CPUID bit does. For consistency, rename KVM_X86_FEATURE_PSFD to KVM_X86_FEATURE_AMD_PSFD. No functional change intended. Signed-off-by: Jim Mattson <jmattson@google.com> Cc: Babu Moger <Babu.Moger@amd.com> Message-Id: <20220830225210.2381310-1-jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Miaohe Lin authored
Use helper macro SPTE_ENT_PER_PAGE to get the number of spte entries per page. Minor readability improvement. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220913085452.25561-1-linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Miaohe Lin authored
Fix some typos in comments. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220913091725.35953-1-linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Miaohe Lin authored
There's no caller. Remove it. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220913090537.25195-1-linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Use kvm_caps.supported_perf_cap directly instead of bouncing through kvm_get_msr_feature() when checking the incoming value for writes to PERF_CAPABILITIES. Note, kvm_get_msr_feature() is guaranteed to succeed when getting PERF_CAPABILITIES, i.e. dropping that check is a nop. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221006000314.73240-9-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Handle PERF_CAPABILITIES directly in kvm_get_msr_feature() now that the supported value is available in kvm_caps. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221006000314.73240-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Initialize vcpu->arch.perf_capabilities in x86's kvm_arch_vcpu_create() instead of deferring initialization to vendor code. For better or worse, common x86 handles reads and writes to the MSR, and so common x86 should also handle initializing the MSR. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221006000314.73240-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Track KVM's supported PERF_CAPABILITIES in kvm_caps instead of computing the supported capabilities on the fly every time. Using kvm_caps will also allow for future cleanups as the kvm_caps values can be used directly in common x86 code. Signed-off-by: Sean Christopherson <seanjc@google.com> Acked-by: Like Xu <likexu@tencent.com> Message-Id: <20221006000314.73240-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Drop the return value from x86_perf_get_lbr() and have the stub zero out the @lbr structure instead of returning -1 to indicate "no LBR support". KVM doesn't actually check the return value, and instead subtly relies on zeroing the number of LBRs in intel_pmu_init(). Formalize "nr=0 means unsupported" so that KVM doesn't need to add a pointless check on the return value to fix KVM's benign bug. Note, the stub is necessary even though KVM x86 selects PERF_EVENTS and the caller exists only when CONFIG_KVM_INTEL=y. Despite the name, KVM_INTEL doesn't strictly require CPU_SUP_INTEL, it can be built with any of INTEL || CENTAUR || ZHAOXIN CPUs. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221006000314.73240-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
Merge tag 'kvm-s390-master-6.1-1' of https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD A PCI allocation fix and a PV clock fix.
-
Like Xu authored
The AMD PerfMonV2 specification allows for a maximum of 16 GP counters, but currently only 6 pairs of MSRs are accepted by KVM. While AMD64_NUM_COUNTERS_CORE is already equal to 6, increasing without adjusting msrs_to_save_all[] could result in out-of-bounds accesses. Therefore introduce a macro (named KVM_AMD_PMC_MAX_GENERIC) to refer to the number of counters supported by KVM. Signed-off-by: Like Xu <likexu@tencent.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <20220919091008.60695-3-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Like Xu authored
The Intel Architectural IA32_PMCx MSRs addresses range allows for a maximum of 8 GP counters, and KVM cannot address any more. Introduce a local macro (named KVM_INTEL_PMC_MAX_GENERIC) and use it consistently to refer to the number of counters supported by KVM, thus avoiding possible out-of-bound accesses. Suggested-by: Jim Mattson <jmattson@google.com> Signed-off-by: Like Xu <likexu@tencent.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <20220919091008.60695-2-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Like Xu authored
The SDM lists an architectural MSR IA32_CORE_CAPABILITIES (0xCF) that limits the theoretical maximum value of the Intel GP PMC MSRs allocated at 0xC1 to 14; likewise the Intel April 2022 SDM adds IA32_OVERCLOCKING_STATUS at 0x195 which limits the number of event selection MSRs to 15 (0x186-0x194). Limiting the maximum number of counters to 14 or 18 based on the currently allocated MSRs is clearly fragile, and it seems likely that Intel will even place PMCs 8-15 at a completely different range of MSR indices. So stop at the maximum number of GP PMCs supported today on Intel processors. There are some machines, like Intel P4 with non Architectural PMU, that may indeed have 18 counters, but those counters are in a completely different MSR address range and are not supported by KVM. Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: stable@vger.kernel.org Fixes: cf05a67b ("KVM: x86: omit "impossible" pmu MSRs from MSR list") Suggested-by: Jim Mattson <jmattson@google.com> Signed-off-by: Like Xu <likexu@tencent.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <20220919091008.60695-1-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Peter Gonda authored
Explicitly print the VMSA dump at KERN_DEBUG log level, KERN_CONT uses KERNEL_DEFAULT if the previous log line has a newline, i.e. if there's nothing to continuing, and as a result the VMSA gets dumped when it shouldn't. The KERN_CONT documentation says it defaults back to KERNL_DEFAULT if the previous log line has a newline. So switch from KERN_CONT to print_hex_dump_debug(). Jarkko pointed this out in reference to the original patch. See: https://lore.kernel.org/all/YuPMeWX4uuR1Tz3M@kernel.org/ print_hex_dump(KERN_DEBUG, ...) was pointed out there, but print_hex_dump_debug() should similar. Fixes: 6fac42f1 ("KVM: SVM: Dump Virtual Machine Save Area (VMSA) to klog") Signed-off-by: Peter Gonda <pgonda@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Harald Hoyer <harald@profian.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: x86@kernel.org Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org Message-Id: <20221104142220.469452-1-pgonda@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Rong Tao authored
Update EXIT_REASONS from source, including VMX_EXIT_REASONS, SVM_EXIT_REASONS, AARCH64_EXIT_REASONS, USERSPACE_EXIT_REASONS. Signed-off-by: Rong Tao <rongtao@cestc.cn> Message-Id: <tencent_00082C8BFA925A65E11570F417F1CD404505@qq.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-