- 24 Jan, 2020 20 commits
-
-
Sean Christopherson authored
Remove the superfluous kvm_arch_vcpu_free() as it is no longer called from commmon KVM code. Note, kvm_arch_vcpu_destroy() *is* called from common code, i.e. choosing which function to whack is not completely arbitrary. Acked-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Remove the superfluous kvm_arch_vcpu_free() as it is no longer called from commmon KVM code. Note, kvm_arch_vcpu_destroy() *is* called from common code, i.e. choosing which function to whack is not completely arbitrary. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Remove the superfluous kvm_arch_vcpu_free() as it is no longer called from commmon KVM code. Note, kvm_arch_vcpu_destroy() *is* called from common code, i.e. choosing which function to whack is not completely arbitrary. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
For reasons unknown, MIPS configures the vCPU allocation cache but allocates vCPUs via kzalloc(). Allocate from the vCPU cache in preparation for moving vCPU allocation to common KVM code. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Move the kvm_cpu_{un}init() calls to common PPC code as an intermediate step towards removing kvm_cpu_{un}init() altogether. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Move the initialization of oldpir so that the call to kvm_vcpu_init() is at the top of kvmppc_core_vcpu_create_e500mc(). oldpir is only use when loading/putting a vCPU, which currently cannot be done until after kvm_arch_vcpu_create() completes. Reording the call to kvm_vcpu_init() paves the way for moving the invocation to common PPC code. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Call kvm_vcpu_init() in kvmppc_core_vcpu_create_pr() prior to allocating the book3s and shadow_vcpu objects in preparation of moving said call to common PPC code. Although kvm_vcpu_init() has an arch callback, the callback is empty for Book3S PR, i.e. barring unseen black magic, moving the allocation has no real functional impact. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Move allocation of all flavors of PPC vCPUs to common PPC code. All variants either allocate 'struct kvm_vcpu' directly, or require that the embedded 'struct kvm_vcpu' member be located at offset 0, i.e. guarantee that the allocation can be directly interpreted as a 'struct kvm_vcpu' object. Remove the message from the build-time assertion regarding placement of the struct, as compatibility with the arch usercopy region is no longer the sole dependent on 'struct kvm_vcpu' being at offset zero. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
In preparation for moving vcpu allocation to common PPC code, add an explicit, albeit redundant, build-time assert to ensure the vcpu member is located at offset 0. The assert is redundant in the sense that kvmppc_core_vcpu_create_e500() contains a functionally identical assert. The motiviation for adding the extra assert is to provide visual confirmation of the correctness of moving vcpu allocation to common code. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Move the kvm_cpu_{un}init() calls to common x86 code as an intermediate step to removing kvm_cpu_{un}init() altogether. Note, VMX'x alloc_apic_access_page() and init_rmode_identity_map() are per-VM allocations and are intentionally kept if vCPU creation fails. They are freed by kvm_arch_destroy_vm(). No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Allocate the pio_data page after creating the MMU and local APIC so that all direct memory allocations are grouped together. This allows setting the return value to -ENOMEM prior to starting the allocations instead of setting it in the fail path for every allocation. The pio_data page is only consumed when KVM_RUN is invoked, i.e. moving its allocation has no real functional impact. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
The allocation of FPU structs is identical across VMX and SVM, move it to common x86 code. Somewhat arbitrarily place the allocation so that it resides directly above the associated initialization via fx_init(), e.g. instead of retaining its position with respect to the overall vcpu creation flow. Although the names names kvm_arch_vcpu_create() and kvm_arch_vcpu_init() might suggest otherwise, x86 does not have a clean split between 'create' and 'init'. Allocating the struct immediately prior to the first use arguably improves readability *now*, and will yield even bigger improvements when kvm_arch_vcpu_init() is removed in a future patch. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Move allocation of VMX and SVM vcpus to common x86. Although the struct being allocated is technically a VMX/SVM struct, it can be interpreted directly as a 'struct kvm_vcpu' because of the pre-existing requirement that 'struct kvm_vcpu' be located at offset zero of the arch/vendor vcpu struct. Remove the message from the build-time assertions regarding placement of the struct, as compatibility with the arch usercopy region is no longer the sole dependent on 'struct kvm_vcpu' being at offset zero. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Capture the vcpu pointer in a local varaible and replace '&svm->vcpu' references with a direct reference to the pointer in anticipation of moving bits of the code to common x86 and passing the vcpu pointer into svm_create_vcpu(), i.e. eliminate unnecessary noise from future patches. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Capture the vcpu pointer in a local varaible and replace '&vmx->vcpu' references with a direct reference to the pointer in anticipation of moving bits of the code to common x86 and passing the vcpu pointer into vmx_create_vcpu(), i.e. eliminate unnecessary noise from future patches. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Do VPID allocation after calling the common kvm_vcpu_init() as a step towards doing vCPU allocation (via kmem_cache_zalloc()) and calling kvm_vcpu_init() back-to-back. Squishing allocation and initialization together will eventually allow the sequence to be moved to arch-agnostic creation code. Note, the VPID is not consumed until KVM_RUN, slightly delaying its allocation should have no real function impact. VPID allocation was arbitrarily placed in the original patch, commit 2384d2b3 ("KVM: VMX: Enable Virtual Processor Identification (VPID)"). Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Free the vCPU's wbinvd_dirty_mask if vCPU creation fails after kvm_arch_vcpu_init(), e.g. when installing the vCPU's file descriptor. Do the freeing by calling kvm_arch_vcpu_free() instead of open coding the freeing. This adds a likely superfluous, but ultimately harmless, call to kvmclock_reset(), which only clears vcpu->arch.pv_time_enabled. Using kvm_arch_vcpu_free() allows for additional cleanup in the future. Fixes: f5f48ee1 ("KVM: VMX: Execute WBINVD to keep data consistency with assigned devices") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Explicitly free the shared page if kvmppc_mmu_init() fails during kvmppc_core_vcpu_create(), as the page is freed only in kvmppc_core_vcpu_free(), which is not reached via kvm_vcpu_uninit(). Fixes: 96bc451a ("KVM: PPC: Introduce shared page") Cc: stable@vger.kernel.org Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Call kvm_vcpu_uninit() if vcore creation fails to avoid leaking any resources allocated by kvm_vcpu_init(), i.e. the vcpu->run page. Fixes: 371fefd6 ("KVM: PPC: Allow book3s_hv guests to use SMT processor modes") Cc: stable@vger.kernel.org Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
If the guest is configured to have SPEC_CTRL but the host does not (which is a nonsensical configuration but these are not explicitly forbidden) then a host-initiated MSR write can write vmx->spec_ctrl (respectively svm->spec_ctrl) and trigger a #GP when KVM tries to restore the host value of the MSR. Add a more comprehensive check for valid bits of SPEC_CTRL, covering host CPUID flags and, since we are at it and it is more correct that way, guest CPUID flags too. For AMD, remove the unnecessary is_guest_mode check around setting the MSR interception bitmap, so that the code looks the same as for Intel. Cc: Jim Mattson <jmattson@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
- 23 Jan, 2020 4 commits
-
-
Paolo Bonzini authored
The wrappers make it less clear that the position of the call to kvm_arch_async_page_present depends on the architecture, and that only one of the two call sites will actually be active. Remove them. Cc: Andy Lutomirski <luto@kernel.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
Even if it's read-only, it can still be written to by userspace. Let them know by adding it to KVM_GET_MSR_INDEX_LIST. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Gavin Shan authored
The filter name is fixed to "exit_reason" for some kvm_exit events, no matter what architect we have. Actually, the filter name ("exit_reason") is only applicable to x86, meaning it's broken on other architects including aarch64. This fixes the issue by providing various kvm_exit filter names, depending on architect we're on. Afterwards, the variable filter name is picked and applied through ioctl(fd, SET_FILTER). Reported-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Gavin Shan <gshan@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
The SPTE_MMIO_MASK overlaps with the bits used to track MMIO generation number. A high enough generation number would overwrite the SPTE_SPECIAL_MASK region and cause the MMIO SPTE to be misinterpreted. Likewise, setting bits 52 and 53 would also cause an incorrect generation number to be read from the PTE, though this was partially mitigated by the (useless if it weren't for the bug) removal of SPTE_SPECIAL_MASK from the spte in get_mmio_spte_generation. Drop that removal, and replace it with a compile-time assertion. Fixes: 6eeb4ef0 ("KVM: x86: assign two bits to track SPTE kinds") Reported-by: Ben Gardon <bgardon@google.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
- 21 Jan, 2020 16 commits
-
-
Milan Pandurov authored
We can store reference to kvm_stats_debugfs_item instead of copying its values to kvm_stat_data. This allows us to remove duplicated code and usage of temporary kvm_stat_data inside vm_stat_get et al. Signed-off-by: Milan Pandurov <milanpa@amazon.de> Reviewed-by: Alexander Graf <graf@amazon.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Remove the bogus 64-bit only condition from the check that disables MMIO spte optimization when the system supports the max PA, i.e. doesn't have any reserved PA bits. 32-bit KVM always uses PAE paging for the shadow MMU, and per Intel's SDM: PAE paging translates 32-bit linear addresses to 52-bit physical addresses. The kernel's restrictions on max physical addresses are limits on how much memory the kernel can reasonably use, not what physical addresses are supported by hardware. Fixes: ce88decf ("KVM: MMU: mmio page fault support") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Miaohe Lin authored
In case writing to vmread destination operand result in a #PF, vmread should not call nested_vmx_succeed() to set rflags to specify success. Similar to as done in VMPTRST (See handle_vmptrst()). Reviewed-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: stable@vger.kernel.org Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Rework the handling of nEPT's bad memtype/XWR checks to micro-optimize the checks as much as possible. Move the check to a separate helper, __is_bad_mt_xwr(), which allows the guest_rsvd_check usage in paging_tmpl.h to omit the check entirely for paging32/64 (bad_mt_xwr is always zero for non-nEPT) while retaining the bitwise-OR of the current code for the shadow_zero_check in walk_shadow_page_get_mmio_spte(). Add a comment for the bitwise-OR usage in the mmio spte walk to avoid future attempts to "fix" the code, which is what prompted this optimization in the first place[*]. Opportunistically remove the superfluous '!= 0' and parantheses, and use BIT_ULL() instead of open coding its equivalent. The net effect is that code generation is largely unchanged for walk_shadow_page_get_mmio_spte(), marginally better for ept_prefetch_invalid_gpte(), and significantly improved for paging32/64_prefetch_invalid_gpte(). Note, walk_shadow_page_get_mmio_spte() can't use a templated version of the memtype/XRW as it works on the host's shadow PTEs, e.g. checks that KVM hasn't borked its EPT tables. Even if it could be templated, the benefits of having a single implementation far outweight the few uops that would be saved for NPT or non-TDP paging, e.g. most compilers inline it all the way to up kvm_mmu_page_fault(). [*] https://lkml.kernel.org/r/20200108001859.25254-1-sean.j.christopherson@intel.com Cc: Jim Mattson <jmattson@google.com> Cc: David Laight <David.Laight@ACULAB.COM> Cc: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Move the !PRESENT and !ACCESSED checks in FNAME(prefetch_invalid_gpte) above the call to is_rsvd_bits_set(). For a well behaved guest, the !PRESENT and !ACCESSED are far more likely to evaluate true than the reserved bit checks, and they do not require additional memory accesses. Before: Dump of assembler code for function paging32_prefetch_invalid_gpte: 0x0000000000044240 <+0>: callq 0x44245 <paging32_prefetch_invalid_gpte+5> 0x0000000000044245 <+5>: mov %rcx,%rax 0x0000000000044248 <+8>: shr $0x7,%rax 0x000000000004424c <+12>: and $0x1,%eax 0x000000000004424f <+15>: lea 0x0(,%rax,4),%r8 0x0000000000044257 <+23>: add %r8,%rax 0x000000000004425a <+26>: mov %rcx,%r8 0x000000000004425d <+29>: and 0x120(%rsi,%rax,8),%r8 0x0000000000044265 <+37>: mov 0x170(%rsi),%rax 0x000000000004426c <+44>: shr %cl,%rax 0x000000000004426f <+47>: and $0x1,%eax 0x0000000000044272 <+50>: or %rax,%r8 0x0000000000044275 <+53>: jne 0x4427c <paging32_prefetch_invalid_gpte+60> 0x0000000000044277 <+55>: test $0x1,%cl 0x000000000004427a <+58>: jne 0x4428a <paging32_prefetch_invalid_gpte+74> 0x000000000004427c <+60>: mov %rdx,%rsi 0x000000000004427f <+63>: callq 0x44080 <drop_spte> 0x0000000000044284 <+68>: mov $0x1,%eax 0x0000000000044289 <+73>: retq 0x000000000004428a <+74>: xor %eax,%eax 0x000000000004428c <+76>: and $0x20,%ecx 0x000000000004428f <+79>: jne 0x44289 <paging32_prefetch_invalid_gpte+73> 0x0000000000044291 <+81>: mov %rdx,%rsi 0x0000000000044294 <+84>: callq 0x44080 <drop_spte> 0x0000000000044299 <+89>: mov $0x1,%eax 0x000000000004429e <+94>: jmp 0x44289 <paging32_prefetch_invalid_gpte+73> End of assembler dump. After: Dump of assembler code for function paging32_prefetch_invalid_gpte: 0x0000000000044240 <+0>: callq 0x44245 <paging32_prefetch_invalid_gpte+5> 0x0000000000044245 <+5>: test $0x1,%cl 0x0000000000044248 <+8>: je 0x4424f <paging32_prefetch_invalid_gpte+15> 0x000000000004424a <+10>: test $0x20,%cl 0x000000000004424d <+13>: jne 0x4425d <paging32_prefetch_invalid_gpte+29> 0x000000000004424f <+15>: mov %rdx,%rsi 0x0000000000044252 <+18>: callq 0x44080 <drop_spte> 0x0000000000044257 <+23>: mov $0x1,%eax 0x000000000004425c <+28>: retq 0x000000000004425d <+29>: mov %rcx,%rax 0x0000000000044260 <+32>: mov (%rsi),%rsi 0x0000000000044263 <+35>: shr $0x7,%rax 0x0000000000044267 <+39>: and $0x1,%eax 0x000000000004426a <+42>: lea 0x0(,%rax,4),%r8 0x0000000000044272 <+50>: add %r8,%rax 0x0000000000044275 <+53>: mov %rcx,%r8 0x0000000000044278 <+56>: and 0x120(%rsi,%rax,8),%r8 0x0000000000044280 <+64>: mov 0x170(%rsi),%rax 0x0000000000044287 <+71>: shr %cl,%rax 0x000000000004428a <+74>: and $0x1,%eax 0x000000000004428d <+77>: mov %rax,%rcx 0x0000000000044290 <+80>: xor %eax,%eax 0x0000000000044292 <+82>: or %rcx,%r8 0x0000000000044295 <+85>: je 0x4425c <paging32_prefetch_invalid_gpte+28> 0x0000000000044297 <+87>: mov %rdx,%rsi 0x000000000004429a <+90>: callq 0x44080 <drop_spte> 0x000000000004429f <+95>: mov $0x1,%eax 0x00000000000442a4 <+100>: jmp 0x4425c <paging32_prefetch_invalid_gpte+28> End of assembler dump. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Tom Lendacky authored
The KVM MMIO support uses bit 51 as the reserved bit to cause nested page faults when a guest performs MMIO. The AMD memory encryption support uses a CPUID function to define the encryption bit position. Given this, it is possible that these bits can conflict. Use svm_hardware_setup() to override the MMIO mask if memory encryption support is enabled. Various checks are performed to ensure that the mask is properly defined and rsvd_bits() is used to generate the new mask (as was done prior to the change that necessitated this patch). Fixes: 28a1f3ac ("kvm: x86: Set highest physical address bits in non-present/reserved SPTEs") Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Miaohe Lin authored
The function nested_vmx_prepare_msr_bitmap() declaration is below its implementation. So this is meaningless and should be removed. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Rename bit() to __feature_bit() to give it a more descriptive name, and add a macro, feature_bit(), to stuff the X68_FEATURE_ prefix to keep line lengths manageable for code that hardcodes the bit to be retrieved. No functional change intended. Cc: Jim Mattson <jmattson@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Add build-time checks to ensure KVM isn't trying to do a reverse CPUID lookup on Linux-defined feature bits, along with comments to explain the gory details of X86_FEATUREs and bit(). No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Add an entry for CPUID_7_1_EAX in the reserve_cpuid array in preparation for incorporating the array in bit() build-time assertions, specifically to avoid an assertion on F(AVX512_BF16) in do_cpuid_7_mask(). No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Move bit() to cpuid.h in preparation for incorporating the reverse_cpuid array in bit() build-time assertions. Opportunistically use the BIT() macro instead of open-coding the shift. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Add feature-specific helpers for querying guest CPUID support from the emulator instead of having the emulator do a full CPUID and perform its own bit tests. The primary motivation is to eliminate the emulator's usage of bit() so that future patches can add more extensive build-time assertions on the usage of bit() without having to expose yet more code to the emulator. Note, providing a generic guest_cpuid_has() to the emulator doesn't work due to the existing built-time assertions in guest_cpuid_has(), which require the feature being checked to be a compile-time constant. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Add a helper macro to generate the set of reserved cr4 bits for both host and guest to ensure that adding a check on guest capabilities is also added for host capabilities, and vice versa. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Now that KVM prevents setting host-reserved CR4 bits, drop the dedicated XSAVE check in guest_cpuid_has() in favor of open coding similar checks in the SVM/VMX XSAVES enabling flows. Note, checking boot_cpu_has(X86_FEATURE_XSAVE) in the XSAVES flows is technically redundant with respect to the CR4 reserved bit checks, e.g. XSAVES #UDs if CR4.OSXSAVE=0 and arch.xsaves_enabled is consumed if and only if CR4.OXSAVE=1 in guest. Keep (add?) the explicit boot_cpu_has() checks to help document KVM's usage of arch.xsaves_enabled. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Check the current CPU's reserved cr4 bits against the mask calculated for the boot CPU to ensure consistent behavior across all CPUs. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Calculate the host-reserved cr4 bits at runtime based on the system's capabilities (using logic similar to __do_cpuid_func()), and use the dynamically generated mask for the reserved bit check in kvm_set_cr4() instead using of the static CR4_RESERVED_BITS define. This prevents userspace from "enabling" features in cr4 that are not supported by the system, e.g. by ignoring KVM_GET_SUPPORTED_CPUID and specifying a bogus CPUID for the vCPU. Allowing userspace to set unsupported bits in cr4 can lead to a variety of undesirable behavior, e.g. failed VM-Enter, and in general increases KVM's attack surface. A crafty userspace can even abuse CR4.LA57 to induce an unchecked #GP on a WRMSR. On a platform without LA57 support: KVM_SET_CPUID2 // CPUID_7_0_ECX.LA57 = 1 KVM_SET_SREGS // CR4.LA57 = 1 KVM_SET_MSRS // KERNEL_GS_BASE = 0x0004000000000000 KVM_RUN leads to a #GP when writing KERNEL_GS_BASE into hardware: unchecked MSR access error: WRMSR to 0xc0000102 (tried to write 0x0004000000000000) at rIP: 0xffffffffa00f239a (vmx_prepare_switch_to_guest+0x10a/0x1d0 [kvm_intel]) Call Trace: kvm_arch_vcpu_ioctl_run+0x671/0x1c70 [kvm] kvm_vcpu_ioctl+0x36b/0x5d0 [kvm] do_vfs_ioctl+0xa1/0x620 ksys_ioctl+0x66/0x70 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x4c/0x170 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7fc08133bf47 Note, the above sequence fails VM-Enter due to invalid guest state. Userspace can allow VM-Enter to succeed (after the WRMSR #GP) by adding a KVM_SET_SREGS w/ CR4.LA57=0 after KVM_SET_MSRS, in which case KVM will technically leak the host's KERNEL_GS_BASE into the guest. But, as KERNEL_GS_BASE is a userspace-defined value/address, the leak is largely benign as a malicious userspace would simply be exposing its own data to the guest, and attacking a benevolent userspace would require multiple bugs in the userspace VMM. Cc: stable@vger.kernel.org Cc: Jun Nakajima <jun.nakajima@intel.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-