- 30 Apr, 2019 19 commits
-
-
KarimAllah Ahmed authored
In KVM, specially for nested guests, there is a dominant pattern of: => map guest memory -> do_something -> unmap guest memory In addition to all this unnecessarily noise in the code due to boiler plate code, most of the time the mapping function does not properly handle memory that is not backed by "struct page". This new guest mapping API encapsulate most of this boiler plate code and also handles guest memory that is not backed by "struct page". The current implementation of this API is using memremap for memory that is not backed by a "struct page" which would lead to a huge slow-down if it was used for high-frequency mapping operations. The API does not have any effect on current setups where guest memory is backed by a "struct page". Further patches are going to also introduce a pfn-cache which would significantly improve the performance of the memremap case. Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Filippo Sironi authored
cmpxchg_gpte() calls get_user_pages_fast() to retrieve the number of pages and the respective struct page to map in the kernel virtual address space. This doesn't work if get_user_pages_fast() is invoked with a userspace virtual address that's backed by PFNs outside of kernel reach (e.g., when limiting the kernel memory with mem= in the command line and using /dev/mem to map memory). If get_user_pages_fast() fails, look up the VMA that back the userspace virtual address, compute the PFN and the physical address, and map it in the kernel virtual address space with memremap(). Signed-off-by: Filippo Sironi <sironi@amazon.de> Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
KarimAllah Ahmed authored
Update the PML table without mapping and unmapping the page. This also avoids using kvm_vcpu_gpa_to_page(..) which assumes that there is a "struct page" for guest memory. As a side-effect of using kvm_write_guest_page the page is also properly marked as dirty. Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
KarimAllah Ahmed authored
Read the data directly from guest memory instead of the map->read->unmap sequence. This also avoids using kvm_vcpu_gpa_to_page() and kmap() which assumes that there is a "struct page" for guest memory. Suggested-by: Jim Mattson <jmattson@google.com> Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Borislav Petkov authored
The hardware configuration register has some useful bits which can be used by guests. Implement McStatusWrEn which can be used by guests when injecting MCEs with the in-kernel mce-inject module. For that, we need to set bit 18 - McStatusWrEn - first, before writing the MCi_STATUS registers (otherwise we #GP). Add the required machinery to do so. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Jim Mattson <jmattson@google.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: KVM <kvm@vger.kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Yazen Ghannam <Yazen.Ghannam@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
The capabilities header depends on asm/vmx.h but doesn't explicitly include said file. This currently doesn't cause problems as all users of capbilities.h first include asm/vmx.h, but the issue often results in build errors if someone starts moving things around the VMX files. Fixes: 3077c191 ("KVM: VMX: Move capabilities structs and helpers to dedicated file") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Dan Carpenter authored
Smatch complains about this: arch/x86/kvm/vmx/vmx.c:5730 dump_vmcs() warn: KERN_* level not at start of string The code should be using pr_cont() instead of pr_err(). Fixes: 9d609649 ("KVM: vmx: print more APICv fields in dump_vmcs") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Jiang Biao authored
is_dirty has been renamed to flush, but the comment for it is outdated. And the description about @flush parameter for kvm_clear_dirty_log_protect() is missing, add it in this patch as well. Signed-off-by: Jiang Biao <benbjiang@tencent.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
If a memory slot's size is not a multiple of 64 pages (256K), then the KVM_CLEAR_DIRTY_LOG API is unusable: clearing the final 64 pages either requires the requested page range to go beyond memslot->npages, or requires log->num_pages to be unaligned, and kvm_clear_dirty_log_protect requires log->num_pages to be both in range and aligned. To allow this case, allow log->num_pages not to be a multiple of 64 if it ends exactly on the last page of the slot. Reported-by: Peter Xu <peterx@redhat.com> Fixes: 98938aa8 ("KVM: validate userspace input in kvm_clear_dirty_log_protect()", 2019-01-02) Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Ten percent of nothin' is... let me do the math here. Nothin' into nothin', carry the nothin'... Cc: Wanpeng Li <wanpengli@tencent.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Checking for a pending non-periodic interrupt in start_hv_timer() leads to restart_apic_timer() making an unnecessary call to start_sw_timer() due to start_hv_timer() returning false. Alternatively, start_hv_timer() could return %true when there is a pending non-periodic interrupt, but that approach is less intuitive, i.e. would require a beefy comment to explain an otherwise simple check. Cc: Liran Alon <liran.alon@oracle.com> Cc: Wanpeng Li <wanpengli@tencent.com> Suggested-by: Liran Alon <liran.alon@oracle.com> Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Refactor kvm_x86_ops->set_hv_timer to use an explicit parameter for stating that the timer has expired. Overloading the return value is unnecessarily clever, e.g. can lead to confusion over the proper return value from start_hv_timer() when r==1. Cc: Wanpeng Li <wanpengli@tencent.com> Cc: Liran Alon <liran.alon@oracle.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Explicitly call cancel_hv_timer() instead of returning %false to coerce restart_apic_timer() into canceling it by way of start_sw_timer(). Functionally, the existing code is correct in the sense that it doesn't doing anything visibily wrong, e.g. generate spurious interrupts or miss an interrupt. But it's extremely confusing and inefficient, e.g. there are multiple extraneous calls to apic_timer_expired() that effectively get dropped due to @timer_pending being %true. Cc: Wanpeng Li <wanpengli@tencent.com> Cc: Liran Alon <liran.alon@oracle.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
...now that VMX's preemption timer, i.e. the hv_timer, also adjusts its programmed time based on lapic_timer_advance_ns. Without the delay, a guest can see a timer interrupt arrive before the requested time when KVM is using the hv_timer to emulate the guest's interrupt. Fixes: c5ce8235 ("KVM: VMX: Optimize tscdeadline timer latency") Cc: <stable@vger.kernel.org> Cc: Wanpeng Li <wanpengli@tencent.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Liran Alon authored
Since commits 668fffa3 ("kvm: better MWAIT emulation for guestsâ€) and 4d5422ce ("KVM: X86: Provide a capability to disable MWAIT interceptsâ€), KVM was modified to allow an admin to configure certain guests to execute MONITOR/MWAIT inside guest without being intercepted by host. This is useful in case admin wishes to allocate a dedicated logical processor for each vCPU thread. Thus, making it safe for guest to completely control the power-state of the logical processor. The ability to use this new KVM capability was introduced to QEMU by commits 6f131f13e68d ("kvm: support -overcommit cpu-pm=on|offâ€) and 2266d4431132 ("i386/cpu: make -cpu host support monitor/mwaitâ€). However, exposing MONITOR/MWAIT to a Linux guest may cause it's intel_idle kernel module to execute c1e_promotion_disable() which will attempt to RDMSR/WRMSR from/to MSR_IA32_POWER_CTL to manipulate the "C1E Enable" bit. This behaviour was introduced by commit 32e95180 ("intel_idle: export both C1 and C1Eâ€). Becuase KVM doesn't emulate this MSR, running KVM with ignore_msrs=0 will cause the above guest behaviour to raise a #GP which will cause guest to kernel panic. Therefore, add support for nop emulation of MSR_IA32_POWER_CTL to avoid #GP in guest in this scenario. Future commits can optimise emulation further by reflecting guest MSR changes to host MSR to provide guest with the ability to fine-tune the dedicated logical processor power-state. Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Luwei Kang authored
Let guests clear the Intel PT ToPA PMI status (bit 55 of MSR_CORE_PERF_GLOBAL_OVF_CTRL). Signed-off-by: Luwei Kang <luwei.kang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Luwei Kang authored
Inject a PMI for KVM guest when Intel PT working in Host-Guest mode and Guest ToPA entry memory buffer was completely filled. Signed-off-by: Luwei Kang <luwei.kang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Radim Krčmář authored
This reverts commit 919f6cd8. The patch was applied twice. The first commit is eca6be56. Reported-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
Merge tag 'kvm-s390-next-5.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: Features and fixes for 5.2 - VSIE crypto fixes - new guest features for gen15 - disable halt polling for nested virtualization with overcommit
-
- 29 Apr, 2019 2 commits
-
-
Pierre Morel authored
Let's use the correct validity number. Fixes: 56019f9a ("KVM: s390: vsie: Allow CRYCB FORMAT-2") Signed-off-by: Pierre Morel <pmorel@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Message-Id: <1556269201-22918-1-git-send-email-pmorel@linux.ibm.com> Acked-by: David Hildenbrand <david@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
Pierre Morel authored
When the guest do not have AP instructions nor Key management we should return without shadowing the CRYCB. We did not check correctly in the past. Fixes: b10bd9a2 ("s390: vsie: Use effective CRYCBD.31 to check CRYCBD validity") Fixes: 6ee74098 ("KVM: s390: vsie: allow CRYCB FORMAT-0") Signed-off-by: Pierre Morel <pmorel@linux.ibm.com> Reported-by: Christian Borntraeger <borntraeger@de.ibm.com> Message-Id: <1556269010-22258-1-git-send-email-pmorel@linux.ibm.com> Acked-by: David Hildenbrand <david@redhat.com> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
- 26 Apr, 2019 2 commits
-
-
Christian Borntraeger authored
We do track the current steal time of the host CPUs. Let us use this value to disable halt polling if the steal time goes beyond a configured value. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
Christian Borntraeger authored
There are cases where halt polling is unwanted. For example when running KVM on an over committed LPAR we rather want to give back the CPU to neighbour LPARs instead of polling. Let us provide a callback that allows architectures to disable polling. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
- 25 Apr, 2019 2 commits
-
-
Christian Borntraeger authored
Instead of adding a new machine option to disable/enable the keywrapping options of pckmo (like for AES and DEA) we can now use the CPU model to decide. As ECC is also wrapped with the AES key we need that to be enabled. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com>
-
Christian Borntraeger authored
This enables stfle.151 and adds the subfunctions for DFLTCC. Bit 151 is added to the list of facilities that will be enabled when there is no cpu model involved as DFLTCC requires no additional handling from userspace, e.g. for migration. Please note that a cpu model enabled user space can and will have the final decision on the facility bits for a guests. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Collin Walling <walling@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com>
-
- 18 Apr, 2019 6 commits
-
-
Christian Borntraeger authored
This enables stfle.150 and adds the subfunctions for SORTL. Bit 150 is added to the list of facilities that will be enabled when there is no cpu model involved as sortl requires no additional handling from userspace, e.g. for migration. Please note that a cpu model enabled user space can and will have the final decision on the facility bits for a guests. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Collin Walling <walling@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com>
-
Christian Borntraeger authored
Some of the new features have a 32byte response for the query function. Provide a new wrapper similar to __cpacf_query. We might want to factor this out if other users come up, as of today there is none. So let us keep the function within KVM. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Collin Walling <walling@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com>
-
Christian Borntraeger authored
This enables stfle.155 and adds the subfunctions for KDSA. Bit 155 is added to the list of facilities that will be enabled when there is no cpu model involved as MSA9 requires no additional handling from userspace, e.g. for migration. Please note that a cpu model enabled user space can and will have the final decision on the facility bits for a guests. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Collin Walling <walling@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com>
-
Christian Borntraeger authored
If vector support is enabled, the vector BCD enhancements facility might also be enabled. We can directly forward this facility to the guest if available and VX is requested by user space. Please note that user space can and will have the final decision on the facility bits for a guests. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Collin Walling <walling@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com>
-
Christian Borntraeger authored
If vector support is enabled, the vector enhancements facility 2 might also be enabled. We can directly forward this facility to the guest if available and VX is requested by user space. Please note that user space can and will have the final decision on the facility bits for a guests. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Collin Walling <walling@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com>
-
Eric Farman authored
Fix some warnings from smatch: arch/s390/kvm/interrupt.c:2310 get_io_adapter() warn: potential spectre issue 'kvm->arch.adapters' [r] (local cap) arch/s390/kvm/interrupt.c:2341 register_io_adapter() warn: potential spectre issue 'dev->kvm->arch.adapters' [w] Signed-off-by: Eric Farman <farman@linux.ibm.com> Message-Id: <20190417005414.47801-1-farman@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
- 16 Apr, 2019 9 commits
-
-
Paolo Bonzini authored
All architectures except MIPS were defining it in the same way, and memory slots are handled entirely by common code so there is no point in keeping the definition per-architecture. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
EFER.LME and EFER.NX are considered reserved if their respective feature bits are not advertised to the guest. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
KVM allows userspace to violate consistency checks related to the guest's CPUID model to some degree. Generally speaking, userspace has carte blanche when it comes to guest state so long as jamming invalid state won't negatively affect the host. Currently this is seems to be a non-issue as most of the interesting EFER checks are missing, e.g. NX and LME, but those will be added shortly. Proactively exempt userspace from the CPUID checks so as not to break userspace. Note, the efer_reserved_bits check still applies to userspace writes as that mask reflects the host's capabilities, e.g. KVM shouldn't allow a guest to run with NX=1 if it has been disabled in the host. Fixes: d8017474 ("KVM: SVM: Only allow setting of EFER_SVME when CPUID SVM is set") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Most, but not all, helpers that are related to emulating consistency checks for nested VM-Entry return -EINVAL when a check fails. Convert the holdouts to have consistency throughout and to make it clear that the functions are signaling pass/fail as opposed to "resume guest" vs. "exit to userspace". Opportunistically fix bad indentation in nested_vmx_check_guest_state(). Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
Convert all top-level nested VM-Enter consistency check functions to return 0/-EINVAL instead of failure codes, since now they can only ever return one failure code. This also does not give the false impression that failure information is always consumed and/or relevant, e.g. vmx_set_nested_state() only cares whether or not the checks were successful. nested_check_host_control_regs() can also now be inlined into its caller, nested_vmx_check_host_state, since the two have effectively become the same function. Based on a patch by Sean Christopherson. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Rename the top-level consistency check functions to (loosely) align with the SDM. Historically, KVM has used the terms "prereq" and "postreq" to differentiate between consistency checks that lead to VM-Fail and those that lead to VM-Exit. The terms are vague and potentially misleading, e.g. "postreq" might be interpreted as occurring after VM-Entry. Note, while the SDM lumps controls and host state into a single section, "Checks on VMX Controls and Host-State Area", split them into separate top-level functions as the two categories of checks result in different VM instruction errors. This split will allow for additional cleanup. Note #2, "vmentry" is intentionally dropped from the new function names to avoid confusion with nested_check_vm_entry_controls(), and to keep the length of the functions names somewhat manageable. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Per Intel's SDM, volume 3, section Checking and Loading Guest State: Because the checking and the loading occur concurrently, a failure may be discovered only after some state has been loaded. For this reason, the logical processor responds to such failures by loading state from the host-state area, as it would for a VM exit. In other words, a failed non-register state consistency check results in a VM-Exit, not VM-Fail. Moving the non-reg state checks also paves the way for renaming nested_vmx_check_vmentry_postreqs() to align with the SDM, i.e. nested_vmx_check_vmentry_guest_state(). Fixes: 26539bd0 ("KVM: nVMX: check vmcs12 for valid activity state") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Krish Sadhukhan authored
According to section "Checking and Loading Guest State" in Intel SDM vol 3C, the following check is performed on vmentry: If the "load IA32_PAT" VM-entry control is 1, the value of the field for the IA32_PAT MSR must be one that could be written by WRMSR without fault at CPL 0. Specifically, each of the 8 bytes in the field must have one of the values 0 (UC), 1 (WC), 4 (WT), 5 (WP), 6 (WB), or 7 (UC-). Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com> Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Krish Sadhukhan authored
According to section "Checks on Host Control Registers and MSRs" in Intel SDM vol 3C, the following check is performed on vmentry: If the "load IA32_PAT" VM-exit control is 1, the value of the field for the IA32_PAT MSR must be one that could be written by WRMSR without fault at CPL 0. Specifically, each of the 8 bytes in the field must have one of the values 0 (UC), 1 (WC), 4 (WT), 5 (WP), 6 (WB), or 7 (UC-). Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com> Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-