An error occurred fetching the project authors.
- 05 Aug, 2018 5 commits
-
-
Nicolai Stange authored
For VMEXITs caused by external interrupts, vmx_handle_external_intr() indirectly calls into the interrupt handlers through the host's IDT. It follows that these interrupts get accounted for in the kvm_cpu_l1tf_flush_l1d per-cpu flag. The subsequently executed vmx_l1d_flush() will thus be aware that some interrupts have happened and conduct a L1d flush anyway. Setting l1tf_flush_l1d from vmx_handle_external_intr() isn't needed anymore. Drop it. Signed-off-by:
Nicolai Stange <nstange@suse.de> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de>
-
Nicolai Stange authored
Part of the L1TF mitigation for vmx includes flushing the L1D cache upon VMENTRY. L1D flushes are costly and two modes of operations are provided to users: "always" and the more selective "conditional" mode. If operating in the latter, the cache would get flushed only if a host side code path considered unconfined had been traversed. "Unconfined" in this context means that it might have pulled in sensitive data like user data or kernel crypto keys. The need for L1D flushes is tracked by means of the per-vcpu flag l1tf_flush_l1d. KVM exit handlers considered unconfined set it. A vmx_l1d_flush() subsequently invoked before the next VMENTER will conduct a L1d flush based on its value and reset that flag again. Currently, interrupts delivered "normally" while in root operation between VMEXIT and VMENTER are not taken into account. Part of the reason is that these don't leave any traces and thus, the vmx code is unable to tell if any such has happened. As proposed by Paolo Bonzini, prepare for tracking all interrupts by introducing a new per-cpu flag, "kvm_cpu_l1tf_flush_l1d". It will be in strong analogy to the per-vcpu ->l1tf_flush_l1d. A later patch will make interrupt handlers set it. For the sake of cache locality, group kvm_cpu_l1tf_flush_l1d into x86' per-cpu irq_cpustat_t as suggested by Peter Zijlstra. Provide the helpers kvm_set_cpu_l1tf_flush_l1d(), kvm_clear_cpu_l1tf_flush_l1d() and kvm_get_cpu_l1tf_flush_l1d(). Make them trivial resp. non-existent for !CONFIG_KVM_INTEL as appropriate. Let vmx_l1d_flush() handle kvm_cpu_l1tf_flush_l1d in the same way as l1tf_flush_l1d. Suggested-by:
Paolo Bonzini <pbonzini@redhat.com> Suggested-by:
Peter Zijlstra <peterz@infradead.org> Signed-off-by:
Nicolai Stange <nstange@suse.de> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Reviewed-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Nicolai Stange authored
Currently, vmx_vcpu_run() checks if l1tf_flush_l1d is set and invokes vmx_l1d_flush() if so. This test is unncessary for the "always flush L1D" mode. Move the check to vmx_l1d_flush()'s conditional mode code path. Notes: - vmx_l1d_flush() is likely to get inlined anyway and thus, there's no extra function call. - This inverts the (static) branch prediction, but there hadn't been any explicit likely()/unlikely() annotations before and so it stays as is. Signed-off-by:
Nicolai Stange <nstange@suse.de> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de>
-
Nicolai Stange authored
The vmx_l1d_flush_always static key is only ever evaluated if vmx_l1d_should_flush is enabled. In that case however, there are only two L1d flushing modes possible: "always" and "conditional". The "conditional" mode's implementation tends to require more sophisticated logic than the "always" mode. Avoid inverted logic by replacing the 'vmx_l1d_flush_always' static key with a 'vmx_l1d_flush_cond' one. There is no change in functionality. Signed-off-by:
Nicolai Stange <nstange@suse.de> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de>
-
Nicolai Stange authored
vmx_l1d_flush() gets invoked only if l1tf_flush_l1d is true. There's no point in setting l1tf_flush_l1d to true from there again. Signed-off-by:
Nicolai Stange <nstange@suse.de> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de>
-
- 19 Jul, 2018 1 commit
-
-
Nicolai Stange authored
The slow path in vmx_l1d_flush() reads from vmx_l1d_flush_pages in order to evict the L1d cache. However, these pages are never cleared and, in theory, their data could be leaked. More importantly, KSM could merge a nested hypervisor's vmx_l1d_flush_pages to fewer than 1 << L1D_CACHE_ORDER host physical pages and this would break the L1d flushing algorithm: L1D on x86_64 is tagged by physical addresses. Fix this by initializing the individual vmx_l1d_flush_pages with a different pattern each. Rename the "empty_zp" asm constraint identifier in vmx_l1d_flush() to "flush_pages" to reflect this change. Fixes: a47dd5f0 ("x86/KVM/VMX: Add L1D flush algorithm") Signed-off-by:
Nicolai Stange <nstange@suse.de> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de>
-
- 13 Jul, 2018 8 commits
-
-
Jiri Kosina authored
Introduce the 'l1tf=' kernel command line option to allow for boot-time switching of mitigation that is used on processors affected by L1TF. The possible values are: full Provides all available mitigations for the L1TF vulnerability. Disables SMT and enables all mitigations in the hypervisors. SMT control via /sys/devices/system/cpu/smt/control is still possible after boot. Hypervisors will issue a warning when the first VM is started in a potentially insecure configuration, i.e. SMT enabled or L1D flush disabled. full,force Same as 'full', but disables SMT control. Implies the 'nosmt=force' command line option. sysfs control of SMT and the hypervisor flush control is disabled. flush Leaves SMT enabled and enables the conditional hypervisor mitigation. Hypervisors will issue a warning when the first VM is started in a potentially insecure configuration, i.e. SMT enabled or L1D flush disabled. flush,nosmt Disables SMT and enables the conditional hypervisor mitigation. SMT control via /sys/devices/system/cpu/smt/control is still possible after boot. If SMT is reenabled or flushing disabled at runtime hypervisors will issue a warning. flush,nowarn Same as 'flush', but hypervisors will not warn when a VM is started in a potentially insecure configuration. off Disables hypervisor mitigations and doesn't emit any warnings. Default is 'flush'. Let KVM adhere to these semantics, which means: - 'lt1f=full,force' : Performe L1D flushes. No runtime control possible. - 'l1tf=full' - 'l1tf-flush' - 'l1tf=flush,nosmt' : Perform L1D flushes and warn on VM start if SMT has been runtime enabled or L1D flushing has been run-time enabled - 'l1tf=flush,nowarn' : Perform L1D flushes and no warnings are emitted. - 'l1tf=off' : L1D flushes are not performed and no warnings are emitted. KVM can always override the L1D flushing behavior using its 'vmentry_l1d_flush' module parameter except when lt1f=full,force is set. This makes KVM's private 'nosmt' option redundant, and as it is a bit non-systematic anyway (this is something to control globally, not on hypervisor level), remove that option. Add the missing Documentation entry for the l1tf vulnerability sysfs file while at it. Signed-off-by:
Jiri Kosina <jkosina@suse.cz> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Tested-by:
Jiri Kosina <jkosina@suse.cz> Reviewed-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by:
Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lkml.kernel.org/r/20180713142323.202758176@linutronix.de
-
Thomas Gleixner authored
All mitigation modes can be switched at run time with a static key now: - Use sysfs_streq() instead of strcmp() to handle the trailing new line from sysfs writes correctly. - Make the static key management handle multiple invocations properly. - Set the module parameter file to RW Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Tested-by:
Jiri Kosina <jkosina@suse.cz> Reviewed-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by:
Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lkml.kernel.org/r/20180713142322.954525119@linutronix.de
-
Thomas Gleixner authored
Writes to the parameter files are not serialized at the sysfs core level, so local serialization is required. Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Tested-by:
Jiri Kosina <jkosina@suse.cz> Reviewed-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by:
Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lkml.kernel.org/r/20180713142322.873642605@linutronix.de
-
Thomas Gleixner authored
Avoid the conditional in the L1D flush control path. Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Tested-by:
Jiri Kosina <jkosina@suse.cz> Reviewed-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by:
Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lkml.kernel.org/r/20180713142322.790914912@linutronix.de
-
Thomas Gleixner authored
In preparation of allowing run time control for L1D flushing, move the setup code to the module parameter handler. In case of pre module init parsing, just store the value and let vmx_init() do the actual setup after running kvm_init() so that enable_ept is having the correct state. During run-time invoke it directly from the parameter setter to prepare for run-time control. Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Tested-by:
Jiri Kosina <jkosina@suse.cz> Reviewed-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by:
Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lkml.kernel.org/r/20180713142322.694063239@linutronix.de
-
Thomas Gleixner authored
If Extended Page Tables (EPT) are disabled or not supported, no L1D flushing is required. The setup function can just avoid setting up the L1D flush for the EPT=n case. Invoke it after the hardware setup has be done and enable_ept has the correct state and expose the EPT disabled state in the mitigation status as well. Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Tested-by:
Jiri Kosina <jkosina@suse.cz> Reviewed-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by:
Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lkml.kernel.org/r/20180713142322.612160168@linutronix.de
-
Thomas Gleixner authored
The VMX module parameter to control the L1D flush should become writeable. The MSR list is set up at VM init per guest VCPU, but the run time switching is based on a static key which is global. Toggling the MSR list at run time might be feasible, but for now drop this optimization and use the regular MSR write to make run-time switching possible. The default mitigation is the conditional flush anyway, so for extra paranoid setups this will add some small overhead, but the extra code executed is in the noise compared to the flush itself. Aside of that the EPT disabled case is not handled correctly at the moment and the MSR list magic is in the way for fixing that as well. If it's really providing a significant advantage, then this needs to be revisited after the code is correct and the control is writable. Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Tested-by:
Jiri Kosina <jkosina@suse.cz> Reviewed-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by:
Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lkml.kernel.org/r/20180713142322.516940445@linutronix.de
-
Thomas Gleixner authored
Store the effective mitigation of VMX in a status variable and use it to report the VMX state in the l1tf sysfs file. Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Tested-by:
Jiri Kosina <jkosina@suse.cz> Reviewed-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by:
Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lkml.kernel.org/r/20180713142322.433098358@linutronix.de
-
- 04 Jul, 2018 10 commits
-
-
Konrad Rzeszutek Wilk authored
If the L1D flush module parameter is set to 'always' and the IA32_FLUSH_CMD MSR is available, optimize the VMENTER code with the MSR save list. Signed-off-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de>
-
Konrad Rzeszutek Wilk authored
The IA32_FLUSH_CMD MSR needs only to be written on VMENTER. Extend add_atomic_switch_msr() with an entry_only parameter to allow storing the MSR only in the guest (ENTRY) MSR array. Signed-off-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de>
-
Konrad Rzeszutek Wilk authored
This allows to load a different number of MSRs depending on the context: VMEXIT or VMENTER. Signed-off-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de>
-
Konrad Rzeszutek Wilk authored
.. to help find the MSR on either the guest or host MSR list. Signed-off-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de>
-
Konrad Rzeszutek Wilk authored
There is no semantic change but this change allows an unbalanced amount of MSRs to be loaded on VMEXIT and VMENTER, i.e. the number of MSRs to save or restore on VMEXIT or VMENTER may be different. Signed-off-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de>
-
Paolo Bonzini authored
Add the logic for flushing L1D on VMENTER. The flush depends on the static key being enabled and the new l1tf_flush_l1d flag being set. The flags is set: - Always, if the flush module parameter is 'always' - Conditionally at: - Entry to vcpu_run(), i.e. after executing user space - From the sched_in notifier, i.e. when switching to a vCPU thread. - From vmexit handlers which are considered unsafe, i.e. where sensitive data can be brought into L1D: - The emulator, which could be a good target for other speculative execution-based threats, - The MMU, which can bring host page tables in the L1 cache. - External interrupts - Nested operations that require the MMU (see above). That is vmptrld, vmptrst, vmclear,vmwrite,vmread. - When handling invept,invvpid [ tglx: Split out from combo patch and reduced to a single flag ] Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de>
-
Paolo Bonzini authored
336996-Speculative-Execution-Side-Channel-Mitigations.pdf defines a new MSR (IA32_FLUSH_CMD aka 0x10B) which has similar write-only semantics to other MSRs defined in the document. The semantics of this MSR is to allow "finer granularity invalidation of caching structures than existing mechanisms like WBINVD. It will writeback and invalidate the L1 data cache, including all cachelines brought in by preceding instructions, without invalidating all caches (eg. L2 or LLC). Some processors may also invalidate the first level level instruction cache on a L1D_FLUSH command. The L1 data and instruction caches may be shared across the logical processors of a core." Use it instead of the loop based L1 flush algorithm. A copy of this document is available at https://bugzilla.kernel.org/show_bug.cgi?id=199511 [ tglx: Avoid allocating pages when the MSR is available ] Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de>
-
Paolo Bonzini authored
To mitigate the L1 Terminal Fault vulnerability it's required to flush L1D on VMENTER to prevent rogue guests from snooping host memory. CPUs will have a new control MSR via a microcode update to flush L1D with a single MSR write, but in the absence of microcode a fallback to a software based flush algorithm is required. Add a software flush loop which is based on code from Intel. [ tglx: Split out from combo patch ] [ bpetkov: Polish the asm code ] Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de>
-
Konrad Rzeszutek Wilk authored
Add a mitigation mode parameter "vmentry_l1d_flush" for CVE-2018-3620, aka L1 terminal fault. The valid arguments are: - "always" L1D cache flush on every VMENTER. - "cond" Conditional L1D cache flush, explained below - "never" Disable the L1D cache flush mitigation "cond" is trying to avoid L1D cache flushes on VMENTER if the code executed between VMEXIT and VMENTER is considered safe, i.e. is not bringing any interesting information into L1D which might exploited. [ tglx: Split out from a larger patch ] Signed-off-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de>
-
Konrad Rzeszutek Wilk authored
If the L1TF CPU bug is present we allow the KVM module to be loaded as the major of users that use Linux and KVM have trusted guests and do not want a broken setup. Cloud vendors are the ones that are uncomfortable with CVE 2018-3620 and as such they are the ones that should set nosmt to one. Setting 'nosmt' means that the system administrator also needs to disable SMT (Hyper-threading) in the BIOS, or via the 'nosmt' command line parameter, or via the /sys/devices/system/cpu/smt/control. See commit 05736e4a ("cpu/hotplug: Provide knobs to control SMT"). Other mitigations are to use task affinity, cpu sets, interrupt binding, etc - anything to make sure that _only_ the same guests vCPUs are running on sibling threads. Signed-off-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de>
-
- 14 Jun, 2018 1 commit
-
-
Arnd Bergmann authored
Arnd had sent this patch to the KVM mailing list, but it slipped through the cracks of maintainers hand-off, and therefore wasn't included in the pull request. The same issue had been fixed by Linus in commit dbee3d02 ("KVM: x86: VMX: fix build without hyper-v", 2018-06-12) as a self-described "quick-and-hacky build fix". However, checking the compile-time configuration symbol with IS_ENABLED is cleaner and it is enough to avoid the link error, so switch to Arnd's solution. Signed-off-by:
Arnd Bergmann <arnd@arndb.de> [Rewritten commit message. - Paolo] Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 13 Jun, 2018 1 commit
-
-
Linus Torvalds authored
Commit ceef7d10 ("KVM: x86: VMX: hyper-v: Enlightened MSR-Bitmap support") broke the build with Hyper-V disabled, because it accesses ms_hyperv.nested_features without checking if that exists. This is the quick-and-hacky build fix. I suspect the proper fix is to replace the static_branch_unlikely(&enable_evmcs) tests with an inline helper function that also checks that CONFIG_HYPERV is enabled, since without that, enable_evmcs makes no sense. But I want a working build environment first and foremost, and I'm upset this slipped through in the first place. My primary build tests missed it because I tend to build with everything enabled, but it should have been caught in the kvm tree. Fixes: ceef7d10 ("KVM: x86: VMX: hyper-v: Enlightened MSR-Bitmap support") Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 12 Jun, 2018 2 commits
-
-
Paolo Bonzini authored
Int the next patch the emulator's .read_std and .write_std callbacks will grow another argument, which is not needed in kvm_read_guest_virt and kvm_write_guest_virt_system's callers. Since we have to make separate functions, let's give the currently existing names a nicer interface, too. Fixes: 129a72a0 ("KVM: x86: Introduce segmented_write_std", 2017-01-12) Cc: stable@vger.kernel.org Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Felix Wilhelm authored
VMX instructions executed inside a L1 VM will always trigger a VM exit even when executed with cpl 3. This means we must perform the privilege check in software. Fixes: 70f3aac9("kvm: nVMX: Remove superfluous VMX instruction fault checks") Cc: stable@vger.kernel.org Signed-off-by:
Felix Wilhelm <fwilhelm@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 04 Jun, 2018 3 commits
-
-
Jim Mattson authored
Add support for "VMWRITE to any supported field in the VMCS" and enable this feature by default in L1's IA32_VMX_MISC MSR. If userspace clears the VMX capability bit, the old behavior will be restored. Note that this feature is a prerequisite for kvm in L1 to use VMCS shadowing, once that feature is available. Signed-off-by:
Jim Mattson <jmattson@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Jim Mattson authored
Disallow changes to the VMX capability MSRs while the vCPU is in VMX operation. Although this does break the existing API, it helps to avoid some potentially tricky situations for which there is no architected behavior. Signed-off-by:
Jim Mattson <jmattson@google.com> Reviewed-by:
Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Wanpeng Li authored
'Commit d0659d94 ("KVM: x86: add option to advance tscdeadline hrtimer expiration")' advances the tscdeadline (the timer is emulated by hrtimer) expiration in order that the latency which is incurred by hypervisor (apic_timer_fn -> vmentry) can be avoided. This patch adds the advance tscdeadline expiration support to which the tscdeadline timer is emulated by VMX preemption timer to reduce the hypervisor lantency (handle_preemption_timer -> vmentry). The guest can also set an expiration that is very small (for example in Linux if an hrtimer feeds a expiration in the past); in that case we set delta_tsc to 0, leading to an immediately vmexit when delta_tsc is not bigger than advance ns. This patch can reduce ~63% latency (~4450 cycles to ~1660 cycles on a haswell desktop) for kvm-unit-tests/tscdeadline_latency when testing busy waits. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by:
Wanpeng Li <wanpengli@tencent.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 01 Jun, 2018 1 commit
-
-
Marc Orr authored
The kvm struct has been bloating. For example, it's tens of kilo-bytes for x86, which turns out to be a large amount of memory to allocate contiguously via kzalloc. Thus, this patch does the following: 1. Uses architecture-specific routines to allocate the kvm struct via vzalloc for x86. 2. Switches arm to __KVM_HAVE_ARCH_VM_ALLOC so that it can use vzalloc when has_vhe() is true. Other architectures continue to default to kalloc, as they have a dependency on kalloc or have a small-enough struct kvm. Signed-off-by:
Marc Orr <marcorr@google.com> Reviewed-by:
Marc Zyngier <marc.zyngier@arm.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 24 May, 2018 3 commits
-
-
Liran Alon authored
When vmcs12 uses VPID, all TLB entries populated by L2 are tagged with vmx->nested.vpid02. Currently, INVVPID executed by L1 is emulated by L0 by using INVVPID single/global-context to flush all TLB entries tagged with vmx->nested.vpid02 regardless of INVVPID type executed by L1. However, we can easily optimize the case of L1 INVVPID on an individual-address. Just INVVPID given individual-address tagged with vmx->nested.vpid02. Reviewed-by:
Liam Merwick <liam.merwick@oracle.com> Signed-off-by:
Liran Alon <liran.alon@oracle.com> Reviewed-by:
Jim Mattson <jmattson@google.com> [Squashed with a preparatory patch that added the !operand.vpid line.] Signed-off-by:
Radim Krčmář <rkrcmar@redhat.com>
-
Liran Alon authored
Since commit 5c614b35 ("KVM: nVMX: nested VPID emulation"), vmcs01 and vmcs02 don't share the same VPID. vmcs01 uses vmx->vpid while vmcs02 uses vmx->nested.vpid02. This was done such that TLB flush could be avoided when switching between L1 and L2. However, the above mentioned commit only changed L2 VMEntry logic to not flush TLB when switching from L1 to L2. It forgot to also remove the TLB flush which is done when simulating a VMExit from L2 to L1. To fix this issue, on VMExit from L2 to L1 we flush TLB only in case vmcs01 enables VPID and vmcs01->vpid==vmcs02->vpid. This happens when vmcs01 enables VPID and vmcs12 does not. Fixes: 5c614b35 ("KVM: nVMX: nested VPID emulation") Reviewed-by:
Liam Merwick <liam.merwick@oracle.com> Reviewed-by:
Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by:
Liran Alon <liran.alon@oracle.com> Reviewed-by:
Jim Mattson <jmattson@google.com> Signed-off-by:
Radim Krčmář <rkrcmar@redhat.com>
-
Liran Alon authored
Reviewed-by:
Liam Merwick <liam.merwick@oracle.com> Signed-off-by:
Liran Alon <liran.alon@oracle.com> Reviewed-by:
Jim Mattson <jmattson@google.com> Signed-off-by:
Radim Krčmář <rkrcmar@redhat.com>
-
- 23 May, 2018 3 commits
-
-
Jim Mattson authored
Enforce the invariant that existing VMCS12 field offsets must not change. Experience has shown that without strict enforcement, this invariant will not be maintained. Signed-off-by:
Jim Mattson <jmattson@google.com> Reviewed-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [Changed the code to use BUILD_BUG_ON_MSG instead of better, but GCC 4.6 requiring _Static_assert. - Radim.] Signed-off-by:
Radim Krčmář <rkrcmar@redhat.com>
-
Jim Mattson authored
Changing the VMCS12 layout will break save/restore compatibility with older kvm releases once the KVM_{GET,SET}_NESTED_STATE ioctls are accepted upstream. Google has already been using these ioctls for some time, and we implore the community not to disturb the existing layout. Move the four most recently added fields to preserve the offsets of the previously defined fields and reserve locations for the vmread and vmwrite bitmaps, which will be used in the virtualization of VMCS shadowing (to improve the performance of double-nesting). Signed-off-by:
Jim Mattson <jmattson@google.com> Reviewed-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [Kept the SDM order in vmcs_field_to_offset_table. - Radim] Signed-off-by:
Radim Krčmář <rkrcmar@redhat.com>
-
Jim Mattson authored
When saving a vCPU's nested state, the vmcs02 is discarded. Only the shadow vmcs12 is saved. The shadow vmcs12 contains all of the information needed to reconstruct an equivalent vmcs02 on restore, but we have to be able to deal with two contexts: 1. The nested state was saved immediately after an emulated VM-entry, before the vmcs02 was ever launched. 2. The nested state was saved some time after the first successful launch of the vmcs02. Though it's an implementation detail rather than an architected bit, vmx->nested_run_pending serves to distinguish between these two cases. Hence, we save it as part of the vCPU's nested state. (Yes, this is ugly.) Even when restoring from a checkpoint, it may be necessary to build the vmcs02 as if prepare_vmcs02 was called from nested_vmx_run. So, the 'from_vmentry' argument should be dropped, and vmx->nested_run_pending should be consulted instead. The nested state restoration code then has to set vmx->nested_run_pending prior to calling prepare_vmcs02. It's important that the restoration code set vmx->nested_run_pending anyway, since the flag impacts things like interrupt delivery as well. Fixes: cf8b84f4 ("kvm: nVMX: Prepare for checkpointing L2 state") Signed-off-by:
Jim Mattson <jmattson@google.com> Signed-off-by:
Radim Krčmář <rkrcmar@redhat.com>
-
- 17 May, 2018 2 commits
-
-
Tom Lendacky authored
Expose the new virtualized architectural mechanism, VIRT_SSBD, for using speculative store bypass disable (SSBD) under SVM. This will allow guests to use SSBD on hardware that uses non-architectural mechanisms for enabling SSBD. [ tglx: Folded the migration fixup from Paolo Bonzini ] Signed-off-by:
Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de>
-
Thomas Gleixner authored
AMD is proposing a VIRT_SPEC_CTRL MSR to handle the Speculative Store Bypass Disable via MSR_AMD64_LS_CFG so that guests do not have to care about the bit position of the SSBD bit and thus facilitate migration. Also, the sibling coordination on Family 17H CPUs can only be done on the host. Extend x86_spec_ctrl_set_guest() and x86_spec_ctrl_restore_host() with an extra argument for the VIRT_SPEC_CTRL MSR. Hand in 0 from VMX and in SVM add a new virt_spec_ctrl member to the CPU data structure which is going to be used in later patches for the actual implementation. Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Reviewed-by:
Borislav Petkov <bp@suse.de> Reviewed-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-