Commits · 87796555d48ca1cc681515e065b824ecab9c4282 · Kirill Smelkov / linux

24 Apr, 2020 1 commit

KVM: nVMX: Store vmcs.EXIT_QUALIFICATION as an unsigned long, not u32 · 87796555

Sean Christopherson authored Apr 22, 2020

Use an unsigned long for 'exit_qual' in nested_vmx_reflect_vmexit(), the
EXIT_QUALIFICATION field is naturally sized, not a 32-bit field.

The bug is most easily observed by doing VMXON (or any VMX instruction)
in L2 with a negative displacement, in which case dropping the upper
bits on nested VM-Exit results in L1 calculating the wrong virtual
address for the memory operand, e.g. "vmxon -0x8(%rbp)" yields:

Unhandled cpu exception 14 #PF at ip 0000000000400553
rbp=0000000000537000 cr2=0000000100536ff8

Fixes: fbdd5025 ("KVM: nVMX: Move VM-Fail check out of nested_vmx_exit_reflected()")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200423001127.13490-1-sean.j.christopherson@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

87796555

23 Apr, 2020 4 commits

KVM: nVMX: Drop a redundant call to vmx_get_intr_info() · 9bd4af24

Sean Christopherson authored Apr 21, 2020

Drop nested_vmx_l1_wants_exit()'s initialization of intr_info from
vmx_get_intr_info() that was inadvertantly introduced along with the
caching mechanism. EXIT_REASON_EXCEPTION_NMI, the only consumer of
intr_info, populates the variable before using it.

Fixes: bb53120d67cd ("KVM: VMX: Cache vmcs.EXIT_INTR_INFO using arch avail_reg flags")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200421075328.14458-2-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

9bd4af24

KVM: x86: move nested-related kvm_x86_ops to a separate struct · 33b22172

Paolo Bonzini authored Apr 17, 2020

Clean up some of the patching of kvm_x86_ops, by moving kvm_x86_ops related to
nested virtualization into a separate struct.

As a result, these ops will always be non-NULL on VMX.  This is not a problem:

* check_nested_events is only called if is_guest_mode(vcpu) returns true

* get_nested_state treats VMXOFF state the same as nested being disabled

* set_nested_state fails if you attempt to set nested state while
  nesting is disabled

* nested_enable_evmcs could already be called on a CPU without VMX enabled
  in CPUID.

* nested_get_evmcs_version was fixed in the previous patch
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

33b22172

KVM: eVMCS: check if nesting is enabled · 25091990

Paolo Bonzini authored Apr 17, 2020

In the next patch nested_get_evmcs_version will be always set in kvm_x86_ops for
VMX, even if nesting is disabled. Therefore, check whether VMX (aka nesting)
is available in the function, the caller will not do the check anymore.
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

25091990

KVM: x86: check_nested_events is never NULL · 56083bdf

Paolo Bonzini authored Apr 17, 2020

Both Intel and AMD now implement it, so there is no need to check if the
callback is implemented.
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

56083bdf

21 Apr, 2020 35 commits

selftests: kvm/set_memory_region_test: do not check RIP if the guest shuts down · 1d2c6c9b

Paolo Bonzini authored Apr 17, 2020

On AMD, the state of the VMCB is undefined after a shutdown VMEXIT. KVM
takes a very conservative approach to that and resets the guest altogether
when that happens. This causes the set_memory_region_test to fail
because the RIP is 0xfff0 (the reset vector). Restrict the RIP test
to KVM_EXIT_INTERNAL_ERROR in order to fix this.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

1d2c6c9b

KVM: SVM: avoid infinite loop on NPF from bad address · e72436bc

Paolo Bonzini authored Apr 17, 2020

When a nested page fault is taken from an address that does not have
a memslot associated to it, kvm_mmu_do_page_fault returns RET_PF_EMULATE
(via mmu_set_spte) and kvm_mmu_page_fault then invokes svm_need_emulation_on_page_fault.

The default answer there is to return false, but in this case this just
causes the page fault to be retried ad libitum.  Since this is not a
fast path, and the only other case where it is taken is an erratum,
just stick a kvm_vcpu_gfn_to_memslot check in there to detect the
common case where the erratum is not happening.

This fixes an infinite loop in the new set_memory_region_test.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

e72436bc

tools/kvm_stat: add sample systemd unit file · 997b7e98

Stefan Raspl authored Apr 02, 2020

Add a sample unit file as a basis for systemd integration of kvm_stat
logs.
Signed-off-by: Stefan Raspl <raspl@de.ibm.com>
Message-Id: <20200402085705.61155-4-raspl@linux.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

997b7e98

tools/kvm_stat: Add command line switch '-L' to log to file · 3754afe7

Stefan Raspl authored Apr 02, 2020

To integrate with logrotate, we have a signal handler that will re-open
the logfile.
Assuming we have a systemd unit file with
     ExecStart=kvm_stat -dtc -s 10 -L /var/log/kvm_stat.csv
     ExecReload=/bin/kill -HUP $MAINPID
and a logrotate config featuring
     postrotate
        /bin/systemctl reload kvm_stat.service
     endscript
Then the overall flow will look like this:
(1) systemd starts kvm_stat, logging to A.
(2) At some point, logrotate runs, moving A to B.
    kvm_stat continues to write to B at this point.
(3) After rotating, logrotate restarts the kvm_stat unit via systemctl.
(4) The kvm_stat unit sends a SIGHUP to kvm_stat, finally making it
    switch over to writing to A again.
Note that in order to keep the structure of the cvs output in tact, we
make sure to, in contrast to the standard log format, only write the
header once at the beginning of a file. This implies that the header is
suppressed when appending to an existing file. Unlike with the standard
format, where we append to an existing file by starting out with a
header.
Signed-off-by: Stefan Raspl <raspl@de.ibm.com>
Message-Id: <20200402085705.61155-3-raspl@linux.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

3754afe7

tools/kvm_stat: add command line switch '-z' to skip zero records · da1fda28

Stefan Raspl authored Apr 02, 2020

When running in logging mode, skip records with all zeros (=empty records)
to preserve space when logging to files.
Signed-off-by: Stefan Raspl <raspl@de.ibm.com>
Message-Id: <20200402085705.61155-2-raspl@linux.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

da1fda28

KVM: Remove redundant argument to kvm_arch_vcpu_ioctl_run · 1b94f6f8

Tianjia Zhang authored Apr 16, 2020

In earlier versions of kvm, 'kvm_run' was an independent structure
and was not included in the vcpu structure. At present, 'kvm_run'
is already included in the vcpu structure, so the parameter
'kvm_run' is redundant.

This patch simplifies the function definition, removes the extra
'kvm_run' parameter, and extracts it from the 'kvm_vcpu' structure
if necessary.
Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Message-Id: <20200416051057.26526-1-tianjia.zhang@linux.alibaba.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

1b94f6f8

KVM: nSVM: Check for CR0.CD and CR0.NW on VMRUN of nested guests · 4f233371

Krish Sadhukhan authored Apr 09, 2020

According to section "Canonicalization and Consistency Checks" in APM vol. 2,
the following guest state combination is illegal:

	"CR0.CD is zero and CR0.NW is set"
Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Message-Id: <20200409205035.16830-2-krish.sadhukhan@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

4f233371

KVM: X86: Improve latency for single target IPI fastpath · a9ab13ff

Wanpeng Li authored Apr 10, 2020

IPI and Timer cause the main MSRs write vmexits in cloud environment
observation, let's optimize virtual IPI latency more aggressively to
inject target IPI as soon as possible.

Running kvm-unit-tests/vmexit.flat IPI testing on SKX server, disable
adaptive advance lapic timer and adaptive halt-polling to avoid the
interference, this patch can give another 7% improvement.

w/o fastpath   -> x86.c fastpath      4238 -> 3543  16.4%
x86.c fastpath -> vmx.c fastpath      3543 -> 3293     7%
w/o fastpath   -> vmx.c fastpath      4238 -> 3293  22.3%

Cc: Haiwei Li <lihaiwei@tencent.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200410174703.1138-3-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

a9ab13ff

KVM: VMX: Optimize handling of VM-Entry failures in vmx_vcpu_run() · 873e1da1

Sean Christopherson authored Apr 10, 2020

Mark the VM-Fail, VM-Exit on VM-Enter, and #MC on VM-Enter paths as
'unlikely' so as to improve code generation so that it favors successful
VM-Enter.  The performance of successful VM-Enter is for more important,
irrespective of whether or not success is actually likely.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200410174703.1138-2-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

873e1da1

KVM: nVMX: Remove non-functional "support" for CR3 target values · b8d295f9

Sean Christopherson authored Apr 15, 2020

Remove all references to cr3_target_value[0-3] and replace the fields
in vmcs12 with "dead_space" to preserve the vmcs12 layout. KVM doesn't
support emulating CR3-target values, despite a variety of code that
implies otherwise, as KVM unconditionally reports '0' for the number of
supported CR3-target values.

This technically fixes a bug where KVM would incorrectly allow VMREAD
and VMWRITE to nonexistent fields, i.e. cr3_target_value[0-3]. Per
Intel's SDM, the number of supported CR3-target values reported in
VMX_MISC also enumerates the existence of the associated VMCS fields:

If a future implementation supports more than 4 CR3-target values, they
will be encoded consecutively following the 4 encodings given here.

Alternatively, the "bug" could be fixed by actually advertisting support
for 4 CR3-target values, but that'd likely just enable kvm-unit-tests
given that no one has complained about lack of support for going on ten
years, e.g. KVM, Xen and HyperV don't use CR3-target values.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200416000739.9012-1-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

b8d295f9

KVM: x86/mmu: Avoid an extra memslot lookup in try_async_pf() for L2 · c36b7150

Paolo Bonzini authored Apr 16, 2020

Create a new function kvm_is_visible_memslot() and use it from
kvm_is_visible_gfn(); use the new function in try_async_pf() too,
to avoid an extra memslot lookup.

Opportunistically squish a multi-line comment into a single-line comment.

Note, the end result, KVM_PFN_NOSLOT, is unchanged.

Cc: Jim Mattson <jmattson@google.com>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

c36b7150

KVM: x86/mmu: Set @writable to false for non-visible accesses by L2 · c583eed6

Sean Christopherson authored Apr 15, 2020

Explicitly set @writable to false in try_async_pf() if the GFN->PFN
translation is short-circuited due to the requested GFN not being
visible to L2.

Leaving @writable ('map_writable' in the callers) uninitialized is ok
in that it's never actually consumed, but one has to track it all the
way through set_spte() being short-circuited by set_mmio_spte() to
understand that the uninitialized variable is benign, and relying on
@writable being ignored is an unnecessary risk.  Explicitly setting
@writable also aligns try_async_pf() with __gfn_to_pfn_memslot().

Jim Mattson <jmattson@google.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200415214414.10194-2-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

c583eed6

KVM: VMX: Cache vmcs.EXIT_INTR_INFO using arch avail_reg flags · 87915858

Sean Christopherson authored Apr 15, 2020

Introduce a new "extended register" type, EXIT_INFO_2 (to pair with the
nomenclature in .get_exit_info()), and use it to cache VMX's
vmcs.EXIT_INTR_INFO. Drop a comment in vmx_recover_nmi_blocking() that
is obsoleted by the generic caching mechanism.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200415203454.8296-6-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

87915858

KVM: VMX: Cache vmcs.EXIT_QUALIFICATION using arch avail_reg flags · 5addc235

Sean Christopherson authored Apr 15, 2020

Introduce a new "extended register" type, EXIT_INFO_1 (to pair with the
nomenclature in .get_exit_info()), and use it to cache VMX's
vmcs.EXIT_QUALIFICATION.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200415203454.8296-5-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

5addc235

KVM: nVMX: Drop manual clearing of segment cache on nested VMCS switch · ec0241f3

Sean Christopherson authored Apr 15, 2020

Drop the call to vmx_segment_cache_clear() in vmx_switch_vmcs() now that
the entire register cache is reset when switching the active VMCS, e.g.
vmx_segment_cache_test_set() will reset the segment cache due to
VCPU_EXREG_SEGMENTS being unavailable.

Move vmx_segment_cache_clear() to vmx.c now that it's no longer invoked
by the nested code.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200415203454.8296-4-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

ec0241f3

KVM: nVMX: Reset register cache (available and dirty masks) on VMCS switch · e5d03de5

Sean Christopherson authored Apr 15, 2020

Reset the per-vCPU available and dirty register masks when switching
between vmcs01 and vmcs02, as the masks track state relative to the
current VMCS. The stale masks don't cause problems in the current code
base because the registers are either unconditionally written on nested
transitions or, in the case of segment registers, have an additional
tracker that is manually reset.

Note, by dropping (previously implicitly, now explicitly) the dirty mask
when switching the active VMCS, KVM is technically losing writes to the
associated fields. But, the only regs that can be dirtied (RIP, RSP and
PDPTRs) are unconditionally written on nested transitions, e.g. explicit
writeback is a waste of cycles, and a WARN_ON would be rather pointless.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200415203454.8296-3-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

e5d03de5

KVM: nVMX: Invoke ept_save_pdptrs() if and only if PAE paging is enabled · 9932b49e

Sean Christopherson authored Apr 15, 2020

Invoke ept_save_pdptrs() when restoring L1's host state on a "late"
VM-Fail if and only if PAE paging is enabled. This saves a CALL in the
common case where L1 is a 64-bit host, and avoids incorrectly marking
the PDPTRs as dirty.

WARN if ept_save_pdptrs() is called with PAE disabled now that the
nested usage pre-checks is_pae_paging(). Barring a bug in KVM's MMU,
attempting to read the PDPTRs with PAE disabled is now impossible.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200415203454.8296-2-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

9932b49e

KVM: nVMX: Rename exit_reason to vm_exit_reason for nested VM-Exit · 4dcefa31

Sean Christopherson authored Apr 15, 2020

Use "vm_exit_reason" for code related to injecting a nested VM-Exit to
VM-Exits to make it clear that nested_vmx_vmexit() expects the full exit
eason, not just the basic exit reason. The basic exit reason (bits 15:0
of vmcs.VM_EXIT_REASON) is colloquially referred to as simply "exit
reason".

Note, other flows, e.g. vmx_handle_exit(), are intentionally left as is.
A future patch will convert vmx->exit_reason to a union + bit-field, and
the exempted flows will interact with the unionized of "exit_reason".
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200415175519.14230-10-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

4dcefa31

KVM: nVMX: Cast exit_reason to u16 to check for nested EXTERNAL_INTERRUPT · 2a783389

Sean Christopherson authored Apr 15, 2020

Explicitly check only the basic exit reason when emulating an external
interrupt VM-Exit in nested_vmx_vmexit(). Checking the full exit reason
doesn't currently cause problems, but only because the only exit reason
modifier support by KVM is FAILED_VMENTRY, which is mutually exclusive
with EXTERNAL_INTERRUPT. Future modifiers, e.g. ENCLAVE_MODE, will
coexist with EXTERNAL_INTERRUPT.
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200415175519.14230-9-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

2a783389

KVM: nVMX: Pull exit_reason from vcpu_vmx in nested_vmx_reflect_vmexit() · f47baaed

Sean Christopherson authored Apr 15, 2020

Grab the exit reason from the vcpu struct in nested_vmx_reflect_vmexit()
instead of having the exit reason explicitly passed from the caller.
This fixes a discrepancy between VM-Fail and VM-Exit handling, as the
VM-Fail case is already handled by checking vcpu_vmx, e.g. the exit
reason previously passed on the stack is bogus if vmx->fail is set.

Not taking the exit reason on the stack also avoids having to document
that nested_vmx_reflect_vmexit() requires the full exit reason, as
opposed to just the basic exit reason, which is not at all obvious since
the only usages of the full exit reason are for tracing and way down in
prepare_vmcs12() where it's propagated to vmcs12.

No functional change intended.
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200415175519.14230-8-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

f47baaed

KVM: nVMX: Drop a superfluous WARN on reflecting EXTERNAL_INTERRUPT · 1d283062

Sean Christopherson authored Apr 15, 2020

Drop the WARN in nested_vmx_reflect_vmexit() that fires if KVM attempts
to reflect an external interrupt.  The WARN is blatantly impossible to
hit now that nested_vmx_l0_wants_exit() is called from
nested_vmx_reflect_vmexit() unconditionally returns true for
EXTERNAL_INTERRUPT.
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200415175519.14230-7-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

1d283062

KVM: nVMX: Split VM-Exit reflection logic into L0 vs. L1 wants · 2c1f3323

Sean Christopherson authored Apr 15, 2020

Split the logic that determines whether a nested VM-Exit is reflected
into L1 into "L0 wants" and "L1 wants" to document the core control flow
at a high level. If L0 wants the VM-Exit, e.g. because the exit is due
to a hardware event that isn't passed through to L1, then KVM should
handle the exit in L0 without considering L1's configuration. Then, if
L0 doesn't want the exit, KVM needs to query L1's wants to determine
whether or not L1 "caused" the exit, e.g. by setting an exiting control,
versus the exit occurring due to an L0 setting, e.g. when L0 intercepts
an action that L1 chose to pass-through.

Note, this adds an extra read on vmcs.VM_EXIT_INTR_INFO for exception.
This will be addressed in a future patch via a VMX-wide enhancement,
rather than pile on another case where vmx->exit_intr_info is
conditionally available.
Suggested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200415175519.14230-6-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

2c1f3323

KVM: nVMX: Move nested VM-Exit tracepoint into nested_vmx_reflect_vmexit() · 236871b6

Sean Christopherson authored Apr 15, 2020

Move the tracepoint for nested VM-Exits in preparation of splitting the
reflection logic into L1 wants the exit vs. L0 always handles the exit.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200415175519.14230-5-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

236871b6

KVM: nVMX: Move VM-Fail check out of nested_vmx_exit_reflected() · fbdd5025

Sean Christopherson authored Apr 15, 2020

Check for VM-Fail on nested VM-Enter in nested_vmx_reflect_vmexit() in
preparation for separating nested_vmx_exit_reflected() into separate "L0
wants exit exit" and "L1 wants the exit" helpers.

Explicitly set exit_intr_info and exit_qual to zero instead of reading
them from vmcs02, as they are invalid on VM-Fail (and thankfully ignored
by nested_vmx_vmexit() for nested VM-Fail).
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200415175519.14230-4-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

fbdd5025

KVM: nVMX: Uninline nested_vmx_reflect_vmexit(), i.e. move it to nested.c · 7b7bd87d

Sean Christopherson authored Apr 15, 2020

Uninline nested_vmx_reflect_vmexit() in preparation of refactoring
nested_vmx_exit_reflected() to split up the reflection logic into more
consumable chunks, e.g. VM-Fail vs. L1 wants the exit vs. L0 always
handles the exit.

No functional change intended.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200415175519.14230-3-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

7b7bd87d

KVM: nVMX: Move reflection check into nested_vmx_reflect_vmexit() · 789afc5c

Sean Christopherson authored Apr 15, 2020

Move the call to nested_vmx_exit_reflected() from vmx_handle_exit() into
nested_vmx_reflect_vmexit() and change the semantics of the return value
for nested_vmx_reflect_vmexit() to indicate whether or not the exit was
reflected into L1.  nested_vmx_exit_reflected() and
nested_vmx_reflect_vmexit() are intrinsically tied together, calling one
without simultaneously calling the other makes little sense.

No functional change intended.
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200415175519.14230-2-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

789afc5c

kvm_host: unify VM_STAT and VCPU_STAT definitions in a single place · 812756a8

Emanuele Giuseppe Esposito authored Apr 14, 2020

The macros VM_STAT and VCPU_STAT are redundantly implemented in multiple
files, each used by a different architecure to initialize the debugfs
entries for statistics. Since they all have the same purpose, they can be
unified in a single common definition in include/linux/kvm_host.h
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20200414155625.20559-1-eesposit@redhat.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

812756a8

KVM: x86: move kvm_create_vcpu_debugfs after last failure point · 63d04348

Paolo Bonzini authored Apr 01, 2020

The placement of kvm_create_vcpu_debugfs is more or less irrelevant, since
it cannot fail and userspace should not care about the debugfs entries until
it knows the vcpu has been created. Moving it after the last failure
point removes the need to remove the directory when unwinding the creation.
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20200331224222.393439-1-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

63d04348

KVM: SVM: Use do_machine_check to pass MCE to the host · 1c164cb3

Uros Bizjak authored Apr 11, 2020

Use do_machine_check instead of INT $12 to pass MCE to the host,
the same approach VMX uses.

On a related note, there is no reason to limit the use of do_machine_check
to 64 bit targets, as is currently done for VMX. MCE handling works
for both target families.

The patch is only compile tested, for both, 64 and 32 bit targets,
someone should test the passing of the exception by injecting
some MCEs into the guest.

For future non-RFC patch, kvm_machine_check should be moved to some
appropriate header file.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Message-Id: <20200411153627.3474710-1-ubizjak@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

1c164cb3

KVM: VMX: Clean cr3/pgd handling in vmx_load_mmu_pgd() · be100ef1

Sean Christopherson authored Mar 20, 2020

Rename @cr3 to @pgd in vmx_load_mmu_pgd() to reflect that it will be
loaded into vmcs.EPT_POINTER and not vmcs.GUEST_CR3 when EPT is enabled.
Similarly, load guest_cr3 with @pgd if and only if EPT is disabled.

This fixes one of the last, if not _the_ last, cases in KVM where a
variable that is not strictly a cr3 value uses "cr3" instead of "pgd".
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200320212833.3507-38-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

be100ef1

KVM: x86: Replace "cr3" with "pgd" in "new cr3/pgd" related code · be01e8e2

Sean Christopherson authored Mar 20, 2020

Rename functions and variables in kvm_mmu_new_cr3() and related code to
replace "cr3" with "pgd", i.e. continue the work started by commit
727a7e27 ("KVM: x86: rename set_cr3 callback and related flags to
load_mmu_pgd"). kvm_mmu_new_cr3() and company are not always loading a
new CR3, e.g. when nested EPT is enabled "cr3" is actually an EPTP.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200320212833.3507-37-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

be01e8e2

KVM: nVMX: Free only the affected contexts when emulating INVEPT · ce8fe7b7

Sean Christopherson authored Mar 20, 2020

Add logic to handle_invept() to free only those roots that match the
target EPT context when emulating a single-context INVEPT.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200320212833.3507-36-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

ce8fe7b7

KVM: nVMX: Don't flush TLB on nested VMX transition · 9805c5f7

Sean Christopherson authored Mar 20, 2020

Unconditionally skip the TLB flush triggered when reusing a root for a
nested transition as nested_vmx_transition_tlb_flush() ensures the TLB
is flushed when needed, regardless of whether the MMU can reuse a cached
root (or the last root).
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200320212833.3507-35-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

9805c5f7

KVM: nVMX: Skip MMU sync on nested VMX transition when possible · 41fab65e

Sean Christopherson authored Mar 20, 2020

Skip the MMU sync when reusing a cached root if EPT is enabled or L1
enabled VPID for L2.

If EPT is enabled, guest-physical mappings aren't flushed even if VPID
is disabled, i.e. L1 can't expect stale TLB entries to be flushed if it
has enabled EPT and L0 isn't shadowing PTEs (for L1 or L2) if L1 has
EPT disabled.

If VPID is enabled (and EPT is disabled), then L1 can't expect stale TLB
entries to be flushed (for itself or L2).
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200320212833.3507-34-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

41fab65e

KVM: x86/mmu: Add module param to force TLB flush on root reuse · 71fe7013

Sean Christopherson authored Mar 20, 2020

Add a module param, flush_on_reuse, to override skip_tlb_flush and
skip_mmu_sync when performing a so called "fast cr3 switch", i.e. when
reusing a cached root.  The primary motiviation for the control is to
provide a fallback mechanism in the event that TLB flushing and/or MMU
sync bugs are exposed/introduced by upcoming changes to stop
unconditionally flushing on nested VMX transitions.
Suggested-by: Jim Mattson <jmattson@google.com>
Suggested-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200320212833.3507-33-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

71fe7013