Commits · 516f7898ae20d9dd902a85522676055a4de9dc9b · nexedi / linux

01 Nov, 2017 9 commits

KVM: PPC: Book3S HV: Allow for running POWER9 host in single-threaded mode · 516f7898

Paul Mackerras authored Oct 16, 2017

This patch allows for a mode on POWER9 hosts where we control all the
threads of a core, much as we do on POWER8. The mode is controlled by
a module parameter on the kvm_hv module, called "indep_threads_mode".
The normal mode on POWER9 is the "independent threads" mode, with
indep_threads_mode=Y, where the host is in SMT4 mode (or in fact any
desired SMT mode) and each thread independently enters and exits from
KVM guests without reference to what other threads in the core are
doing.

If indep_threads_mode is set to N at the point when a VM is started,
KVM will expect every core that the guest runs on to be in single
threaded mode (that is, threads 1, 2 and 3 offline), and will set the
flag that prevents secondary threads from coming online. We can still
use all four threads; the code that implements dynamic micro-threading
on POWER8 will become active in over-commit situations and will allow
up to three other VCPUs to be run on the secondary threads of the core
whenever a VCPU is run.

The reason for wanting this mode is that this will allow us to run HPT
guests on a radix host on a POWER9 machine that does not support
"mixed mode", that is, having some threads in a core be in HPT mode
while other threads are in radix mode. It will also make it possible
to implement a "strict threads" mode in future, if desired.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>

516f7898

KVM: PPC: Book3S HV: Add infrastructure for running HPT guests on radix host · 18c3640c

Paul Mackerras authored Sep 13, 2017

This sets up the machinery for switching a guest between HPT (hashed
page table) and radix MMU modes, so that in future we can run a HPT
guest on a radix host on POWER9 machines.

* The KVM_PPC_CONFIGURE_V3_MMU ioctl can now specify either HPT or
  radix mode, on a radix host.

* The KVM_CAP_PPC_MMU_HASH_V3 capability now returns 1 on POWER9
  with HV KVM on a radix host.

* The KVM_PPC_GET_SMMU_INFO returns information about the HPT MMU on a
  radix host.

* The KVM_PPC_ALLOCATE_HTAB ioctl on a radix host will switch the
  guest to HPT mode and allocate a HPT.

* For simplicity, we now allocate the rmap array for each memslot,
  even on a radix host, since it will be needed if the guest switches
  to HPT mode.

* Since we cannot yet run a HPT guest on a radix host, the KVM_RUN
  ioctl will return an EINVAL error in that case.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>

18c3640c

KVM: PPC: Book3S HV: Unify dirty page map between HPT and radix · e641a317

Paul Mackerras authored Oct 26, 2017

Currently, the HPT code in HV KVM maintains a dirty bit per guest page
in the rmap array, whether or not dirty page tracking has been enabled
for the memory slot. In contrast, the radix code maintains a dirty
bit per guest page in memslot->dirty_bitmap, and only does so when
dirty page tracking has been enabled.

This changes the HPT code to maintain the dirty bits in the memslot
dirty_bitmap like radix does. This results in slightly less code
overall, and will mean that we do not lose the dirty bits when
transitioning between HPT and radix mode in future.

There is one minor change to behaviour as a result. With HPT, when
dirty tracking was enabled for a memslot, we would previously clear
all the dirty bits at that point (both in the HPT entries and in the
rmap arrays), meaning that a KVM_GET_DIRTY_LOG ioctl immediately
following would show no pages as dirty (assuming no vcpus have run
in the meantime). With this change, the dirty bits on HPT entries
are not cleared at the point where dirty tracking is enabled, so
KVM_GET_DIRTY_LOG would show as dirty any guest pages that are
resident in the HPT and dirty. This is consistent with what happens
on radix.

This also fixes a bug in the mark_pages_dirty() function for radix
(in the sense that the function no longer exists). In the case where
a large page of 64 normal pages or more is marked dirty, the
addressing of the dirty bitmap was incorrect and could write past
the end of the bitmap. Fortunately this case was never hit in
practice because a 2MB large page is only 32 x 64kB pages, and we
don't support backing the guest with 1GB huge pages at this point.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>

e641a317

KVM: PPC: Book3S HV: Rename hpte_setup_done to mmu_ready · 1b151ce4

Paul Mackerras authored Sep 13, 2017

This renames the kvm->arch.hpte_setup_done field to mmu_ready because
we will want to use it for radix guests too -- both for setting things
up before vcpu execution, and for excluding vcpus from executing while
MMU-related things get changed, such as in future switching the MMU
from radix to HPT mode or vice-versa.

This also moves the call to kvmppc_setup_partition_table() that was
done in kvmppc_hv_setup_htab_rma() for HPT guests, and the setting
of mmu_ready, into the caller in kvmppc_vcpu_run_hv().
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>

1b151ce4

KVM: PPC: Book3S HV: Don't rely on host's page size information · 8dc6cca5

Paul Mackerras authored Sep 11, 2017

This removes the dependence of KVM on the mmu_psize_defs array (which
stores information about hardware support for various page sizes) and
the things derived from it, chiefly hpte_page_sizes[], hpte_page_size(),
hpte_actual_page_size() and get_sllp_encoding().  We also no longer
rely on the mmu_slb_size variable or the MMU_FTR_1T_SEGMENTS feature
bit.

The reason for doing this is so we can support a HPT guest on a radix
host.  In a radix host, the mmu_psize_defs array contains information
about page sizes supported by the MMU in radix mode rather than the
page sizes supported by the MMU in HPT mode.  Similarly, mmu_slb_size
and the MMU_FTR_1T_SEGMENTS bit are not set.

Instead we hard-code knowledge of the behaviour of the HPT MMU in the
POWER7, POWER8 and POWER9 processors (which are the only processors
supported by HV KVM) - specifically the encoding of the LP fields in
the HPT and SLB entries, and the fact that they have 32 SLB entries
and support 1TB segments.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>

8dc6cca5

Merge remote-tracking branch 'remotes/powerpc/topic/ppc-kvm' into kvm-ppc-next · 3e8f150a

Paul Mackerras authored Nov 01, 2017

This merges in the ppc-kvm topic branch of the powerpc tree to get the
commit that reverts the patch "KVM: PPC: Book3S HV: POWER9 does not
require secondary thread management".  This is needed for subsequent
patches which will be applied on this branch.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>

3e8f150a

KVM: PPC: Book3S: Fix gas warning due to using r0 as immediate 0 · 93897a1f

Nicholas Piggin authored Oct 21, 2017

This fixes the message:

arch/powerpc/kvm/book3s_segment.S: Assembler messages:
arch/powerpc/kvm/book3s_segment.S:330: Warning: invalid register expression
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>

93897a1f

KVM: PPC: Book3S PR: Only install valid SLBs during KVM_SET_SREGS · f4093ee9

Greg Kurz authored Oct 16, 2017

Userland passes an array of 64 SLB descriptors to KVM_SET_SREGS,
some of which are valid (ie, SLB_ESID_V is set) and the rest are
likely all-zeroes (with QEMU at least).

Each of them is then passed to kvmppc_mmu_book3s_64_slbmte(), which
assumes to find the SLB index in the 3 lower bits of its rb argument.
When passed zeroed arguments, it happily overwrites the 0th SLB entry
with zeroes. This is exactly what happens while doing live migration
with QEMU when the destination pushes the incoming SLB descriptors to
KVM PR. When reloading the SLBs at the next synchronization, QEMU first
clears its SLB array and only restore valid ones, but the 0th one is
now gone and we cannot access the corresponding memory anymore:

(qemu) x/x $pc
c0000000000b742c: Cannot access memory

To avoid this, let's filter out non-valid SLB entries. While here, we
also force a full SLB flush before installing new entries. Since SLB
is for 64-bit only, we now build this path conditionally to avoid a
build break on 32-bit, which doesn't define SLB_ESID_V.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>

f4093ee9

KVM: PPC: Book3S HV: Don't call real-mode XICS hypercall handlers if not enabled · 00bb6ae5

Paul Mackerras authored Oct 26, 2017

When running a guest on a POWER9 system with the in-kernel XICS
emulation disabled (for example by running QEMU with the parameter
"-machine pseries,kernel_irqchip=off"), the kernel does not pass
the XICS-related hypercalls such as H_CPPR up to userspace for
emulation there as it should.

The reason for this is that the real-mode handlers for these
hypercalls don't check whether a XICS device has been instantiated
before calling the xics-on-xive code.  That code doesn't check
either, leading to potential NULL pointer dereferences because
vcpu->arch.xive_vcpu is NULL.  Those dereferences won't cause an
exception in real mode but will lead to kernel memory corruption.

This fixes it by adding kvmppc_xics_enabled() checks before calling
the XICS functions.

Cc: stable@vger.kernel.org # v4.11+
Fixes: 5af50993 ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller")
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>

00bb6ae5

20 Oct, 2017 1 commit

KVM: PPC: Tie KVM_CAP_PPC_HTM to the user-visible TM feature · 2a3d6553

Michael Ellerman authored Oct 12, 2017

Currently we use CPU_FTR_TM to decide if the CPU/kernel can support
TM (Transactional Memory), and if it's true we advertise that to
Qemu (or similar) via KVM_CAP_PPC_HTM.

PPC_FEATURE2_HTM is the user-visible feature bit, which indicates that
the CPU and kernel can support TM. Currently CPU_FTR_TM and
PPC_FEATURE2_HTM always have the same value, either true or false, so
using the former for KVM_CAP_PPC_HTM is correct.

However some Power9 CPUs can operate in a mode where TM is enabled but
TM suspended state is disabled. In this mode CPU_FTR_TM is true, but
PPC_FEATURE2_HTM is false. Instead a different PPC_FEATURE2 bit is
set, to indicate that this different mode of TM is available.

It is not safe to let guests use TM as-is, when the CPU is in this
mode. So to prevent that from happening, use PPC_FEATURE2_HTM to
determine the value of KVM_CAP_PPC_HTM.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

2a3d6553

19 Oct, 2017 1 commit

Revert "KVM: PPC: Book3S HV: POWER9 does not require secondary thread management" · 31a4d448

Paul Mackerras authored Oct 19, 2017

This reverts commit 94a04bc2.

In order to run HPT guests on a radix POWER9 host, we will have to run
the host in single-threaded mode, because POWER9 processors do not
currently support running some threads of a core in HPT mode while
others are in radix mode ("mixed mode").

That means that we will need the same mechanisms that are used on
POWER8 to make the secondary threads available to KVM, which were
disabled on POWER9 by commit 94a04bc2.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

31a4d448

15 Oct, 2017 1 commit

KVM: PPC: Book3S HV: Explicitly disable HPT operations on radix guests · 891f1ebf

Paul Mackerras authored Sep 13, 2017

This adds code to make sure that we don't try to access the
non-existent HPT for a radix guest using the htab file for the VM
in debugfs, a file descriptor obtained using the KVM_PPC_GET_HTAB_FD
ioctl, or via the KVM_PPC_RESIZE_HPT_{PREPARE,COMMIT} ioctls.

At present nothing bad happens if userspace does access these
interfaces on a radix guest, mostly because kvmppc_hpt_npte()
gives 0 for a radix guest, which in turn is because 1 << -4
comes out as 0 on POWER processors.  However, that relies on
undefined behaviour, so it is better to be explicit about not
accessing the HPT for a radix guest.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>

891f1ebf

14 Oct, 2017 5 commits

KVM: PPC: Book3S PR: Enable in-kernel TCE handlers for PR KVM · 3f2bb764

Alexey Kardashevskiy authored Oct 11, 2017

The handlers support PR KVM from the day one; however the PR KVM's
enable/disable hcalls handler missed these ones.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>

3f2bb764

KVM: PPC: Book3S HV: Delete an error message for a failed memory allocation in... · 9c7e53dc

Markus Elfring authored Oct 05, 2017

KVM: PPC: Book3S HV: Delete an error message for a failed memory allocation in kvmppc_allocate_hpt()

Omit an extra message for a memory allocation failure in this function.

This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>

9c7e53dc

KVM: PPC: BookE: Use vma_pages function · 4bdcb701

Thomas Meyer authored Sep 21, 2017

Use vma_pages function on vma object instead of explicit computation.
Found by coccinelle spatch "api/vma_pages.cocci"
Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>

4bdcb701

KVM: PPC: Book3S HV: Use ARRAY_SIZE macro · 4bb817ed

Thomas Meyer authored Sep 03, 2017

Use ARRAY_SIZE macro, rather than explicitly coding some variant of it
yourself.
Found with: find -type f -name "*.c" -o -name "*.h" | xargs perl -p -i -e
's/\bsizeof\s*\(\s*(\w+)\s*\)\s*\ /\s*sizeof\s*\(\s*\1\s*\[\s*0\s*\]\s*\)
/ARRAY_SIZE(\1)/g' and manual check/verification.
Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>

4bb817ed

KVM: PPC: Book3S HV: Handle unexpected interrupts better · 857b99e1

Paul Mackerras authored Sep 01, 2017

At present, if an interrupt (i.e. an exception or trap) occurs in the
code where KVM is switching the MMU to or from guest context, we jump
to kvmppc_bad_host_intr, where we simply spin with interrupts disabled.
In this situation, it is hard to debug what happened because we get no
indication as to which interrupt occurred or where. Typically we get
a cascade of stall and soft lockup warnings from other CPUs.

In order to get more information for debugging, this adds code to
create a stack frame on the emergency stack and save register values
to it. We start half-way down the emergency stack in order to give
ourselves some chance of being able to do a stack trace on secondary
threads that are already on the emergency stack.

On POWER7 or POWER8, we then just spin, as before, because we don't
know what state the MMU context is in or what other threads are doing,
and we can't switch back to host context without coordinating with
other threads. On POWER9 we can do better; there we load up the host
MMU context and jump to C code, which prints an oops message to the
console and panics.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>

857b99e1

12 Oct, 2017 23 commits

KVM: x86: extend usage of RET_MMIO_PF_* constants · 9b8ebbdb

Paolo Bonzini authored Aug 17, 2017

The x86 MMU if full of code that returns 0 and 1 for retry/emulate.  Use
the existing RET_MMIO_PF_RETRY/RET_MMIO_PF_EMULATE enum, renaming it to
drop the MMIO part.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

9b8ebbdb

KVM: nSVM: fix SMI injection in guest mode · 05cade71

Ladi Prosek authored Oct 11, 2017

Entering SMM while running in guest mode wasn't working very well because several
pieces of the vcpu state were left set up for nested operation.

Some of the issues observed:

* L1 was getting unexpected VM exits (using L1 interception controls but running
  in SMM execution environment)
* MMU was confused (walk_mmu was still set to nested_mmu)
* INTERCEPT_SMI was not emulated for L1 (KVM never injected SVM_EXIT_SMI)

Intel SDM actually prescribes the logical processor to "leave VMX operation" upon
entering SMM in 34.14.1 Default Treatment of SMI Delivery. AMD doesn't seem to
document this but they provide fields in the SMM state-save area to stash the
current state of SVM. What we need to do is basically get out of guest mode for
the duration of SMM. All this completely transparent to L1, i.e. L1 is not given
control and no L1 observable state changes.

To avoid code duplication this commit takes advantage of the existing nested
vmexit and run functionality, perhaps at the cost of efficiency. To get out of
guest mode, nested_svm_vmexit is called, unchanged. Re-entering is performed using
enter_svm_guest_mode.

This commit fixes running Windows Server 2016 with Hyper-V enabled in a VM with
OVMF firmware (OVMF_CODE-need-smm.fd).
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

05cade71

KVM: nSVM: refactor nested_svm_vmrun · c2634065

Ladi Prosek authored Oct 11, 2017

Analogous to 858e25c0 ("kvm: nVMX: Refactor nested_vmx_run()"), this commit splits
nested_svm_vmrun into two parts. The newly introduced enter_svm_guest_mode modifies the
vcpu state to transition from L1 to L2, while the code left in nested_svm_vmrun handles
the VMRUN instruction.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

c2634065

KVM: nVMX: fix SMI injection in guest mode · 72e9cbdb

Ladi Prosek authored Oct 11, 2017

Entering SMM while running in guest mode wasn't working very well because several
pieces of the vcpu state were left set up for nested operation.

Some of the issues observed:

* L1 was getting unexpected VM exits (using L1 interception controls but running
  in SMM execution environment)
* SMM handler couldn't write to vmx_set_cr4 because of incorrect validity checks
  predicated on nested.vmxon
* MMU was confused (walk_mmu was still set to nested_mmu)

Intel SDM actually prescribes the logical processor to "leave VMX operation" upon
entering SMM in 34.14.1 Default Treatment of SMI Delivery. What we need to do is
basically get out of guest mode and set nested.vmxon to false for the duration of
SMM. All this completely transparent to L1, i.e. L1 is not given control and no
L1 observable state changes.

To avoid code duplication this commit takes advantage of the existing nested
vmexit and run functionality, perhaps at the cost of efficiency. To get out of
guest mode, nested_vmx_vmexit with exit_reason == -1 is called, a trick already
used in vmx_leave_nested. Re-entering is cleaner, using enter_vmx_non_root_mode.

This commit fixes running Windows Server 2016 with Hyper-V enabled in a VM with
OVMF firmware (OVMF_CODE-need-smm.fd).
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

72e9cbdb

KVM: nVMX: set IDTR and GDTR limits when loading L1 host state · 21f2d551

Ladi Prosek authored Oct 11, 2017

Intel SDM 27.5.2 Loading Host Segment and Descriptor-Table Registers:

"The GDTR and IDTR limits are each set to FFFFH."
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

21f2d551

KVM: x86: introduce ISA specific smi_allowed callback · 72d7b374

Ladi Prosek authored Oct 11, 2017

Similar to NMI, there may be ISA specific reasons why an SMI cannot be
injected into the guest. This commit adds a new smi_allowed callback to
be implemented in following commits.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

72d7b374

KVM: x86: introduce ISA specific SMM entry/exit callbacks · 0234bf88

Ladi Prosek authored Oct 11, 2017

Entering and exiting SMM may require ISA specific handling under certain
circumstances. This commit adds two new callbacks with empty implementations.
Actual functionality will be added in following commits.

* pre_enter_smm() is to be called when injecting an SMM, before any
  SMM related vcpu state has been changed
* pre_leave_smm() is to be called when emulating the RSM instruction,
  when the vcpu is in real mode and before any SMM related vcpu state
  has been restored
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

0234bf88

KVM: SVM: limit kvm_handle_page_fault to #PF handling · d0006530

Paolo Bonzini authored Aug 11, 2017

It has always annoyed me a bit how SVM_EXIT_NPF is handled by
pf_interception.  This is also the only reason behind the
under-documented need_unprotect argument to kvm_handle_page_fault.
Let NPF go straight to kvm_mmu_page_fault, just like VMX
does in handle_ept_violation and handle_ept_misconfig.
Reviewed-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

d0006530

KVM: SVM: unconditionally wake up VCPU on IOMMU interrupt · 1cf53587

Paolo Bonzini authored Oct 10, 2017

Checking the mode is unnecessary, and is done without a memory barrier
separating the LAPIC write from the vcpu->mode read; in addition,
kvm_vcpu_wake_up is already doing a check for waiters on the wait queue
that has the same effect.

In practice it's safe because spin_lock has full-barrier semantics on x86,
but don't be too clever.
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

1cf53587

arch/x86: remove redundant null checks before kmem_cache_destroy · c1bd743e

Tim Hansen authored Oct 07, 2017

Remove redundant null checks before calling kmem_cache_destroy.

Found with make coccicheck M=arch/x86/kvm on linux-next tag
next-20170929.
Signed-off-by: Tim Hansen <devtimhansen@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

c1bd743e

KVM: VMX: Don't expose unrestricted_guest is enabled if ept is disabled · 8ad8182e

Wanpeng Li authored Oct 09, 2017

SDM mentioned:

 "If either the â€œunrestricted guestâ€ VM-execution control or the â€œmode-based
  execute control for EPTâ€ VM- execution control is 1, the â€œenable EPTâ€
  VM-execution control must also be 1."

However, we can still observe unrestricted_guest is Y after inserting the kvm-intel.ko
w/ ept=N. It depends on later starts a guest in order that the function
vmx_compute_secondary_exec_control() can be executed, then both the module parameter
and exec control fields will be amended.

This patch fixes it by amending module parameter immediately during vmcs data setup.
Reviewed-by: Jim Mattson <jmattson@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

8ad8182e

KVM: X86: Processor States following Reset or INIT · a554d207

Wanpeng Li authored Oct 11, 2017

- XCR0 is reset to 1 by RESET but not INIT
- XSS is zeroed by both RESET and INIT
- BNDCFGU, BND0-BND3, BNDCFGS, BNDSTATUS are zeroed by both RESET and INIT

This patch does this according to SDM.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

a554d207

KVM: x86: thoroughly disarm LAPIC timer around TSC deadline switch · 44275932

Radim Krčmář authored Oct 06, 2017

Our routines look at tscdeadline and period when deciding state of a
timer.  The timer is disarmed when switching between TSC deadline and
other modes, so we should set everything to disarmed state.
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

44275932

KVM: x86: really disarm lapic timer when clearing TMICT · 5d74a699

Radim Krčmář authored Oct 06, 2017

preemption timer only looks at tscdeadline and could inject already
disarmed timer.
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

5d74a699

KVM: x86: handle 0 write to TSC_DEADLINE MSR · 86bbc1e6

Radim Krčmář authored Oct 06, 2017

0 should disable the timer, but start_hv_timer will recognize it as an
expired timer instead.
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

86bbc1e6

kvm, mm: account kvm related kmem slabs to kmemcg · 46bea48a

Shakeel Butt authored Oct 05, 2017

The kvm slabs can consume a significant amount of system memory
and indeed in our production environment we have observed that
a lot of machines are spending significant amount of memory that
can not be left as system memory overhead. Also the allocations
from these slabs can be triggered directly by user space applications
which has access to kvm and thus a buggy application can leak
such memory. So, these caches should be accounted to kmemcg.
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

46bea48a

KVM: VMX: rename RDSEED and RDRAND vmx ctrls to reflect exiting · 736fdf72

David Hildenbrand authored Aug 24, 2017

Let's just name these according to the SDM. This should make it clearer
that the are used to enable exiting and not the feature itself.
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>

736fdf72

KVM: x86: allow setting identity map addr with no vcpus only · 1af1ac91

David Hildenbrand authored Aug 24, 2017

Changing it afterwards doesn't make too much sense and will only result
in inconsistencies.
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>

1af1ac91

KVM: x86: document special identity map address value · 726b99c4

David Hildenbrand authored Aug 24, 2017

Setting it to 0 leads to setting it to the default value, let's document
this.
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>

726b99c4

KVM: VMX: cleanup init_rmode_identity_map() · d8a6e365

David Hildenbrand authored Aug 24, 2017

No need for another enable_ept check. kvm->arch.ept_identity_map_addr
only has to be inititalized once. Having alloc_identity_pagetable() is
overkill and dropping BUG_ONs is always nice.
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>

d8a6e365

KVM: nVMX: no need to set ept/vpid caps to 0 · 1c13bffd

David Hildenbrand authored Aug 24, 2017

They are inititally 0, so no need to reset them to 0.
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>

1c13bffd

KVM: nVMX: no need to set vcpu->cpu when switching vmcs · 0ee096d0

David Hildenbrand authored Aug 24, 2017

vcpu->cpu is not cleared when doing a vmx_vcpu_put/load, so this can be
dropped.
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>

0ee096d0

KVM: VMX: drop unnecessary function declarations · 9522ea9e

David Hildenbrand authored Aug 24, 2017

Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>

9522ea9e