Commits · 27469d29b3caf889ddf81c7d89f0676e45eb551d · Kirill Smelkov / linux

22 Apr, 2013 14 commits

KVM: x86: Fix memory leak in vmx.c · 27469d29

Andrew Honig authored Apr 18, 2013

If userspace creates and destroys multiple VMs within the same process
we leak 20k of memory in the userspace process context per VM.  This
patch frees the memory in kvm_arch_destroy_vm.  If the process exits
without closing the VM file descriptor or the file descriptor has been
shared with another process then we don't free the memory.

It's still possible for a user space process to leak memory if the last
process to close the fd for the VM is not the process that created it.
However, this is an unexpected case that's only caused by a user space
process that's misbehaving.
Signed-off-by: Andrew Honig <ahonig@google.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

27469d29

KVM: x86: fix error return code in kvm_arch_vcpu_init() · f1797359

Wei Yongjun authored Apr 18, 2013

Fix to return a negative error code from the error handling
case instead of 0, as returned elsewhere in this function.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

f1797359

KVM: nVMX: Enable and disable shadow vmcs functionality · 8a1b9dd0

Abel Gordon authored Apr 18, 2013

Once L1 loads VMCS12 we enable shadow-vmcs capability and copy all the VMCS12
shadowed fields to the shadow vmcs.  When we release the VMCS12, we also
disable shadow-vmcs capability.
Signed-off-by: Abel Gordon <abelg@il.ibm.com>
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

8a1b9dd0

KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs · 012f83cb

Abel Gordon authored Apr 18, 2013

Synchronize between the VMCS12 software controlled structure and the
processor-specific shadow vmcs
Signed-off-by: Abel Gordon <abelg@il.ibm.com>
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

012f83cb

KVM: nVMX: Copy VMCS12 to processor-specific shadow vmcs · c3114420

Abel Gordon authored Apr 18, 2013

Introduce a function used to copy fields from the software controlled VMCS12
to the processor-specific shadow vmcs
Signed-off-by: Abel Gordon <abelg@il.ibm.com>
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

c3114420

KVM: nVMX: Copy processor-specific shadow-vmcs to VMCS12 · 16f5b903

Abel Gordon authored Apr 18, 2013

Introduce a function used to copy fields from the processor-specific shadow
vmcs to the software controlled VMCS12
Signed-off-by: Abel Gordon <abelg@il.ibm.com>
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

16f5b903

KVM: nVMX: Release shadow vmcs · e7953d7f

Abel Gordon authored Apr 18, 2013

Unmap vmcs12 and release the corresponding shadow vmcs
Signed-off-by: Abel Gordon <abelg@il.ibm.com>
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

e7953d7f

KVM: nVMX: Allocate shadow vmcs · 8de48833

Abel Gordon authored Apr 18, 2013

Allocate a shadow vmcs used by the processor to shadow part of the fields
stored in the software defined VMCS12 (let L1 access fields without causing
exits). Note we keep a shadow vmcs only for the current vmcs12.  Once a vmcs12
becomes non-current, its shadow vmcs is released.
Signed-off-by: Abel Gordon <abelg@il.ibm.com>
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

8de48833

KVM: nVMX: Fix VMXON emulation · 145c28dd

Abel Gordon authored Apr 18, 2013

handle_vmon doesn't check if L1 is already in root mode (VMXON
was previously called). This patch adds this missing check and calls
nested_vmx_failValid if VMX is already ON.
We need this check because L0 will allocate the shadow vmcs when L1
executes VMXON and we want to avoid host leaks (due to shadow vmcs
allocation) if L1 executes VMXON repeatedly.
Signed-off-by: Abel Gordon <abelg@il.ibm.com>
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

145c28dd

KVM: nVMX: Refactor handle_vmwrite · 20b97fea

Abel Gordon authored Apr 18, 2013

Refactor existent code so we re-use vmcs12_write_any to copy fields from the
shadow vmcs specified by the link pointer (used by the processor,
implementation-specific) to the VMCS12 software format used by L0 to hold
the fields in L1 memory address space.
Signed-off-by: Abel Gordon <abelg@il.ibm.com>
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

20b97fea

KVM: nVMX: Introduce vmread and vmwrite bitmaps · 4607c2d7

Abel Gordon authored Apr 18, 2013

Prepare vmread and vmwrite bitmaps according to a pre-specified list of fields.
These lists are intended to specifiy most frequent accessed fields so we can
minimize the number of fields that are copied from/to the software controlled
VMCS12 format to/from to processor-specific shadow vmcs. The lists were built
measuring the VMCS fields access rate after L2 Ubuntu 12.04 booted when it was
running on top of L1 KVM, also Ubuntu 12.04. Note that during boot there were
additional fields which were frequently modified but they were not added to
these lists because after boot these fields were not longer accessed by L1.
Signed-off-by: Abel Gordon <abelg@il.ibm.com>
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

4607c2d7

KVM: nVMX: Detect shadow-vmcs capability · abc4fc58

Abel Gordon authored Apr 18, 2013

Add logic required to detect if shadow-vmcs is supported by the
processor. Introduce a new kernel module parameter to specify if L0 should use
shadow vmcs (or not) to run L1.
Signed-off-by: Abel Gordon <abelg@il.ibm.com>
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

abc4fc58

KVM: nVMX: Shadow-vmcs control fields/bits · 89662e56

Abel Gordon authored Apr 18, 2013

Add definitions for all the vmcs control fields/bits
required to enable vmcs-shadowing
Signed-off-by: Abel Gordon <abelg@il.ibm.com>
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

89662e56

Merge git://github.com/agraf/linux-2.6.git kvm-ppc-next into queue · d10ab869
Gleb Natapov authored Apr 22, 2013

d10ab869

18 Apr, 2013 2 commits

KVM: ia64: Fix kvm_vm_ioctl_irq_line · e6062647

Yang Zhang authored Apr 17, 2013

Fix the compile error with kvm_vm_ioctl_irq_line.
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

e6062647

KVM: x86: Fix posted interrupt with CONFIG_SMP=n · 6ffbbbba

Zhang, Yang Z authored Apr 17, 2013

->send_IPI_mask is not defined on UP.
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

6ffbbbba

17 Apr, 2013 4 commits

kvm/ppc: don't call complete_mmio_load when it's a store · be28a27c

Scott Wood authored Apr 15, 2013

complete_mmio_load writes back the mmio result into the
destination register.  Doing this on a store results in
register corruption.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

be28a27c

KVM: PPC: emulate dcbst · c32498ee

Stuart Yoder authored Apr 09, 2013

Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

c32498ee

Added ONE_REG interface for debug instruction · 8c32a2ea

Bharat Bhushan authored Mar 20, 2013

This patch adds the one_reg interface to get the special instruction
to be used for setting software breakpoint from userspace.
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

8c32a2ea

Merge commit 'origin/next' into kvm-ppc-next · fca7567c
Alexander Graf authored Apr 17, 2013

fca7567c

16 Apr, 2013 17 commits

KVM: VMX: Fix check guest state validity if a guest is in VM86 mode · f13882d8

Gleb Natapov authored Apr 14, 2013

If guest vcpu is in VM86 mode the vcpu state should be checked as if in
real mode.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

f13882d8

KVM: nVMX: check vmcs12 for valid activity state · 26539bd0

Paolo Bonzini authored Apr 15, 2013

KVM does not use the activity state VMCS field, and does not support
it in nested VMX either (the corresponding bits in the misc VMX feature
MSR are zero).  Fail entry if the activity state is set to anything but
"active".

Since the value will always be the same for L1 and L2, we do not need
to read and write the corresponding VMCS field on L1/L2 transitions,
either.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

26539bd0

KVM: ARM: Fix kvm_vm_ioctl_irq_line · 79558f11

Alexander Graf authored Apr 16, 2013

Commit aa2fbe6d broke the ARM KVM target by introducing a new parameter
to irq handling functions.

Fix the function prototype to get things compiling again and ignore the
parameter just like we did before
Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Christoffer Dall <cdall@cs.columbia.edu>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

79558f11

KVM: VMX: Use posted interrupt to deliver virtual interrupt · 5a71785d

Yang Zhang authored Apr 11, 2013

If posted interrupt is avaliable, then uses it to inject virtual
interrupt to guest.
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

5a71785d

KVM: VMX: Add the deliver posted interrupt algorithm · a20ed54d

Yang Zhang authored Apr 11, 2013

Only deliver the posted interrupt when target vcpu is running
and there is no previous interrupt pending in pir.
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

a20ed54d

KVM: Set TMR when programming ioapic entry · cf9e65b7

Yang Zhang authored Apr 11, 2013

We already know the trigger mode of a given interrupt when programming
the ioapice entry. So it's not necessary to set it in each interrupt
delivery.
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

cf9e65b7

KVM: Call common update function when ioapic entry changed. · 3d81bc7e

Yang Zhang authored Apr 11, 2013

Both TMR and EOI exit bitmap need to be updated when ioapic changed
or vcpu's id/ldr/dfr changed. So use common function instead eoi exit
bitmap specific function.
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

3d81bc7e

KVM: VMX: Check the posted interrupt capability · 01e439be

Yang Zhang authored Apr 11, 2013

Detect the posted interrupt feature. If it exists, then set it in vmcs_config.
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

01e439be

KVM: VMX: Register a new IPI for posted interrupt · d78f2664

Yang Zhang authored Apr 11, 2013

Posted Interrupt feature requires a special IPI to deliver posted interrupt
to guest. And it should has a high priority so the interrupt will not be
blocked by others.
Normally, the posted interrupt will be consumed by vcpu if target vcpu is
running and transparent to OS. But in some cases, the interrupt will arrive
when target vcpu is scheduled out. And host will see it. So we need to
register a dump handler to handle it.
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

d78f2664

KVM: VMX: Enable acknowledge interupt on vmexit · a547c6db

Yang Zhang authored Apr 11, 2013

The "acknowledge interrupt on exit" feature controls processor behavior
for external interrupt acknowledgement. When this control is set, the
processor acknowledges the interrupt controller to acquire the
interrupt vector on VM exit.

After enabling this feature, an interrupt which arrived when target cpu is
running in vmx non-root mode will be handled by vmx handler instead of handler
in idt. Currently, vmx handler only fakes an interrupt stack and jump to idt
table to let real handler to handle it. Further, we will recognize the interrupt
and only delivery the interrupt which not belong to current vcpu through idt table.
The interrupt which belonged to current vcpu will be handled inside vmx handler.
This will reduce the interrupt handle cost of KVM.

Also, interrupt enable logic is changed if this feature is turnning on:
Before this patch, hypervior call local_irq_enable() to enable it directly.
Now IF bit is set on interrupt stack frame, and will be enabled on a return from
interrupt handler if exterrupt interrupt exists. If no external interrupt, still
call local_irq_enable() to enable it.

Refer to Intel SDM volum 3, chapter 33.2.
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

a547c6db

KVM: Use eoi to track RTC interrupt delivery status · 2c2bf011

Yang Zhang authored Apr 11, 2013

Current interrupt coalescing logci which only used by RTC has conflict
with Posted Interrupt.
This patch introduces a new mechinism to use eoi to track interrupt:
When delivering an interrupt to vcpu, the pending_eoi set to number of
vcpu that received the interrupt. And decrease it when each vcpu writing
eoi. No subsequent RTC interrupt can deliver to vcpu until all vcpus
write eoi.
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

2c2bf011

KVM: Let ioapic know the irq line status · aa2fbe6d

Yang Zhang authored Apr 11, 2013

Userspace may deliver RTC interrupt without query the status. So we
want to track RTC EOI for this case.
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

aa2fbe6d

KVM: Force vmexit with virtual interrupt delivery · f3bff631

Yang Zhang authored Apr 11, 2013

Need the EOI to track interrupt deliver status, so force vmexit
on EOI for rtc interrupt when enabling virtual interrupt delivery.
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

f3bff631

KVM: Add reset/restore rtc_status support · 10606919

Yang Zhang authored Apr 11, 2013

restore rtc_status from migration or save/restore
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

10606919

KVM: Return destination vcpu on interrupt injection · b4f2225c

Yang Zhang authored Apr 11, 2013

Add a new parameter to know vcpus who received the interrupt.
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

b4f2225c

KVM: Introduce struct rtc_status · 8dc6aade

Yang Zhang authored Apr 11, 2013

rtc_status is used to track RTC interrupt delivery status. The pending_eoi
will be increased by vcpu who received RTC interrupt and will be decreased
when EOI to this interrupt.
Also, we use dest_map to record the destination vcpu to avoid the case that
vcpu who didn't get the RTC interupt, but issued EOI with same vector of RTC
and descreased pending_eoi by mistake.
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

8dc6aade

KVM: Add vcpu info to ioapic_update_eoi() · 1fcc7890

Yang Zhang authored Apr 11, 2013

Add vcpu info to ioapic_update_eoi, so we can know which vcpu
issued this EOI.
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

1fcc7890

14 Apr, 2013 3 commits

KVM: nVMX: Avoid reading VM_EXIT_INTR_ERROR_CODE needlessly on nested exits · c0d1c770

Jan Kiszka authored Apr 14, 2013

We only need to update vm_exit_intr_error_code if there is a valid exit
interruption information and it comes with a valid error code.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

c0d1c770

KVM: nVMX: Fix conditions for interrupt injection · e8457c67

Jan Kiszka authored Apr 14, 2013

If we are entering guest mode, we do not want L0 to interrupt this
vmentry with all its side effects on the vmcs. Therefore, injection
shall be disallowed during L1->L2 transitions, as in the previous
version. However, this check is conceptually independent of
nested_exit_on_intr, so decouple it.

If L1 traps external interrupts, we can kick the guest from L2 to L1,
also just like the previous code worked. But we no longer need to
consider L1's idt_vectoring_info_field. It will always be empty at this
point. Instead, if L2 has pending events, those are now found in the
architectural queues and will, thus, prevent vmx_interrupt_allowed from
being called at all.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

e8457c67

KVM: nVMX: Rework event injection and recovery · 5f3d5799

Jan Kiszka authored Apr 14, 2013

The basic idea is to always transfer the pending event injection on
vmexit into the architectural state of the VCPU and then drop it from
there if it turns out that we left L2 to enter L1, i.e. if we enter
prepare_vmcs12.

vmcs12_save_pending_events takes care to transfer pending L0 events into
the queue of L1. That is mandatory as L1 may decide to switch the guest
state completely, invalidating or preserving the pending events for
later injection (including on a different node, once we support
migration).

This concept is based on the rule that a pending vmlaunch/vmresume is
not canceled. Otherwise, we would risk to lose injected events or leak
them into the wrong queues. Encode this rule via a WARN_ON_ONCE at the
entry of nested_vmx_vmexit.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

5f3d5799