Commits · 60395224d94945544f1f9dce5566981844bf0e77 · Kirill Smelkov / linux

30 Jan, 2008 40 commits

KVM: Add a might_sleep() annotation to gfn_to_page() · 60395224

Avi Kivity authored Oct 21, 2007

This will help trap accesses to guest memory in atomic context.
Signed-off-by: Avi Kivity <avi@qumranet.com>

60395224

KVM: Move vmx_vcpu_reset() out of vmx_vcpu_setup() · e00c8cf2

Avi Kivity authored Oct 21, 2007

Split guest reset code out of vmx_vcpu_setup().  Besides being cleaner, this
moves the realmode tss setup (which can sleep) outside vmx_vcpu_setup()
(which is executed with preemption enabled).

[izik: remove unused variable]
Signed-off-by: Avi Kivity <avi@qumranet.com>

e00c8cf2

KVM: Portability: Split kvm_vcpu into arch dependent and independent parts (part 1) · 34c16eec

Zhang Xiantao authored Oct 20, 2007

First step to split kvm_vcpu.  Currently, we just use an macro to define
the common fields in kvm_vcpu for all archs, and all archs need to define
its own kvm_vcpu struct.
Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

34c16eec

KVM: Allocate userspace memory for older userspace · 8d4e1288

Anthony Liguori authored Oct 18, 2007

Allocate a userspace buffer for older userspaces.  Also eliminate phys_mem
buffer.  The memset() in kvmctl really kills initial memory usage but swapping
works even with old userspaces.

A side effect is that maximum guest side is reduced for older userspace on
i386.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

8d4e1288

KVM: Use virtual cpu accounting if available for guest times. · e56a7a28

Christian Borntraeger authored Oct 18, 2007

ppc and s390 offer the possibility to track process times precisely
by looking at cpu timer on every context switch, irq, softirq etc.
We can use that infrastructure as well for guest time accounting.
We need to account the used time before we change the state.
This patch adds a call to account_system_vtime to kvm_guest_enter
and kvm_guest exit. If CONFIG_VIRT_CPU_ACCOUNTING is not set,
account_system_vtime is defined in hardirq.h as an empty function,
which means this patch does not change the behaviour on other
platforms.

I compile tested this patch on x86 and function tested the patch on
s390.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

e56a7a28

KVM: MMU: Partial swapping of guest memory · 8a7ae055

Izik Eidus authored Oct 18, 2007

This allows guest memory to be swapped.  Pages which are currently mapped
via shadow page tables are pinned into memory, but all other pages can
be freely swapped.

The patch makes gfn_to_page() elevate the page's reference count, and
introduces kvm_release_page() that pairs with it.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

8a7ae055

KVM: MMU: Make gfn_to_page() always safe · cea7bb21

Izik Eidus authored Oct 17, 2007

In case the page is not present in the guest memory map, return a dummy
page the guest can scribble on.

This simplifies error checking in its users.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

cea7bb21

KVM: MMU: Keep a reverse mapping of non-writable translations · 9647c14c

Izik Eidus authored Oct 16, 2007

The current kvm mmu only reverse maps writable translation.  This is used
to write-protect a page in case it becomes a pagetable.

But with swapping support, we need a reverse mapping of read-only pages as
well:  when we evict a page, we need to remove any mapping to it, whether
writable or not.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

9647c14c

KVM: MMU: Add rmap_next(), a helper for walking kvm rmaps · 98348e95
Izik Eidus authored Oct 16, 2007
```
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
```
98348e95

KVM: x86 emulator: cmc, clc, cli, sti · b284be57

Nitin A Kamble authored Oct 16, 2007

Instruction: cmc, clc, cli, sti
opcodes: 0xf5, 0xf8, 0xfa, 0xfb respectively.

[avi: fix reference to EFLG_IF which is not defined anywhere]
Signed-off-by: Nitin A Kamble <nitin.a.kamble@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

b284be57

KVM: MMU: Simplify page table walker · 42bf3f0a

Avi Kivity authored Oct 17, 2007

Simplify the walker level loop not to carry so much information from one
loop to the next. In addition to being complex, this made kmap_atomic()
critical sections difficult to manage.

As a result of this change, kmap_atomic() sections are limited to actually
touching the guest pte, which allows the other functions called from the
walker to do sleepy operations. This will happen when we enable swapping.
Signed-off-by: Avi Kivity <avi@qumranet.com>

42bf3f0a

KVM: x86 emulator: Implement emulation of instruction: inc & dec · d77a2507

Nitin A Kamble authored Oct 12, 2007

Instructions:
	inc r16/r32 (opcode 0x40-0x47)
	dec r16/r32 (opcode 0x48-0x4f)
Signed-off-by: Nitin A Kamble <nitin.a.kamble@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

d77a2507

KVM: Rename KVM_TLB_FLUSH to KVM_REQ_TLB_FLUSH · 3176bc3e

Avi Kivity authored Oct 16, 2007

We now have a new namespace, KVM_REQ_*, for bits in vcpu->requests.
Signed-off-by: Avi Kivity <avi@qumranet.com>

3176bc3e

KVM: Move apic timer interrupt backlog processing to common code · ab6ef34b

Avi Kivity authored Oct 16, 2007

Beside the obvious goodness of making code more common, this prevents
a livelock with the next patch which moves interrupt injection out of the
critical section.
Signed-off-by: Avi Kivity <avi@qumranet.com>

ab6ef34b

KVM: Add some \n in ioapic_debug() · e25e3ed5

Laurent Vivier authored Oct 12, 2007

Add new-line at end of debug strings.
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>

e25e3ed5

KVM: apic round robin cleanup · e4d47f40

Qing He authored Sep 24, 2007

If no apic is enabled in the bitmap of an interrupt delivery with delivery
mode of lowest priority, a warning should be reported rather than select
a fallback vcpu
Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Eddie (Yaozu) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

e4d47f40

KVM: Portability: split kvm_vcpu_ioctl · 313a3dc7

Carsten Otte authored Oct 11, 2007

This patch splits kvm_vcpu_ioctl into archtecture independent parts, and
x86 specific parts which go to kvm_arch_vcpu_ioctl in x86.c.

Common ioctls for all architectures are:
KVM_RUN, KVM_GET/SET_(S-)REGS, KVM_TRANSLATE, KVM_INTERRUPT,
KVM_DEBUG_GUEST, KVM_SET_SIGNAL_MASK, KVM_GET/SET_FPU
Note that some PPC chips don't have an FPU, so we might need an #ifdef
around KVM_GET/SET_FPU one day.

x86 specific ioctls are:
KVM_GET/SET_LAPIC, KVM_SET_CPUID, KVM_GET/SET_MSRS

An interresting aspect is vcpu_load/vcpu_put. We now have a common
vcpu_load/put which does the preemption stuff, and an architecture
specific kvm_arch_vcpu_load/put. In the x86 case, this one calls the
vmx/svm function defined in kvm_x86_ops.
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

313a3dc7

KVM: MMU: When updating the dirty bit, inform the mmu about it · c4fcc272

Avi Kivity authored Oct 11, 2007

Since the mmu uses different shadow pages for dirty large pages and clean
large pages, this allows the mmu to drop ptes that are now invalid.
Signed-off-by: Avi Kivity <avi@qumranet.com>

c4fcc272

KVM: MMU: Move dirty bit updates to a separate function · 5df34a86
Avi Kivity authored Oct 11, 2007
```
Signed-off-by: Avi Kivity <avi@qumranet.com>
```
5df34a86
KVM: MMU: Instantiate real-mode shadows as user writable shadows · 6bfccdc9
Avi Kivity authored Oct 11, 2007
```
This is consistent with real-mode permissions.
Signed-off-by: Avi Kivity <avi@qumranet.com>
```
6bfccdc9

KVM: MMU: Disable write access on clean large pages · cc70e737

Avi Kivity authored Oct 11, 2007

By forcing clean huge pages to be read-only, we have separate roles
for the shadow of a clean large page and the shadow of a dirty large
page.  This is necessary because different ptes will be instantiated
for the two cases, even for read faults.
Signed-off-by: Avi Kivity <avi@qumranet.com>

cc70e737

KVM: MMU: Fix nx access bit for huge pages · c22e3514

Avi Kivity authored Oct 11, 2007

We must set the bit before the shift, otherwise the wrong bit gets set.
Signed-off-by: Avi Kivity <avi@qumranet.com>

c22e3514

KVM: Move guest pte dirty bit management to the guest pagetable walker · e3c5e7ec

Avi Kivity authored Oct 11, 2007

This is more consistent with the accessed bit management, and makes the dirty
bit available earlier for other purposes.
Signed-off-by: Avi Kivity <avi@qumranet.com>

e3c5e7ec

KVM: MMU: More struct kvm_vcpu -> struct kvm cleanups · 4a4c9924

Anthony Liguori authored Oct 10, 2007

This time, the biggest change is gpa_to_hpa. The translation of GPA to HPA does
not depend on the VCPU state unlike GVA to GPA so there's no need to pass in
the kvm_vcpu.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

4a4c9924

KVM: MMU: Clean up MMU functions to take struct kvm when appropriate · f67a46f4

Anthony Liguori authored Oct 10, 2007

Some of the MMU functions take a struct kvm_vcpu even though they affect all
VCPUs.  This patch cleans up some of them to instead take a struct kvm.  This
makes things a bit more clear.

The main thing that was confusing me was whether certain functions need to be
called on all VCPUs.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

f67a46f4

KVM: Move x86 msr handling to new files x86.[ch] · 043405e1
Carsten Otte authored Oct 10, 2007
```
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
```
043405e1

KVM: Support assigning userspace memory to the guest · 6fc138d2

Izik Eidus authored Oct 09, 2007

Instead of having the kernel allocate memory to the guest, let userspace
allocate it and pass the address to the kernel.

This is required for s390 support, but also enables features like memory
sharing and using hugetlbfs backed memory.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

6fc138d2

KVM: CodingStyle cleanup · d77c26fc

Mike Day authored Oct 08, 2007

Signed-off-by: Mike D. Day <ncmike@ncultra.org>
Signed-off-by: Avi Kivity <avi@qumranet.com>

d77c26fc

KVM: Remove gratuitous casts from lapic.c · 7e620d16

Rusty Russell authored Oct 08, 2007

Since vcpu->apic is of the correct type, there's not need to cast.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>

7e620d16

KVM: Hoist kvm_create_lapic() into kvm_vcpu_init() · 76fafa5e

Rusty Russell authored Oct 08, 2007

Move kvm_create_lapic() into kvm_vcpu_init(), rather than having svm
and vmx do it.  And make it return the error rather than a fairly
random -ENOMEM.

This also solves the problem that neither svm.c nor vmx.c actually
handles the error path properly.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>

76fafa5e

KVM: Add kvm_free_lapic() to pair with kvm_create_lapic() · d589444e

Rusty Russell authored Oct 08, 2007

Instead of the asymetry of kvm_free_apic, implement kvm_free_lapic().
And guess what?  I found a minor bug: we don't need to hrtimer_cancel()
from kvm_main.c, because we do that in kvm_free_apic().

Also:
1) kvm_vcpu_uninit should be the reverse order from kvm_vcpu_init.
2) Don't set apic->regs_page to zero before freeing apic.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>

d589444e

KVM: Allow dynamic allocation of the mmu shadow cache size · 82ce2c96

Izik Eidus authored Oct 02, 2007

The user is now able to set how many mmu pages will be allocated to the guest.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

82ce2c96

KVM: Add general accessors to read and write guest memory · 195aefde
Izik Eidus authored Oct 01, 2007
```
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
```
195aefde

KVM: Remove the usage of page->private field by rmap · 290fc38d

Izik Eidus authored Sep 27, 2007

When kvm uses user-allocated pages in the future for the guest, we won't
be able to use page->private for rmap, since page->rmap is reserved for
the filesystem.  So we move the rmap base pointers to the memory slot.

A side effect of this is that we need to store the gfn of each gpte in
the shadow pages, since the memory slot is addressed by gfn, instead of
hfn like struct page.
Signed-off-by: Izik Eidus <izik@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

290fc38d

KVM: VMX: Simplify vcpu_clear() · f566e09f

Avi Kivity authored Sep 30, 2007

Now that smp_call_function_single() knows how to call a function on the
current cpu, there's no need to check explicitly.
Signed-off-by: Avi Kivity <avi@qumranet.com>

f566e09f

KVM: VMX: Don't clear the vmcs if the vcpu is not loaded on any processor · eae5ecb5
Avi Kivity authored Sep 30, 2007
```
Noted by Eddie Dong.
Signed-off-by: Avi Kivity <avi@qumranet.com>
```
eae5ecb5

KVM: x86 emulator: Any legacy prefix after a REX prefix nullifies its effect · b4c6abfe

Laurent Vivier authored Sep 25, 2007

This patch modifies the management of REX prefix according behavior
I saw in Xen 3.1. In Xen, this modification has been introduced by
Jan Beulich.

http://lists.xensource.com/archives/html/xen-changelog/2007-01/msg00081.htmlSigned-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>

b4c6abfe

KVM: Purify x86_decode_insn() error case management · a22436b7

Laurent Vivier authored Sep 24, 2007

The only valid case is on protected page access, other cases are errors.
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>

a22436b7

KVM: x86_emulator: no writeback for bt · e4f8e039

Qing He authored Sep 24, 2007

Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

e4f8e039

KVM: x86 emulator: Remove no_wb, use dst.type = OP_NONE instead · a01af5ec

Laurent Vivier authored Sep 24, 2007

Remove no_wb, use dst.type = OP_NONE instead, idea stollen from xen-3.1
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>

a01af5ec