Commits · 1fe779f8eccd16e527315e1bafd2b3a876ff2489 · nexedi / linux

30 Jan, 2008 40 commits

KVM: Portability: Split kvm_vm_ioctl v3 · 1fe779f8

Carsten Otte authored Oct 29, 2007

This patch splits kvm_vm_ioctl into archtecture independent parts, and
x86 specific parts which go to kvm_arch_vcpu_ioctl in x86.c.
The patch is unchanged since last submission.

Common ioctls for all architectures are:
KVM_CREATE_VCPU, KVM_GET_DIRTY_LOG, KVM_SET_USER_MEMORY_REGION

x86 specific ioctls are:
KVM_SET_MEMORY_REGION,
KVM_GET/SET_NR_MMU_PAGES, KVM_SET_MEMORY_ALIAS, KVM_CREATE_IRQCHIP,
KVM_CREATE_IRQ_LINE, KVM_GET/SET_IRQCHIP
KVM_SET_TSS_ADDR
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

1fe779f8

KVM: MMU: Topup the mmu memory preallocation caches before emulating an insn · b733bfb5

Avi Kivity authored Oct 28, 2007

Emulation may cause a shadow pte to be instantiated, which requires
memory resources.  Make sure the caches are filled to avoid an oops.
Signed-off-by: Avi Kivity <avi@qumranet.com>

b733bfb5

KVM: Move page fault processing to common code · 3067714c

Avi Kivity authored Oct 28, 2007

The code that dispatches the page fault and emulates if we failed to map
is duplicated across vmx and svm. Merge it to simplify further bugfixing.
Signed-off-by: Avi Kivity <avi@qumranet.com>

3067714c

KVM: x86 emulator: don't depend on cr2 for mov abs emulation · c7e75a3d

Avi Kivity authored Oct 28, 2007

The 'mov abs' instruction family (opcodes 0xa0 - 0xa3) still depends on cr2
provided by the page fault handler.  This is wrong for several reasons:

- if an instruction accessed misaligned data that crosses a page boundary,
  and if the fault happened on the second page, cr2 will point at the
  second page, not the data itself.

- if we're emulating in real mode, or due to a FlexPriority exit, there
  is no cr2 generated.

So, this change adds decoding for this instruction form and drops reliance
on cr2.
Signed-off-by: Avi Kivity <avi@qumranet.com>

c7e75a3d

KVM: SVM: Let gcc to choose which registers to save (i386) · fe7935d4

Laurent Vivier authored Oct 25, 2007

This patch lets GCC to determine which registers to save when we
switch to/from a VCPU in the case of AMD i386

* Original code saves following registers:

    ebx, ecx, edx, esi, edi, ebp

* Patched code:

  - informs GCC that we modify following registers
    using the clobber description:

    ebx, ecx, edx, esi, edi

  - rbp is saved (pop/push) because GCC seems to ignore its use in the clobber
    description.
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>

fe7935d4

KVM: SVM: Let gcc to choose which registers to save (x86_64) · 54a08c04

Laurent Vivier authored Oct 25, 2007

This patch lets GCC to determine which registers to save when we
switch to/from a VCPU in the case of AMD x86_64.

* Original code saves following registers:

    rbx, rcx, rdx, rsi, rdi, rbp,
    r8, r9, r10, r11, r12, r13, r14, r15

* Patched code:

  - informs GCC that we modify following registers
    using the clobber description:

    rbx, rcx, rdx, rsi, rdi
    r8, r9, r10, r11, r12, r13, r14, r15

  - rbp is saved (pop/push) because GCC seems to ignore its use in the clobber
    description.
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>

54a08c04

KVM: VMX: Let gcc to choose which registers to save (i386) · ff593e5a

Laurent Vivier authored Oct 25, 2007

This patch lets GCC to determine which registers to save when we
switch to/from a VCPU in the case of intel i386.

* Original code saves following registers:

    eax, ebx, ecx, edx, edi, esi, ebp (using popa)

* Patched code:

  - informs GCC that we modify following registers
    using the clobber description:

    ebx, edi, rsi

  - doesn't save eax because it is an output operand (vmx->fail)

  - cannot put ecx in clobber description because it is an input operand,
    but as we modify it and we want to keep its value (vcpu), we must
    save it (pop/push)

  - ebp is saved (pop/push) because GCC seems to ignore its use the clobber
    description.

  - edx is saved (pop/push) because it is reserved by GCC (REGPARM) and
    cannot be put in the clobber description.

  - line "mov (%%esp), %3 \n\t" has been removed because %3
    is ecx and ecx is restored just after.
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>

ff593e5a

KVM: VMX: Let gcc to choose which registers to save (x86_64) · c2036300

Laurent Vivier authored Oct 25, 2007

This patch lets GCC to determine which registers to save when we
switch to/from a VCPU in the case of intel x86_64.

* Original code saves following registers:

    rax, rbx, rcx, rdx, rsi, rdi, rbp,
    r8, r9, r10, r11, r12, r13, r14, r15

* Patched code:

  - informs GCC that we modify following registers
    using the clobber description:

    rbx, rdi, rsi,
    r8, r9, r10, r11, r12, r13, r14, r15

  - doesn't save rax because it is an output operand (vmx->fail)

  - cannot put rcx in clobber description because it is an input operand,
    but as we modify it and we want to keep its value (vcpu), we must
    save it (pop/push)

  - rbp is saved (pop/push) because GCC seems to ignore its use in the clobber
    description.

  - rdx is saved (pop/push) because it is reserved by GCC (REGPARM) and
    cannot be put in the clobber description.

  - line "mov (%%rsp), %3 \n\t" has been removed because %3
    is rcx and rcx is restored just after.

  - line ASM_VMX_VMWRITE_RSP_RDX() is moved out of the ifdef/else/endif
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>

c2036300

KVM: Add ioctl to tss address from userspace, · cbc94022

Izik Eidus authored Oct 25, 2007

Currently kvm has a wart in that it requires three extra pages for use
as a tss when emulating real mode on Intel.  This patch moves the allocation
internally, only requiring userspace to tell us where in the physical address
space we can place the tss.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

cbc94022

KVM: Add kernel-internal memory slots · e0d62c7f

Izik Eidus authored Oct 24, 2007

Reserve a few memory slots for kernel internal use.  This is good for case
you have to register memory region and you want to be sure it was not
registered from userspace, and for case you want to register a memory region
that won't be seen from userspace.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

e0d62c7f

KVM: Export memory slot allocation mechanism · 210c7c4d

Izik Eidus authored Oct 24, 2007

Remove kvm memory slot allocation mechanism from the ioctl
and put it to exported function.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

210c7c4d

KVM: Unmap kernel-allocated memory on slot destruction · 80b14b5b

Izik Eidus authored Oct 25, 2007

kvm_vm_ioctl_set_memory_region() is able to remove memory in addition to
adding it.  Therefore when using kernel swapping support for old userspaces,
we need to munmap the memory if the user request to remove it
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

80b14b5b

KVM: Per-architecture hypercall definitions · 5f43238d

Christian Borntraeger authored Oct 11, 2007

Currently kvm provides hypercalls only for x86* architectures. To
provide hypercall infrastructure for other kvm architectures I split
kvm_para.h into a generic header file and architecture specific
definitions.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

5f43238d

KVM: Split IOAPIC reset function and export for kernel RESET · 8c392696
Eddie Dong authored Oct 10, 2007
```
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
```
8c392696

KVM: Export PIC reset for kernel device reset · 2fcceae1

Eddie Dong authored Oct 10, 2007

Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

2fcceae1

KVM: Add a might_sleep() annotation to gfn_to_page() · 60395224

Avi Kivity authored Oct 21, 2007

This will help trap accesses to guest memory in atomic context.
Signed-off-by: Avi Kivity <avi@qumranet.com>

60395224

KVM: Move vmx_vcpu_reset() out of vmx_vcpu_setup() · e00c8cf2

Avi Kivity authored Oct 21, 2007

Split guest reset code out of vmx_vcpu_setup().  Besides being cleaner, this
moves the realmode tss setup (which can sleep) outside vmx_vcpu_setup()
(which is executed with preemption enabled).

[izik: remove unused variable]
Signed-off-by: Avi Kivity <avi@qumranet.com>

e00c8cf2

KVM: Portability: Split kvm_vcpu into arch dependent and independent parts (part 1) · 34c16eec

Zhang Xiantao authored Oct 20, 2007

First step to split kvm_vcpu.  Currently, we just use an macro to define
the common fields in kvm_vcpu for all archs, and all archs need to define
its own kvm_vcpu struct.
Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

34c16eec

KVM: Allocate userspace memory for older userspace · 8d4e1288

Anthony Liguori authored Oct 18, 2007

Allocate a userspace buffer for older userspaces.  Also eliminate phys_mem
buffer.  The memset() in kvmctl really kills initial memory usage but swapping
works even with old userspaces.

A side effect is that maximum guest side is reduced for older userspace on
i386.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

8d4e1288

KVM: Use virtual cpu accounting if available for guest times. · e56a7a28

Christian Borntraeger authored Oct 18, 2007

ppc and s390 offer the possibility to track process times precisely
by looking at cpu timer on every context switch, irq, softirq etc.
We can use that infrastructure as well for guest time accounting.
We need to account the used time before we change the state.
This patch adds a call to account_system_vtime to kvm_guest_enter
and kvm_guest exit. If CONFIG_VIRT_CPU_ACCOUNTING is not set,
account_system_vtime is defined in hardirq.h as an empty function,
which means this patch does not change the behaviour on other
platforms.

I compile tested this patch on x86 and function tested the patch on
s390.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

e56a7a28

KVM: MMU: Partial swapping of guest memory · 8a7ae055

Izik Eidus authored Oct 18, 2007

This allows guest memory to be swapped.  Pages which are currently mapped
via shadow page tables are pinned into memory, but all other pages can
be freely swapped.

The patch makes gfn_to_page() elevate the page's reference count, and
introduces kvm_release_page() that pairs with it.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

8a7ae055

KVM: MMU: Make gfn_to_page() always safe · cea7bb21

Izik Eidus authored Oct 17, 2007

In case the page is not present in the guest memory map, return a dummy
page the guest can scribble on.

This simplifies error checking in its users.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

cea7bb21

KVM: MMU: Keep a reverse mapping of non-writable translations · 9647c14c

Izik Eidus authored Oct 16, 2007

The current kvm mmu only reverse maps writable translation.  This is used
to write-protect a page in case it becomes a pagetable.

But with swapping support, we need a reverse mapping of read-only pages as
well:  when we evict a page, we need to remove any mapping to it, whether
writable or not.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

9647c14c

KVM: MMU: Add rmap_next(), a helper for walking kvm rmaps · 98348e95
Izik Eidus authored Oct 16, 2007
```
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
```
98348e95

KVM: x86 emulator: cmc, clc, cli, sti · b284be57

Nitin A Kamble authored Oct 16, 2007

Instruction: cmc, clc, cli, sti
opcodes: 0xf5, 0xf8, 0xfa, 0xfb respectively.

[avi: fix reference to EFLG_IF which is not defined anywhere]
Signed-off-by: Nitin A Kamble <nitin.a.kamble@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

b284be57

KVM: MMU: Simplify page table walker · 42bf3f0a

Avi Kivity authored Oct 17, 2007

Simplify the walker level loop not to carry so much information from one
loop to the next. In addition to being complex, this made kmap_atomic()
critical sections difficult to manage.

As a result of this change, kmap_atomic() sections are limited to actually
touching the guest pte, which allows the other functions called from the
walker to do sleepy operations. This will happen when we enable swapping.
Signed-off-by: Avi Kivity <avi@qumranet.com>

42bf3f0a

KVM: x86 emulator: Implement emulation of instruction: inc & dec · d77a2507

Nitin A Kamble authored Oct 12, 2007

Instructions:
	inc r16/r32 (opcode 0x40-0x47)
	dec r16/r32 (opcode 0x48-0x4f)
Signed-off-by: Nitin A Kamble <nitin.a.kamble@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

d77a2507

KVM: Rename KVM_TLB_FLUSH to KVM_REQ_TLB_FLUSH · 3176bc3e

Avi Kivity authored Oct 16, 2007

We now have a new namespace, KVM_REQ_*, for bits in vcpu->requests.
Signed-off-by: Avi Kivity <avi@qumranet.com>

3176bc3e

KVM: Move apic timer interrupt backlog processing to common code · ab6ef34b

Avi Kivity authored Oct 16, 2007

Beside the obvious goodness of making code more common, this prevents
a livelock with the next patch which moves interrupt injection out of the
critical section.
Signed-off-by: Avi Kivity <avi@qumranet.com>

ab6ef34b

KVM: Add some \n in ioapic_debug() · e25e3ed5

Laurent Vivier authored Oct 12, 2007

Add new-line at end of debug strings.
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>

e25e3ed5

KVM: apic round robin cleanup · e4d47f40

Qing He authored Sep 24, 2007

If no apic is enabled in the bitmap of an interrupt delivery with delivery
mode of lowest priority, a warning should be reported rather than select
a fallback vcpu
Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Eddie (Yaozu) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

e4d47f40

KVM: Portability: split kvm_vcpu_ioctl · 313a3dc7

Carsten Otte authored Oct 11, 2007

This patch splits kvm_vcpu_ioctl into archtecture independent parts, and
x86 specific parts which go to kvm_arch_vcpu_ioctl in x86.c.

Common ioctls for all architectures are:
KVM_RUN, KVM_GET/SET_(S-)REGS, KVM_TRANSLATE, KVM_INTERRUPT,
KVM_DEBUG_GUEST, KVM_SET_SIGNAL_MASK, KVM_GET/SET_FPU
Note that some PPC chips don't have an FPU, so we might need an #ifdef
around KVM_GET/SET_FPU one day.

x86 specific ioctls are:
KVM_GET/SET_LAPIC, KVM_SET_CPUID, KVM_GET/SET_MSRS

An interresting aspect is vcpu_load/vcpu_put. We now have a common
vcpu_load/put which does the preemption stuff, and an architecture
specific kvm_arch_vcpu_load/put. In the x86 case, this one calls the
vmx/svm function defined in kvm_x86_ops.
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

313a3dc7

KVM: MMU: When updating the dirty bit, inform the mmu about it · c4fcc272

Avi Kivity authored Oct 11, 2007

Since the mmu uses different shadow pages for dirty large pages and clean
large pages, this allows the mmu to drop ptes that are now invalid.
Signed-off-by: Avi Kivity <avi@qumranet.com>

c4fcc272

KVM: MMU: Move dirty bit updates to a separate function · 5df34a86
Avi Kivity authored Oct 11, 2007
```
Signed-off-by: Avi Kivity <avi@qumranet.com>
```
5df34a86
KVM: MMU: Instantiate real-mode shadows as user writable shadows · 6bfccdc9
Avi Kivity authored Oct 11, 2007
```
This is consistent with real-mode permissions.
Signed-off-by: Avi Kivity <avi@qumranet.com>
```
6bfccdc9

KVM: MMU: Disable write access on clean large pages · cc70e737

Avi Kivity authored Oct 11, 2007

By forcing clean huge pages to be read-only, we have separate roles
for the shadow of a clean large page and the shadow of a dirty large
page.  This is necessary because different ptes will be instantiated
for the two cases, even for read faults.
Signed-off-by: Avi Kivity <avi@qumranet.com>

cc70e737

KVM: MMU: Fix nx access bit for huge pages · c22e3514

Avi Kivity authored Oct 11, 2007

We must set the bit before the shift, otherwise the wrong bit gets set.
Signed-off-by: Avi Kivity <avi@qumranet.com>

c22e3514

KVM: Move guest pte dirty bit management to the guest pagetable walker · e3c5e7ec

Avi Kivity authored Oct 11, 2007

This is more consistent with the accessed bit management, and makes the dirty
bit available earlier for other purposes.
Signed-off-by: Avi Kivity <avi@qumranet.com>

e3c5e7ec

KVM: MMU: More struct kvm_vcpu -> struct kvm cleanups · 4a4c9924

Anthony Liguori authored Oct 10, 2007

This time, the biggest change is gpa_to_hpa. The translation of GPA to HPA does
not depend on the VCPU state unlike GVA to GPA so there's no need to pass in
the kvm_vcpu.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

4a4c9924

KVM: MMU: Clean up MMU functions to take struct kvm when appropriate · f67a46f4

Anthony Liguori authored Oct 10, 2007

Some of the MMU functions take a struct kvm_vcpu even though they affect all
VCPUs.  This patch cleans up some of them to instead take a struct kvm.  This
makes things a bit more clear.

The main thing that was confusing me was whether certain functions need to be
called on all VCPUs.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

f67a46f4