Commits · 02f59dc9f1f51d2148d87d48f84adb455a4fd697 · nexedi / linux

24 Oct, 2010 40 commits

KVM: MMU: Introduce init_kvm_nested_mmu() · 02f59dc9

Joerg Roedel authored Sep 10, 2010

This patch introduces the init_kvm_nested_mmu() function
which is used to re-initialize the nested mmu when the l2
guest changes its paging mode.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

02f59dc9

KVM: MMU: Introduce kvm_read_nested_guest_page() · 3d06b8bf

Joerg Roedel authored Sep 10, 2010

This patch introduces the kvm_read_guest_page_x86 function
which reads from the physical memory of the guest. If the
guest is running in guest-mode itself with nested paging
enabled it will read from the guest's guest physical memory
instead.
The patch also changes changes the code to use this function
where it is necessary.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

3d06b8bf

KVM: MMU: Make walk_addr_generic capable for two-level walking · 2329d46d

Joerg Roedel authored Sep 10, 2010

This patch uses kvm_read_guest_page_tdp to make the
walk_addr_generic functions suitable for two-level page
table walking.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

2329d46d

KVM: X86: Add kvm_read_guest_page_mmu function · ec92fe44

Joerg Roedel authored Sep 10, 2010

This patch adds a function which can read from the guests
physical memory or from the guest's guest physical memory.
This will be used in the two-dimensional page table walker.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

ec92fe44

KVM: MMU: Implement nested gva_to_gpa functions · 6539e738

Joerg Roedel authored Sep 10, 2010

This patch adds the functions to do a nested l2_gva to
l1_gpa page table walk.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

6539e738

KVM: X86: Introduce pointer to mmu context used for gva_to_gpa · 14dfe855

Joerg Roedel authored Sep 10, 2010

This patch introduces the walk_mmu pointer which points to
the mmu-context currently used for gva_to_gpa translations.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

14dfe855

KVM: MMU: Add infrastructure for two-level page walker · c30a358d

Joerg Roedel authored Sep 10, 2010

This patch introduces a mmu-callback to translate gpa
addresses in the walk_addr code. This is later used to
translate l2_gpa addresses into l1_gpa addresses.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

c30a358d

KVM: MMU: Introduce generic walk_addr function · 1e301feb

Joerg Roedel authored Sep 10, 2010

This is the first patch in the series towards a generic
walk_addr implementation which could walk two-dimensional
page tables in the end. In this first step the walk_addr
function is renamed into walk_addr_generic which takes a
mmu context as an additional parameter.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

1e301feb

KVM: MMU: Track page fault data in struct vcpu · 8df25a32

Joerg Roedel authored Sep 10, 2010

This patch introduces a struct with two new fields in
vcpu_arch for x86:

	* fault.address
	* fault.error_code

This will be used to correctly propagate page faults back
into the guest when we could have either an ordinary page
fault or a nested page fault. In the case of a nested page
fault the fault-address is different from the original
address that should be walked. So we need to keep track
about the real fault-address.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

8df25a32

KVM: MMU: Let is_rsvd_bits_set take mmu context instead of vcpu · 3241f22d

Joerg Roedel authored Sep 10, 2010

This patch changes is_rsvd_bits_set() function prototype to
take only a kvm_mmu context instead of a full vcpu.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

3241f22d

KVM: MMU: Introduce kvm_init_shadow_mmu helper function · 52fde8df

Joerg Roedel authored Sep 10, 2010

Some logic of the init_kvm_softmmu function is required to
build the Nested Nested Paging context. So factor the
required logic into a seperate function and export it.
Also make the whole init path suitable for more than one mmu
context.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

52fde8df

KVM: MMU: Introduce inject_page_fault function pointer · cb659db8

Joerg Roedel authored Sep 10, 2010

This patch introduces an inject_page_fault function pointer
into struct kvm_mmu which will be used to inject a page
fault. This will be used later when Nested Nested Paging is
implemented.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

cb659db8

KVM: MMU: Introduce get_cr3 function pointer · 5777ed34

Joerg Roedel authored Sep 10, 2010

This function pointer in the MMU context is required to
implement Nested Nested Paging.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

5777ed34

KVM: X86: Introduce a tdp_set_cr3 function · 1c97f0a0

Joerg Roedel authored Sep 10, 2010

This patch introduces a special set_tdp_cr3 function pointer
in kvm_x86_ops which is only used for tpd enabled mmu
contexts. This allows to remove some hacks from svm code.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

1c97f0a0

KVM: MMU: Make set_cr3 a function pointer in kvm_mmu · f43addd4

Joerg Roedel authored Sep 10, 2010

This is necessary to implement Nested Nested Paging. As a
side effect this allows some cleanups in the SVM nested
paging code.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

f43addd4

KVM: MMU: Make tdp_enabled a mmu-context parameter · c5a78f2b

Joerg Roedel authored Sep 10, 2010

This patch changes the tdp_enabled flag from its global
meaning to the mmu-context and renames it to direct_map
there. This is necessary for Nested SVM with emulation of
Nested Paging where we need an extra MMU context to shadow
the Nested Nested Page Table.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

c5a78f2b

KVM: MMU: Check for root_level instead of long mode · 957446af

Joerg Roedel authored Sep 10, 2010

The walk_addr function checks for !is_long_mode in its 64
bit version. But what is meant here is a check for pae
paging. Change the condition to really check for pae paging
so that it also works with nested nested paging.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

957446af

KVM: x86: Emulate MSR_EBC_FREQUENCY_ID · 7b914098

Jes Sorensen authored Sep 09, 2010

Some operating systems store data about the host processor at the
time of installation, and when booted on a more uptodate cpu tries
to read MSR_EBC_FREQUENCY_ID. This has been found with XP.
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

7b914098

x86: Define MSR_EBC_FREQUENCY_ID · b9a52c4b

Jes Sorensen authored Sep 09, 2010

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

b9a52c4b

KVM: SVM: Clean up rip handling in vmrun emulation · b75f4eb3

Roedel, Joerg authored Sep 03, 2010

This patch changes the rip handling in the vmrun emulation
path from using next_rip to the generic kvm register access
functions.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

b75f4eb3

KVM: SVM: Restore correct registers after sel_cr0 intercept emulation · cda00082

Joerg Roedel authored Sep 02, 2010

This patch implements restoring of the correct rip, rsp, and
rax after the svm emulation in KVM injected a selective_cr0
write intercept into the guest hypervisor. The problem was
that the vmexit is emulated in the instruction emulation
which later commits the registers right after the write-cr0
instruction. So the l1 guest will continue to run with the
l2 rip, rsp and rax resulting in unpredictable behavior.

This patch is not the final word, it is just an easy patch
to fix the issue. The real fix will be done when the
instruction emulator is made aware of nested virtualization.
Until this is done this patch fixes the issue and provides
an easy way to fix this in -stable too.

Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

cda00082

KVM: MMU: Fix 32 bit legacy paging with NPT · f87f9288

Joerg Roedel authored Sep 02, 2010

This patch fixes 32 bit legacy paging with NPT enabled. The
mmu_check_root call on the top-level of the loop causes
root_gfn to take values (in the tdp_enabled path) which are
outside of guest memory. So the mmu_check_root call fails at
some point in the loop interation causing the guest to
tiple-fault.
This patch changes the mmu_check_root calls to the places
where they are really necessary. As a side-effect it
introduces a check for the root of a pae page table too.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

f87f9288

KVM: PPC: Move of include to __KERNEL__ section · 26e673c3

Alexander Graf authored Sep 03, 2010

We have to protect the include for linux/of.h by __KERNEL__ so it doesn't
accidently get referenced outside.

This patch fixes this and makes the tree compile again.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

26e673c3

KVM: PPC: Add documentation for magic page enhancements · d1e87c7e

Alexander Graf authored Aug 31, 2010

This documents how to detect additional features inside the magic
page when a guest maps it.
Signed-off-by: Alexander Graf <agraf@suse.de>

d1e87c7e

KVM: PPC: Fix compile error in e500_tlb.c · 344941be

Alexander Graf authored Aug 31, 2010

The e500_tlb.c file didn't compile for me due to the following error:

arch/powerpc/kvm/e500_tlb.c: In function ‘kvmppc_e500_shadow_map’:
arch/powerpc/kvm/e500_tlb.c:300: error: format ‘%lx’ expects type ‘long unsigned int’, but argument 2 has type ‘gfn_t’

So let's explicitly cast the argument to make printk happy.
Signed-off-by: Alexander Graf <agraf@suse.de>

344941be

KVM: PPC: e500_tlb: Fix a minor copy-paste tracing bug · 21e537ba

Kyle Moffett authored Aug 30, 2010

The kvmppc_e500_stlbe_invalidate() function was trying to pass too many
parameters to trace_kvm_stlb_inval().  This appears to be a bad
copy-paste from a call to trace_kvm_stlb_write().
Signed-off-by: Kyle Moffett <Kyle.D.Moffett@boeing.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

21e537ba

KVM: PPC: Document KVM_INTERRUPT ioctl · 6f7a2bd4

Alexander Graf authored Aug 31, 2010

This adds some documentation for the KVM_INTERRUPT special cases that
PowerPC now implements.
Signed-off-by: Alexander Graf <agraf@suse.de>

6f7a2bd4

KVM: PPC: Implement level interrupts for BookE · c5335f17

Alexander Graf authored Aug 30, 2010

BookE also wants to support level based interrupts, so let's implement
all the necessary logic there. We need to trick a bit here because the
irqprios are 1:1 assigned to architecture defined values. But since there
is some space left there, we can just pick a random one and move it later
on - it's internal anyways.
Signed-off-by: Alexander Graf <agraf@suse.de>

c5335f17

KVM: PPC: Expose level based interrupt cap · 7b4203e8

Alexander Graf authored Aug 30, 2010

Now that we have all the level interrupt magic in place, let's
expose the capability to user space, so it can make use of it!
Signed-off-by: Alexander Graf <agraf@suse.de>

7b4203e8

KVM: PPC: Implement Level interrupts on Book3S · 17bd1580

Alexander Graf authored Aug 30, 2010

The current interrupt logic is just completely broken. We get a notification
from user space, telling us that an interrupt is there. But then user space
expects us that we just acknowledge an interrupt once we deliver it to the
guest.

This is not how real hardware works though. On real hardware, the interrupt
controller pulls the external interrupt line until it gets notified that the
interrupt was received.

So in reality we have two events: pulling and letting go of the interrupt line.

To maintain backwards compatibility, I added a new request for the pulling
part. The letting go part was implemented earlier already.

With this in place, we can now finally start guests that do not randomly stall
and stop to work at random times.

This patch implements above logic for Book3S.
Signed-off-by: Alexander Graf <agraf@suse.de>

17bd1580

KVM: PPC: Enable napping only for Book3s_64 · 591bd8e7

Alexander Graf authored Aug 17, 2010

Before I incorrectly enabled napping also for BookE, which would result in
needless dcache flushes. Since we only need to force enable napping on
Book3s_64 because it doesn't go into MSR_POW otherwise, we can just #ifdef
that code to this particular platform.
Reported-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

591bd8e7

KVM: PPC: allow ppc440gp to pass the compatibility check · ebc65874

Hollis Blanchard authored Aug 07, 2010

Match only the first part of cur_cpu_spec->platform.

440GP (the first 440 processor) is identified by the string "ppc440gp", while
all later 440 processors use simply "ppc440".
Signed-off-by: Hollis Blanchard <hollis_blanchard@mentor.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

ebc65874

KVM: PPC: fix compilation of "dump tlbs" debug function · 0b3bafc8

Hollis Blanchard authored Aug 07, 2010

Missing local variable.
Signed-off-by: Hollis Blanchard <hollis_blanchard@mentor.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

0b3bafc8

KVM: PPC: initialize IVORs in addition to IVPR · 082decf2

Hollis Blanchard authored Aug 07, 2010

Developers can now tell at a glace the exact type of the premature interrupt,
instead of just knowing that there was some premature interrupt.
Signed-off-by: Hollis Blanchard <hollis_blanchard@mentor.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

082decf2

KVM: PPC: Don't put MSR_POW in MSR · 296c19d0

Alexander Graf authored Aug 15, 2010

On Book3S a mtmsr with the MSR_POW bit set indicates that the OS is in
idle and only needs to be waked up on the next interrupt.

Now, unfortunately we let that bit slip into the stored MSR value which
is not what the real CPU does, so that we ended up executing code like
this:

	r = mfmsr();
	/* r containts MSR_POW */
	mtmsr(r | MSR_EE);

This obviously breaks, as we're going into idle mode in code sections that
don't expect to be idling.

This patch masks MSR_POW out of the stored MSR value on wakeup, making
guests happy again.
Signed-off-by: Alexander Graf <agraf@suse.de>

296c19d0

KVM: PPC: Implement correct SID mapping on Book3s_32 · 8b6db3bc

Alexander Graf authored Aug 15, 2010

Up until now we were doing segment mappings wrong on Book3s_32. For Book3s_64
we were using a trick where we know that a single mmu_context gives us 16 bits
of context ids.

The mm system on Book3s_32 instead uses a clever algorithm to distribute VSIDs
across the available range, so a context id really only gives us 16 available
VSIDs.

To keep at least a few guest processes in the SID shadow, let's map a number of
contexts that we can use as VSID pool. This makes the code be actually correct
and shouldn't hurt performance too much.
Signed-off-by: Alexander Graf <agraf@suse.de>

8b6db3bc

KVM: PPC: Force enable nap on KVM · ad087376

Alexander Graf authored Aug 17, 2010

There are some heuristics in the PPC power management code that try to find
out if the particular hardware we're running on supports proper power management
or just hangs the machine when going into nap mode.

Since we know that KVM is safe with nap, let's force enable it in the PV code
once we're certain that we are on a KVM VM.
Signed-off-by: Alexander Graf <agraf@suse.de>

ad087376

KVM: PPC: Make PV mtmsrd L=1 work with r30 and r31 · df08bd10

Alexander Graf authored Aug 05, 2010

We had an arbitrary limitation in mtmsrd L=1 that kept us from using r30 and
r31 as input registers. Let's get rid of that and get more potential speedups!
Signed-off-by: Alexander Graf <agraf@suse.de>

df08bd10

KVM: PPC: Update int_pending also on dequeue · 9ee18b1e

Alexander Graf authored Aug 05, 2010

When having a decrementor interrupt pending, the dequeuing happens manually
through an mtdec instruction. This instruction simply calls dequeue on that
interrupt, so the int_pending hint doesn't get updated.

This patch enables updating the int_pending hint also on dequeue, thus
correctly enabling guests to stay in guest contexts more often.
Signed-off-by: Alexander Graf <agraf@suse.de>

9ee18b1e

KVM: PPC: Make PV mtmsr work with r30 and r31 · 512ba59e

Alexander Graf authored Aug 05, 2010

So far we've been restricting ourselves to r0-r29 as registers an mtmsr
instruction could use. This was bad, as there are some code paths in
Linux actually using r30.

So let's instead handle all registers gracefully and get rid of that
stupid limitation
Signed-off-by: Alexander Graf <agraf@suse.de>

512ba59e