Commits · d5e528136cda31a32ff7d1eaa8d06220eb443781 · nexedi / linux

01 Mar, 2010 40 commits

KVM: PPC: Add helper functions to call real mode loaders · d5e52813

Alexander Graf authored Jan 15, 2010

Linux contains quite some bits of code to load FPU, Altivec and VSX lazily for
a task. It calls those bits in real mode, coming from an interrupt handler.

For KVM we better reuse those, so let's wrap a bit of trampoline magic around
them and then we can call them from normal module code.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

d5e52813

KVM: PPC: Export __giveup_vsx · fbad5f1d

Alexander Graf authored Jan 15, 2010

We need to explicitly only giveup VSX in KVM, so let's export that
specific function to module space.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

fbad5f1d

KVM: ia64: remove redundant kvm_get_exit_data() NULL tests · 0f0412c1

Roel Kluin authored Jan 14, 2010

kvm_get_exit_data() cannot return a NULL pointer.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

0f0412c1

KVM: SVM: Lazy fpu with npt · 4610c83c

Avi Kivity authored Jan 10, 2010

Now that we can allow the guest to play with cr0 when the fpu is loaded,
we can enable lazy fpu when npt is in use.
Acked-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

4610c83c

KVM: SVM: Selective cr0 intercept · d225157b

Avi Kivity authored Jan 06, 2010

If two conditions apply:
 - no bits outside TS and EM differ between the host and guest cr0
 - the fpu is active

then we can activate the selective cr0 write intercept and drop the
unconditional cr0 read and write intercept, and allow the guest to run
with the host fpu state.  This reduces cr0 exits due to guest fpu management
while the guest fpu is loaded.
Acked-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

d225157b

KVM: SVM: Restore unconditional cr0 intercept under npt · 888f9f3e

Avi Kivity authored Jan 10, 2010

Currently we don't intercept cr0 at all when npt is enabled.  This improves
performance but requires us to activate the fpu at all times.

Remove this behaviour in preparation for adding selective cr0 intercepts.
Acked-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

888f9f3e

KVM: SVM: Initialize fpu_active in init_vmcb() · bff78274

Avi Kivity authored Jan 07, 2010

init_vmcb() sets up the intercepts as if the fpu is active, so initialize it
there.  This avoids an INIT from setting up intercepts inconsistent with
fpu_active.
Acked-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

bff78274

KVM: SVM: Fix SVM_CR0_SELECTIVE_MASK · dc77270f

Avi Kivity authored Jan 06, 2010

Instead of selecting TS and MP as the comments say, the macro included TS and
PE.  Luckily the macro is unused now, but fix in order to save a few hours of
debugging from anyone who attempts to use it.
Acked-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

dc77270f

KVM: Set cr0.et when the guest writes cr0 · f9a48e6a
Avi Kivity authored Jan 06, 2010
```
Follow the hardware.
Signed-off-by: Avi Kivity <avi@redhat.com>
```
f9a48e6a

KVM: VMX: Give the guest ownership of cr0.ts when the fpu is active · edcafe3c

Avi Kivity authored Dec 30, 2009

If the guest fpu is loaded, there is nothing interesing about cr0.ts; let
the guest play with it as it will.  This makes context switches between fpu
intensive guest processes faster, as we won't trap the clts and cr0 write
instructions.

[marcelo: fix cr0 read shadow update on fpu deactivation; kills F8 install]
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

edcafe3c

KVM: Lazify fpu activation and deactivation · 02daab21

Avi Kivity authored Dec 30, 2009

Defer fpu deactivation as much as possible - if the guest fpu is loaded, keep
it loaded until the next heavyweight exit (where we are forced to unload it).
This reduces unnecessary exits.

We also defer fpu activation on clts; while clts signals the intent to use the
fpu, we can't be sure the guest will actually use it.
Signed-off-by: Avi Kivity <avi@redhat.com>

02daab21

KVM: VMX: Allow the guest to own some cr0 bits · e8467fda

Avi Kivity authored Dec 29, 2009

We will use this later to give the guest ownership of cr0.ts.
Signed-off-by: Avi Kivity <avi@redhat.com>

e8467fda

KVM: Replace read accesses of vcpu->arch.cr0 by an accessor · 4d4ec087

Avi Kivity authored Dec 29, 2009

Since we'd like to allow the guest to own a few bits of cr0 at times, we need
to know when we access those bits.
Signed-off-by: Avi Kivity <avi@redhat.com>

4d4ec087

KVM: VMX: trace clts and lmsw instructions as cr accesses · a1f83a74

Avi Kivity authored Dec 29, 2009

clts writes cr0.ts; lmsw writes cr0[0:15] - record that in ftrace.
Signed-off-by: Avi Kivity <avi@redhat.com>

a1f83a74

KVM: PPC: Make large pages work · 4b5c9b7f

Alexander Graf authored Jan 10, 2010

An SLB entry contains two pieces of information related to size:

  1) PTE size
  2) SLB size

The L bit defines the PTE be "large" (usually means 16MB),
SLB_VSID_B_1T defines that the SLB should span 1 GB instead of the
default 256MB.

Apparently I messed things up and just put those two in one box,
shaked it heavily and came up with the current code which handles
large pages incorrectly, because it also treats large page SLB entries
as "1TB" segment entries.

This patch splits those two features apart, making Linux guests boot
even when they have > 256MB.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

4b5c9b7f

KVM: PPC: Pass through program interrupts · 5f2b105a

Alexander Graf authored Jan 10, 2010

When we get a program interrupt in guest kernel mode, we try to emulate the
instruction.

If that doesn't fail, we report to the user and try again - at the exact same
instruction pointer. So if the guest kernel really does trigger an invalid
instruction, we loop forever.

So let's better go and forward program exceptions to the guest when we don't
know the instruction we're supposed to emulate.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

5f2b105a

KVM: PPC: Pass program interrupt flags to the guest · ff1ca3f9

Alexander Graf authored Jan 08, 2010

When we need to reinject a program interrupt into the guest, we also need to
reinject the corresponding flags into the guest.
Signed-off-by: Alexander Graf <agraf@suse.de>
Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Avi Kivity <avi@redhat.com>

ff1ca3f9

KVM: PPC: Fix HID5 setting code · d35feb26

Alexander Graf authored Jan 08, 2010

The code to unset HID5.dcbz32 is broken.
This patch makes it do the right rotate magic.
Signed-off-by: Alexander Graf <agraf@suse.de>
Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Avi Kivity <avi@redhat.com>

d35feb26

KVM: PPC: Emulate trap SRR1 flags properly · 25a8a02d

Alexander Graf authored Jan 08, 2010

Book3S needs some flags in SRR1 to get to know details about an interrupt.

One such example is the trap instruction. It tells the guest kernel that
a program interrupt is due to a trap using a bit in SRR1.

This patch implements above behavior, making WARN_ON behave like WARN_ON.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

25a8a02d

KVM: PPC: Call SLB patching code in interrupt safe manner · 021ec9c6

Alexander Graf authored Jan 08, 2010

Currently we're racy when doing the transition from IR=1 to IR=0, from
the module memory entry code to the real mode SLB switching code.

To work around that I took a look at the RTAS entry code which is faced
with a similar problem and did the same thing:

  A small helper in linear mapped memory that does mtmsr with IR=0 and
  then RFIs info the actual handler.

Thanks to that trick we can safely take page faults in the entry code
and only need to be really wary of what to do as of the SLB switching
part.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

021ec9c6

KVM: PPC: Get rid of unnecessary RFI · bc90923e

Alexander Graf authored Jan 08, 2010

Using an RFI in IR=1 is dangerous. We need to set two SRRs and then do an RFI
without getting interrupted at all, because every interrupt could potentially
overwrite the SRR values.

Fortunately, we don't need to RFI in at least this particular case of the code,
so we can just replace it with an mtmsr and b.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

bc90923e

KVM: PPC: Implement 'skip instruction' mode · b4433a7c

Alexander Graf authored Jan 08, 2010

To fetch the last instruction we were interrupted on, we enable DR in early
exit code, where we are still in a very transitional phase between guest
and host state.

Most of the time this seemed to work, but another CPU can easily flush our
TLB and HTAB which makes us go in the Linux page fault handler which totally
breaks because we still use the guest's SLB entries.

To work around that, let's introduce a second KVM guest mode that defines
that whenever we get a trap, we don't call the Linux handler or go into
the KVM exit code, but just jump over the faulting instruction.

That way a potentially bad lwz doesn't trigger any faults and we can later
on interpret the invalid instruction we fetched as "fetch didn't work".
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

b4433a7c

KVM: PPC: Use PACA backed shadow vcpu · 7e57cba0

Alexander Graf authored Jan 08, 2010

We're being horribly racy right now. All the entry and exit code hijacks
random fields from the PACA that could easily be used by different code in
case we get interrupted, for example by a #MC or even page fault.

After discussing this with Ben, we figured it's best to reserve some more
space in the PACA and just shove off some vcpu state to there.

That way we can drastically improve the readability of the code, make it
less racy and less complex.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

7e57cba0

KVM: PPC: Add helpers for CR, XER · 992b5b29

Alexander Graf authored Jan 08, 2010

We now have helpers for the GPRs, so let's also add some for CR and XER.

Having them in the PACA simplifies code a lot, as we don't need to care
about where to store CC or not to overflow any integers.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

992b5b29

KVM: PPC: Use accessor functions for GPR access · 8e5b26b5

Alexander Graf authored Jan 08, 2010

All code in PPC KVM currently accesses gprs in the vcpu struct directly.

While there's nothing wrong with that wrt the current way gprs are stored
and loaded, it doesn't suffice for the PACA acceleration that will follow
in this patchset.

So let's just create little wrapper inline functions that we call whenever
a GPR needs to be read from or written to. The compiled code shouldn't really
change at all for now.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

8e5b26b5

KVM: Fix the explanation of write_emulated · 0d178975

Takuya Yoshikawa authored Jan 06, 2010

The explanation of write_emulated is confused with
that of read_emulated. This patch fix it.
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

0d178975

KVM: VMX: Enable EPT 1GB page support · 878403b7

Sheng Yang authored Jan 05, 2010

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

878403b7

KVM: x86: Rename gb_page_enable() to get_lpage_level() in kvm_x86_ops · 17cc3935

Sheng Yang authored Jan 05, 2010

Then the callback can provide the maximum supported large page level, which
is more flexible.

Also move the gb page support into x86_64 specific.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

17cc3935

KVM: x86: Moving PT_*_LEVEL to mmu.h · c9c54174

Sheng Yang authored Jan 05, 2010

We can use them in x86.c and vmx.c now...
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

c9c54174

KVM: PPC: Enable lightweight exits again · 97c4cfbe

Alexander Graf authored Jan 04, 2010

The PowerPC C ABI defines that registers r14-r31 need to be preserved across
function calls. Since our exit handler is written in C, we can make use of that
and don't need to reload r14-r31 on every entry/exit cycle.

This technique is also used in the BookE code and is called "lightweight exits"
there. To follow the tradition, it's called the same in Book3S.

So far this optimization was disabled though, as the code didn't do what it was
expected to do, but failed to work.

This patch fixes and enables lightweight exits again.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

97c4cfbe

KVM: PPC: Fix typo in rebolting code · b480f780

Alexander Graf authored Jan 04, 2010

When we're loading bolted entries into the SLB again, we're checking if an
entry is in use and only slbmte it when it is.

Unfortunately, the check always goes to the skip label of the first entry,
resulting in an endless loop when it actually gets triggered.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

b480f780

KVM: avoid taking ioapic mutex for non-ioapic EOIs · 46a929bc

Avi Kivity authored Dec 28, 2009

When the guest acknowledges an interrupt, it sends an EOI message to the local
apic, which broadcasts it to the ioapic.  To handle the EOI, we need to take
the ioapic mutex.

On large guests, this causes a lot of contention on this mutex.  Since large
guests usually don't route interrupts via the ioapic (they use msi instead),
this is completely unnecessary.

Avoid taking the mutex by introducing a handled_vectors bitmap.  Before taking
the mutex, check if the ioapic was actually responsible for the acked vector.
If not, we can return early.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

46a929bc

KVM: Fill out ftrace exit reason strings · f4c9e87c

Avi Kivity authored Dec 28, 2009

Some exit reasons missed their strings; fill out the table.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

f4c9e87c

KVM: Bump maximum vcpu count to 64 · 0680fe52

Avi Kivity authored Dec 27, 2009

With slots_lock converted to rcu, the entire kvm hotpath on modern processors
(with npt or ept) now scales beautifully.  Increase the maximum vcpu count to
64 to reflect this.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

0680fe52

KVM: convert slots_lock to a mutex · 79fac95e
Marcelo Tosatti authored Dec 23, 2009
```
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
```
79fac95e
KVM: switch vcpu context to use SRCU · f656ce01
Marcelo Tosatti authored Dec 23, 2009
```
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
```
f656ce01
KVM: convert io_bus to SRCU · e93f8a0f
Marcelo Tosatti authored Dec 23, 2009
```
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
```
e93f8a0f

KVM: x86: switch kvm_set_memory_alias to SRCU update · a983fb23

Marcelo Tosatti authored Dec 23, 2009

Using a similar two-step procedure as for memslots.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

a983fb23

KVM: use SRCU for dirty log · b050b015
Marcelo Tosatti authored Dec 23, 2009
```
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
```
b050b015

KVM: introduce kvm->srcu and convert kvm_set_memory_region to SRCU update · bc6678a3

Marcelo Tosatti authored Dec 23, 2009

Use two steps for memslot deletion: mark the slot invalid (which stops
instantiation of new shadow pages for that slot, but allows destruction),
then instantiate the new empty slot.

Also simplifies kvm_handle_hva locking.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

bc6678a3