Commits · dda96d8f1b3de692cce09969ce28fe22e58e5acf · Kirill Smelkov / linux

31 Dec, 2008 40 commits

KVM: x86 emulator: reduce duplication in one operand emulation thunks · dda96d8f
Avi Kivity authored Nov 26, 2008
```
Signed-off-by: Avi Kivity <avi@redhat.com>
```
dda96d8f

KVM: MMU: optimize set_spte for page sync · ecc5589f

Marcelo Tosatti authored Nov 25, 2008

The write protect verification in set_spte is unnecessary for page sync.

Its guaranteed that, if the unsync spte was writable, the target page
does not have a write protected shadow (if it had, the spte would have
been write protected under mmu_lock by rmap_write_protect before).

Same reasoning applies to mark_page_dirty: the gfn has been marked as
dirty via the pagefault path.

The cost of hash table and memslot lookups are quite significant if the
workload is pagetable write intensive resulting in increased mmu_lock
contention.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

ecc5589f

KVM: MSI to INTx translate · 5319c662

Sheng Yang authored Nov 24, 2008

Now we use MSI as default one, and translate MSI to INTx when guest need
INTx rather than MSI. For legacy device, we provide support for non-sharing
host IRQ.

Provide a parameter msi2intx for this method. The value is true by default in
x86 architecture.

We can't guarantee this mode can work on every device, but for most of us
tested, it works. If your device encounter some trouble with this mode, you can
try set msi2intx modules parameter to 0. If the device is OK with msi2intx=0,
then please report it to KVM mailing list or me. We may prepare a blacklist for
the device that can't work in this mode.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

5319c662

KVM: Enable MSI for device assignment · 6b9cc7fd

Sheng Yang authored Nov 24, 2008

We enable guest MSI and host MSI support in this patch. The userspace want to
enable MSI should set KVM_DEV_IRQ_ASSIGN_ENABLE_MSI in the assigned_irq's flag.
Function would return -ENOTTY if can't enable MSI, userspace shouldn't set MSI
Enable bit when KVM_ASSIGN_IRQ return -ENOTTY with
KVM_DEV_IRQ_ASSIGN_ENABLE_MSI.

Userspace can tell the support of MSI device from #ifdef KVM_CAP_DEVICE_MSI.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

6b9cc7fd

KVM: Add assigned_device_msi_dispatch() · f64769eb

Sheng Yang authored Nov 24, 2008

The function is used to dispatch MSI to lapic according to MSI message
address and message data.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

f64769eb

KVM: Export ioapic_get_delivery_bitmask · 68b76f51

Sheng Yang authored Nov 24, 2008

It would be used for MSI in device assignment, for MSI dispatch.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

68b76f51

KVM: Add fields for MSI device assignment · 0937c48d

Sheng Yang authored Nov 24, 2008

Prepared for kvm_arch_assigned_device_msi_dispatch().
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

0937c48d

KVM: Clean up assigned_device_update_irq · fbac7818

Sheng Yang authored Nov 24, 2008

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

fbac7818

KVM: Replace irq_requested with more generic irq_requested_type · 4f906c19

Sheng Yang authored Nov 24, 2008

Separate guest irq type and host irq type, for we can support guest using INTx
with host using MSI (but not opposite combination).
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

4f906c19

KVM: Separate update irq to a single function · 00e3ed39

Sheng Yang authored Nov 24, 2008

Separate INTx enabling part to a independence function, so that we can add MSI
enabling part easily.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

00e3ed39

KVM: Move ack notifier register and IRQ sourcd ID request · 342ffb93

Sheng Yang authored Nov 24, 2008

Distinguish common part for device assignment and INTx part, perparing for
refactor later.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

342ffb93

x86: KVM guest: sign kvmclock as paravirt · 423cd25a

Glauber Costa authored Nov 24, 2008

Currently, we only set the KVM paravirt signature in case
of CONFIG_KVM_GUEST. However, it is possible to have it turned
off, while CONFIG_KVM_CLOCK is turned on. This is also a paravirt
case, and should be shown accordingly.
Signed-off-by: Glauber Costa <glommer@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

423cd25a

KVM: VMX: Conditionally request interrupt window after injecting irq · df203ec9

Avi Kivity authored Nov 23, 2008

If we're injecting an interrupt, and another one is pending, request
an interrupt window notification so we don't have excess latency on the
second interrupt.

This shouldn't happen in practice since an EOI will be issued, giving a second
chance to request an interrupt window, but...
Signed-off-by: Avi Kivity <avi@redhat.com>

df203ec9

KVM: ia64: Clean up vmm_ivt.S using tab to indent every line · 8fe07367

Xiantao Zhang authored Nov 21, 2008

Using tab for indentation for vmm_ivt.S.
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

8fe07367

KVM: ia64: Add handler for crashed vmm · 9f7d5bb5

Xiantao Zhang authored Nov 21, 2008

Since vmm runs in an isolated address space and it is just a copy
of host's kvm-intel module, so once vmm crashes, we just crash all guests
running on it instead of crashing whole kernel.
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

9f7d5bb5

KVM: ia64: Add some debug points to provide crash infomation · 5e2be198

Xiantao Zhang authored Nov 21, 2008

Use printk infrastructure to print out some debug info once VM crashes.
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

5e2be198

KVM: ia64: Define printk function for kvm-intel module · 7d637978

Xiantao Zhang authored Nov 21, 2008

kvm-intel module is relocated to an isolated address space
with kernel, so it can't call host kernel's printk for debug
purpose. In the module, we implement the printk to output debug
info of vmm.
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

7d637978

x86: disable VMX on all CPUs on reboot · d176720d

Eduardo Habkost authored Nov 17, 2008

On emergency_restart, we may need to use an NMI to disable virtualization
on all CPUs. We do that using nmi_shootdown_cpus() if VMX is enabled.

Note: With this patch, we will run the NMI stuff only when the CPU where
emergency_restart() was called has VMX enabled. This should work on most
cases because KVM enables VMX on all CPUs, but we may miss the small
window where KVM is doing that. Also, I don't know if all code using
VMX out there always enable VMX on all CPUs like KVM does. We have two
other alternatives for that:

a) Have an API that all code that enables VMX on any CPU should use
to tell the kernel core that it is going to enable VMX on the CPUs.
b) Always call nmi_shootdown_cpus() if the CPU supports VMX. This is
a bit intrusive and more risky, as it would run nmi_shootdown_cpus()
on emergency_reboot() even on systems where virtualization is never
enabled.

Finding a proper point to hook the nmi_shootdown_cpus() call isn't
trivial, as the non-emergency machine_restart() (that doesn't need the
NMI tricks) uses machine_emergency_restart() directly.

The solution to make this work without adding a new function or argument
to machine_ops was setting a 'reboot_emergency' flag that tells if
native_machine_emergency_restart() needs to do the virt cleanup or not.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

d176720d

kdump: forcibly disable VMX and SVM on machine_crash_shutdown() · 2340b62f

Eduardo Habkost authored Nov 17, 2008

We need to disable virtualization extensions on all CPUs before booting
the kdump kernel, otherwise the kdump kernel booting will fail, and
rebooting after the kdump kernel did its task may also fail.

We do it using cpu_emergency_vmxoff() and cpu_emergency_svm_disable(),
that should always work, because those functions check if the CPUs
support SVM or VMX before doing their tasks.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

2340b62f

x86: cpu_emergency_svm_disable() function · 0f3e9eeb

Eduardo Habkost authored Nov 17, 2008

This function can be used by the reboot or kdump code to forcibly
disable SVM on the CPU.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

0f3e9eeb

KVM: SVM: move svm_hardware_disable() code to asm/virtext.h · 2c8dceeb

Eduardo Habkost authored Nov 17, 2008

Create cpu_svm_disable() function.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

2c8dceeb

KVM: SVM: move has_svm() code to asm/virtext.h · 63d1142f

Eduardo Habkost authored Nov 17, 2008

Use a trick to keep the printk()s on has_svm() working as before. gcc
will take care of not generating code for the 'msg' stuff when the
function is called with a NULL msg argument.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

63d1142f

x86: cpu_emergency_vmxoff() function · 6aa07a0d

Eduardo Habkost authored Nov 17, 2008

Add cpu_emergency_vmxoff() and its friends: cpu_vmx_enabled() and
__cpu_emergency_vmxoff().
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

6aa07a0d

KVM: VMX: extract kvm_cpu_vmxoff() from hardware_disable() · 710ff4a8

Eduardo Habkost authored Nov 17, 2008

Along with some comments on why it is different from the core cpu_vmxoff()
function.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

710ff4a8

x86: asm/virtext.h: add cpu_vmxoff() inline function · 1e993114

Eduardo Habkost authored Nov 17, 2008

Unfortunately we can't use exactly the same code from vmx
hardware_disable(), because the KVM function uses the
__kvm_handle_fault_on_reboot() tricks.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

1e993114

KVM: VMX: move cpu_has_kvm_support() to an inline on asm/virtext.h · 6210e37b

Eduardo Habkost authored Nov 17, 2008

It will be used by core code on kdump and reboot, to disable
vmx if needed.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

6210e37b

KVM: VMX: move ASM_VMX_* definitions from asm/kvm_host.h to asm/vmx.h · eca70fc5

Eduardo Habkost authored Nov 17, 2008

Those definitions will be used by code outside KVM, so move it outside
of a KVM-specific source file.

Those definitions are used only on kvm/vmx.c, that already includes
asm/vmx.h, so they can be moved safely.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

eca70fc5

KVM: SVM: move svm.h to include/asm · c2cedf7b

Eduardo Habkost authored Nov 17, 2008

svm.h will be used by core code that is independent of KVM, so I am
moving it outside the arch/x86/kvm directory.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

c2cedf7b

KVM: VMX: move vmx.h to include/asm · 13673a90

Eduardo Habkost authored Nov 17, 2008

vmx.h will be used by core code that is independent of KVM, so I am
moving it outside the arch/x86/kvm directory.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

13673a90

KVM: ppc: fix userspace mapping invalidation on context switch · fe4e771d

Hollis Blanchard authored Nov 10, 2008

We used to defer invalidating userspace TLB entries until jumping out of the
kernel. This was causing MMU weirdness most easily triggered by using a pipe in
the guest, e.g. "dmesg | tail". I believe the problem was that after the guest
kernel changed the PID (part of context switch), the old process's mappings
were still present, and so copy_to_user() on the "return to new process" path
ended up using stale mappings.

Testing with large pages (64K) exposed the problem, probably because with 4K
pages, pressure on the TLB faulted all process A's mappings out before the
guest kernel could insert any for process B.
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

fe4e771d

KVM: ppc: use prefetchable mappings for guest memory · df9b856c

Hollis Blanchard authored Nov 10, 2008

Bare metal Linux on 440 can "overmap" RAM in the kernel linear map, so that it
can use large (256MB) mappings even if memory isn't a multiple of 256MB. To
prevent the hardware prefetcher from loading from an invalid physical address
through that mapping, it's marked Guarded.

However, KVM must ensure that all guest mappings are backed by real physical
RAM (since a deliberate access through a guarded mapping could still cause a
machine check). Accordingly, we don't need to make our mappings guarded, so
let's allow prefetching as the designers intended.

Curiously this patch didn't affect performance at all on the quick test I
tried, but it's clearly the right thing to do anyways and may improve other
workloads.
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

df9b856c

KVM: ppc: use MMUCR accessor to obtain TID · bf5d4025

Hollis Blanchard authored Nov 10, 2008

We have an accessor; might as well use it.
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

bf5d4025

KVM: Fix kernel allocated memory slot · e7cacd40

Sheng Yang authored Nov 11, 2008

Commit 7fd49de9773fdcb7b75e823b21c1c5dc1e218c14 "KVM: ensure that memslot
userspace addresses are page-aligned" broke kernel space allocated memory
slot, for the userspace_addr is invalid.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

e7cacd40

KVM: ia64: Remove some macro definitions in asm-offsets.c. · 30ed5bb6

Xiantao Zhang authored Oct 24, 2008

Use kernel's corresponding macro instead.
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

30ed5bb6

KVM: ppc: fix Kconfig constraints · 74ef740d

Hollis Blanchard authored Nov 07, 2008

Make sure that CONFIG_KVM cannot be selected without processor support
(currently, 440 is the only processor implementation available).
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

74ef740d

KVM: ensure that memslot userspace addresses are page-aligned · 78749809

Hollis Blanchard authored Nov 07, 2008

Bad page translation and silent guest failure ensue if the userspace address is
not page-aligned.  I hit this problem using large (host) pages with qemu,
because qemu currently has a hardcoded 4096-byte alignment for guest memory
allocations.
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

78749809

KVM: Fix cpuid iteration on multiple leaves per eac · 0fdf8e59

Nitin A Kamble authored Nov 05, 2008

The code to traverse the cpuid data array list for counting type of leaves is
currently broken.

This patches fixes the 2 things in it.

 1. Set the 1st counting entry's flag KVM_CPUID_FLAG_STATE_READ_NEXT. Without
    it the code will never find a valid entry.

 2. Also the stop condition in the for loop while looking for the next unflaged
    entry is broken. It needs to stop when it find one matching entry;
    and in the case of count of 1, it will be the same entry found in this
    iteration.
Signed-Off-By: Nitin A Kamble <nitin.a.kamble@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

0fdf8e59

KVM: Fix cpuid leaf 0xb loop termination · 0853d2c1

Nitin A Kamble authored Nov 05, 2008

For cpuid leaf 0xb the bits 8-15 in ECX register define the end of counting
leaf.      The previous code was using bits 0-7 for this purpose, which is
a bug.
Signed-off-by: Nitin A Kamble <nitin.a.kamble@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

0853d2c1

KVM: ppc: improve trap emulation · fcfdbd26

Hollis Blanchard authored Nov 05, 2008

set ESR[PTR] when emulating a guest trap. This allows Linux guests to
properly handle WARN_ON() (i.e. detect that it's a non-fatal trap).

Also remove debugging printk in trap emulation.
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

fcfdbd26

KVM: ppc: optimize irq delivery path · d4cf3892

Hollis Blanchard authored Nov 05, 2008

In kvmppc_deliver_interrupt is just one case left in the switch and it is a
rare one (less than 8%) when looking at the exit numbers. Therefore we can
at least drop the switch/case and if an if. I inserted an unlikely too, but
that's open for discussion.

In kvmppc_can_deliver_interrupt all frequent cases are in the default case.
I know compilers are smart but we can make it easier for them. By writing
down all options and removing the default case combined with the fact that
ithe values are constants 0..15 should allow the compiler to write an easy
jump table.
Modifying kvmppc_can_deliver_interrupt pointed me to the fact that gcc seems
to be unable to reduce priority_exception[x] to a build time constant.
Therefore I changed the usage of the translation arrays in the interrupt
delivery path completely. It is now using priority without translation to irq
on the full irq delivery path.
To be able to do that ivpr regs are stored by their priority now.

Additionally the decision made in kvmppc_can_deliver_interrupt is already
sufficient to get the value of interrupt_msr_mask[x]. Therefore we can replace
the 16x4byte array used here with a single 4byte variable (might still be one
miss, but the chance to find this in cache should be better than the right
entry of the whole array).
Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

d4cf3892