Commits · e730b63cc083852551b092e1c93f0ef22c25f220 · Kirill Smelkov / linux

12 Jan, 2011 40 commits

KVM: MMU: don't mark spte notrap if reserved bit set · e730b63c

Xiao Guangrong authored Nov 17, 2010

If reserved bit is set, we need inject the #PF with PFEC.RSVD=1,
but shadow_notrap_nonpresent_pte injects #PF with PFEC.RSVD=0 only
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

e730b63c

KVM: Document device assigment API · 49f48172

Jan Kiszka authored Nov 16, 2010

Adds API documentation for KVM_[DE]ASSIGN_PCI_DEVICE,
KVM_[DE]ASSIGN_DEV_IRQ, KVM_SET_GSI_ROUTING, KVM_ASSIGN_SET_MSIX_NR, and
KVM_ASSIGN_SET_MSIX_ENTRY.
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

49f48172

KVM: Clean up kvm_vm_ioctl_assigned_device · 51de271d

Jan Kiszka authored Nov 16, 2010

Any arch not supporting device assigment will also not build
assigned-dev.c. So testing for KVM_CAP_DEVICE_DEASSIGNMENT is pointless.
KVM_CAP_ASSIGN_DEV_IRQ is unconditinally set. Moreover, add a default
case for dispatching the ioctl.
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

51de271d

KVM: Save/restore state of assigned PCI device · ed78661f

Jan Kiszka authored Nov 16, 2010

The guest may change states that pci_reset_function does not touch. So
we better save/restore the assigned device across guest usage.
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

ed78661f

KVM: Refactor IRQ names of assigned devices · 1e001d49

Jan Kiszka authored Nov 16, 2010

Cosmetic change, but it helps to correlate IRQs with PCI devices.
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

1e001d49

KVM: Switch assigned device IRQ forwarding to threaded handler · 0645211c

Jan Kiszka authored Nov 16, 2010

This improves the IRQ forwarding for assigned devices: By using the
kernel's threaded IRQ scheme, we can get rid of the latency-prone work
queue and simplify the code in the same run.

Moreover, we no longer have to hold assigned_dev_lock while raising the
guest IRQ, which can be a lenghty operation as we may have to iterate
over all VCPUs. The lock is now only used for synchronizing masking vs.
unmasking of INTx-type IRQs, thus is renames to intx_lock.
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

0645211c

KVM: Clear assigned guest IRQ on release · 0c106b5a

Jan Kiszka authored Nov 16, 2010

When we deassign a guest IRQ, clear the potentially asserted guest line.
There might be no chance for the guest to do this, specifically if we
switch from INTx to MSI mode.
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

0c106b5a

KVM: Mask KVM_GET_SUPPORTED_CPUID data with Linux cpuid info · 945ee35e

Avi Kivity authored Nov 09, 2010

This allows Linux to mask cpuid bits if, for example, nx is enabled on only
some cpus.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

945ee35e

KVM: SVM: Replace svm_has() by standard Linux cpuid accessors · 2a6b20b8

Avi Kivity authored Nov 09, 2010

Instead of querying cpuid directly, use the Linux accessors (boot_cpu_has,
etc.).  This allows the things like the clearcpuid kernel command line to
work (when it's fixed wrt scattered cpuid bits).
Acked-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

2a6b20b8

KVM: MMU: fix apf prefault if nested guest is enabled · c4806acd

Xiao Guangrong authored Nov 12, 2010

If apf is generated in L2 guest and is completed in L1 guest, it will
prefault this apf in L1 guest's mmu context.
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

c4806acd

KVM: MMU: support apf for nonpaing guest · 060c2abe

Xiao Guangrong authored Nov 12, 2010

Let's support apf for nonpaing guest
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

060c2abe

KVM: MMU: clear apfs if page state is changed · e5f3f027

Xiao Guangrong authored Nov 12, 2010

If CR0.PG is changed, the page fault cann't be avoid when the prefault address
is accessed later

And it also fix a bug: it can retry a page enabled #PF in page disabled context
if mmu is shadow page

This idear is from Gleb Natapov
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

e5f3f027

KVM: MMU: fix missing post sync audit · 5054c0de

Xiao Guangrong authored Nov 12, 2010

Add AUDIT_POST_SYNC audit for long mode shadow page
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

5054c0de

KVM: Clean up vm creation and release · d89f5eff

Jan Kiszka authored Nov 09, 2010

IA64 support forces us to abstract the allocation of the kvm structure.
But instead of mixing this up with arch-specific initialization and
doing the same on destruction, split both steps. This allows to move
generic destruction calls into generic code.

It also fixes error clean-up on failures of kvm_create_vm for IA64.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

d89f5eff

KVM: x86: Makefile clean up · 9d893c6b

Tracey Dent authored Nov 06, 2010

Changed makefile to use the ccflags-y option instead of EXTRA_CFLAGS.
Signed-off-by: Tracey Dent <tdent48227@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

9d893c6b

KVM: remove unused function declaration · 2a126faa

Xiao Guangrong authored Nov 04, 2010

Remove the declaration of kvm_mmu_set_base_ptes()
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

2a126faa

KVM: Refactor srcu struct release on early errors · 57e7fbee

Jan Kiszka authored Nov 09, 2010

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

57e7fbee

KVM: VMX: Disallow NMI while blocked by STI · 30bd0c4c

Avi Kivity authored Nov 01, 2010

While not mandated by the spec, Linux relies on NMI being blocked by an
IF-enabling STI.  VMX also refuses to enter a guest in this state, at
least on some implementations.

Disallow NMI while blocked by STI by checking for the condition, and
requesting an interrupt window exit if it occurs.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

30bd0c4c

KVM: fix the race while wakeup all pv guest · 64f638c7

Xiao Guangrong authored Nov 01, 2010

In kvm_async_pf_wakeup_all(), we add a dummy apf to vcpu->async_pf.done
without holding vcpu->async_pf.lock, it will break if we are handling apfs
at this time.

Also use 'list_empty_careful()' instead of 'list_empty()'
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Acked-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

64f638c7

KVM: handle more completed apfs if possible · 15096ffc

Xiao Guangrong authored Nov 02, 2010

If it's no need to inject async #PF to PV guest we can handle
more completed apfs at one time, so we can retry guest #PF
as early as possible
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Acked-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

15096ffc

KVM: avoid unnecessary wait for a async pf · e6d53e3b

Xiao Guangrong authored Nov 01, 2010

In current code, it checks async pf completion out of the wait context,
like this:

if (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE &&
		    !vcpu->arch.apf.halted)
			r = vcpu_enter_guest(vcpu);
		else {
			......
			kvm_vcpu_block(vcpu)
			 ^- waiting until 'async_pf.done' is not empty
}

kvm_check_async_pf_completion(vcpu)
 ^- delete list from async_pf.done

So, if we check aysnc pf completion first, it can be blocked at
kvm_vcpu_block

Fixed by mark the vcpu is unhalted in kvm_check_async_pf_completion()
path
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Acked-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

e6d53e3b

KVM: fix searching async gfn in kvm_async_pf_gfn_slot · c7d28c24

Xiao Guangrong authored Nov 01, 2010

Don't search later slots if the slot is empty
Acked-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

c7d28c24

KVM: cleanup async_pf tracepoints · 0730388b

Xiao Guangrong authored Nov 01, 2010

Use 'DECLARE_EVENT_CLASS' to cleanup async_pf tracepoints
Acked-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

0730388b

KVM: fix tracing kvm_try_async_get_page · c9b263d2

Xiao Guangrong authored Nov 01, 2010

Tracing 'async' and *pfn is useless, since 'async' is always true,
and '*pfn' is always "fault_pfn'

We can trace 'gva' and 'gfn' instead, it can help us to see the
life-cycle of an async_pf
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

c9b263d2

KVM: replace vmalloc and memset with vzalloc · 26535037

Takuya Yoshikawa authored Nov 02, 2010

Let's use newly introduced vzalloc().
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

26535037

KVM: handle exit due to INVD in VMX · ec25d5e6

Gleb Natapov authored Nov 01, 2010

Currently the exit is unhandled, so guest halts with error if it tries
to execute INVD instruction. Call into emulator when INVD instruction
is executed by a guest instead. This instruction is not needed by ordinary
guests, but firmware (like OpenBIOS) use it and fail.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

ec25d5e6

KVM: x86: Avoid issuing wbinvd twice · 2eec7343

Jan Kiszka authored Nov 01, 2010

Micro optimization to avoid calling wbinvd twice on the CPU that has to
emulate it. As we might be preempted between smp_call_function_many and
the local wbinvd, the cache might be filled again so that real work
could be done uselessly.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

2eec7343

KVM: get rid of warning within kvm_dev_ioctl_create_vm · aac87636

Heiko Carstens authored Oct 27, 2010

Fixes this:

  CC      arch/s390/kvm/../../../virt/kvm/kvm_main.o
arch/s390/kvm/../../../virt/kvm/kvm_main.c: In function 'kvm_dev_ioctl_create_vm':
arch/s390/kvm/../../../virt/kvm/kvm_main.c:1828:10: warning: unused variable 'r'
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

aac87636

KVM: add cast within kvm_clear_guest_page to fix warning · 3bcc8a8c

Heiko Carstens authored Oct 27, 2010

Fixes this:

CC arch/s390/kvm/../../../virt/kvm/kvm_main.o
arch/s390/kvm/../../../virt/kvm/kvm_main.c: In function 'kvm_clear_guest_page':
arch/s390/kvm/../../../virt/kvm/kvm_main.c:1224:2: warning: passing argument 3 of 'kvm_write_guest_page' makes pointer from integer without a cast
arch/s390/kvm/../../../virt/kvm/kvm_main.c:1185:5: note: expected 'const void *' but argument is of type 'long unsigned int'
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

3bcc8a8c

KVM: use kmalloc() for small dirty bitmaps · 6f9e5c17

Takuya Yoshikawa authored Nov 01, 2010

Currently we are using vmalloc() for all dirty bitmaps even if
they are small enough, say less than K bytes.

We use kmalloc() if dirty bitmap size is less than or equal to
PAGE_SIZE so that we can avoid vmalloc area usage for VGA.

This will also make the logging start/stop faster.
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

6f9e5c17

KVM: pre-allocate one more dirty bitmap to avoid vmalloc() · 515a0127

Takuya Yoshikawa authored Oct 27, 2010

Currently x86's kvm_vm_ioctl_get_dirty_log() needs to allocate a bitmap by
vmalloc() which will be used in the next logging and this has been causing
bad effect to VGA and live-migration: vmalloc() consumes extra systime,
triggers tlb flush, etc.

This patch resolves this issue by pre-allocating one more bitmap and switching
between two bitmaps during dirty logging.

Performance improvement:
  I measured performance for the case of VGA update by trace-cmd.
  The result was 1.5 times faster than the original one.

  In the case of live migration, the improvement ratio depends on the workload
  and the guest memory size. In general, the larger the memory size is the more
  benefits we get.

Note:
  This does not change other architectures's logic but the allocation size
  becomes twice. This will increase the actual memory consumption only when
  the new size changes the number of pages allocated by vmalloc().
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

515a0127

KVM: introduce wrapper functions for creating/destroying dirty bitmaps · a36a57b1

Takuya Yoshikawa authored Oct 27, 2010

This makes it easy to change the way of allocating/freeing dirty bitmaps.
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

a36a57b1

KVM: x86: trace "exit to userspace" event · 64be5007

Gleb Natapov authored Oct 24, 2010

Add tracepoint for userspace exit.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

64be5007

KVM: propagate fault r/w information to gup(), allow read-only memory · 612819c3

Marcelo Tosatti authored Oct 22, 2010

As suggested by Andrea, pass r/w error code to gup(), upgrading read fault
to writable if host pte allows it.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

612819c3

KVM: MMU: flush TLBs on writable -> read-only spte overwrite · 7905d9a5

Marcelo Tosatti authored Oct 22, 2010

This can happen in the following scenario:

vcpu0			vcpu1
read fault
gup(.write=0)
			gup(.write=1)
			reuse swap cache, no COW
			set writable spte
			use writable spte
set read-only spte
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

7905d9a5

KVM: MMU: remove kvm_mmu_set_base_ptes · 982c2565

Marcelo Tosatti authored Oct 22, 2010

Unused.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

982c2565

KVM: VMX: remove setting of shadow_base_ptes for EPT · ff1fcb9e

Marcelo Tosatti authored Oct 22, 2010

The EPT present/writable bits use the same position as normal
pagetable bits.

Since direct_map passes ACC_ALL to mmu_set_spte, thus always setting
the writable bit on sptes, use the generic PT_PRESENT shadow_base_pte.

Also pass present/writable error code information from EPT violation
to generic pagefault handler.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

ff1fcb9e

KVM: Avoid double interrupt injection with vapic · 83bcacb1

Avi Kivity authored Oct 25, 2010

After an interrupt injection, the PPR changes, and we have to reflect that
into the vapic. This causes a KVM_REQ_EVENT to be set, which causes the
whole interrupt injection routine to be run again (harmlessly).

Optimize by only setting KVM_REQ_EVENT if the ppr was lowered; otherwise
there is no chance that a new injection is needed.
Signed-off-by: Avi Kivity <avi@redhat.com>

83bcacb1

KVM: SVM: Fold save_host_msrs() and load_host_msrs() into their callers · 82ca2d10

Avi Kivity authored Oct 21, 2010

This abstraction only serves to obfuscate.  Remove.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

82ca2d10

KVM: SVM: Move fs/gs/ldt save/restore to heavyweight exit path · dacccfdd

Avi Kivity authored Oct 21, 2010

ldt is never used in the kernel context; same goes for fs (x86_64) and gs
(i386).  So save/restore them in the heavyweight exit path instead
of the lightweight path.

By itself, this doesn't buy us much, but it paves the way for moving vmload
and vmsave to the heavyweight exit path, since they modify the same registers.

[jan: fix copy/pase mistake on i386]
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

dacccfdd