Commits · ec04b2604c3707a46db1d26d98f82b11d0844669 · nexedi / linux

10 Sep, 2009 40 commits

KVM: Prepare memslot data structures for multiple hugepage sizes · ec04b260

Joerg Roedel authored Jun 19, 2009

[avi: fix build on non-x86]
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

ec04b260

hugetlbfs: export vma_kernel_pagsize to modules · f340ca0f

Joerg Roedel authored Jun 19, 2009

This function is required by KVM.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

f340ca0f

KVM: s390: Fix memslot initialization for userspace_addr != 0 · 3eea8437

Christian Borntraeger authored Jun 23, 2009

Since
commit 854b5338196b1175706e99d63be43a4f8d8ab607
Author: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
    KVM: s390: streamline memslot handling

s390 uses the values of the memslot instead of doing everything in the arch
ioctl handler of the KVM_SET_USER_MEMORY_REGION. Unfortunately we missed to
set the userspace_addr of our memslot due to our s390 ifdef in
__kvm_set_memory_region.
Old s390 userspace launchers did not notice, since they started the guest at
userspace address 0.
Because of CONFIG_DEFAULT_MMAP_MIN_ADDR we now put the guest at 1M userspace,
which does not work. This patch makes sure that new.userspace_addr is set
on s390.
This fix should go in quickly. Nevertheless, looking at the code we should
clean up that ifdef in the long term. Any kernel janitors?
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

3eea8437

KVM: x86 emulator: Add sysexit emulation · 4668f050

Andre Przywara authored Jun 18, 2009

Handle #UD intercept of the sysexit instruction in 64bit mode returning to
32bit compat mode on an AMD host.
Setup the segment descriptors for CS and SS and the EIP/ESP registers
according to the manual.
Signed-off-by: Christoph Egger <christoph.egger@amd.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

4668f050

KVM: x86 emulator: Add sysenter emulation · 8c604352

Andre Przywara authored Jun 18, 2009

Handle #UD intercept of the sysenter instruction in 32bit compat mode on
an AMD host.
Setup the segment descriptors for CS and SS and the EIP/ESP registers
according to the manual.
Signed-off-by: Christoph Egger <christoph.egger@amd.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

8c604352

KVM: x86 emulator: add syscall emulation · e66bb2cc

Andre Przywara authored Jun 18, 2009

Handle #UD intercept of the syscall instruction in 32bit compat mode on
an Intel host.
Setup the segment descriptors for CS and SS and the EIP/ESP registers
according to the manual. Save the RIP and EFLAGS to the correct registers.

[avi: fix build on i386 due to missing R11]
Signed-off-by: Christoph Egger <christoph.egger@amd.com>
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

e66bb2cc

KVM: x86 emulator: Prepare for emulation of syscall instructions · e99f0507

Andre Przywara authored Jun 17, 2009

Add the flags needed for syscall, sysenter and sysexit to the opcode table.
Catch (but for now ignore) the opcodes in the emulation switch/case.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Christoph Egger <christoph.egger@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

e99f0507

KVM: x86 emulator: Add missing EFLAGS bit definitions · b1d86143

Andre Przywara authored Jun 17, 2009

Signed-off-by: Christoph Egger <christoph.egger@amd.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

b1d86143

KVM: Allow emulation of syscalls instructions on #UD · 0cb5762e

Andre Przywara authored Jun 17, 2009

Add the opcodes for syscall, sysenter and sysexit to the list of instructions
handled by the undefined opcode handler.
Signed-off-by: Christoph Egger <christoph.egger@amd.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

0cb5762e

KVM: convert custom marker based tracing to event traces · 229456fc

Marcelo Tosatti authored Jun 17, 2009

This allows use of the powerful ftrace infrastructure.

See Documentation/trace/ for usage information.

[avi, stephen: various build fixes]
[sheng: fix control register breakage]
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

229456fc

KVM: SVM: Improve nested interrupt injection · 219b65dc

Alexander Graf authored Jun 15, 2009

While trying to get Hyper-V running, I realized that the interrupt injection
mechanisms that are in place right now are not 100% correct.

This patch makes nested SVM's interrupt injection behave more like on a
real machine.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

219b65dc

KVM: SVM: Implement INVLPGA · ff092385

Alexander Graf authored Jun 15, 2009

SVM adds another way to do INVLPG by ASID which Hyper-V makes use of,
so let's implement it!

For now we just do the same thing invlpg does, as asid switching
means we flush the mmu anyways. That might change one day though.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

ff092385

KVM: Implement MSRs used by Hyper-V · 3c5d0a44

Alexander Graf authored Jun 15, 2009

Hyper-V uses some MSRs, some of which are actually reserved for BIOS usage.

But let's be nice today and have it its way, because otherwise it fails
terribly.

[jaswinder: fix build for linux-next changes]
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

3c5d0a44

x86: Add definition for IGNNE MSR · 0367b433

Alexander Graf authored Jun 15, 2009

Hyper-V accesses MSR_IGNNE while running under KVM.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

0367b433

KVM: SVM: Don't save/restore host cr2 · b3dbf89e

Avi Kivity authored Jun 16, 2009

The host never reads cr2 in process context, so are free to clobber it. The
vmx code does this, so we can safely remove the save/restore code.
Signed-off-by: Avi Kivity <avi@redhat.com>

b3dbf89e

KVM: VMX: Only reload guest cr2 if different from host cr2 · d3edefc0

Avi Kivity authored Jun 16, 2009

cr2 changes only rarely, and writing it is expensive.  Avoid the costly cr2
writes by checking if it does not already hold the desired value.

Shaves 70 cycles off the vmexit latency.
Signed-off-by: Avi Kivity <avi@redhat.com>

d3edefc0

KVM: Drop useless atomic test from timer function · 681405bf

Jan Kiszka authored Jun 09, 2009

The current code tries to optimize the setting of
KVM_REQ_PENDING_TIMER but used atomic_inc_and_test - which always
returns true unless pending had the invalid value of -1 on entry. This
patch drops the test part preserving the original semantic but
expressing it less confusingly.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

681405bf

KVM: Fix racy event propagation in timer · f7104db2

Jan Kiszka authored Jun 09, 2009

Minor issue that likely had no practical relevance: the kvm timer
function so far incremented the pending counter and then may reset it
again to 1 in case reinjection was disabled. This opened a small racy
window with the corresponding VCPU loop that may have happened to run
on another (real) CPU and already consumed the value.

Fix it by skipping the incrementation in case pending is already > 0.
This opens a different race windows, but may only rarely cause lost
events in case we do not care about them anyway (!reinject).
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

f7104db2

KVM: Optimize searching for highest IRR · 33e4c686

Gleb Natapov authored Jun 11, 2009

Most of the time IRR is empty, so instead of scanning the whole IRR on
each VM entry keep a variable that tells us if IRR is not empty. IRR
will have to be scanned twice on each IRQ delivery, but this is much
more rare than VM entry.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

33e4c686

KVM: Replace pending exception by PF if it happens serially · 6edf14d8

Gleb Natapov authored Jun 11, 2009

Replace previous exception with a new one in a hope that instruction
re-execution will regenerate lost exception.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

6edf14d8

KVM: VMX: conditionally disable 2M pages · 54dee993

Marcelo Tosatti authored Jun 11, 2009

Disable usage of 2M pages if VMX_EPT_2MB_PAGE_BIT (bit 16) is clear
in MSR_IA32_VMX_EPT_VPID_CAP and EPT is enabled.

[avi: s/largepages_disabled/largepages_enabled/ to avoid negative logic]
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

54dee993

KVM: VMX: EPT misconfiguration handler · 68f89400

Marcelo Tosatti authored Jun 11, 2009

Handler for EPT misconfiguration which checks for valid state
in the shadow pagetables, printing the spte on each level.

The separate WARN_ONs are useful for kerneloops.org.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

68f89400

KVM: MMU: add kvm_mmu_get_spte_hierarchy helper · 94d8b056

Marcelo Tosatti authored Jun 11, 2009

Required by EPT misconfiguration handler.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

94d8b056

KVM: MMU: make for_each_shadow_entry aware of largepages · 4d88954d

Marcelo Tosatti authored Jun 11, 2009

This way there is no need to add explicit checks in every
for_each_shadow_entry user.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

4d88954d

KVM: VMX: more MSR_IA32_VMX_EPT_VPID_CAP capability bits · e799794e

Marcelo Tosatti authored Jun 11, 2009

Required for EPT misconfiguration handler.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

e799794e

KVM: Move performance counter MSR access interception to generic x86 path · 71db6023

Andre Przywara authored Jun 12, 2009

The performance counter MSRs are different for AMD and Intel CPUs and they
are chosen mainly by the CPUID vendor string. This patch catches writes to
all addresses (regardless of VMX/SVM path) and handles them in the generic
MSR handler routine. Writing a 0 into the event select register is something
we perfectly emulate ;-), so don't print out a warning to dmesg in this
case.
This fixes booting a 64bit Windows guest with an AMD CPUID on an Intel host.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

71db6023

KVM: MMU audit: largepage handling · 2920d728

Marcelo Tosatti authored Jun 10, 2009

Make the audit code aware of largepages.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

2920d728

KVM: MMU audit: audit_mappings tweaks · 2aaf65e8

Marcelo Tosatti authored Jun 10, 2009

- Fail early in case gfn_to_pfn returns is_error_pfn.
- For the pre pte write case, avoid spurious "gva is valid but spte is notrap"
  messages (the emulation code does the guest write first, so this particular
  case is OK).
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

2aaf65e8

KVM: MMU audit: nontrapping ptes in nonleaf level · 48fc0317

Marcelo Tosatti authored Jun 10, 2009

It is valid to set non leaf sptes as notrap.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

48fc0317

KVM: MMU audit: update audit_write_protection · e58b0f9e

Marcelo Tosatti authored Jun 10, 2009

- Unsync pages contain writable sptes in the rmap.
- rmaps do not exclusively contain writable sptes anymore.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

e58b0f9e

KVM: MMU audit: update count_writable_mappings / count_rmaps · 08a3732b

Marcelo Tosatti authored Jun 10, 2009

Under testing, count_writable_mappings returns a value that is 2 integers
larger than what count_rmaps returns.

Suspicion is that either of the two functions is counting a duplicate (either
positively or negatively).

Modifying check_writable_mappings_rmap to check for rmap existance on
all present MMU pages fails to trigger an error, which should keep Avi
happy.

Also introduce mmu_spte_walk to invoke a callback on all present sptes visible
to the current vcpu, might be useful in the future.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

08a3732b

KVM: MMU: introduce is_last_spte helper · 776e6633

Marcelo Tosatti authored Jun 10, 2009

Hiding some of the last largepage / level interaction (which is useful
for gbpages and for zero based levels).

Also merge the PT_PAGE_TABLE_LEVEL clearing loop in unlink_children.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

776e6633

KVM: Return to userspace on emulation failure · 3f5d18a9

Avi Kivity authored Jun 11, 2009

Instead of mindlessly retrying to execute the instruction, report the
failure to userspace.
Signed-off-by: Avi Kivity <avi@redhat.com>

3f5d18a9

KVM: Use macro to iterate over vcpus. · 988a2cae

Gleb Natapov authored Jun 09, 2009

[christian: remove unused variables on s390]
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

988a2cae

KVM: Break dependency between vcpu index in vcpus array and vcpu_id. · 73880c80

Gleb Natapov authored Jun 09, 2009

Archs are free to use vcpu_id as they see fit. For x86 it is used as
vcpu's apic id. New ioctl is added to configure boot vcpu id that was
assumed to be 0 till now.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

73880c80

KVM: Use pointer to vcpu instead of vcpu_id in timer code. · 1ed0ce00
Gleb Natapov authored Jun 09, 2009
```
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
```
1ed0ce00

KVM: Introduce kvm_vcpu_is_bsp() function. · c5af89b6

Gleb Natapov authored Jun 09, 2009

Use it instead of open code "vcpu_id zero is BSP" assumption.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

c5af89b6

KVM: MMU: s/shadow_pte/spte/ · d555c333

Avi Kivity authored Jun 10, 2009

We use shadow_pte and spte inconsistently, switch to the shorter spelling.

Rename set_shadow_pte() to __set_spte() to avoid a conflict with the
existing set_spte(), and to indicate its lowlevelness.
Signed-off-by: Avi Kivity <avi@redhat.com>

d555c333

KVM: MMU: Adjust pte accessors to explicitly indicate guest or shadow pte · 43a3795a

Avi Kivity authored Jun 10, 2009

Since the guest and host ptes can have wildly different format, adjust
the pte accessor names to indicate on which type of pte they operate on.

No functional changes.
Signed-off-by: Avi Kivity <avi@redhat.com>

43a3795a

KVM: MMU: Fix is_dirty_pte() · 439e218a

Avi Kivity authored Jun 10, 2009

is_dirty_pte() is used on guest ptes, not shadow ptes, so it needs to avoid
shadow_dirty_mask and use PT_DIRTY_MASK instead.

Misdetecting dirty pages could lead to unnecessarily setting the dirty bit
under EPT.
Signed-off-by: Avi Kivity <avi@redhat.com>

439e218a