Commits · 134291bf3cb434a9039298ba6b15ef33e65ba542 · nexedi / linux

12 Jul, 2011 40 commits

KVM: MMU: Clean up the error handling of walk_addr_generic() · 134291bf

Takuya Yoshikawa authored Jul 01, 2011

Avoid two step jump to the error handling part.  This eliminates the use
of the variables present and rsvd_fault.

We also use the const type qualifier to show that write/user/fetch_fault
do not change in the function.

Both of these were suggested by Ingo Molnar.

Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

134291bf

Revert "KVM: MMU: make kvm_mmu_reset_context() flush the guest TLB" · f8f7e5ee

Marcelo Tosatti authored Jun 21, 2011

This reverts commit bee931d31e588b8eb86b7edee32fac2d16930cd7.

TLB flush should be done lazily during guest entry, in
kvm_mmu_load().
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

f8f7e5ee

KVM: PPC: e500: Don't search over the entire TLB0. · 1aee47a0

Scott Wood authored Jun 14, 2011

Only look in the 4 entries that could possibly contain the
entry we're looking for.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

1aee47a0

KVM: PPC: e500: Add shadow PID support · dd9ebf1f

Liu Yu authored Jun 14, 2011

Dynamically assign host PIDs to guest PIDs, splitting each guest PID into
multiple host (shadow) PIDs based on kernel/user and MSR[IS/DS].  Use
both PID0 and PID1 so that the shadow PIDs for the right mode can be
selected, that correspond both to guest TID = zero and guest TID = guest
PID.

This allows us to significantly reduce the frequency of needing to
invalidate the entire TLB.  When the guest mode or PID changes, we just
update the host PID0/PID1.  And since the allocation of shadow PIDs is
global, multiple guests can share the TLB without conflict.

Note that KVM does not yet support the guest setting PID1 or PID2 to
a value other than zero.  This will need to be fixed for nested KVM
to work.  Until then, we enforce the requirement for guest PID1/PID2
to stay zero by failing the emulation if the guest tries to set them
to something else.
Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

dd9ebf1f

KVM: PPC: e500: Stop keeping shadow TLB · 08b7fa92

Liu Yu authored Jun 14, 2011

Instead of a fully separate set of TLB entries, keep just the
pfn and dirty status.
Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

08b7fa92

KVM: PPC: e500: enable magic page · a4cd8b23

Scott Wood authored Jun 14, 2011

This is a shared page used for paravirtualization.  It is always present
in the guest kernel's effective address space at the address indicated
by the hypercall that enables it.

The physical address specified by the hypercall is not used, as
e500 does not have real mode.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

a4cd8b23

KVM: PPC: e500: Support large page mappings of PFNMAP vmas. · 9973d54e

Scott Wood authored Jun 14, 2011

This allows large pages to be used on guest mappings backed by things like
/dev/mem, resulting in a significant speedup when guest memory
is mapped this way (it's useful for directly-assigned MMIO, too).

This is not a substitute for hugetlbfs integration, but is useful for
configurations where devices are directly assigned on chips without an
IOMMU -- in these cases, we need guest physical and true physical to
match, and be contiguous, so static reservation and mapping via /dev/mem
is the most straightforward way to set things up.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

9973d54e

KVM: PPC: e500: Eliminate shadow_pages[], and use pfns instead. · 59c1f4e3

Scott Wood authored Jun 14, 2011

This is in line with what other architectures do, and will allow us to
map things other than ordinary, unreserved kernel pages -- such as
dedicated devices, or large contiguous reserved regions.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

59c1f4e3

KVM: PPC: e500: don't use MAS0 as intermediate storage. · 0ef30995

Scott Wood authored Jun 14, 2011

This avoids races.  It also means that we use the shadow TLB way,
rather than the hardware hint -- if this is a problem, we could do
a tlbsx before inserting a TLB0 entry.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

0ef30995

KVM: PPC: e500: Disable preloading TLB1 in tlb_load(). · 6fc4d1eb

Scott Wood authored Jun 14, 2011

Since TLB1 loading doesn't check the shadow TLB before allocating another
entry, you can get duplicates.

Once shadow PIDs are enabled in a later patch, we won't need to
invalidate the TLB on every switch, so this optimization won't be
needed anyway.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

6fc4d1eb

KVM: PPC: e500: Save/restore SPE state · 4cd35f67

Scott Wood authored Jun 14, 2011

This is done lazily.  The SPE save will be done only if the guest has
used SPE since the last preemption or heavyweight exit.  Restore will be
done only on demand, when enabling MSR_SPE in the shadow MSR, in response
to an SPE fault or mtmsr emulation.

For SPEFSCR, Linux already switches it on context switch (non-lazily), so
the only remaining bit is to save it between qemu and the guest.
Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

4cd35f67

KVM: PPC: booke: use shadow_msr · ecee273f

Scott Wood authored Jun 14, 2011

Keep the guest MSR and the guest-mode true MSR separate, rather than
modifying the guest MSR on each guest entry to produce a true MSR.

Any bits which should be modified based on guest MSR must be explicitly
propagated from vcpu->arch.shared->msr to vcpu->arch.shadow_msr in
kvmppc_set_msr().

While we're modifying the guest entry code, reorder a few instructions
to bury some load latencies.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

ecee273f

powerpc/e500: SPE register saving: take arbitrary struct offset · c51584d5

Scott Wood authored Jun 14, 2011

Previously, these macros hardcoded THREAD_EVR0 as the base of the save
area, relative to the base register passed.  This base offset is now
passed as a separate macro parameter, allowing reuse with other SPE
save areas, such as used by KVM.
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

c51584d5

powerpc/e500: Save SPEFCSR in flush_spe_to_thread() · 685659ee

yu liu authored Jun 14, 2011

giveup_spe() saves the SPE state which is protected by MSR[SPE].
However, modifying SPEFSCR does not trap when MSR[SPE]=0.
And since SPEFSCR is already saved/restored in _switch(),
not all the callers want to save SPEFSCR again.
Thus, saving SPEFSCR should not belong to giveup_spe().

This patch moves SPEFSCR saving to flush_spe_to_thread(),
and cleans up the caller that needs to save SPEFSCR accordingly.
Signed-off-by: Liu Yu <yu.liu@freescale.com>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

685659ee

KVM: PPC: Resolve real-mode handlers through function exports · a22a2dac

Alexander Graf authored Jun 07, 2011

Up until now, Book3S KVM had variables stored in the kernel that a kernel module
or the kvm code in the kernel could read from to figure out where some real mode
helper functions are located.

This is all unnecessary. The high bits of the EA get ignore in real mode, so we
can just use the pointer as is. Also, it's a lot easier on relocations when we
use the normal way of resolving the address to a function, instead of jumping
through hoops.

This patch fixes compilation with CONFIG_RELOCATABLE=y.
Signed-off-by: Alexander Graf <agraf@suse.de>

a22a2dac

KVM: PPC: fix partial application of "exit timing in ticks" · 24294b9a

Stuart Yoder authored May 17, 2011

When http://www.spinics.net/lists/kvm-ppc/msg02664.html
was applied to produce commit b51e7aa7ed6d8d134d02df78300ab0f91cfff4d2,
the removal of the conversion in add_exit_timing was left out.
Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

24294b9a

KVM: MMU: make kvm_mmu_reset_context() flush the guest TLB · 45bd07b9

Avi Kivity authored Jun 12, 2011

kvm_set_cr0() and kvm_set_cr4(), and possible other functions,
assume that kvm_mmu_reset_context() flushes the guest TLB.  However,
it does not.

Fix by flushing the tlb (and syncing the new root as well).
Signed-off-by: Avi Kivity <avi@redhat.com>

45bd07b9

KVM: MMU: Adjust shadow paging to work when SMEP=1 and CR0.WP=0 · 411c588d

Avi Kivity authored Jun 06, 2011

When CR0.WP=0, we sometimes map user pages as kernel pages (to allow
the kernel to write to them).  Unfortunately this also allows the kernel
to fetch from these pages, even if CR4.SMEP is set.

Adjust for this by also setting NX on the spte in these circumstances.
Signed-off-by: Avi Kivity <avi@redhat.com>

411c588d

KVM: Enable ERMS feature support for KVM · a01c8f9b

Yang, Wei authored Jun 14, 2011

This patch exposes ERMS feature to KVM guests.

The REP MOVSB/STOSB instruction can enhance fast strings attempts to
move as much of the data with larger size load/stores as possible.
Signed-off-by: Yang, Wei <wei.y.yang@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

a01c8f9b

KVM: Expose RDWRGSFS bit to KVM guests · 176f61da

Yang, Wei authored Jun 14, 2011

This patch exposes RDWRGSFS bit to KVM guests.
Signed-off-by: Yang, Wei <wei.y.yang@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

176f61da

KVM: Add RDWRGSFS support when setting CR4 · 74dc2b4f

Yang, Wei authored Jun 14, 2011

This patch adds RDWRGSFS support when setting CR4.
Signed-off-by: Yang, Wei <wei.y.yang@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

74dc2b4f

KVM: Remove RDWRGSFS bit from CR4_RESERVED_BITS · d9c3476d

Yang, Wei authored Jun 14, 2011

This patch removes RDWRGSFS bit from CR4_RESERVED_BITS.
Signed-off-by: Yang, Wei <wei.y.yang@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

d9c3476d

KVM: Enable DRNG feature support for KVM · 4a00efdf

Yang, Wei Y authored Jun 13, 2011

This patch exposes DRNG feature to KVM guests.

The RDRAND instruction can provide software with sequences of
random numbers generated from white noise.
Signed-off-by: Yang, Wei <wei.y.yang@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

4a00efdf

KVM: fix XSAVE bit scanning (now properly) · 02668b06

Andre Przywara authored Jun 10, 2011

commit 123108f1c1aafd51d6a5c79cc04d7999dd88a930 tried to fix KVMs
XSAVE valid feature scanning, but it was wrong. It was not considering
the sparse nature of this bitfield, instead reading values from
uninitialized members of the entries array.
This patch now separates subleaf indicies from KVM's array indicies
and fills the entry before querying it's value.
This fixes AVX support in KVM guests.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

02668b06

KVM: Fix KVM_ASSIGN_SET_MSIX_ENTRY documentation · 58f0964e

Jan Kiszka authored Jun 11, 2011

The documented behavior did not match the implemented one (which also
never changed).
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

58f0964e

KVM: Fix off-by-one in overflow check of KVM_ASSIGN_SET_MSIX_NR · 9f3191ae

Jan Kiszka authored Jun 11, 2011

KVM_MAX_MSIX_PER_DEV implies that up to that many MSI-X entries can be
requested. But the kernel so far rejected already the upper limit.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

9f3191ae

KVM: Add compat ioctl for KVM_SET_SIGNAL_MASK · 1dda606c

Alexander Graf authored Jun 08, 2011

KVM has an ioctl to define which signal mask should be used while running
inside VCPU_RUN. At least for big endian systems, this mask is different
on 32-bit and 64-bit systems (though the size is identical).

Add a compat wrapper that converts the mask to whatever the kernel accepts,
allowing 32-bit kvm user space to set signal masks.

This patch fixes qemu with --enable-io-thread on ppc64 hosts when running
32-bit user land.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

1dda606c

KVM: Clarify KVM_ASSIGN_PCI_DEVICE documentation · 91e3d71d

Jan Kiszka authored Jun 03, 2011

Neither host_irq nor the guest_msi struct are used anymore today.
Tag the former, drop the latter to avoid confusion.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

91e3d71d

KVM: Add instruction fetch checking when walking guest page table · e57d4a35

Yang, Wei Y authored Jun 03, 2011

This patch adds instruction fetch checking when walking guest page table,
to implement SMEP when emulating instead of executing natively.
Signed-off-by: Yang, Wei <wei.y.yang@intel.com>
Signed-off-by: Shan, Haitao <haitao.shan@intel.com>
Signed-off-by: Li, Xin <xin.li@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

e57d4a35

KVM: Mask function7 ebx against host capability word9 · 611c120f

Yang, Wei Y authored Jun 03, 2011

This patch masks CPUID leaf 7 ebx against host capability word9.
Signed-off-by: Yang, Wei <wei.y.yang@intel.com>
Signed-off-by: Shan, Haitao <haitao.shan@intel.com>
Signed-off-by: Li, Xin <xin.li@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

611c120f

KVM: Add SMEP support when setting CR4 · c68b734f

Yang, Wei Y authored Jun 03, 2011

This patch adds SMEP handling when setting CR4.
Signed-off-by: Yang, Wei <wei.y.yang@intel.com>
Signed-off-by: Shan, Haitao <haitao.shan@intel.com>
Signed-off-by: Li, Xin <xin.li@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

c68b734f

KVM: Remove SMEP bit from CR4_RESERVED_BITS · 8d9c975f

Yang, Wei Y authored Jun 03, 2011

This patch removes SMEP bit from CR4_RESERVED_BITS.
Signed-off-by: Yang, Wei <wei.y.yang@intel.com>
Signed-off-by: Shan, Haitao <haitao.shan@intel.com>
Signed-off-by: Li, Xin <xin.li@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

8d9c975f

KVM: nVMX: Fix bug preventing more than two levels of nesting · 509c75ea

Nadav Har'El authored Jun 02, 2011

The nested VMX feature is supposed to fully emulate VMX for the guest. This
(theoretically) not only allows it to run its own guests, but also also
to further emulate VMX for its own guests, and allow arbitrarily deep nesting.

This patch fixes a bug (discovered by Kevin Tian) in handling a VMLAUNCH
by L2, which prevented deeper nesting.

Deeper nesting now works (I only actually tested L3), but is currently
*absurdly* slow, to the point of being unusable.
Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

509c75ea

KVM: Fixup documentation section numbering · 7f4382e8

Jan Kiszka authored Jun 02, 2011

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

7f4382e8

KVM: x86 emulator: fold decode_cache into x86_emulate_ctxt · 9dac77fa

Avi Kivity authored Jun 01, 2011

This saves a lot of pointless casts x86_emulate_ctxt and decode_cache.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

9dac77fa

KVM: x86 emulator: rename decode_cache::eip to _eip · 36dd9bb5

Avi Kivity authored Jun 01, 2011

The name eip conflicts with a field of the same name in x86_emulate_ctxt,
which we plan to fold decode_cache into.

The name _eip is unfortunate, but what's really needed is a refactoring
here, not a better name.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

36dd9bb5

KVM: VMX: Silence warning on 32-bit hosts · 2e4ce7f5

Jan Kiszka authored Jun 01, 2011

a is unused now on CONFIG_X86_32.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

2e4ce7f5

KVM: x86 emulator: Use opcode::execute for CLI/STI(FA/FB) · f411e6cd

Takuya Yoshikawa authored May 29, 2011

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

f411e6cd

KVM: x86 emulator: Use opcode::execute for LOOP/JCXZ · d06e03ad

Takuya Yoshikawa authored May 29, 2011

  LOOP/LOOPcc      : E0-E2
  JCXZ/JECXZ/JRCXZ : E3
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

d06e03ad

KVM: x86 emulator: Clean up INT n/INTO/INT 3(CC/CD/CE) · 5c5df76b

Takuya Yoshikawa authored May 29, 2011

Call emulate_int() directly to avoid spaghetti goto's.
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

5c5df76b