Commits · b65d6e17fe2239c9b2051727903955d922083fbf · nexedi / linux

23 Nov, 2014 2 commits

Paolo Bonzini authored Nov 21, 2014

This feature is not supported inside KVM guests yet, because we do not emulate
MSR_IA32_XSS.  Mask it out.

Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

b65d6e17

kvm: x86: move assigned-dev.c and iommu.c to arch/x86/ · c274e03a

Radim Krčmář authored Nov 21, 2014

Now that ia64 is gone, we can hide deprecated device assignment in x86.

Notable changes:
 - kvm_vm_ioctl_assigned_device() was moved to x86/kvm_arch_vm_ioctl()

The easy parts were removed from generic kvm code, remaining
 - kvm_iommu_(un)map_pages() would require new code to be moved
 - struct kvm_assigned_dev_kernel depends on struct kvm_irq_ack_notifier
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

c274e03a

21 Nov, 2014 3 commits

kvm: remove IA64 ioctls · 6b397158

Radim Krčmář authored Nov 20, 2014

KVM ia64 is no longer present so new applications shouldn't use them.
The main problem is that they most likely didn't work even before,
because of a conflict in the #defines:

  #define KVM_SET_GUEST_DEBUG       _IOW(KVMIO,  0x9b, struct kvm_guest_debug)
  #define KVM_IA64_VCPU_SET_STACK   _IOW(KVMIO,  0x9b, void *)

The argument to KVM_SET_GUEST_DEBUG is:

  struct kvm_guest_debug {
  	__u32 control;
  	__u32 pad;
  	struct kvm_guest_debug_arch arch;
  };

  struct kvm_guest_debug_arch {
  };

meaning that sizeof(struct kvm_guest_debug) == sizeof(void *) == 8
and KVM_SET_GUEST_DEBUG == KVM_IA64_VCPU_SET_STACK.

KVM_SET_GUEST_DEBUG is handled in virt/kvm/kvm_main.c before even calling
kvm_arch_vcpu_ioctl (which would have handled KVM_IA64_VCPU_SET_STACK),
so KVM_IA64_VCPU_SET_STACK would just return -EINVAL.
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

6b397158

kvm: remove CONFIG_X86 #ifdefs from files formerly shared with ia64 · 3bf58e9a
Radim Krcmar authored Nov 21, 2014
```
Signed-off-by: Radim Krcmar <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
```
3bf58e9a

kvm: x86: move ioapic.c and irq_comm.c back to arch/x86/ · 6ef768fa

Paolo Bonzini authored Nov 20, 2014

ia64 does not need them anymore.  Ack notifiers become x86-specific
too.
Suggested-by: Gleb Natapov <gleb@kernel.org>
Reviewed-by: Radim Krcmar <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

6ef768fa

20 Nov, 2014 2 commits

kvm: Documentation: remove ia64 · c32a4272

Tiejun Chen authored Nov 20, 2014

kvm/ia64 is gone, clean up Documentation too.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

c32a4272

KVM: ia64: remove · 003f7de6

Paolo Bonzini authored Nov 19, 2014

KVM for ia64 has been marked as broken not just once, but twice even,
and the last patch from the maintainer is now roughly 5 years old.
Time for it to rest in peace.
Acked-by: Gleb Natapov <gleb@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

003f7de6

19 Nov, 2014 9 commits

KVM: x86: Remove FIXMEs in emulate.c · 86619e7b

Nicholas Krause authored Nov 19, 2014

Remove FIXME comments about needing fault addresses to be returned.  These
are propaagated from walk_addr_generic to gva_to_gpa and from there to
ops->read_std and ops->write_std.
Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

86619e7b

KVM: emulator: remove duplicated limit check · 997b0412

Paolo Bonzini authored Nov 19, 2014

The check on the higher limit of the segment, and the check on the
maximum accessible size, is the same for both expand-up and
expand-down segments.  Only the computation of "lim" varies.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

997b0412

KVM: emulator: remove code duplication in register_address{,_increment} · 01485a22

Paolo Bonzini authored Nov 19, 2014

register_address has been a duplicate of address_mask ever since the
ancestor of __linearize was born in 90de84f5 (KVM: x86 emulator:
preserve an operand's segment identity, 2010-11-17).

However, we can put it to a better use by including the call to reg_read
in register_address.  Similarly, the call to reg_rmw can be moved to
register_address_increment.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

01485a22

KVM: x86: Move __linearize masking of la into switch · 31ff6488

Nadav Amit authored Nov 19, 2014

In __linearize there is check of the condition whether to check if masking of
the linear address is needed.  It occurs immediately after switch that
evaluates the same condition.  Merge them.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

31ff6488

KVM: x86: Non-canonical access using SS should cause #SS · abc7d8a4

Nadav Amit authored Nov 19, 2014

When SS is used using a non-canonical address, an #SS exception is generated on
real hardware.  KVM emulator causes a #GP instead. Fix it to behave as real x86
CPU.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

abc7d8a4

KVM: x86: Perform limit checks when assigning EIP · d50eaa18

Nadav Amit authored Nov 19, 2014

If branch (e.g., jmp, ret) causes limit violations, since the target IP >
limit, the #GP exception occurs before the branch. In other words, the RIP
pushed on the stack should be that of the branch and not that of the target.

To do so, we can call __linearize, with new EIP, which also saves us the code
which performs the canonical address checks. On the case of assigning an EIP >=
2^32 (when switching cs.l), we also safe, as __linearize will check the new EIP
does not exceed the limit and would trigger #GP(0) otherwise.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

d50eaa18

KVM: x86: Emulator performs privilege checks on __linearize · a7315d2f

Nadav Amit authored Nov 19, 2014

When segment is accessed, real hardware does not perform any privilege level
checks. In contrast, KVM emulator does. This causes some discrepencies from
real hardware. For instance, reading from readable code segment may fail due to
incorrect segment checks. In addition, it introduces unnecassary overhead.

To reference Intel SDM 5.5 ("Privilege Levels"): "Privilege levels are checked
when the segment selector of a segment descriptor is loaded into a segment
register." The SDM never mentions privilege level checks during memory access,
except for loading far pointers in section 5.10 ("Pointer Validation"). Those
are actually segment selector loads and are emulated in the similarily (i.e.,
regardless to __linearize checks).

This behavior was also checked using sysexit. A data-segment whose DPL=0 was
loaded, and after sysexit (CPL=3) it is still accessible.

Therefore, all the privilege level checks in __linearize are removed.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

a7315d2f

KVM: x86: Stack size is overridden by __linearize · 1c1c35ae

Nadav Amit authored Nov 19, 2014

When performing segmented-read/write in the emulator for stack operations, it
ignores the stack size, and uses the ad_bytes as indication for the pointer
size. As a result, a wrong address may be accessed.

To fix this behavior, we can remove the masking of address in __linearize and
perform it beforehand.  It is already done for the operands (so currently it is
inefficiently done twice). It is missing in two cases:
1. When using rip_relative
2. On fetch_bit_operand that changes the address.

This patch masks the address on these two occassions, and removes the masking
from __linearize.

Note that it does not mask EIP during fetch. In protected/legacy mode code
fetch when RIP >= 2^32 should result in #GP and not wrap-around. Since we make
limit checks within __linearize, this is the expected behavior.

Partial revert of commit 518547b3 (KVM: x86: Emulator does not
calculate address correctly, 2014-09-30).
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

1c1c35ae

KVM: x86: Revert NoBigReal patch in the emulator · 7d882ffa

Nadav Amit authored Nov 19, 2014

Commit 10e38fc7cab6 ("KVM: x86: Emulator flag for instruction that only support
16-bit addresses in real mode") introduced NoBigReal for instructions such as
MONITOR. Apparetnly, the Intel SDM description that led to this patch is
misleading. Since no instruction is using NoBigReal, it is safe to remove it,
we fully understand what the SDM means.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

7d882ffa

18 Nov, 2014 2 commits

kvm: x86: vmx: remove MMIO_MAX_GEN · 842bb26a

Tiejun Chen authored Nov 18, 2014

MMIO_MAX_GEN is the same as MMIO_GEN_MASK.  Use only one.
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

842bb26a

kvm: x86: vmx: cleanup handle_ept_violation · 81ed33e4

Tiejun Chen authored Nov 18, 2014

Instead, just use PFERR_{FETCH, PRESENT, WRITE}_MASK
inside handle_ept_violation() for slightly better code.
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

81ed33e4

17 Nov, 2014 5 commits

KVM: x86: Fix lost interrupt on irr_pending race · f210f757

Nadav Amit authored Nov 16, 2014

apic_find_highest_irr assumes irr_pending is set if any vector in APIC_IRR is
set.  If this assumption is broken and apicv is disabled, the injection of
interrupts may be deferred until another interrupt is delivered to the guest.
Ultimately, if no other interrupt should be injected to that vCPU, the pending
interrupt may be lost.

commit 56cc2406 ("KVM: nVMX: fix "acknowledge interrupt on exit" when APICv
is in use") changed the behavior of apic_clear_irr so irr_pending is cleared
after setting APIC_IRR vector. After this commit, if apic_set_irr and
apic_clear_irr run simultaneously, a race may occur, resulting in APIC_IRR
vector set, and irr_pending cleared. In the following example, assume a single
vector is set in IRR prior to calling apic_clear_irr:

apic_set_irr				apic_clear_irr
------------				--------------
apic->irr_pending = true;
					apic_clear_vector(...);
					vec = apic_search_irr(apic);
					// => vec == -1
apic_set_vector(...);
					apic->irr_pending = (vec != -1);
					// => apic->irr_pending == false

Nonetheless, it appears the race might even occur prior to this commit:

apic_set_irr				apic_clear_irr
------------				--------------
apic->irr_pending = true;
					apic->irr_pending = false;
					apic_clear_vector(...);
					if (apic_search_irr(apic) != -1)
						apic->irr_pending = true;
					// => apic->irr_pending == false
apic_set_vector(...);

Fixing this issue by:
1. Restoring the previous behavior of apic_clear_irr: clear irr_pending, call
   apic_clear_vector, and then if APIC_IRR is non-zero, set irr_pending.
2. On apic_set_irr: first call apic_set_vector, then set irr_pending.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

f210f757

KVM: compute correct map even if all APICs are software disabled · a3e339e1

Paolo Bonzini authored Nov 06, 2014

Logical destination mode can be used to send NMI IPIs even when all
APICs are software disabled, so if all APICs are software disabled we
should still look at the DFRs.

So the DFRs should all be the same, even if some or all APICs are
software disabled.  However, the SDM does not say this, so tweak
the logic as follows:

- if one APIC is enabled and has LDR != 0, use that one to build the map.
This picks the right DFR in case an OS is only setting it for the
software-enabled APICs, or in case an OS is using logical addressing
on some APICs while leaving the rest in reset state (using LDR was
suggested by Radim).

- if all APICs are disabled, pick a random one to build the map.
We use the last one with LDR != 0 for simplicity.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

a3e339e1

KVM: x86: Software disabled APIC should still deliver NMIs · 173beedc

Nadav Amit authored Nov 02, 2014

Currently, the APIC logical map does not consider VCPUs whose local-apic is
software-disabled.  However, NMIs, INIT, etc. should still be delivered to such
VCPUs. Therefore, the APIC mode should first be determined, and then the map,
considering all VCPUs should be constructed.

To address this issue, first find the APIC mode, and only then construct the
logical map.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

173beedc

kvm: simplify update_memslots invocation · 5cc15027

Paolo Bonzini authored Nov 14, 2014

The update_memslots invocation is only needed in one case.  Make
the code clearer by moving it to __kvm_set_memory_region, and
removing the wrapper around insert_memslot.
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

5cc15027

kvm: commonize allocation of the new memory slots · f2a81036

Paolo Bonzini authored Nov 14, 2014

The two kmemdup invocations can be unified.  I find that the new
placement of the comment makes it easier to see what happens.
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

f2a81036

14 Nov, 2014 3 commits

kvm: memslots: track id_to_index changes during the insertion sort · 8593176c

Paolo Bonzini authored Nov 14, 2014

This completes the optimization from the previous patch, by
removing the KVM_MEM_SLOTS_NUM-iteration loop from insert_memslot.
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

8593176c

kvm: memslots: replace heap sort with an insertion sort pass · 063584d4

Igor Mammedov authored Nov 13, 2014

memslots is a sorted array.  When a slot is changed, heapsort (lib/sort.c)
would take O(n log n) time to update it; an optimized insertion sort will
only cost O(n) on an array with just one item out of order.

Replace sort() with a custom sort that takes advantage of memslots usage
pattern and the known position of the changed slot.

performance change of 128 memslots insertions with gradually increasing
size (the worst case):

      heap sort   custom sort
max:  249747      2500 cycles

with custom sort alg taking ~98% less then original
update time.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

063584d4

kvm: x86: increase user memory slots to 509 · 1d4e7e3c

Igor Mammedov authored Nov 06, 2014

With the 3 private slots, this gives us 512 slots total.
Motivation for this is in addition to assigned devices
support more memory hotplug slots, where 1 slot is
used by a hotplugged memory stick.
It will allow to support upto 256 hotplug memory
slots and leave 253 slots for assigned devices and
other devices that use them.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

1d4e7e3c

13 Nov, 2014 1 commit

kvm: svm: move WARN_ON in svm_adjust_tsc_offset · d913b904

Chris J Arges authored Nov 12, 2014

When running the tsc_adjust kvm-unit-test on an AMD processor with the
IA32_TSC_ADJUST feature enabled, the WARN_ON in svm_adjust_tsc_offset can be
triggered. This WARN_ON checks for a negative adjustment in case __scale_tsc
is called; however it may trigger unnecessary warnings.

This patch moves the WARN_ON to trigger only if __scale_tsc will actually be
called from svm_adjust_tsc_offset. In addition make adj in kvm_set_msr_common
s64 since this can have signed values.
Signed-off-by: Chris J Arges <chris.j.arges@canonical.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

d913b904

12 Nov, 2014 2 commits

x86, kvm, vmx: Don't set LOAD_IA32_EFER when host and guest match · 54b98bff

Andy Lutomirski authored Nov 10, 2014

There's nothing to switch if the host and guest values are the same.
I am unable to find evidence that this makes any difference
whatsoever.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
[I could see a difference on Nehalem.  From 5 runs:

 userspace exit, guest!=host   12200 11772 12130 12164 12327
 userspace exit, guest=host    11983 11780 11920 11919 12040
 lightweight exit, guest!=host  3214  3220  3238  3218  3337
 lightweight exit, guest=host   3178  3193  3193  3187  3220

 This passes the t-test with 99% confidence for userspace exit,
 98.5% confidence for lightweight exit. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

54b98bff

x86, kvm, vmx: Always use LOAD_IA32_EFER if available · f6577a5f

Andy Lutomirski authored Nov 07, 2014

At least on Sandy Bridge, letting the CPU switch IA32_EFER is much
faster than switching it manually.

I benchmarked this using the vmexit kvm-unit-test (single run, but
GOAL multiplied by 5 to do more iterations):

Test Before After Change
cpuid 2000 1932 -3.40%
vmcall 1914 1817 -5.07%
mov_from_cr8 13 13 0.00%
mov_to_cr8 19 19 0.00%
inl_from_pmtimer 19164 10619 -44.59%
inl_from_qemu 15662 10302 -34.22%
inl_from_kernel 3916 3802 -2.91%
outl_to_kernel 2230 2194 -1.61%
mov_dr 172 176 2.33%
ipi (skipped) (skipped)
ipi+halt (skipped) (skipped)
ple-round-robin 13 13 0.00%
wr_tsc_adjust_msr 1920 1845 -3.91%
rd_tsc_adjust_msr 1892 1814 -4.12%
mmio-no-eventfd:pci-mem 16394 11165 -31.90%
mmio-wildcard-eventfd:pci-mem 4607 4645 0.82%
mmio-datamatch-eventfd:pci-mem 4601 4610 0.20%
portio-no-eventfd:pci-io 11507 7942 -30.98%
portio-wildcard-eventfd:pci-io 2239 2225 -0.63%
portio-datamatch-eventfd:pci-io 2250 2234 -0.71%

I haven't explicitly computed the significance of these numbers,
but this isn't subtle.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
[The results were reproducible on all of Nehalem, Sandy Bridge and
Ivy Bridge. The slowness of manual switching is because writing
to EFER with WRMSR triggers a TLB flush, even if the only bit you're
touching is SCE (so the page table format is not affected). Doing
the write as part of vmentry/vmexit, instead, does not flush the TLB,
probably because all processors that have EPT also have VPID. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

f6577a5f

10 Nov, 2014 1 commit

KVM: x86: fix warning on 32-bit compilation · ac146235

Paolo Bonzini authored Nov 10, 2014

PCIDs are only supported in 64-bit mode.  No need to clear bit 63
of CR3 unless the host is 64-bit.

Reported by Fengguang Wu's autobuilder.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

ac146235

08 Nov, 2014 7 commits

kvm: x86: add trace event for pvclock updates · ce1a5e60

David Matlack authored Nov 05, 2014

The new trace event records:
  * the id of vcpu being updated
  * the pvclock_vcpu_time_info struct being written to guest memory

This is useful for debugging pvclock bugs, such as the bug fixed by
"[PATCH] kvm: x86: Fix kvm clock versioning.".
Signed-off-by: David Matlack <dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

ce1a5e60

kvm: x86: Fix kvm clock versioning. · 09a0c3f1

Owen Hofmann authored Nov 03, 2014

kvm updates the version number for the guest paravirt clock structure by
incrementing the version of its private copy. It does not read the guest
version, so will write version = 2 in the first update for every new VM,
including after restoring a saved state. If guest state is saved during
reading the clock, it could read and accept struct fields and guest TSC
from two different updates. This changes the code to increment the guest
version and write it back.
Signed-off-by: Owen Hofmann <osh@google.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

09a0c3f1

KVM: x86: MOVNTI emulation min opsize is not respected · ed9aad21

Nadav Amit authored Nov 02, 2014

Commit 3b32004a ("KVM: x86: movnti minimum op size of 32-bit is not kept")
did not fully fix the minimum operand size of MONTI emulation. Still, MOVNTI
may be mistakenly performed using 16-bit opsize.

This patch add No16 flag to mark an instruction does not support 16-bits
operand size.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

ed9aad21

KVM: x86: update masterclock values on TSC writes · 7f187922

Marcelo Tosatti authored Nov 04, 2014

When the guest writes to the TSC, the masterclock TSC copy must be
updated as well along with the TSC_OFFSET update, otherwise a negative
tsc_timestamp is calculated at kvm_guest_time_update.

Once "if (!vcpus_matched && ka->use_master_clock)" is simplified to
"if (ka->use_master_clock)", the corresponding "if (!ka->use_master_clock)"
becomes redundant, so remove the do_request boolean and collapse
everything into a single condition.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

7f187922

KVM: x86: Return UNHANDLABLE on unsupported SYSENTER · b2c9d43e

Nadav Amit authored Nov 02, 2014

Now that KVM injects #UD on "unhandlable" error, it makes better sense to
return such error on sysenter instead of directly injecting #UD to the guest.
This allows to track more easily the unhandlable cases the emulator does not
support.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

b2c9d43e

KVM: x86: Warn on APIC base relocation · db324fe6

Nadav Amit authored Nov 02, 2014

APIC base relocation is unsupported by KVM. If anyone uses it, the least should
be to report a warning in the hypervisor.

Note that KVM-unit-tests uses this feature for some reason, so running the
tests triggers the warning.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

db324fe6

KVM: x86: Emulator mis-decodes VEX instructions on real-mode · d14cb5df

Nadav Amit authored Nov 02, 2014

Commit 7fe864dc (KVM: x86: Mark VEX-prefix instructions emulation as
unimplemented, 2014-06-02) marked VEX instructions as such in protected
mode.  VEX-prefix instructions are not supported relevant on real-mode
and VM86, but should cause #UD instead of being decoded as LES/LDS.

Fix this behaviour to be consistent with real hardware.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
[Check for mod == 3, rather than 2 or 3. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

d14cb5df

07 Nov, 2014 3 commits

KVM: x86: Remove redundant and incorrect cpl check on task-switch · 2c2ca2d1

Nadav Amit authored Nov 02, 2014

Task-switch emulation checks the privilege level prior to performing the
task-switch. This check is incorrect in the case of task-gates, in which the
tss.dpl is ignored, and can cause superfluous exceptions. Moreover this check
is unnecassary, since the CPU checks the privilege levels prior to exiting.
Intel SDM 25.4.2 says "If CALL or JMP accesses a TSS descriptor directly
outside IA-32e mode, privilege levels are checked on the TSS descriptor" prior
to exiting. AMD 15.14.1 says "The intercept is checked before the task switch
takes place but after the incoming TSS and task gate (if one was involved) have
been checked for correctness."

This patch removes the CPL checks for CALL and JMP.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

2c2ca2d1

KVM: x86: Inject #GP when loading system segments with non-canonical base · 9a9abf6b

Nadav Amit authored Nov 02, 2014

When emulating LTR/LDTR/LGDT/LIDT, #GP should be injected if the base is
non-canonical. Otherwise, VM-entry will fail.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

9a9abf6b

KVM: x86: Combine the lgdt and lidt emulation logic · 5b7f6a1e

Nadav Amit authored Nov 02, 2014

LGDT and LIDT emulation logic is almost identical. Merge the logic into a
single point to avoid redundancy. This will be used by the next patch that
will ensure the bases of the loaded GDTR and IDTR are canonical.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

5b7f6a1e