Commits · e08e833616f7eefebdacfd1d743d80ff3c3b2585 · Kirill Smelkov / linux

05 Dec, 2014 11 commits

KVM: cpuid: recompute CPUID 0xD.0:EBX,ECX · e08e8336

Radim Krčmář authored Dec 04, 2014

We reused host EBX and ECX, but KVM might not support all features;
emulated XSAVE size should be smaller.

EBX depends on unknown XCR0, so we default to ECX.

SDM CPUID (EAX = 0DH, ECX = 0):
 EBX Bits 31-00: Maximum size (bytes, from the beginning of the
     XSAVE/XRSTOR save area) required by enabled features in XCR0. May
     be different than ECX if some features at the end of the XSAVE save
     area are not enabled.

 ECX Bit 31-00: Maximum size (bytes, from the beginning of the
     XSAVE/XRSTOR save area) of the XSAVE/XRSTOR save area required by
     all supported features in the processor, i.e all the valid bit
     fields in XCR0.
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Tested-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

e08e8336

kvm: vmx: add nested virtualization support for xsaves · 81dc01f7

Wanpeng Li authored Dec 04, 2014

Add nested virtualization support for xsaves.
Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

81dc01f7

kvm: vmx: add MSR logic for XSAVES · 20300099

Wanpeng Li authored Dec 02, 2014

Add logic to get/set the XSS model-specific register.
Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

20300099

kvm: x86: handle XSAVES vmcs and vmexit · f53cd63c

Wanpeng Li authored Dec 02, 2014

Initialize the XSS exit bitmap.  It is zero so there should be no XSAVES
or XRSTORS exits.
Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

f53cd63c

KVM: cpuid: mask more bits in leaf 0xd and subleaves · 404e0a19

Paolo Bonzini authored Dec 04, 2014

- EAX=0Dh, ECX=1: output registers EBX/ECX/EDX are reserved.

- EAX=0Dh, ECX>1: output register ECX bit 0 is clear for all the CPUID
leaves we support, because variable "supported" comes from XCR0 and not
XSS.  Bits above 0 are reserved, so ECX is overall zero.  Output register
EDX is reserved.

Source: Intel Architecture Instruction Set Extensions Programming
Reference, ref. number 319433-022
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Tested-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

404e0a19

KVM: cpuid: set CPUID(EAX=0xd,ECX=1).EBX correctly · 412a3c41

Paolo Bonzini authored Dec 03, 2014

This is the size of the XSAVES area.  This starts providing guest support
for XSAVES (with no support yet for supervisor states, i.e. XSS == 0
always in guests for now).

Wanpeng Li suggested testing XSAVEC as well as XSAVES, since in practice
no real processor exists that only has one of them, and there is no
other way for userspace programs to compute the area of the XSAVEC
save area.  CPUID(EAX=0xd,ECX=1).EBX provides an upper bound.
Suggested-by: Radim Krčmář <rkrcmar@redhat.com>
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Tested-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

412a3c41

kvm: x86: Add kvm_x86_ops hook that enables XSAVES for guest · 55412b2e

Wanpeng Li authored Dec 02, 2014

Expose the XSAVES feature to the guest if the kvm_x86_ops say it is
available.
Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

55412b2e

KVM: x86: use F() macro throughout cpuid.c · 5c404cab

Paolo Bonzini authored Dec 03, 2014

For code that deals with cpuid, this makes things a bit more readable.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

5c404cab

KVM: x86: support XSAVES usage in the host · df1daba7

Paolo Bonzini authored Nov 21, 2014

Userspace is expecting non-compacted format for KVM_GET_XSAVE, but
struct xsave_struct might be using the compacted format.  Convert
in order to preserve userspace ABI.

Likewise, userspace is passing non-compacted format for KVM_SET_XSAVE
but the kernel will pass it to XRSTORS, and we need to convert back.

Fixes: f31a9f7c
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: stable@vger.kernel.org
Cc: H. Peter Anvin <hpa@linux.intel.com>
Tested-by: Nadav Amit <namit@cs.technion.ac.il>
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

df1daba7

x86: export get_xsave_addr · ba7b3920

Paolo Bonzini authored Nov 24, 2014

get_xsave_addr is the API to access XSAVE states, and KVM would
like to use it.  Export it.

Cc: stable@vger.kernel.org
Cc: x86@kernel.org
Cc: H. Peter Anvin <hpa@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

ba7b3920

Merge tag 'kvm-s390-next-20141204' of... · 28145be0

Paolo Bonzini authored Dec 05, 2014

Merge tag 'kvm-s390-next-20141204' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD

KVM: s390: Fixups for kvm/next (3.19)

Here we have two fixups of the latest interrupt rework and
one architectural fixup.

28145be0

04 Dec, 2014 18 commits

KVM: s390: clean up return code handling in irq delivery code · 99e20009

Jens Freimann authored Dec 01, 2014

Instead of returning a possibly random or'ed together value, let's
always return -EFAULT if rc is set.
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

99e20009

KVM: s390: use atomic bitops to access pending_irqs bitmap · 9185124e

Jens Freimann authored Dec 01, 2014

Currently we use a mixture of atomic/non-atomic bitops
and the local_int spin lock to protect the pending_irqs bitmap
and interrupt payload data.

We need to use atomic bitops for the pending_irqs bitmap everywhere
and in addition acquire the local_int lock where interrupt data needs
to be protected.
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

9185124e

KVM: s390: some ext irqs have to clear the ext cpu addr · 467fc298

David Hildenbrand authored Dec 01, 2014

The cpu address of a source cpu (responsible for an external irq) is only to
be stored if bit 6 of the ext irq code is set.

If bit 6 is not set, it is to be zeroed out.

The special external irq code used for virtio and pfault uses the cpu addr as a
parameter field. As bit 6 is set, this implementation is correct.
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

467fc298

KVM: track pid for VCPU only on KVM_RUN ioctl · 7a72f7a1

Christian Borntraeger authored Aug 05, 2014

We currently track the pid of the task that runs the VCPU in vcpu_load.
If a yield to that VCPU is triggered while the PID of the wrong thread
is active, the wrong thread might receive a yield, but this will most
likely not help the executing thread at all.  Instead, if we only track
the pid on the KVM_RUN ioctl, there are two possibilities:

1) the thread that did a non-KVM_RUN ioctl is holding a mutex that
the VCPU thread is waiting for.  In this case, the VCPU thread is not
runnable, but we also do not do a wrong yield.

2) the thread that did a non-KVM_RUN ioctl is sleeping, or doing
something that does not block the VCPU thread.  In this case, the
VCPU thread can receive the directed yield correctly.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
CC: Rik van Riel <riel@redhat.com>
CC: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
CC: Michael Mueller <mimu@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

7a72f7a1

KVM: don't check for PF_VCPU when yielding · eed6e79d

David Hildenbrand authored Nov 25, 2014

kvm_enter_guest() has to be called with preemption disabled and will
set PF_VCPU.  Current code takes PF_VCPU as a hint that the VCPU thread
is running and therefore needs no yield.

However, the check on PF_VCPU is wrong on s390, where preemption has
to stay enabled in order to correctly process page faults.  Thus,
s390 reenables preemption and starts to execute the guest.  The thread
might be scheduled out between kvm_enter_guest() and kvm_exit_guest(),
resulting in PF_VCPU being set but not being run.  When this happens,
the opportunity for directed yield is missed.

However, this check is done already in kvm_vcpu_on_spin before calling
kvm_vcpu_yield_loop:

        if (!ACCESS_ONCE(vcpu->preempted))
                continue;

so the check on PF_VCPU is superfluous in general, and this patch
removes it.
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

eed6e79d

kvm: optimize GFN to memslot lookup with large slots amount · 9c1a5d38

Igor Mammedov authored Dec 01, 2014

Current linear search doesn't scale well when
large amount of memslots is used and looked up slot
is not in the beginning memslots array.
Taking in account that memslots don't overlap, it's
possible to switch sorting order of memslots array from
'npages' to 'base_gfn' and use binary search for
memslot lookup by GFN.

As result of switching to binary search lookup times
are reduced with large amount of memslots.

Following is a table of search_memslot() cycles
during WS2008R2 guest boot.

                         boot,          boot + ~10 min
                         mostly same    of using it,
                         slot lookup    randomized lookup
                max      average        average
                cycles   cycles         cycles

13 slots      : 1450       28           30

13 slots      : 1400       30           40
binary search

117 slots     : 13000      30           460

117 slots     : 2000       35           180
binary search
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

9c1a5d38

kvm: change memslot sorting rule from size to GFN · 0e60b079

Igor Mammedov authored Dec 01, 2014

it will allow to use binary search for GFN -> memslot
lookups, reducing lookup cost with large slots amount.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

0e60b079

kvm: search_memslots: add simple LRU memslot caching · d4ae84a0

Igor Mammedov authored Dec 01, 2014

In typical guest boot workload only 2-3 memslots are used
extensively, and at that it's mostly the same memslot
lookup operation.

Adding LRU cache improves average lookup time from
46 to 28 cycles (~40%) for this workload.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

d4ae84a0

kvm: update_memslots: drop not needed check for the same slot · 7f379cff

Igor Mammedov authored Dec 01, 2014

UP/DOWN shift loops will shift array in needed
direction and stop at place where new slot should
be placed regardless of old slot size.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

7f379cff

kvm: update_memslots: drop not needed check for the same number of pages · 5a38b6e6

Igor Mammedov authored Dec 01, 2014

if number of pages haven't changed sorting algorithm
will do nothing, so there is no need to do extra check
to avoid entering sorting logic.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

5a38b6e6

KVM: x86: allow 256 logical x2APICs again · 45c3094a

Radim Krčmář authored Nov 27, 2014

While fixing an x2apic bug,
 17d68b76 KVM: x86: fix guest-initiated crash with x2apic (CVE-2013-6376)
we've made only one cluster available.  This means that the amount of
logically addressible x2APICs was reduced to 16 and VCPUs kept
overwriting themselves in that region, so even the first cluster wasn't
set up correctly.

This patch extends x2APIC support back to the logical_map's limit, and
keeps the CVE fixed as messages for non-present APICs are dropped.
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

45c3094a

KVM: x86: check bounds of APIC maps · 25995e5b

Radim Krčmář authored Nov 27, 2014

They can't be violated now, but play it safe for the future.
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

25995e5b

KVM: x86: fix APIC physical destination wrapping · fa834e91

Radim Krčmář authored Nov 27, 2014

x2apic allows destinations > 0xff and we don't want them delivered to
lower APICs.  They are correctly handled by doing nothing.
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

fa834e91

KVM: x86: deliver phys lowest-prio · 085563fb

Radim Krčmář authored Nov 27, 2014

Physical mode can't address more than one APIC, but lowest-prio is
allowed, so we just reuse our paths.

SDM 10.6.2.1 Physical Destination:
  Also, for any non-broadcast IPI or I/O subsystem initiated interrupt
  with lowest priority delivery mode, software must ensure that APICs
  defined in the interrupt address are present and enabled to receive
  interrupts.

We could warn on top of that.
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

085563fb

KVM: x86: don't retry hopeless APIC delivery · 698f9755

Radim Krčmář authored Nov 27, 2014

False from kvm_irq_delivery_to_apic_fast() means that we don't handle it
in the fast path, but we still return false in cases that were perfectly
handled, fix that.
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

698f9755

KVM: x86: use MSR_ICR instead of a number · decdc283

Radim Krčmář authored Nov 26, 2014

0x830 MSR is 0x300 xAPIC MMIO, which is MSR_ICR.

Signed-off-by: Radim KrÄmÃ¡Å™ <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

decdc283

KVM: x86: Fix reserved x2apic registers · c69d3d9b

Nadav Amit authored Nov 26, 2014

x2APIC has no registers for DFR and ICR2 (see Intel SDM 10.12.1.2 "x2APIC
Register Address Space"). KVM needs to cause #GP on such accesses.

Fix it (DFR and ICR2 on read, ICR2 on write, DFR already handled on writes).
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

c69d3d9b

KVM: x86: Generate #UD when memory operand is required · 39f062ff

Nadav Amit authored Nov 26, 2014

Certain x86 instructions that use modrm operands only allow memory operand
(i.e., mod012), and cause a #UD exception otherwise. KVM ignores this fact.
Currently, the instructions that are such and are emulated by KVM are MOVBE,
MOVNTPS, MOVNTPD and MOVNTI.  MOVBE is the most blunt example, since it may be
emulated by the host regardless of MMIO.

The fix introduces a new group for handling such instructions, marking mod3 as
illegal instruction.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

39f062ff

03 Dec, 2014 1 commit

Merge tag 'kvm-s390-next-20141128' of... · be06b6be

Paolo Bonzini authored Dec 03, 2014

Merge tag 'kvm-s390-next-20141128' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD

KVM: s390: Several fixes,cleanups and reworks

Here is a bunch of fixes that deal mostly with architectural compliance:
- interrupt priorities
- interrupt handling
- intruction exit handling

We also provide a helper function for getting the guest visible storage key.

be06b6be

28 Nov, 2014 10 commits

KVM: s390: allow injecting all kinds of machine checks · fc2020cf

Jens Freimann authored Aug 13, 2014

Allow to specify CR14, logout area, external damage code
and failed storage address.

Since more then one machine check can be indicated to the guest at
a time we need to combine all indication bits with already pending
requests.
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

fc2020cf

KVM: s390: handle pending local interrupts via bitmap · 383d0b05

Jens Freimann authored Jul 29, 2014

This patch adapts handling of local interrupts to be more compliant with
the z/Architecture Principles of Operation and introduces a data
structure
which allows more efficient handling of interrupts.

* get rid of li->active flag, use bitmap instead
* Keep interrupts in a bitmap instead of a list
* Deliver interrupts in the order of their priority as defined in the
  PoP
* Use a second bitmap for sigp emergency requests, as a CPU can have
  one request pending from every other CPU in the system.
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

383d0b05

KVM: s390: add bitmap for handling cpu-local interrupts · c0e6159d

Jens Freimann authored Jul 29, 2013

Adds a bitmap to the vcpu structure which is used to keep track
of local pending interrupts. Also add enum with all interrupt
types sorted in order of priority (highest to lowest)
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

c0e6159d

KVM: s390: refactor interrupt delivery code · 0fb97abe

Jens Freimann authored Jul 29, 2014

Move delivery code for cpu-local interrupt from the huge do_deliver_interrupt()
to smaller functions which handle one type of interrupt.
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

0fb97abe

KVM: s390: add defines for virtio and pfault interrupt code · 60f90a14

Jens Freimann authored Nov 10, 2014

Get rid of open coded value for virtio and pfault completion interrupts.
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

60f90a14

KVM: s390: external param not valid for cpu timer and ckc · af43eb2f

David Hildenbrand authored Nov 07, 2014

The 32bit external interrupt parameter is only valid for timing-alert and
service-signal interrupts.
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

af43eb2f

KVM: s390: refactor interrupt injection code · 0146a7b0

Jens Freimann authored Jul 28, 2014

In preparation for the rework of the local interrupt injection code,
factor out injection routines from kvm_s390_inject_vcpu().
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

0146a7b0

KVM: S390: Create helper function get_guest_storage_key · 9fcf93b5

Jason J. Herne authored Sep 23, 2014

Define get_guest_storage_key which can be used to get the value of a guest
storage key. This compliments the functionality provided by the helper function
set_guest_storage_key. Both functions are needed for live migration of s390
guests that use storage keys.
Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

9fcf93b5

KVM: s390: trigger the right CPU exit for floating interrupts · da00fcbd

Christian Borntraeger authored Nov 21, 2014

When injecting a floating interrupt and no CPU is idle we
kick one CPU to do an external exit. In case of I/O we
should trigger an I/O exit instead. This does not matter
for Linux guests as external and I/O interrupts are
enabled/disabled at the same time, but play safe anyway.

The same holds true for machine checks. Since there is no
special exit, just reuse the generic stop exit. The injection
code inside the VCPU loop will recheck anyway and rearm the
proper exits (e.g. control registers) if necessary.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>

da00fcbd

KVM: s390: Fix rewinding of the PSW pointing to an EXECUTE instruction · 04b41acd

Thomas Huth authored Nov 12, 2014

A couple of our interception handlers rewind the PSW to the beginning
of the instruction to run the intercepted instruction again during the
next SIE entry. This normally works fine, but there is also the
possibility that the instruction did not get run directly but via an
EXECUTE instruction.
In this case, the PSW does not point to the instruction that caused the
interception, but to the EXECUTE instruction! So we've got to rewind the
PSW to the beginning of the EXECUTE instruction instead.
This is now accomplished with a new helper function kvm_s390_rewind_psw().
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

04b41acd