Commits · fa6b7fe9928d50444c29b29c8563746c6b0c6299 · nexedi / linux

07 Jan, 2013 7 commits

KVM: s390: Add support for channel I/O instructions. · fa6b7fe9

Cornelia Huck authored Dec 20, 2012

Add a new capability, KVM_CAP_S390_CSS_SUPPORT, which will pass
intercepts for channel I/O instructions to userspace. Only I/O
instructions interacting with I/O interrupts need to be handled
in-kernel:

- TEST PENDING INTERRUPTION (tpi) dequeues and stores pending
  interrupts entirely in-kernel.
- TEST SUBCHANNEL (tsch) dequeues pending interrupts in-kernel
  and exits via KVM_EXIT_S390_TSCH to userspace for subchannel-
  related processing.
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Reviewed-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

fa6b7fe9

KVM: s390: Base infrastructure for enabling capabilities. · d6712df9

Cornelia Huck authored Dec 20, 2012

Make s390 support KVM_ENABLE_CAP.
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Acked-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

d6712df9

KVM: s390: In-kernel handling of I/O instructions. · f379aae5

Cornelia Huck authored Dec 20, 2012

Explicitely catch all channel I/O related instructions intercepts
in the kernel and set condition code 3 for them.

This paves the way for properly handling these instructions later
on.

Note: This is not architecture compliant (the previous code wasn't
either) since setting cc 3 is not the correct thing to do for some
of these instructions. For Linux guests, however, it still has the
intended effect of stopping css probing.
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Reviewed-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

f379aae5

KVM: s390: Add support for machine checks. · 48a3e950

Cornelia Huck authored Dec 20, 2012

Add support for injecting machine checks (only repressible
conditions for now).

This is a bit more involved than I/O interrupts, for these reasons:

- Machine checks come in both floating and cpu varieties.
- We don't have a bit for machine checks enabling, but have to use
  a roundabout approach with trapping PSW changing instructions and
  watching for opened machine checks.
Reviewed-by: Alexander Graf <agraf@suse.de>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

48a3e950

KVM: s390: Support for I/O interrupts. · d8346b7d

Cornelia Huck authored Dec 20, 2012

Add support for handling I/O interrupts (standard, subchannel-related
ones and rudimentary adapter interrupts).

The subchannel-identifying parameters are encoded into the interrupt
type.

I/O interrupts are floating, so they can't be injected on a specific
vcpu.
Reviewed-by: Alexander Graf <agraf@suse.de>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

d8346b7d

KVM: s390: Decoding helper functions. · b1c571a5

Cornelia Huck authored Dec 20, 2012

Introduce helper functions for decoding the various base/displacement
instruction formats.
Reviewed-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

b1c571a5

KVM: s390: Constify intercept handler tables. · 77975357

Cornelia Huck authored Dec 20, 2012

These tables are never modified.
Reviewed-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

77975357

02 Jan, 2013 7 commits

KVM: VMX: handle IO when emulation is due to #GP in real mode. · 0ca1b4f4

Gleb Natapov authored Dec 20, 2012

With emulate_invalid_guest_state=0 if a vcpu is in real mode VMX can
enter the vcpu with smaller segment limit than guest configured.  If the
guest tries to access pass this limit it will get #GP at which point
instruction will be emulated with correct segment limit applied. If
during the emulation IO is detected it is not handled correctly. Vcpu
thread should exit to userspace to serve the IO, but it returns to the
guest instead.  Since emulation is not completed till userspace completes
the IO the faulty instruction is re-executed ad infinitum.

The patch fixes that by exiting to userspace if IO happens during
instruction emulation.
Reported-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

0ca1b4f4

KVM: VMX: Do not fix segment register during vcpu initialization. · d54d07b2

Gleb Natapov authored Dec 20, 2012

Segment registers will be fixed according to current emulation policy
during switching to real mode for the first time.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

d54d07b2

KVM: VMX: fix emulation of invalid guest state. · d99e4152

Gleb Natapov authored Dec 20, 2012

Currently when emulation of invalid guest state is enable
(emulate_invalid_guest_state=1) segment registers are still fixed for
entry to vm86 mode some times. Segment register fixing is avoided in
enter_rmode(), but vmx_set_segment() still does it unconditionally.
The patch fixes it.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

d99e4152

KVM: VMX: make rmode_segment_valid() more strict. · 89efbed0

Gleb Natapov authored Dec 20, 2012

Currently it allows entering vm86 mode if segment limit is greater than
0xffff and db bit is set. Both of those can cause incorrect execution of
instruction by cpu since in vm86 mode limit will be set to 0xffff and db
will be forced to 0.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

89efbed0

KVM: emulator: implement fninit, fnstsw, fnstcw · 045a282c

Gleb Natapov authored Dec 20, 2012

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

045a282c

KVM: emulator: drop RPL check from linearize() function · 3a78a4f4

Gleb Natapov authored Dec 20, 2012

According to Intel SDM Vol3 Section 5.5 "Privilege Levels" and 5.6
"Privilege Level Checking When Accessing Data Segments" RPL checking is
done during loading of a segment selector, not during data access. We
already do checking during segment selector loading, so drop the check
during data access. Checking RPL during data access triggers #GP if
after transition from real mode to protected mode RPL bits in a segment
selector are set.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

3a78a4f4

x86: kvm_para: fix typo in hypercall comments · 11393a07

Jesse Larrew authored Dec 10, 2012

Correct a typo in the comment explaining hypercalls.
Signed-off-by: Jesse Larrew <jlarrew@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

11393a07

24 Dec, 2012 1 commit

KVM: move the code that installs new slots array to a separate function. · 7ec4fb44

Gleb Natapov authored Dec 24, 2012

Move repetitive code sequence to a separate function.
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

7ec4fb44

23 Dec, 2012 9 commits

KVM: VMX: remove unneeded temporary variable from vmx_set_segment() · f924d66d
Gleb Natapov authored Dec 12, 2012
```
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
```
f924d66d

KVM: VMX: clean-up vmx_set_segment() · 1ecd50a9

Gleb Natapov authored Dec 12, 2012

Move all vm86_active logic into one place.
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

1ecd50a9

KVM: VMX: remove redundant code from vmx_set_segment() · 39dcfb95

Gleb Natapov authored Dec 12, 2012

Segment descriptor's base is fixed by call to fix_rmode_seg(). Not need
to do it twice.
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

39dcfb95

KVM: VMX: use fix_rmode_seg() to fix all code/data segments · beb853ff

Gleb Natapov authored Dec 12, 2012

The code for SS and CS does the same thing fix_rmode_seg() is doing.
Use it instead of hand crafted code.
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

beb853ff

KVM: VMX: return correct segment limit and flags for CS/SS registers in real mode · c6ad1153

Gleb Natapov authored Dec 12, 2012

VMX without unrestricted mode cannot virtualize real mode, so if
emulate_invalid_guest_state=0 kvm uses vm86 mode to approximate
it. Sometimes, when guest moves from protected mode to real mode, it
leaves segment descriptors in a state not suitable for use by vm86 mode
virtualization, so we keep shadow copy of segment descriptors for internal
use and load fake register to VMCS for guest entry to succeed. Till
now we kept shadow for all segments except SS and CS (for SS and CS we
returned parameters directly from VMCS), but since commit a5625189
emulator enforces segment limits in real mode. This causes #GP during move
from protected mode to real mode when emulator fetches first instruction
after moving to real mode since it uses incorrect CS base and limit to
linearize the %rip. Fix by keeping shadow for SS and CS too.
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

c6ad1153

KVM: VMX: relax check for CS register in rmode_segment_valid() · 0647f4aa

Gleb Natapov authored Dec 12, 2012

rmode_segment_valid() checks if segment descriptor can be used to enter
vm86 mode. VMX spec mandates that in vm86 mode CS register will be of
type data, not code. Lets allow guest entry with vm86 mode if the only
problem with CS register is incorrect type. Otherwise entire real mode
will be emulated.
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

0647f4aa

KVM: VMX: cleanup rmode_segment_valid() · 07f42f5f

Gleb Natapov authored Dec 12, 2012

Set segment fields explicitly instead of using  binary operations.

No behaviour changes.
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

07f42f5f

kvm: Fix memory slot generation updates · 116c14c0

Alex Williamson authored Dec 21, 2012

Previous patch "kvm: Minor memory slot optimization" (b7f69c55)
overlooked the generation field of the memory slots. Re-using the
original memory slots left us with with two slightly different memory
slots with the same generation. To fix this, make update_memslots()
take a new parameter to specify the last generation. This also makes
generation management more explicit to avoid such problems in the future.
Reported-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

116c14c0

KVM: remove a wrong hack of delivery PIT intr to vcpu0 · 871a069d

Yang Zhang authored Dec 12, 2012

This hack is wrong. The pin number of PIT is connected to
2 not 0. This means this hack never takes effect. So it is ok
to remove it.
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

871a069d

18 Dec, 2012 4 commits

KVM: s390: Add a channel I/O based virtio transport driver. · 7e64e059

Cornelia Huck authored Dec 14, 2012

Add a driver for kvm guests that matches virtual ccw devices provided
by the host as virtio bridge devices.

These virtio-ccw devices use a special set of channel commands in order
to perform virtio functions.
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Reviewed-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

7e64e059

s390/ccwdev: Include asm/schid.h. · 0abbe448

Cornelia Huck authored Dec 14, 2012

Get the definition of struct subchannel_id.
Reviewed-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

0abbe448

KVM: s390: Handle hosts not supporting s390-virtio. · 55c171a6

Cornelia Huck authored Dec 14, 2012

Running under a kvm host does not necessarily imply the presence of
a page mapped above the main memory with the virtio information;
however, the code includes a hard coded access to that page.

Instead, check for the presence of the page and exit gracefully
before we hit an addressing exception if it does not exist.
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Reviewed-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
cc: stable@vger.kernel.org
Signed-off-by: Gleb Natapov <gleb@redhat.com>

55c171a6

kvm: fix i8254 counter 0 wraparound · d4b06c2d

Nickolai Zeldovich authored Dec 15, 2012

The kvm i8254 emulation for counter 0 (but not for counters 1 and 2)
has at least two bugs in mode 0:

1. The OUT bit, computed by pit_get_out(), is never set high.

2. The counter value, computed by pit_get_count(), wraps back around to
   the initial counter value, rather than wrapping back to 0xFFFF
   (which is the behavior described in the comment in __kpit_elapsed,
   the behavior implemented by qemu, and the behavior observed on AMD
   hardware).

The bug stems from __kpit_elapsed computing the elapsed time mod the
initial counter value (stored as nanoseconds in ps->period).  This is both
unnecessary (none of the callers of kpit_elapsed expect the value to be
at most the initial counter value) and incorrect (it causes pit_get_count
to appear to wrap around to the initial counter value rather than 0xFFFF).
Removing this mod from __kpit_elapsed fixes both of the above bugs.
Signed-off-by: Nickolai Zeldovich <nickolai@csail.mit.edu>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

d4b06c2d

15 Dec, 2012 1 commit

KVM: remove unused variable. · e11ae1a1

Gleb Natapov authored Dec 14, 2012

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

e11ae1a1

14 Dec, 2012 11 commits

KVM: Increase user memory slots on x86 to 125 · 0f888f5a

Alex Williamson authored Dec 10, 2012

With the 3 private slots, this gives us a nice round 128 slots total.
The primary motivation for this is to support more assigned devices.
Each assigned device can theoretically use up to 8 slots (6 MMIO BARs,
1 ROM BAR, 1 spare for a split MSI-X table mapping) though it's far
more typical for a device to use 3-4 slots.  If we assume a typical VM
uses a dozen slots for non-assigned devices purposes, we should always
be able to support 14 worst case assigned devices or 28 to 37 typical
devices.
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

0f888f5a

KVM: struct kvm_memory_slot.id -> short · 1e702d9a

Alex Williamson authored Dec 10, 2012

We're currently offering a whopping 32 memory slots to user space, an
int is a bit excessive for storing this.  We would like to increase
our memslots, but SHRT_MAX should be more than enough.
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

1e702d9a

KVM: struct kvm_memory_slot.flags -> u32 · 6104f472

Alex Williamson authored Dec 10, 2012

struct kvm_userspace_memory_region.flags is a u32 with a comment that
bits 0 ~ 15 are visible to userspace and the other bits are reserved
for kvm internal use.  KVM_MEMSLOT_INVALID is the only internal use
flag and it has a comment that bits 16 ~ 31 are internally used and
the other bits are visible to userspace.

Therefore, let's define this as a u32 so we don't waste bytes on LP64
systems.  Move to the end of the struct for alignment.
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

6104f472

KVM: struct kvm_memory_slot.user_alloc -> bool · f82a8cfe

Alex Williamson authored Dec 10, 2012

There's no need for this to be an int, it holds a boolean.
Move to the end of the struct for alignment.
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

f82a8cfe

KVM: Make KVM_PRIVATE_MEM_SLOTS optional · 0743247f

Alex Williamson authored Dec 10, 2012

Seems like everyone copied x86 and defined 4 private memory slots
that never actually get used.  Even x86 only uses 3 of the 4.  These
aren't exposed so there's no need to add padding.
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

0743247f

KVM: Rename KVM_MEMORY_SLOTS -> KVM_USER_MEM_SLOTS · bbacc0c1

Alex Williamson authored Dec 10, 2012

It's easy to confuse KVM_MEMORY_SLOTS and KVM_MEM_SLOTS_NUM.  One is
the user accessible slots and the other is user + private.  Make this
more obvious.
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

bbacc0c1

KVM: Minor memory slot optimization · b7f69c55

Alex Williamson authored Dec 10, 2012

If a slot is removed or moved in the guest physical address space, we
first allocate and install a new slot array with the invalidated
entry. The old array is then freed. We then proceed to allocate yet
another slot array to install the permanent replacement. Re-use the
original array when this occurs and avoid the extra kfree/kmalloc.
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

b7f69c55

KVM: Fix iommu map/unmap to handle memory slot moves · e40f193f

Alex Williamson authored Dec 10, 2012

The iommu integration into memory slots expects memory slots to be
added or removed and doesn't handle the move case.  We can unmap
slots from the iommu after we mark them invalid and map them before
installing the final memslot array.  Also re-order the kmemdup vs
map so we don't leave iommu mappings if we get ENOMEM.
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

e40f193f

KVM: Check userspace_addr when modifying a memory slot · 9c695d42

Alex Williamson authored Dec 10, 2012

The API documents that only flags and guest physical memory space can
be modified on an existing slot, but we don't enforce that the
userspace address cannot be modified.  Instead we just ignore it.
This means that a user may think they've successfully moved both the
guest and user addresses, when in fact only the guest address changed.
Check and error instead.
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

9c695d42

KVM: Restrict non-existing slot state transitions · f0736cf0

Alex Williamson authored Dec 10, 2012

The API documentation states:

	When changing an existing slot, it may be moved in the guest
	physical memory space, or its flags may be modified.

An "existing slot" requires a non-zero npages (memory_size).  The only
transition we should therefore allow for a non-existing slot should be
to create the slot, which includes setting a non-zero memory_size.  We
currently allow calls to modify non-existing slots, which is pointless,
confusing, and possibly wrong.

With this we know that the invalidation path of __kvm_set_memory_region
is always for a delete or move and never for adding a zero size slot.
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

f0736cf0

KVM: inject ExtINT interrupt before APIC interrupts · f3200d00

Gleb Natapov authored Dec 10, 2012

According to Intel SDM Volume 3 Section 10.8.1 "Interrupt Handling with
the Pentium 4 and Intel Xeon Processors" and Section 10.8.2 "Interrupt
Handling with the P6 Family and Pentium Processors" ExtINT interrupts are
sent directly to the processor core for handling. Currently KVM checks
APIC before it considers ExtINT interrupts for injection which is
backwards from the spec. Make code behave according to the SDM.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Acked-by: "Zhang, Yang Z" <yang.z.zhang@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

f3200d00