Commits · d157f4a5155f4fbd0d1da66b3d2f504c13bd194d · nexedi / linux

29 Apr, 2013 9 commits

ARM: KVM: perform HYP initilization for hotplugged CPUs · d157f4a5

Marc Zyngier authored Apr 12, 2013

Now that we have the necessary infrastructure to boot a hotplugged CPU
at any point in time, wire a CPU notifier that will perform the HYP
init for the incoming CPU.

Note that this depends on the platform code and/or firmware to boot the
incoming CPU with HYP mode enabled and return to the kernel by following
the normal boot path (HYP stub installed).
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>

d157f4a5

ARM: KVM: switch to a dual-step HYP init code · 5a677ce0

Marc Zyngier authored Apr 12, 2013

Our HYP init code suffers from two major design issues:
- it cannot support CPU hotplug, as we tear down the idmap very early
- it cannot perform a TLB invalidation when switching from init to
  runtime mappings, as pages are manipulated from PL1 exclusively

The hotplug problem mandates that we keep two sets of page tables
(boot and runtime). The TLB problem mandates that we're able to
transition from one PGD to another while in HYP, invalidating the TLBs
in the process.

To be able to do this, we need to share a page between the two page
tables. A page that will have the same VA in both configurations. All we
need is a VA that has the following properties:
- This VA can't be used to represent a kernel mapping.
- This VA will not conflict with the physical address of the kernel text

The vectors page seems to satisfy this requirement:
- The kernel never maps anything else there
- The kernel text being copied at the beginning of the physical memory,
  it is unlikely to use the last 64kB (I doubt we'll ever support KVM
  on a system with something like 4MB of RAM, but patches are very
  welcome).

Let's call this VA the trampoline VA.

Now, we map our init page at 3 locations:
- idmap in the boot pgd
- trampoline VA in the boot pgd
- trampoline VA in the runtime pgd

The init scenario is now the following:
- We jump in HYP with four parameters: boot HYP pgd, runtime HYP pgd,
  runtime stack, runtime vectors
- Enable the MMU with the boot pgd
- Jump to a target into the trampoline page (remember, this is the same
  physical page!)
- Now switch to the runtime pgd (same VA, and still the same physical
  page!)
- Invalidate TLBs
- Set stack and vectors
- Profit! (or eret, if you only care about the code).

Note that we keep the boot mapping permanently (it is not strictly an
idmap anymore) to allow for CPU hotplug in later patches.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>

5a677ce0

ARM: KVM: rework HYP page table freeing · 4f728276

Marc Zyngier authored Apr 12, 2013

There is no point in freeing HYP page tables differently from Stage-2.
They now have the same requirements, and should be dealt with the same way.

Promote unmap_stage2_range to be The One True Way, and get rid of a number
of nasty bugs in the process (good thing we never actually called free_hyp_pmds
before...).
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>

4f728276

ARM: KVM: enforce maximum size for identity mapped code · 0394e1f6

Marc Zyngier authored Apr 12, 2013

We're about to move to an init procedure where we rely on the
fact that the init code fits in a single page. Make sure we
align the idmap text on a vector alignment, and that the code is
not bigger than a single page.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>

0394e1f6

ARM: KVM: move to a KVM provided HYP idmap · 2fb41059

Marc Zyngier authored Apr 12, 2013

After the HYP page table rework, it is pretty easy to let the KVM
code provide its own idmap, rather than expecting the kernel to
provide it. It takes actually less code to do so.
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>

2fb41059

ARM: KVM: fix HYP mapping limitations around zero · 3562c76d

Marc Zyngier authored Apr 12, 2013

The current code for creating HYP mapping doesn't like to wrap
around zero, which prevents from mapping anything into the last
page of the virtual address space.

It doesn't take much effort to remove this limitation, making
the code more consistent with the rest of the kernel in the process.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>

3562c76d

ARM: KVM: simplify HYP mapping population · 6060df84

Marc Zyngier authored Apr 12, 2013

The way we populate HYP mappings is a bit convoluted, to say the least.
Passing a pointer around to keep track of the current PFN is quite
odd, and we end-up having two different PTE accessors for no good
reason.

Simplify the whole thing by unifying the two PTE accessors, passing
a pgprot_t around, and moving the various validity checks to the
upper layers.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>

6060df84

ARM: KVM: arch_timer: use symbolic constants · 372b7c1b

Mark Rutland authored Mar 27, 2013

In clocksource/arm_arch_timer.h we define useful symbolic constants.
Let's use them to make the KVM arch_timer code clearer.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: Christoffer Dall <cdall@cs.columbia.edu>
Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>

372b7c1b

ARM: KVM: add support for minimal host vs guest profiling · 210552c1

Marc Zyngier authored Mar 05, 2013

In order to be able to correctly profile what is happening on the
host, we need to be able to identify when we're running on the guest,
and log these events differently.

Perf offers a simple way to register callbacks into KVM. Mimic what
x86 does and enjoy being able to profile your KVM host.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>

210552c1

28 Apr, 2013 7 commits

KVM: nVMX: Skip PF interception check when queuing during nested run · 5a2892ce

Jan Kiszka authored Apr 28, 2013

While a nested run is pending, vmx_queue_exception is only called to
requeue exceptions that were previously picked up via
vmx_cancel_injection. Therefore, we must not check for PF interception
by L1, possibly causing a bogus nested vmexit.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

5a2892ce

KVM: x86: Increase the "hard" max VCPU limit · cbf64358

Chegu Vinod authored Apr 27, 2013

KVM guests today use 8bit APIC ids allowing for 256 ID's. Reserving one
ID for Broadcast interrupts should leave 255 ID's. In case of KVM there
is no need for reserving another ID for IO-APIC so the hard max limit for
VCPUS can be increased from 254 to 255. (This was confirmed by Gleb Natapov
http://article.gmane.org/gmane.comp.emulators.kvm.devel/99713 )
Signed-off-by: Chegu Vinod <chegu_vinod@hp.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

cbf64358

kvm: Allow build-time configuration of KVM device assignment · 2a5bab10

Alex Williamson authored Apr 16, 2013

We hope to at some point deprecate KVM legacy device assignment in
favor of VFIO-based assignment.  Towards that end, allow legacy
device assignment to be deconfigured.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Alexander Graf <agraf@suse.de>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

2a5bab10

Merge git://github.com/agraf/linux-2.6.git kvm-ppc-next into queue · 064d1afa
Gleb Natapov authored Apr 28, 2013

064d1afa

KVM: x86: Rework request for immediate exit · 730dca42

Jan Kiszka authored Apr 28, 2013

The VMX implementation of enable_irq_window raised
KVM_REQ_IMMEDIATE_EXIT after we checked it in vcpu_enter_guest. This
caused infinite loops on vmentry. Fix it by letting enable_irq_window
signal the need for an immediate exit via its return value and drop
KVM_REQ_IMMEDIATE_EXIT.

This issue only affects nested VMX scenarios.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

730dca42

kvm, svm: Fix typo in printk message · 6614c7d0

Borislav Petkov authored Apr 26, 2013

It is "exit_int_info". It is actually EXITINTINFO in the official docs
but we don't like screaming docs.
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

6614c7d0

KVM: VMX: remove unprintable characters from comment · cb0c8cda

Jan Kiszka authored Apr 27, 2013

Slipped in while copy&pasting from the SDM.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

cb0c8cda

26 Apr, 2013 24 commits

KVM: PPC: Book3S: Facilities to save/restore XICS presentation ctrler state · 8b78645c

Paul Mackerras authored Apr 17, 2013

This adds the ability for userspace to save and restore the state
of the XICS interrupt presentation controllers (ICPs) via the
KVM_GET/SET_ONE_REG interface.  Since there is one ICP per vcpu, we
simply define a new 64-bit register in the ONE_REG space for the ICP
state.  The state includes the CPU priority setting, the pending IPI
priority, and the priority and source number of any pending external
interrupt.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>

8b78645c

KVM: PPC: Book3S: Add support for ibm,int-on/off RTAS calls · d19bd862

Paul Mackerras authored Apr 17, 2013

This adds support for the ibm,int-on and ibm,int-off RTAS calls to the
in-kernel XICS emulation and corrects the handling of the saved
priority by the ibm,set-xive RTAS call.  With this, ibm,int-off sets
the specified interrupt's priority in its saved_priority field and
sets the priority to 0xff (the least favoured value).  ibm,int-on
restores the saved_priority to the priority field, and ibm,set-xive
sets both the priority and the saved_priority to the specified
priority value.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>

d19bd862

KVM: PPC: Book3S HV: Improve real-mode handling of external interrupts · 4619ac88

Paul Mackerras authored Apr 17, 2013

This streamlines our handling of external interrupts that come in
while we're in the guest. First, when waking up a hardware thread
that was napping, we split off the "napping due to H_CEDE" case
earlier, and use the code that handles an external interrupt (0x500)
in the guest to handle that too. Secondly, the code that handles
those external interrupts now checks if any other thread is exiting
to the host before bouncing an external interrupt to the guest, and
also checks that there is actually an external interrupt pending for
the guest before setting the LPCR MER bit (mediated external request).

This also makes sure that we clear the "ceded" flag when we handle a
wakeup from cede in real mode, and fixes a potential infinite loop
in kvmppc_run_vcpu() which can occur if we ever end up with the ceded
flag set but MSR[EE] off.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>

4619ac88

KVM: PPC: Book3S HV: Add support for real mode ICP in XICS emulation · e7d26f28

Benjamin Herrenschmidt authored Apr 17, 2013

This adds an implementation of the XICS hypercalls in real mode for HV
KVM, which allows us to avoid exiting the guest MMU context on all
threads for a variety of operations such as fetching a pending
interrupt, EOI of messages, IPIs, etc.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>

e7d26f28

KVM: PPC: Book3S HV: Speed up wakeups of CPUs on HV KVM · 54695c30

Benjamin Herrenschmidt authored Apr 17, 2013

Currently, we wake up a CPU by sending a host IPI with
smp_send_reschedule() to thread 0 of that core, which will take all
threads out of the guest, and cause them to re-evaluate their
interrupt status on the way back in.

This adds a mechanism to differentiate real host IPIs from IPIs sent
by KVM for guest threads to poke each other, in order to target the
guest threads precisely when possible and avoid that global switch of
the core to host state.

We then use this new facility in the in-kernel XICS code.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>

54695c30

KVM: PPC: Book3S: Add kernel emulation for the XICS interrupt controller · bc5ad3f3

Benjamin Herrenschmidt authored Apr 17, 2013

This adds in-kernel emulation of the XICS (eXternal Interrupt
Controller Specification) interrupt controller specified by PAPR, for
both HV and PR KVM guests.

The XICS emulation supports up to 1048560 interrupt sources.
Interrupt source numbers below 16 are reserved; 0 is used to mean no
interrupt and 2 is used for IPIs.  Internally these are represented in
blocks of 1024, called ICS (interrupt controller source) entities, but
that is not visible to userspace.

Each vcpu gets one ICP (interrupt controller presentation) entity,
used to store the per-vcpu state such as vcpu priority, pending
interrupt state, IPI request, etc.

This does not include any API or any way to connect vcpus to their
ICP state; that will be added in later patches.

This is based on an initial implementation by Michael Ellerman
<michael@ellerman.id.au> reworked by Benjamin Herrenschmidt and
Paul Mackerras.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
[agraf: fix typo, add dependency on !KVM_MPIC]
Signed-off-by: Alexander Graf <agraf@suse.de>

bc5ad3f3

KVM: PPC: Book3S: Add infrastructure to implement kernel-side RTAS calls · 8e591cb7

Michael Ellerman authored Apr 17, 2013

For pseries machine emulation, in order to move the interrupt
controller code to the kernel, we need to intercept some RTAS
calls in the kernel itself.  This adds an infrastructure to allow
in-kernel handlers to be registered for RTAS services by name.
A new ioctl, KVM_PPC_RTAS_DEFINE_TOKEN, then allows userspace to
associate token values with those service names.  Then, when the
guest requests an RTAS service with one of those token values, it
will be handled by the relevant in-kernel handler rather than being
passed up to userspace as at present.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
[agraf: fix warning]
Signed-off-by: Alexander Graf <agraf@suse.de>

8e591cb7

kvm/ppc/mpic: Eliminate mmio_mapped · 91194919

Scott Wood authored Apr 25, 2013

We no longer need to keep track of this now that MPIC destruction
always happens either during VM destruction (after MMIO has been
destroyed) or during a failed creation (before the fd has been exposed
to userspace, and thus before the MMIO region could have been
registered).
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

91194919

kvm: destroy emulated devices on VM exit · 07f0a7bd

Scott Wood authored Apr 25, 2013

The hassle of getting refcounting right was greater than the hassle
of keeping a list of devices to destroy on VM exit.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

07f0a7bd

KVM: IA64: Carry non-ia64 changes into ia64 · 22e64024

Alexander Graf authored Apr 25, 2013

We changed a few things in non-ia64 code paths. This patch blindly applies
the changes to the ia64 code as well, hoping it proves useful in case anyone
revives the ia64 kvm code.
Signed-off-by: Alexander Graf <agraf@suse.de>

22e64024

KVM: PPC: MPIC: Restrict to e500 platforms · 447a03c0

Alexander Graf authored Apr 17, 2013

The code as is doesn't make any sense on non-e500 platforms. Restrict it
there, so that people don't get wrong ideas on what would actually work.

This patch should get reverted as soon as it's possible to either run e500
guests on non-e500 hosts or the MPIC emulation gains support for non-e500
modes.
Signed-off-by: Alexander Graf <agraf@suse.de>

447a03c0

KVM: PPC: MPIC: Add support for KVM_IRQ_LINE · 5efdb4be

Alexander Graf authored Apr 17, 2013

Now that all pieces are in place for reusing generic irq infrastructure,
we can copy x86's implementation of KVM_IRQ_LINE irq injection and simply
reuse it for PPC, as it will work there just as well.
Signed-off-by: Alexander Graf <agraf@suse.de>

5efdb4be

KVM: PPC: Support irq routing and irqfd for in-kernel MPIC · de9ba2f3

Alexander Graf authored Apr 16, 2013

Now that all the irq routing and irqfd pieces are generic, we can expose
real irqchip support to all of KVM's internal helpers.

This allows us to use irqfd with the in-kernel MPIC.
Signed-off-by: Alexander Graf <agraf@suse.de>

de9ba2f3

kvm/ppc/mpic: add KVM_CAP_IRQ_MPIC · eb1e4f43