Commits · ee53e560a8f52cbc1fba877fdf4acffb0c163f29 · Kirill Smelkov / linux

13 Feb, 2013 4 commits

booke: Added DBCR4 SPR number · ee53e560

Bharat Bhushan authored Jan 15, 2013

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

ee53e560

KVM: PPC: booke: Allow multiple exception types · 1d542d9c

Bharat Bhushan authored Jan 15, 2013

Current kvmppc_booke_handlers uses the same macro (KVM_HANDLER) and
all handlers are considered to be the same size. This will not be
the case if we want to use different macros for different handlers.

This patch improves the kvmppc_booke_handler so that it can
support different macros for different handlers.
Signed-off-by: Liu Yu <yu.liu@freescale.com>
[bharat.bhushan@freescale.com: Substantial changes]
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

1d542d9c

KVM: PPC: booke: use vcpu reference from thread_struct · ffe129ec

Bharat Bhushan authored Jan 15, 2013

Like other places, use thread_struct to get vcpu reference.
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

ffe129ec

Merge commit 'origin/next' into kvm-ppc-next · dd92d6f2
Alexander Graf authored Feb 13, 2013

dd92d6f2

06 Feb, 2013 4 commits

KVM: VMX: add missing exit names to VMX_EXIT_REASONS array · b0da5bec
Gleb Natapov authored Feb 03, 2013
```
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
```
b0da5bec

KVM: VMX: disable SMEP feature when guest is in non-paging mode · c08800a5

Dongxiao Xu authored Feb 04, 2013

SMEP is disabled if CPU is in non-paging mode in hardware.
However KVM always uses paging mode to emulate guest non-paging
mode with TDP. To emulate this behavior, SMEP needs to be manually
disabled when guest switches to non-paging mode.

We met an issue that, SMP Linux guest with recent kernel (enable
SMEP support, for example, 3.5.3) would crash with triple fault if
setting unrestricted_guest=0. This is because KVM uses an identity
mapping page table to emulate the non-paging mode, where the page
table is set with USER flag. If SMEP is still enabled in this case,
guest will meet unhandlable page fault and then crash.
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

c08800a5

KVM: Remove duplicate text in api.txt · 4293b5e5

Geoff Levand authored Jan 31, 2013

Signed-off-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

4293b5e5

Revert "KVM: MMU: split kvm_mmu_free_page" · 834be0d8

Gleb Natapov authored Jan 30, 2013

This reverts commit bd4c86ea.

There is not user for kvm_mmu_isolate_page() any more.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

834be0d8

05 Feb, 2013 7 commits

KVM: MMU: drop superfluous is_present_gpte() check. · eb3fce87

Gleb Natapov authored Jan 30, 2013

Gust page walker puts only present ptes into ptes[] array. No need to
check it again.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

eb3fce87

KVM: MMU: drop superfluous min() call. · 116eb3d3

Gleb Natapov authored Jan 30, 2013

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

116eb3d3

KVM: MMU: set base_role.nxe during mmu initialization. · 2c9afa52

Gleb Natapov authored Jan 30, 2013

Move base_role.nxe initialisation to where all other roles are initialized.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

2c9afa52

KVM: MMU: drop unneeded checks. · 9bb4f6b1

Gleb Natapov authored Jan 30, 2013

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

9bb4f6b1

KVM: MMU: make spte_is_locklessly_modifiable() more clear · feb3eb70

Gleb Natapov authored Jan 30, 2013

spte_is_locklessly_modifiable() checks that both SPTE_HOST_WRITEABLE and
SPTE_MMU_WRITEABLE are present on spte. Make it more explicit.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

feb3eb70

KVM: set_memory_region: Disallow changing read-only attribute later · 75d61fbc

Takuya Yoshikawa authored Jan 30, 2013

As Xiao pointed out, there are a few problems with it:
 - kvm_arch_commit_memory_region() write protects the memory slot only
   for GET_DIRTY_LOG when modifying the flags.
 - FNAME(sync_page) uses the old spte value to set a new one without
   checking KVM_MEM_READONLY flag.

Since we flush all shadow pages when creating a new slot, the simplest
fix is to disallow such problematic flag changes: this is safe because
no one is doing such things.
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

75d61fbc

KVM: set_memory_region: Identify the requested change explicitly · f64c0398

Takuya Yoshikawa authored Jan 29, 2013

KVM_SET_USER_MEMORY_REGION forces __kvm_set_memory_region() to identify
what kind of change is being requested by checking the arguments. The
current code does this checking at various points in code and each
condition being used there is not easy to understand at first glance.

This patch consolidates these checks and introduces an enum to name the
possible changes to clean up the code.

Although this does not introduce any functional changes, there is one
change which optimizes the code a bit: if we have nothing to change, the
new code returns 0 immediately.

Note that the return value for this case cannot be changed since QEMU
relies on it: we noticed this when we changed it to -EINVAL and got a
section mismatch error at the final stage of live migration.
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

f64c0398

30 Jan, 2013 3 commits

s390/kvm: Fix instruction decoding · 0c29b229

Christian Borntraeger authored Jan 25, 2013

Instructions with long displacement have a signed displacement.
Currently the sign bit is interpreted as 2^20: Lets fix it by doing the
sign extension from 20bit to 32bit and then use it as a signed variable
in the addition (see kvm_s390_get_base_disp_rsy).

Furthermore, there are lots of "int" in that code. This is problematic,
because shifting on a signed integer is undefined/implementation defined
if the bit value happens to be negative.
Fortunately the promotion rules will make the right hand side unsigned
anyway, so there is no real problem right now.
Let's convert them anyway to unsigned where appropriate to avoid
problems if the code is changed or copy/pasted later on.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

0c29b229

s390/virtio-ccw: Fix setup_vq error handling. · c98d3683

Cornelia Huck authored Jan 25, 2013

virtio_ccw_setup_vq() failed to unwind correctly on errors. In
particular, it failed to delete the virtqueue on errors, leading to
list corruption when virtio_ccw_del_vqs() iterated over a virtqueue
that had not been added to the vcdev's list.

Fix this with redoing the error unwinding in virtio_ccw_setup_vq(),
using a single path for all errors.
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

c98d3683

s390/kvm: Fix store status for ACRS/FPRS · 15bc8d84

Christian Borntraeger authored Jan 25, 2013

On store status we need to copy the current state of registers
into a save area. Currently we might save stale versions:
The sie state descriptor doesnt have fields for guest ACRS,FPRS,
those registers are simply stored in the host registers. The host
program must copy these away if needed. We do that in vcpu_put/load.

If we now do a store status in KVM code between vcpu_put/load, the
saved values are not up-to-date. Lets collect the ACRS/FPRS before
saving them.

This also fixes some strange problems with hotplug and virtio-ccw,
since the low level machine check handler (on hotplug a machine check
will happen) will revalidate all registers with the content of the
save area.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
CC: stable@vger.kernel.org
Signed-off-by: Gleb Natapov <gleb@redhat.com>

15bc8d84

29 Jan, 2013 5 commits

kvm: Handle yield_to failure return code for potential undercommit case · c45c528e

Raghavendra K T authored Jan 22, 2013

yield_to returns -ESRCH, When source and target of yield_to
run queue length is one. When we see three successive failures of
yield_to we assume we are in potential undercommit case and abort
from PLE handler.
The assumption is backed by low probability of wrong decision
for even worst case scenarios such as average runqueue length
between 1 and 2.

More detail on rationale behind using three tries:
if p is the probability of finding rq length one on a particular cpu,
and if we do n tries, then probability of exiting ple handler is:

 p^(n+1) [ because we would have come across one source with rq length
1 and n target cpu rqs  with length 1 ]

so
num tries:         probability of aborting ple handler (1.5x overcommit)
 1                 1/4
 2                 1/8
 3                 1/16

We can increase this probability with more tries, but the problem is
the overhead.
Also, If we have tried three times that means we would have iterated
over 3 good eligible vcpus along with many non-eligible candidates. In
worst case if we iterate all the vcpus, we reduce 1x performance and
overcommit performance get hit.

note that we do not update last boosted vcpu in failure cases.
Thank Avi for raising question on aborting after first fail from yield_to.
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Tested-by: Chegu Vinod <chegu_vinod@hp.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

c45c528e

sched: Bail out of yield_to when source and target runqueue has one task · 7b270f60

Peter Zijlstra authored Jan 22, 2013

In case of undercomitted scenarios, especially in large guests
yield_to overhead is significantly high. when run queue length of
source and target is one, take an opportunity to bail out and return
-ESRCH. This return condition can be further exploited to quickly come
out of PLE handler.

(History: Raghavendra initially worked on break out of kvm ple handler upon
 seeing source runqueue length = 1, but it had to export rq length).
 Peter came up with the elegant idea of return -ESRCH in scheduler core.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Raghavendra, Checking the rq length of target vcpu condition added.(thanks Avi)
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Acked-by: Andrew Jones <drjones@redhat.com>
Tested-by: Chegu Vinod <chegu_vinod@hp.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

7b270f60

x86, apicv: add virtual interrupt delivery support · c7c9c56c

Yang Zhang authored Jan 25, 2013

Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
manually, which is fully taken care of by the hardware. This needs
some special awareness into existing interrupr injection path:

- for pending interrupt, instead of direct injection, we may need
  update architecture specific indicators before resuming to guest.

- A pending interrupt, which is masked by ISR, should be also
  considered in above update action, since hardware will decide
  when to inject it at right time. Current has_interrupt and
  get_interrupt only returns a valid vector from injection p.o.v.
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

c7c9c56c

x86, apicv: add virtual x2apic support · 8d14695f

Yang Zhang authored Jan 25, 2013

basically to benefit from apicv, we need to enable virtualized x2apic mode.
Currently, we only enable it when guest is really using x2apic.

Also, clear MSR bitmap for corresponding x2apic MSRs when guest enabled x2apic:
0x800 - 0x8ff: no read intercept for apicv register virtualization,
               except APIC ID and TMCCT which need software's assistance to
               get right value.
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

8d14695f

x86, apicv: add APICv register virtualization support · 83d4c286

Yang Zhang authored Jan 25, 2013

- APIC read doesn't cause VM-Exit
- APIC write becomes trap-like
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Yang Zhang <yang.z.zhang@intel.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

83d4c286

27 Jan, 2013 3 commits

kvm: Obey read-only mappings in iommu · d47510e2

Alex Williamson authored Jan 24, 2013

We've been ignoring read-only mappings and programming everything
into the iommu as read-write.  Fix this to only include the write
access flag when read-only is not set.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

d47510e2

kvm: Force IOMMU remapping on memory slot read-only flag changes · 261874b0

Alex Williamson authored Jan 24, 2013

Memory slot flags can be altered without changing other parameters of
the slot.  The read-only attribute is the only one the IOMMU cares
about, so generate an un-map, re-map when this occurs.  This also
avoid unnecessarily re-mapping the slot when no IOMMU visible changes
are made.
Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

261874b0

KVM: x86 emulator: fix test_cc() build failure on i386 · 3f0c3d0b

Avi Kivity authored Jan 26, 2013

'pushq' doesn't exist on i386.  Replace with 'push', which should work
since the operand is a register.
Signed-off-by: Avi Kivity <avi.kivity@gmail.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>

3f0c3d0b

24 Jan, 2013 14 commits

KVM: PPC: E500: Remove kvmppc_e500_tlbil_all usage from guest TLB code · b9e3e208

Alexander Graf authored Jan 18, 2013

The guest TLB handling code should not have any insight into how the host
TLB shadow code works.

kvmppc_e500_tlbil_all() is a function that is used for distinction between
e500v2 and e500mc (E.HV) on how to flush shadow entries. This function really
is private between the e500.c/e500mc.c file and e500_mmu_host.c.

Instead of this one, use the public kvmppc_core_flush_tlb() function to flush
all shadow TLB entries. As a nice side effect, with this we also end up
flushing TLB1 entries which we forgot to do before.
Signed-off-by: Alexander Graf <agraf@suse.de>

b9e3e208

KVM: PPC: E500: Make clear_tlb_refs and clear_tlb1_bitmap static · 483ba97c

Alexander Graf authored Jan 18, 2013

Host shadow TLB flushing is logic that the guest TLB code should have
no insight about. Declare the internal clear_tlb_refs and clear_tlb1_bitmap
functions static to the host TLB handling file.

Instead of these, we can use the already exported kvmppc_core_flush_tlb().
This gives us a common API across the board to say "please flush any
pending host shadow translation".
Signed-off-by: Alexander Graf <agraf@suse.de>

483ba97c

KVM: PPC: e500: Implement TLB1-in-TLB0 mapping · c015c62b

Alexander Graf authored Jan 17, 2013

When a host mapping fault happens in a guest TLB1 entry today, we
map the translated guest entry into the host's TLB1.

This isn't particularly clever when the guest is mapped by normal 4k
pages, since these would be a lot better to put into TLB0 instead.

This patch adds the required logic to map 4k TLB1 shadow maps into
the host's TLB0.
Signed-off-by: Alexander Graf <agraf@suse.de>

c015c62b

KVM: PPC: E500: Split host and guest MMU parts · b71c9e2f

Alexander Graf authored Jan 11, 2013

This patch splits the file e500_tlb.c into e500_mmu.c (guest TLB handling)
and e500_mmu_host.c (host TLB handling).

The main benefit of this split is readability and maintainability. It's
just a lot harder to write dirty code :).
Signed-off-by: Alexander Graf <agraf@suse.de>

b71c9e2f

KVM: PPC: e500: Call kvmppc_mmu_map for initial mapping · 9d98b3ff

Alexander Graf authored Jan 17, 2013

When emulating tlbwe, we want to automatically map the entry that just got
written in our shadow TLB map, because chances are quite high that it's
going to be used very soon.

Today this happens explicitly, duplicating all the logic that is in
kvmppc_mmu_map() already. Just call that one instead.
Signed-off-by: Alexander Graf <agraf@suse.de>

9d98b3ff

KVM: PPC: E500: Propagate errors when shadow mapping · 2c378fd7

Alexander Graf authored Jan 18, 2013

When shadow mapping a page, mapping this page can fail. In that case we
don't have a shadow map.

Take this case into account, otherwise we might end up writing bogus TLB
entries into the host TLB.

While at it, also move the write_stlbe() calls into the respective TLBn
handlers.
Signed-off-by: Alexander Graf <agraf@suse.de>

2c378fd7

KVM: PPC: E500: Explicitly mark shadow maps invalid · 523f0e54

Alexander Graf authored Jan 18, 2013

When we invalidate shadow TLB maps on the host, we don't mark them
as not valid. But we should.

Fix this by removing the E500_TLB_VALID from their flags when
invalidating.
Signed-off-by: Alexander Graf <agraf@suse.de>

523f0e54

KVM: PPC: E500: Move write_stlbe higher · 9445ef01

Alexander Graf authored Jan 18, 2013

Later patches want to call the function and it doesn't have
dependencies on anything below write_host_tlbe.

Move it higher up in the file.
Signed-off-by: Alexander Graf <agraf@suse.de>

9445ef01

KVM: VMX: set vmx->emulation_required only when needed. · 14168786

Gleb Natapov authored Jan 21, 2013

If emulate_invalid_guest_state=false vmx->emulation_required is never
actually used, but it ends up to be always set to true since
handle_invalid_guest_state(), the only place it is reset back to
false, is never called. This, besides been not very clean, makes vmexit
and vmentry path to check emulate_invalid_guest_state needlessly.

The patch fixes that by keeping emulation_required coherent with
emulate_invalid_guest_state setting.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

14168786

KVM: x86: fix use of uninitialized memory as segment descriptor in emulator. · 378a8b09

Gleb Natapov authored Jan 21, 2013

If VMX reports segment as unusable, zero descriptor passed by the emulator
before returning. Such descriptor will be considered not present by the
emulator.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

378a8b09

KVM: VMX: rename fix_pmode_dataseg to fix_pmode_seg. · 91b0aa2c

Gleb Natapov authored Jan 21, 2013

The function deals with code segment too.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

91b0aa2c

KVM: VMX: don't clobber segment AR of unusable segments. · 25391454

Gleb Natapov authored Jan 21, 2013

Usability is returned in unusable field, so not need to clobber entire
AR. Callers have to know how to deal with unusable segments already
since if emulate_invalid_guest_state=true AR is not zeroed.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

25391454

KVM: VMX: skip vmx->rmode.vm86_active check on cr0 write if unrestricted guest is enabled · 218e763f

Gleb Natapov authored Jan 21, 2013

vmx->rmode.vm86_active is never true is unrestricted guest is enabled.
Make it more explicit that neither enter_pmode() nor enter_rmode() is
called in this case.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

218e763f

KVM: VMX: remove hack that disables emulation on vcpu reset/init · 286da415

Gleb Natapov authored Jan 21, 2013

There is no reason for it. If state is suitable for vmentry it
will be detected during guest entry and no emulation will happen.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

286da415