Commits · a87036add09283e6c4f4103a15c596c67b86ab86 · Kirill Smelkov / linux

09 Mar, 2016 14 commits

KVM: x86: disable MPX if host did not enable MPX XSAVE features · a87036ad

Paolo Bonzini authored Mar 08, 2016

When eager FPU is disabled, KVM will still see the MPX bit in CPUID and
presumably the MPX vmentry and vmexit controls.  However, it will not
be able to expose the MPX XSAVE features to the guest, because the guest's
accessible XSAVE features are always a subset of host_xcr0.

In this case, we should disable the MPX CPUID bit, the BNDCFGS MSR,
and the MPX vmentry and vmexit controls for nested virtualization.
It is then unnecessary to enable guest eager FPU if the guest has the
MPX CPUID bit set.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

a87036ad

Merge tag 'kvm-arm-for-4.6' of... · ab92f308

Paolo Bonzini authored Mar 09, 2016

Merge tag 'kvm-arm-for-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/ARM updates for 4.6

- VHE support so that we can run the kernel at EL2 on ARMv8.1 systems
- PMU support for guests
- 32bit world switch rewritten in C
- Various optimizations to the vgic save/restore code

Conflicts:
	include/uapi/linux/kvm.h

ab92f308

arm64: KVM: vgic-v3: Only wipe LRs on vcpu exit · b40c4892

Marc Zyngier authored Feb 09, 2016

So far, we're always writing all possible LRs, setting the empty
ones with a zero value. This is obvious doing a low of work for
nothing, and we're better off clearing those we've actually
dirtied on the exit path (it is very rare to inject more than one
interrupt at a time anyway).
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

b40c4892

arm64: KVM: vgic-v3: Reset LRs at boot time · 0d98d00b

Marc Zyngier authored Mar 03, 2016

In order to let the GICv3 code be more lazy in the way it
accesses the LRs, it is necessary to start with a clean slate.

Let's reset the LRs on each CPU when the vgic is probed (which
includes a round trip to EL2...).
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

0d98d00b

arm64: KVM: vgic-v3: Do not save an LR known to be empty · 84e8b9c8

Marc Zyngier authored Feb 09, 2016

On exit, any empty LR will be signaled in ICH_ELRSR_EL2. Which
means that we do not have to save it, and we can just clear
its state in the in-memory copy.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

84e8b9c8

arm64: KVM: vgic-v3: Save maintenance interrupt state only if required · b4344545

Marc Zyngier authored Feb 09, 2016

Next on our list of useless accesses is the maintenance interrupt
status registers (ICH_MISR_EL2, ICH_EISR_EL2).

It is pointless to save them if we haven't asked for a maintenance
interrupt the first place, which can only happen for two reasons:
- Underflow: ICH_HCR_UIE will be set,
- EOI: ICH_LR_EOI will be set.

These conditions can be checked on the in-memory copies of the regs.
Should any of these two condition be valid, we must read GICH_MISR.
We can then check for ICH_MISR_EOI, and only when set read
ICH_EISR_EL2.

This means that in most case, we don't have to save them at all.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

b4344545

arm64: KVM: vgic-v3: Avoid accessing ICH registers · 1b8e83c0

Marc Zyngier authored Feb 17, 2016

Just like on GICv2, we're a bit hammer-happy with GICv3, and access
them more often than we should.

Adopt a policy similar to what we do for GICv2, only save/restoring
the minimal set of registers. As we don't access the registers
linearly anymore (we may skip some), the convoluted accessors become
slightly simpler, and we can drop the ugly indexing macro that
tended to confuse the reviewers.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

1b8e83c0

KVM: arm/arm64: vgic-v2: Make GICD_SGIR quicker to hit · 667a87a9

Marc Zyngier authored Feb 09, 2016

The GICD_SGIR register lives a long way from the beginning of
the handler array, which is searched linearly. As this is hit
pretty often, let's move it up. This saves us some precious
cycles when the guest is generating IPIs.
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

667a87a9

KVM: arm/arm64: vgic-v2: Only wipe LRs on vcpu exit · cc1daf0b

Marc Zyngier authored Feb 09, 2016

So far, we're always writing all possible LRs, setting the empty
ones with a zero value. This is obvious doing a lot of work for
nothing, and we're better off clearing those we've actually
dirtied on the exit path (it is very rare to inject more than one
interrupt at a time anyway).
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

cc1daf0b

KVM: arm/arm64: vgic-v2: Reset LRs at boot time · d6400d77

Marc Zyngier authored Mar 03, 2016

In order to let make the GICv2 code more lazy in the way it
accesses the LRs, it is necessary to start with a clean slate.

Let's reset the LRs on each CPU when the vgic is probed.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

d6400d77

KVM: arm/arm64: vgic-v2: Do not save an LR known to be empty · f8cfbce1

Marc Zyngier authored Feb 09, 2016

On exit, any empty LR will be signaled in GICH_ELRSR*. Which
means that we do not have to save it, and we can just clear
its state in the in-memory copy.

Take this opportunity to move the LR saving code into its
own function.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

f8cfbce1

KVM: arm/arm64: vgic-v2: Move GICH_ELRSR saving to its own function · 2a1044f8

Marc Zyngier authored Feb 09, 2016

In order to make the saving path slightly more readable and
prepare for some more optimizations, let's move the GICH_ELRSR
saving to its own function.

No functional change.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

2a1044f8

KVM: arm/arm64: vgic-v2: Save maintenance interrupt state only if required · c813bb17

Marc Zyngier authored Feb 09, 2016

Next on our list of useless accesses is the maintenance interrupt
status registers (GICH_MISR, GICH_EISR{0,1}).

It is pointless to save them if we haven't asked for a maintenance
interrupt the first place, which can only happen for two reasons:
- Underflow: GICH_HCR_UIE will be set,
- EOI: GICH_LR_EOI will be set.

These conditions can be checked on the in-memory copies of the regs.
Should any of these two condition be valid, we must read GICH_MISR.
We can then check for GICH_MISR_EOI, and only when set read
GICH_EISR*.

This means that in most case, we don't have to save them at all.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

c813bb17

KVM: arm/arm64: vgic-v2: Avoid accessing GICH registers · 59f00ff9

Marc Zyngier authored Feb 02, 2016

GICv2 registers are *slow*. As in "terrifyingly slow". Which is bad.
But we're equaly bad, as we make a point in accessing them even if
we don't have any interrupt in flight.

A good solution is to first find out if we have anything useful to
write into the GIC, and if we don't, to simply not do it. This
involves tracking which LRs actually have something valid there.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

59f00ff9

08 Mar, 2016 9 commits

KVM: MMU: micro-optimize gpte_access · bb9eadf0

Paolo Bonzini authored Feb 23, 2016

Avoid AND-NOT, most x86 processor lack an instruction for it.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

bb9eadf0

KVM: MMU: simplify last_pte_bitmap · 6bb69c9b

Paolo Bonzini authored Feb 23, 2016

Branch-free code is fun and everybody knows how much Avi loves it,
but last_pte_bitmap takes it a bit to the extreme.  Since the code
is simply doing a range check, like

	(level == 1 ||
	 ((gpte & PT_PAGE_SIZE_MASK) && level < N)

we can make it branch-free without storing the entire truth table;
it is enough to cache N.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

6bb69c9b

KVM: MMU: coalesce more page zapping in mmu_sync_children · 50c9e6f3

Paolo Bonzini authored Feb 25, 2016

mmu_sync_children can only process up to 16 pages at a time.  Check
if we need to reschedule, and do not bother zapping the pages until
that happens.
Reviewed-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

50c9e6f3

KVM: MMU: move zap/flush to kvm_mmu_get_page · 2a74003a

Paolo Bonzini authored Feb 24, 2016

kvm_mmu_get_page is the only caller of kvm_sync_page_transient
and kvm_sync_pages.  Moving the handling of the invalid_list there
removes the need for the underdocumented kvm_sync_page_transient
function.
Reviewed-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

2a74003a

KVM: MMU: invert return value of mmu.sync_page and *kvm_sync_page* · 1f50f1b3

Paolo Bonzini authored Feb 24, 2016

Return true if the page was synced (and the TLB must be flushed)
and false if the page was zapped.
Reviewed-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

1f50f1b3

KVM: MMU: cleanup __kvm_sync_page and its callers · 9a43c5d9

Paolo Bonzini authored Feb 24, 2016

Calling kvm_unlink_unsync_page in the middle of __kvm_sync_page makes
things unnecessarily tricky.  If kvm_mmu_prepare_zap_page is called,
it will call kvm_unlink_unsync_page too.  So kvm_unlink_unsync_page can
be called just as well at the beginning or the end of __kvm_sync_page...
which means that we might do it in kvm_sync_page too and remove the
parameter.

kvm_sync_page ends up being the same code that kvm_sync_pages used
to have before the previous patch.
Reviewed-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

9a43c5d9

KVM: MMU: use kvm_sync_page in kvm_sync_pages · df748f86

Paolo Bonzini authored Feb 24, 2016

If the last argument is true, kvm_unlink_unsync_page is called anyway in
__kvm_sync_page (either by kvm_mmu_prepare_zap_page or by __kvm_sync_page
itself).  Therefore, kvm_sync_pages can just call kvm_sync_page, instead
of going through kvm_unlink_unsync_page+__kvm_sync_page.
Reviewed-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

df748f86

KVM: MMU: move TLB flush out of __kvm_sync_page · 35a70510

Paolo Bonzini authored Feb 24, 2016

By doing this, kvm_sync_pages can use __kvm_sync_page instead of
reinventing it.  Because of kvm_mmu_flush_or_zap, the code does not
end up being more complex than before, and more cleanups to kvm_sync_pages
will come in the next patches.
Reviewed-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

35a70510

KVM: MMU: introduce kvm_mmu_flush_or_zap · b8c67b7a

Paolo Bonzini authored Feb 24, 2016

This is a generalization of mmu_pte_write_flush_tlb, that also
takes care of calling kvm_mmu_commit_zap_page.  The next
patches will introduce more uses.
Reviewed-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

b8c67b7a

04 Mar, 2016 17 commits

KVM: i8254: drop local copy of mul_u64_u32_div · 0e4d4415

Paolo Bonzini authored Mar 04, 2016

A function that does the same as i8254.c's muldiv64 has been added
(for KVM's own use, in fact!) in include/linux/math64.h.  Use it
instead of muldiv64.
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

0e4d4415

KVM: MMU: check kvm_mmu_pages and mmu_page_path indices · e23d3fef

Xiao Guangrong authored Feb 24, 2016

Give a special invalid index to the root of the walk, so that we
can check the consistency of kvm_mmu_pages and mmu_page_path.
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
[Extracted from a bigger patch proposed by Guangrong. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

e23d3fef

KVM: MMU: Fix ubsan warnings · 0a47cd85

Paolo Bonzini authored Feb 23, 2016

kvm_mmu_pages_init is doing some really yucky stuff. It is setting
up a sentinel for mmu_page_clear_parents; however, because of a) the
way levels are numbered starting from 1 and b) the way mmu_page_path
sizes its arrays with PT64_ROOT_LEVEL-1 elements, the access can be
out of bounds. This is harmless because the code overwrites up to the
first two elements of parents->idx and these are initialized, and
because the sentinel is not needed in this case---mmu_page_clear_parents
exits anyway when it gets to the end of the array. However ubsan
complains, and everyone else should too.

This fix does three things. First it makes the mmu_page_path arrays
PT64_ROOT_LEVEL elements in size, so that we can write to them without
checking the level in advance. Second it disintegrates kvm_mmu_pages_init
between mmu_unsync_walk (to reset the struct kvm_mmu_pages) and
for_each_sp (to place the NULL sentinel at the end of the current path).
This is okay because the mmu_page_path is only used in
mmu_pages_clear_parents; mmu_pages_clear_parents itself is called within
a for_each_sp iterator, and hence always after a call to mmu_pages_next.
Third it changes mmu_pages_clear_parents to just use the sentinel to
stop iteration, without checking the bounds on level.
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Reported-by: Mike Krinkin <krinkin.m.u@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

0a47cd85

KVM: MMU: cleanup handle_abnormal_pfn · 798e88b3

Paolo Bonzini authored Feb 23, 2016

The goto and temporary variable are unnecessary, just use return
statements.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

798e88b3

KVM: VMX: use vmcs_clear/set_bits for debug register exits · 8f22372f

Paolo Bonzini authored Feb 26, 2016

Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

8f22372f

KVM: ensure __gfn_to_pfn_memslot initializes *writable · b2740d35

Paolo Bonzini authored Feb 23, 2016

For the kvm_is_error_hva, ubsan complains if the uninitialized writable
is passed to __direct_map, even though the value itself is not used
(__direct_map goes to mmu_set_spte->set_spte->set_mmio_spte but never
looks at that argument).

Ensuring that __gfn_to_pfn_memslot initializes *writable is cheap and
avoids this kind of issue.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

b2740d35

KVM: document KVM_REINJECT_CONTROL ioctl · 107d44a2

Radim Krčmář authored Mar 02, 2016

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

107d44a2

KVM: i8254: turn kvm_kpit_state.reinject into atomic_t · a0aace5a

Radim Krčmář authored Mar 02, 2016

Document possible races between readers and concurrent update to the
ioctl.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

a0aace5a

KVM: i8254: move PIT timer function initialization · ab4c1476

Radim Krčmář authored Mar 02, 2016

We can do it just once.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

ab4c1476

KVM: i8254: don't assume layout of kvm_kpit_state · 34f3941c

Radim Krčmář authored Mar 02, 2016

channels has offset 0 and correct size now, but that can change.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

34f3941c

KVM: i8254: remove pointless dereference of PIT · 4a2095df

Radim Krčmář authored Mar 02, 2016

PIT is known at that point.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

4a2095df

KVM: i8254: remove pit and kvm from kvm_kpit_state · a3e13115

Radim Krčmář authored Mar 02, 2016

kvm isn't ever used and pit can be accessed with container_of.
If you *really* need kvm, pit_state_to_pit(ps)->kvm.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

a3e13115

KVM: i8254: refactor kvm_free_pit · 08e5ccf3

Radim Krčmář authored Mar 02, 2016

Could be easier to read, but git history will become deeper.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

08e5ccf3

KVM: i8254: refactor kvm_create_pit · 10d24821

Radim Krčmář authored Mar 02, 2016

Locks are gone, so we don't need to duplicate error paths.
Use goto everywhere.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

10d24821

KVM: i8254: remove notifiers from PIT discard policy · 71474e2f

Radim Krčmář authored Mar 02, 2016

Discard policy doesn't rely on information from notifiers, so we don't
need to register notifiers unconditionally.  We kept correct counts in
case userspace switched between policies during runtime, but that can be
avoided by reseting the state.
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

71474e2f

KVM: i8254: remove unnecessary uses of PIT state lock · b39c90b6

Radim Krčmář authored Mar 02, 2016

- kvm_create_pit had to lock only because it exposed kvm->arch.vpit very
  early, but initialization doesn't use kvm->arch.vpit since the last
  patch, so we can drop locking.
- kvm_free_pit is only run after there are no users of KVM and therefore
  is the sole actor.
- Locking in kvm_vm_ioctl_reinject doesn't do anything, because reinject
  is only protected at that place.
- kvm_pit_reset isn't used anywhere and its locking can be dropped if we
  hide it.

Removing useless locking allows to see what actually is being protected
by PIT state lock (values accessible from the guest).
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

b39c90b6

KVM: i8254: pass struct kvm_pit instead of kvm in PIT · 09edea72

Radim Krčmář authored Mar 02, 2016

This patch passes struct kvm_pit into internal PIT functions.
Those functions used to get PIT through kvm->arch.vpit, even though most
of them never used *kvm for other purposes.  Another benefit is that we
don't need to set kvm->arch.vpit during initialization.
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

09edea72