Commits · 02d4160fbd765110e836c84c30c7babb5af84a44 · Kirill Smelkov / linux

22 Aug, 2019 8 commits

x86: KVM: add xsetbv to the emulator · 02d4160f

Vitaly Kuznetsov authored Aug 13, 2019

To avoid hardcoding xsetbv length to '3' we need to support decoding it in
the emulator.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

02d4160f

x86: KVM: clear interrupt shadow on EMULTYPE_SKIP · 97413d29

Vitaly Kuznetsov authored Aug 13, 2019

When doing x86_emulate_instruction(EMULTYPE_SKIP) interrupt shadow has to
be cleared if and only if the skipping is successful.

There are two immediate issues:
- In SVM skip_emulated_instruction() we are not zapping interrupt shadow
  in case kvm_emulate_instruction(EMULTYPE_SKIP) is used to advance RIP
  (!nrpip_save).
- In VMX handle_ept_misconfig() when running as a nested hypervisor we
  (static_cpu_has(X86_FEATURE_HYPERVISOR) case) forget to clear interrupt
  shadow.

Note that we intentionally don't handle the case when the skipped
instruction is supposed to prolong the interrupt shadow ("MOV/POP SS") as
skip-emulation of those instructions should not happen under normal
circumstances.
Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

97413d29

x86: kvm: svm: propagate errors from skip_emulated_instruction() · f8ea7c60

Vitaly Kuznetsov authored Aug 13, 2019

On AMD, kvm_x86_ops->skip_emulated_instruction(vcpu) can, in theory,
fail: in !nrips case we call kvm_emulate_instruction(EMULTYPE_SKIP).
Currently, we only do printk(KERN_DEBUG) when this happens and this
is not ideal. Propagate the error up the stack.

On VMX, skip_emulated_instruction() doesn't fail, we have two call
sites calling it explicitly: handle_exception_nmi() and
handle_task_switch(), we can just ignore the result.

On SVM, we also have two explicit call sites:
svm_queue_exception() and it seems we don't need to do anything there as
we check if RIP was advanced or not. In task_switch_interception(),
however, we are better off not proceeding to kvm_task_switch() in case
skip_emulated_instruction() failed.
Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

f8ea7c60

x86: KVM: svm: don't pretend to advance RIP in case wrmsr_interception() results in #GP · 05402f64

Vitaly Kuznetsov authored Aug 13, 2019

svm->next_rip is only used by skip_emulated_instruction() and in case
kvm_set_msr() fails we rightfully don't do that. Move svm->next_rip
advancement to 'else' branch to avoid creating false impression that
it's always advanced (and make it look like rdmsr_interception()).

This is a preparatory change to removing hardcoded RIP advancement
from instruction intercepts, no functional change.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

05402f64

KVM: x86: Fix x86_decode_insn() return when fetching insn bytes fails · 016cd759

Sean Christopherson authored Aug 15, 2019

Jump to the common error handling in x86_decode_insn() if
__do_insn_fetch_bytes() fails so that its error code is converted to the
appropriate return type.  Although the various helpers used by
x86_decode_insn() return X86EMUL_* values, x86_decode_insn() itself
returns EMULATION_FAILED or EMULATION_OK.

This doesn't cause a functional issue as the sole caller,
x86_emulate_instruction(), currently only cares about success vs.
failure, and success is indicated by '0' for both types
(X86EMUL_CONTINUE and EMULATION_OK).

Fixes: 285ca9e9 ("KVM: emulate: speed up do_insn_fetch")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

016cd759

KVM: x86: use Intel speculation bugs and features as derived in generic x86 code · 0c54914d

Paolo Bonzini authored Aug 19, 2019

Similar to AMD bits, set the Intel bits from the vendor-independent
feature and bug flags, because KVM_GET_SUPPORTED_CPUID does not care
about the vendor and they should be set on AMD processors as well.
Suggested-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

0c54914d

KVM: x86: always expose VIRT_SSBD to guests · 50896de4

Paolo Bonzini authored Aug 15, 2019

Even though it is preferrable to use SPEC_CTRL (represented by
X86_FEATURE_AMD_SSBD) instead of VIRT_SPEC, VIRT_SPEC is always
supported anyway because otherwise it would be impossible to
migrate from old to new CPUs.  Make this apparent in the
result of KVM_GET_SUPPORTED_CPUID as well.

However, we need to hide the bit on Intel processors, so move
the setting to svm_set_supported_cpuid.

Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reported-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

50896de4

KVM: x86: fix reporting of AMD speculation bug CPUID leaf · 4c6903a0

Paolo Bonzini authored Aug 14, 2019

The AMD_* bits have to be set from the vendor-independent
feature and bug flags, because KVM_GET_SUPPORTED_CPUID does not care
about the vendor and they should be set on Intel processors as well.
On top of this, SSBD, STIBP and AMD_SSB_NO bit were not set, and
VIRT_SSBD does not have to be added manually because it is a
cpufeature that comes directly from the host's CPUID bit.
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

4c6903a0

21 Aug, 2019 3 commits

selftests/kvm: make platform_info_test pass on AMD · e4427372

Vitaly Kuznetsov authored Jun 10, 2019

test_msr_platform_info_disabled() generates EXIT_SHUTDOWN but VMCB state
is undefined after that so an attempt to launch this guest again from
test_msr_platform_info_enabled() fails. Reorder the tests to make test
pass.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

e4427372

Revert "KVM: x86/mmu: Zap only the relevant pages when removing a memslot" · d012a06a

Paolo Bonzini authored Aug 15, 2019

This reverts commit 4e103134.
Alex Williamson reported regressions with device assignment with
this patch.  Even though the bug is probably elsewhere and still
latent, this is needed to fix the regression.

Fixes: 4e103134 ("KVM: x86/mmu: Zap only the relevant pages when removing a memslot", 2019-02-05)
Reported-by: Alex Willamson <alex.williamson@redhat.com>
Cc: stable@vger.kernel.org
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

d012a06a

selftests: kvm: fix state save/load on processors without XSAVE · 54577e50

Paolo Bonzini authored Aug 20, 2019

state_test and smm_test are failing on older processors that do not
have xcr0.  This is because on those processor KVM does provide
support for KVM_GET/SET_XSAVE (to avoid having to rely on the older
KVM_GET/SET_FPU) but not for KVM_GET/SET_XCRS.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

54577e50

15 Aug, 2019 3 commits

selftests: kvm: fix vmx_set_nested_state_test · c930e197

Paolo Bonzini authored Aug 14, 2019

vmx_set_nested_state_test is trying to use the KVM_STATE_NESTED_EVMCS without
enabling enlightened VMCS first. Correct the outcome of the test, and actually
test that it succeeds after the capability is enabled.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

c930e197

selftests: kvm: provide common function to enable eVMCS · 65efa61d

Paolo Bonzini authored Aug 14, 2019

There are two tests already enabling eVMCS and a third is coming.
Add a function that enables the capability and tests the result.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

65efa61d

selftests: kvm: do not try running the VM in vmx_set_nested_state_test · 92cd0f0b

Paolo Bonzini authored Aug 14, 2019

This test is only covering various edge cases of the
KVM_SET_NESTED_STATE ioctl.  Running the VM does not really
add anything.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

92cd0f0b

14 Aug, 2019 4 commits

KVM: x86: svm: remove redundant assignment of var new_entry · c8e174b3

Miaohe Lin authored Aug 12, 2019

new_entry is reassigned a new value next line. So
it's redundant and remove it.
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

c8e174b3

MAINTAINERS: add KVM x86 reviewers · ed4e7b05

Paolo Bonzini authored Aug 09, 2019

This is probably overdue---KVM x86 has quite a few contributors that
usually review each other's patches, which is really helpful to me.
Formalize this by listing them as reviewers.  I am including people
with various expertise:

- Joerg for SVM (with designated reviewers, it makes more sense to have
him in the main KVM/x86 stanza)

- Sean for MMU and VMX

- Jim for VMX

- Vitaly for Hyper-V and possibly SVM

- Wanpeng for LAPIC and paravirtualization.

Please ack if you are okay with this arrangement, otherwise speak up.

In other news, Radim is going to leave Red Hat soon.  However, he has
not been very much involved in upstream KVM development for some time,
and in the immediate future he is still going to help maintain kvm/queue
while I am on vacation.  Since not much is going to change, I will let
him decide whether he wants to keep the maintainer role after he leaves.
Acked-by: Joerg Roedel <joro@8bytes.org>
Acked-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Wanpeng Li <wanpengli@tencent.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

ed4e7b05

MAINTAINERS: change list for KVM/s390 · 74260dc2

Paolo Bonzini authored Aug 09, 2019

KVM/s390 does not have a list of its own, and linux-s390 is in the
loop anyway thanks to the generic arch/s390 match.  So use the generic
KVM list for s390 patches.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

74260dc2

kvm: x86: skip populating logical dest map if apic is not sw enabled · b14c876b

Radim Krcmar authored Aug 13, 2019

recalculate_apic_map does not santize ldr and it's possible that
multiple bits are set. In that case, a previous valid entry
can potentially be overwritten by an invalid one.

This condition is hit when booting a 32 bit, >8 CPU, RHEL6 guest and then
triggering a crash to boot a kdump kernel. This is the sequence of
events:
1. Linux boots in bigsmp mode and enables PhysFlat, however, it still
writes to the LDR which probably will never be used.
2. However, when booting into kdump, the stale LDR values remain as
they are not cleared by the guest and there isn't a apic reset.
3. kdump boots with 1 cpu, and uses Logical Destination Mode but the
logical map has been overwritten and points to an inactive vcpu.
Signed-off-by: Radim Krcmar <rkrcmar@redhat.com>
Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

b14c876b

09 Aug, 2019 8 commits

Merge tag 'kvmarm-fixes-for-5.3-2' of... · a738b5e7

Paolo Bonzini authored Aug 09, 2019

Merge tag 'kvmarm-fixes-for-5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm fixes for 5.3, take #2

- Fix our system register reset so that we stop writing
  non-sensical values to them, and track which registers
  get reset instead.
- Sync VMCR back from the GIC on WFI so that KVM has an
  exact vue of PMR.
- Reevaluate state of HW-mapped, level triggered interrupts
  on enable.

a738b5e7

Merge tag 'kvmarm-fixes-for-5.3' of... · 0e1c438c

Paolo Bonzini authored Aug 09, 2019

Merge tag 'kvmarm-fixes-for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm fixes for 5.3

- A bunch of switch/case fall-through annotation, fixing one actual bug
- Fix PMU reset bug
- Add missing exception class debug strings

0e1c438c

selftests: kvm: Adding config fragments · c096397c

Naresh Kamboju authored Aug 07, 2019

selftests kvm test cases need pre-required kernel configs for the test
to get pass.
Signed-off-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

c096397c

KVM: selftests: Update gitignore file for latest changes · e2c26537

Thomas Huth authored Jul 31, 2019

The kvm_create_max_vcpus test has been moved to the main directory,
and sync_regs_test is now available on s390x, too.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

e2c26537

kvm: remove unnecessary PageReserved check · 8f946da7

Paolo Bonzini authored Aug 05, 2019

The same check is already done in kvm_is_reserved_pfn.
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

8f946da7

KVM: arm/arm64: vgic: Reevaluate level sensitive interrupts on enable · 16e604a4

Alexandru Elisei authored Aug 07, 2019

A HW mapped level sensitive interrupt asserted by a device will not be put
into the ap_list if it is disabled at the VGIC level. When it is enabled
again, it will be inserted into the ap_list and written to a list register
on guest entry regardless of the state of the device.

We could argue that this can also happen on real hardware, when the command
to enable the interrupt reached the GIC before the device had the chance to
de-assert the interrupt signal; however, we emulate the distributor and
redistributors in software and we can do better than that.
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>

16e604a4

KVM: arm: Don't write junk to CP15 registers on reset · c69509c7

Marc Zyngier authored Aug 05, 2019

At the moment, the way we reset CP15 registers is mildly insane:
We write junk to them, call the reset functions, and then check that
we have something else in them.

The "fun" thing is that this can happen while the guest is running
(PSCI, for example). If anything in KVM has to evaluate the state
of a CP15 register while junk is in there, bad thing may happen.

Let's stop doing that. Instead, we track that we have called a
reset function for that register, and assume that the reset
function has done something.

In the end, the very need of this reset check is pretty dubious,
as it doesn't check everything (a lot of the CP15 reg leave outside
of the cp15_regs[] array). It may well be axed in the near future.
Signed-off-by: Marc Zyngier <maz@kernel.org>

c69509c7

KVM: arm64: Don't write junk to sysregs on reset · 03fdfb26

Marc Zyngier authored Aug 05, 2019

At the moment, the way we reset system registers is mildly insane:
We write junk to them, call the reset functions, and then check that
we have something else in them.

The "fun" thing is that this can happen while the guest is running
(PSCI, for example). If anything in KVM has to evaluate the state
of a system register while junk is in there, bad thing may happen.

Let's stop doing that. Instead, we track that we have called a
reset function for that register, and assume that the reset
function has done something. This requires fixing a couple of
sysreg refinition in the trap table.

In the end, the very need of this reset check is pretty dubious,
as it doesn't check everything (a lot of the sysregs leave outside of
the sys_regs[] array). It may well be axed in the near future.
Tested-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>

03fdfb26

05 Aug, 2019 7 commits

KVM: arm/arm64: Sync ICH_VMCR_EL2 back when about to block · 5eeaf10e

Marc Zyngier authored Aug 02, 2019

Since commit commit 328e5664 ("KVM: arm/arm64: vgic: Defer
touching GICH_VMCR to vcpu_load/put"), we leave ICH_VMCR_EL2 (or
its GICv2 equivalent) loaded as long as we can, only syncing it
back when we're scheduled out.

There is a small snag with that though: kvm_vgic_vcpu_pending_irq(),
which is indirectly called from kvm_vcpu_check_block(), needs to
evaluate the guest's view of ICC_PMR_EL1. At the point were we
call kvm_vcpu_check_block(), the vcpu is still loaded, and whatever
changes to PMR is not visible in memory until we do a vcpu_put().

Things go really south if the guest does the following:

	mov x0, #0	// or any small value masking interrupts
	msr ICC_PMR_EL1, x0

	[vcpu preempted, then rescheduled, VMCR sampled]

	mov x0, #ff	// allow all interrupts
	msr ICC_PMR_EL1, x0
	wfi		// traps to EL2, so samping of VMCR

	[interrupt arrives just after WFI]

Here, the hypervisor's view of PMR is zero, while the guest has enabled
its interrupts. kvm_vgic_vcpu_pending_irq() will then say that no
interrupts are pending (despite an interrupt being received) and we'll
block for no reason. If the guest doesn't have a periodic interrupt
firing once it has blocked, it will stay there forever.

To avoid this unfortuante situation, let's resync VMCR from
kvm_arch_vcpu_blocking(), ensuring that a following kvm_vcpu_check_block()
will observe the latest value of PMR.

This has been found by booting an arm64 Linux guest with the pseudo NMI
feature, and thus using interrupt priorities to mask interrupts instead
of the usual PSTATE masking.

Cc: stable@vger.kernel.org # 4.12
Fixes: 328e5664 ("KVM: arm/arm64: vgic: Defer touching GICH_VMCR to vcpu_load/put")
Signed-off-by: Marc Zyngier <maz@kernel.org>

5eeaf10e

x86: kvm: remove useless calls to kvm_para_available · 57b76bdb

Paolo Bonzini authored Jul 25, 2019

Most code in arch/x86/kernel/kvm.c is called through x86_hyper_kvm, and thus only
runs if KVM has been detected.  There is no need to check again for the CPUID
base.

Cc: Sergio Lopez <slp@redhat.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

57b76bdb

KVM: no need to check return value of debugfs_create functions · 3e7093d0

Greg KH authored Jul 31, 2019

When calling debugfs functions, there is no need to ever check the
return value.  The function can work or not, but the code logic should
never do something different based on this.

Also, when doing this, change kvm_arch_create_vcpu_debugfs() to return
void instead of an integer, as we should not care at all about if this
function actually does anything or not.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <x86@kernel.org>
Cc: <kvm@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

3e7093d0

KVM: remove kvm_arch_has_vcpu_debugfs() · 741cbbae

Paolo Bonzini authored Aug 03, 2019

There is no need for this function as all arches have to implement
kvm_arch_create_vcpu_debugfs() no matter what.  A #define symbol
let us actually simplify the code.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

741cbbae

KVM: Fix leak vCPU's VMCS value into other pCPU · 17e433b5

Wanpeng Li authored Aug 05, 2019

After commit d73eb57b (KVM: Boost vCPUs that are delivering interrupts), a
five years old bug is exposed. Running ebizzy benchmark in three 80 vCPUs VMs
on one 80 pCPUs Skylake server, a lot of rcu_sched stall warning splatting
in the VMs after stress testing:

 INFO: rcu_sched detected stalls on CPUs/tasks: { 4 41 57 62 77} (detected by 15, t=60004 jiffies, g=899, c=898, q=15073)
 Call Trace:
   flush_tlb_mm_range+0x68/0x140
   tlb_flush_mmu.part.75+0x37/0xe0
   tlb_finish_mmu+0x55/0x60
   zap_page_range+0x142/0x190
   SyS_madvise+0x3cd/0x9c0
   system_call_fastpath+0x1c/0x21

swait_active() sustains to be true before finish_swait() is called in
kvm_vcpu_block(), voluntarily preempted vCPUs are taken into account
by kvm_vcpu_on_spin() loop greatly increases the probability condition
kvm_arch_vcpu_runnable(vcpu) is checked and can be true, when APICv
is enabled the yield-candidate vCPU's VMCS RVI field leaks(by
vmx_sync_pir_to_irr()) into spinning-on-a-taken-lock vCPU's current
VMCS.

This patch fixes it by checking conservatively a subset of events.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Marc Zyngier <Marc.Zyngier@arm.com>
Cc: stable@vger.kernel.org
Fixes: 98f4a146 (KVM: add kvm_arch_vcpu_runnable() test to kvm_vcpu_on_spin() loop)
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

17e433b5

KVM: Check preempted_in_kernel for involuntary preemption · 046ddeed

Wanpeng Li authored Aug 01, 2019

preempted_in_kernel is updated in preempt_notifier when involuntary preemption
ocurrs, it can be stale when the voluntarily preempted vCPUs are taken into
account by kvm_vcpu_on_spin() loop. This patch lets it just check preempted_in_kernel
for involuntary preemption.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

046ddeed

KVM: LAPIC: Don't need to wakeup vCPU twice afer timer fire · a48d06f9

Wanpeng Li authored Aug 01, 2019

kvm_set_pending_timer() will take care to wake up the sleeping vCPU which
has pending timer, don't need to check this in apic_timer_expired() again.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

a48d06f9

29 Jul, 2019 1 commit

arm64: KVM: hyp: debug-sr: Mark expected switch fall-through · cdb2d3ee

Anders Roxell authored Jul 26, 2019

When fall-through warnings was enabled by default the following warnings
was starting to show up:

../arch/arm64/kvm/hyp/debug-sr.c: In function ‘__debug_save_state’:
../arch/arm64/kvm/hyp/debug-sr.c:20:19: warning: this statement may fall
 through [-Wimplicit-fallthrough=]
  case 15: ptr[15] = read_debug(reg, 15);   \
../arch/arm64/kvm/hyp/debug-sr.c:113:2: note: in expansion of macro ‘save_debug’
  save_debug(dbg->dbg_bcr, dbgbcr, brps);
  ^~~~~~~~~~
../arch/arm64/kvm/hyp/debug-sr.c:21:2: note: here
  case 14: ptr[14] = read_debug(reg, 14);   \
  ^~~~
../arch/arm64/kvm/hyp/debug-sr.c:113:2: note: in expansion of macro ‘save_debug’
  save_debug(dbg->dbg_bcr, dbgbcr, brps);
  ^~~~~~~~~~
../arch/arm64/kvm/hyp/debug-sr.c:21:19: warning: this statement may fall
 through [-Wimplicit-fallthrough=]
  case 14: ptr[14] = read_debug(reg, 14);   \
../arch/arm64/kvm/hyp/debug-sr.c:113:2: note: in expansion of macro ‘save_debug’
  save_debug(dbg->dbg_bcr, dbgbcr, brps);
  ^~~~~~~~~~
../arch/arm64/kvm/hyp/debug-sr.c:22:2: note: here
  case 13: ptr[13] = read_debug(reg, 13);   \
  ^~~~
../arch/arm64/kvm/hyp/debug-sr.c:113:2: note: in expansion of macro ‘save_debug’
  save_debug(dbg->dbg_bcr, dbgbcr, brps);
  ^~~~~~~~~~

Rework to add a 'Fall through' comment where the compiler warned
about fall-through, hence silencing the warning.

Fixes: d93512ef0f0e ("Makefile: Globally enable fall-through warning")
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
[maz: fixed commit message]
Signed-off-by: Marc Zyngier <maz@kernel.org>

cdb2d3ee

26 Jul, 2019 3 commits

KVM: arm64: Update kvm_arm_exception_class and esr_class_str for new EC · 6701c619

Zenghui Yu authored Jul 13, 2019

We've added two ESR exception classes for new ARM hardware extensions:
ESR_ELx_EC_PAC and ESR_ELx_EC_SVE, but failed to update the strings
used in tracing and other debug.

Let's update "kvm_arm_exception_class" for these two EC, which the
new EC will be visible to user-space via kvm_exit trace events
Also update to "esr_class_str" for ESR_ELx_EC_PAC, by which we can
get more readable debug info.

Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>

6701c619

KVM: arm: vgic-v3: Mark expected switch fall-through · 1a8248c7

Anders Roxell authored Jul 26, 2019

When fall-through warnings was enabled by default the following warnings
was starting to show up:

../virt/kvm/arm/hyp/vgic-v3-sr.c: In function ‘__vgic_v3_save_aprs’:
../virt/kvm/arm/hyp/vgic-v3-sr.c:351:24: warning: this statement may fall
 through [-Wimplicit-fallthrough=]
   cpu_if->vgic_ap0r[2] = __vgic_v3_read_ap0rn(2);
   ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~
../virt/kvm/arm/hyp/vgic-v3-sr.c:352:2: note: here
  case 6:
  ^~~~
../virt/kvm/arm/hyp/vgic-v3-sr.c:353:24: warning: this statement may fall
 through [-Wimplicit-fallthrough=]
   cpu_if->vgic_ap0r[1] = __vgic_v3_read_ap0rn(1);
   ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~
../virt/kvm/arm/hyp/vgic-v3-sr.c:354:2: note: here
  default:
  ^~~~~~~

Rework so that the compiler doesn't warn about fall-through.

Fixes: d93512ef0f0e ("Makefile: Globally enable fall-through warning")
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>

1a8248c7

arm64: KVM: regmap: Fix unexpected switch fall-through · 3d584a3c

Anders Roxell authored Jul 26, 2019

When fall-through warnings was enabled by default, commit d93512ef0f0e
("Makefile: Globally enable fall-through warning"), the following
warnings was starting to show up:

In file included from ../arch/arm64/include/asm/kvm_emulate.h:19,
                 from ../arch/arm64/kvm/regmap.c:13:
../arch/arm64/kvm/regmap.c: In function ‘vcpu_write_spsr32’:
../arch/arm64/include/asm/kvm_hyp.h:31:3: warning: this statement may fall
 through [-Wimplicit-fallthrough=]
   asm volatile(ALTERNATIVE(__msr_s(r##nvh, "%x0"), \
   ^~~
../arch/arm64/include/asm/kvm_hyp.h:46:31: note: in expansion of macro ‘write_sysreg_elx’
 #define write_sysreg_el1(v,r) write_sysreg_elx(v, r, _EL1, _EL12)
                               ^~~~~~~~~~~~~~~~
../arch/arm64/kvm/regmap.c:180:3: note: in expansion of macro ‘write_sysreg_el1’
   write_sysreg_el1(v, SYS_SPSR);
   ^~~~~~~~~~~~~~~~
../arch/arm64/kvm/regmap.c:181:2: note: here
  case KVM_SPSR_ABT:
  ^~~~
In file included from ../arch/arm64/include/asm/cputype.h:132,
                 from ../arch/arm64/include/asm/cache.h:8,
                 from ../include/linux/cache.h:6,
                 from ../include/linux/printk.h:9,
                 from ../include/linux/kernel.h:15,
                 from ../include/asm-generic/bug.h:18,
                 from ../arch/arm64/include/asm/bug.h:26,
                 from ../include/linux/bug.h:5,
                 from ../include/linux/mmdebug.h:5,
                 from ../include/linux/mm.h:9,
                 from ../arch/arm64/kvm/regmap.c:11:
../arch/arm64/include/asm/sysreg.h:837:2: warning: this statement may fall
 through [-Wimplicit-fallthrough=]
  asm volatile("msr " __stringify(r) ", %x0"  \
  ^~~
../arch/arm64/kvm/regmap.c:182:3: note: in expansion of macro ‘write_sysreg’
   write_sysreg(v, spsr_abt);
   ^~~~~~~~~~~~
../arch/arm64/kvm/regmap.c:183:2: note: here
  case KVM_SPSR_UND:
  ^~~~

Rework to add a 'break;' in the swich-case since it didn't have that,
leading to an interresting set of bugs.

Cc: stable@vger.kernel.org # v4.17+
Fixes: a8928195 ("KVM: arm64: Prepare to handle deferred save/restore of 32-bit registers")
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
[maz: reworked commit message, fixed stable range]
Signed-off-by: Marc Zyngier <maz@kernel.org>

3d584a3c

24 Jul, 2019 2 commits

KVM: X86: Boost queue head vCPU to mitigate lock waiter preemption · 266e85a5

Wanpeng Li authored Jul 24, 2019

Commit 11752adb (locking/pvqspinlock: Implement hybrid PV queued/unfair locks)
introduces hybrid PV queued/unfair locks
 - queued mode (no starvation)
 - unfair mode (good performance on not heavily contended lock)
The lock waiter goes into the unfair mode especially in VMs with over-commit
vCPUs since increaing over-commitment increase the likehood that the queue
head vCPU may have been preempted and not actively spinning.

However, reschedule queue head vCPU timely to acquire the lock still can get
better performance than just depending on lock stealing in over-subscribe
scenario.

Testing on 80 HT 2 socket Xeon Skylake server, with 80 vCPUs VM 80GB RAM:
ebizzy -M
             vanilla     boosting    improved
 1VM          23520        25040         6%
 2VM           8000        13600        70%
 3VM           3100         5400        74%

The lock holder vCPU yields to the queue head vCPU when unlock, to boost queue
head vCPU which is involuntary preemption or the one which is voluntary halt
due to fail to acquire the lock after a short spin in the guest.

Cc: Waiman Long <longman@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

266e85a5

Documentation: move Documentation/virtual to Documentation/virt · 2f5947df

Christoph Hellwig authored Jul 24, 2019

Renaming docs seems to be en vogue at the moment, so fix on of the
grossly misnamed directories.  We usually never use "virtual" as
a shortcut for virtualization in the kernel, but always virt,
as seen in the virt/ top-level directory.  Fix up the documentation
to match that.

Fixes: ed16648e ("Move kvm, uml, and lguest subdirectories under a common "virtual" directory, I.E:")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

2f5947df

23 Jul, 2019 1 commit

KVM: arm/arm64: Introduce kvm_pmu_vcpu_init() to setup PMU counter index · bca031e2

Zenghui Yu authored Jul 18, 2019

We use "pmc->idx" and the "chained" bitmap to determine if the pmc is
chained, in kvm_pmu_pmc_is_chained().  But idx might be uninitialized
(and random) when we doing this decision, through a KVM_ARM_VCPU_INIT
ioctl -> kvm_pmu_vcpu_reset(). And the test_bit() against this random
idx will potentially hit a KASAN BUG [1].

In general, idx is the static property of a PMU counter that is not
expected to be modified across resets, as suggested by Julien.  It
looks more reasonable if we can setup the PMU counter idx for a vcpu
in its creation time. Introduce a new function - kvm_pmu_vcpu_init()
for this basic setup. Oh, and the KASAN BUG will get fixed this way.

[1] https://www.spinics.net/lists/kvm-arm/msg36700.html

Fixes: 80f393a2 ("KVM: arm/arm64: Support chained PMU counters")
Suggested-by: Andrew Murray <andrew.murray@arm.com>
Suggested-by: Julien Thierry <julien.thierry@arm.com>
Acked-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

bca031e2