Commits · c258b62b264fdc469b6d3610a907708068145e3b · nexedi / linux

05 Aug, 2015 7 commits

KVM: MMU: introduce the framework to check zero bits on sptes · c258b62b

Xiao Guangrong authored Aug 05, 2015

We have abstracted the data struct and functions which are used to check
reserved bit on guest page tables, now we extend the logic to check
zero bits on shadow page tables

The zero bits on sptes include not only reserved bits on hardware but also
the bits that SPTEs willnever use.  For example, shadow pages will never
use GB pages unless the guest uses them too.
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

c258b62b

KVM: MMU: split reset_rsvds_bits_mask_ept · 81b8eebb

Xiao Guangrong authored Aug 05, 2015

Since shadow ept page tables and Intel nested guest page tables have the
same format, split reset_rsvds_bits_mask_ept so that the logic can be
reused by later patches which check zero bits on sptes
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

81b8eebb

KVM: MMU: split reset_rsvds_bits_mask · 6dc98b86

Xiao Guangrong authored Aug 05, 2015

Since softmmu & AMD nested shadow page tables and guest page tables have
the same format, split reset_rsvds_bits_mask so that the logic can be
reused by later patches which check zero bits on sptes
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

6dc98b86

KVM: MMU: introduce rsvd_bits_validate · a0a64f50

Xiao Guangrong authored Aug 05, 2015

These two fields, rsvd_bits_mask and bad_mt_xwr, in "struct kvm_mmu" are
used to check if reserved bits set on guest ptes, move them to a data
struct so that the approach can be applied to check host shadow page
table entries as well
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

a0a64f50

KVM: MMU: move FNAME(is_rsvd_bits_set) to mmu.c · d2b0f981

Xiao Guangrong authored Aug 05, 2015

FNAME(is_rsvd_bits_set) does not depend on guest mmu mode, move it
to mmu.c to stop being compiled multiple times
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

d2b0f981

KVM: MMU: fix validation of mmio page fault · 6f691251

Xiao Guangrong authored Aug 05, 2015

We got the bug that qemu complained with "KVM: unknown exit, hardware
reason 31" and KVM shown these info:
[84245.284948] EPT: Misconfiguration.
[84245.285056] EPT: GPA: 0xfeda848
[84245.285154] ept_misconfig_inspect_spte: spte 0x5eaef50107 level 4
[84245.285344] ept_misconfig_inspect_spte: spte 0x5f5fadc107 level 3
[84245.285532] ept_misconfig_inspect_spte: spte 0x5141d18107 level 2
[84245.285723] ept_misconfig_inspect_spte: spte 0x52e40dad77 level 1

This is because we got a mmio #PF and the handler see the mmio spte becomes
normal (points to the ram page)

However, this is valid after introducing fast mmio spte invalidation which
increases the generation-number instead of zapping mmio sptes, a example
is as follows:
1. QEMU drops mmio region by adding a new memslot
2. invalidate all mmio sptes
3.

        VCPU 0                        VCPU 1
    access the invalid mmio spte
                            access the region originally was MMIO before
                            set the spte to the normal ram map

    mmio #PF
    check the spte and see it becomes normal ram mapping !!!

This patch fixes the bug just by dropping the check in mmio handler, it's
good for backport. Full check will be introduced in later patches
Reported-by: Pavel Shirshov <ru.pchel@gmail.com>
Tested-by: Pavel Shirshov <ru.pchel@gmail.com>
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

6f691251

KVM: MTRR: Use default type for non-MTRR-covered gfn before WARN_ON · 9c33ae0c

Alex Williamson authored Aug 04, 2015

The patch was munged on commit to re-order these tests resulting in
excessive warnings when trying to do device assignment.  Return to
original ordering: https://lkml.org/lkml/2015/7/15/769

Fixes: 3e5d2fdc ("KVM: MTRR: simplify kvm_mtrr_get_guest_memory_type")
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

9c33ae0c

30 Jul, 2015 2 commits

KVM: x86: clean/fix memory barriers in irqchip_in_kernel · 71ba994c

Paolo Bonzini authored Jul 29, 2015

The memory barriers are trying to protect against concurrent RCU-based
interrupt injection, but the IRQ routing table is not valid at the time
kvm->arch.vpic is written. Fix this by writing kvm->arch.vpic last.
kvm_destroy_pic then need not set kvm->arch.vpic to NULL; modify it
to take a struct kvm_pic* and reuse it if the IOAPIC creation fails.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

71ba994c

KVM: document memory barriers for kvm->vcpus/kvm->online_vcpus · dd489240
Paolo Bonzini authored Jul 29, 2015
```
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
```
dd489240

29 Jul, 2015 19 commits

KVM: x86: remove unnecessary memory barriers for shared MSRs · c847fe88

Paolo Bonzini authored Jul 29, 2015

There is no smp_rmb matching the smp_wmb. shared_msr_update is called from
hardware_enable, which in turn is called via on_each_cpu. on_each_cpu
and must imply a read memory barrier (on x86 the rmb is achieved simply
through asm volatile in native_apic_mem_write).
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

c847fe88

KVM: move code related to KVM_SET_BOOT_CPU_ID to x86 · d71ba788
Paolo Bonzini authored Jul 29, 2015
```
This is another remnant of ia64 support.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
```
d71ba788

Merge tag 'kvm-s390-next-20150728' of... · 554726d3

Paolo Bonzini authored Jul 29, 2015

Merge tag 'kvm-s390-next-20150728' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-next

KVM: s390: Fixes and features for kvm/next (4.3)

1. Rework logging infrastructure (s390dbf) to integrate feedback learned
when debugging performance and test issues
2. Some cleanups and simplifications for CMMA handling
3. Fix gdb debugging and single stepping on some instructions
4. Error handling for storage key setup

554726d3

KVM: s390: log capability enablement and vm attribute changes · c92ea7b9

Christian Borntraeger authored Jul 22, 2015

Depending on user space, some capabilities and vm attributes are
enabled at runtime. Let's log those events and while we're at it,
log querying the vm attributes as well.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

c92ea7b9

KVM: s390: Provide global debug log · 78f26131

Christian Borntraeger authored Jul 22, 2015

In addition to the per VM debug logs, let's provide a global
one for KVM-wide events, like new guests or fatal errors.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>

78f26131

KVM: s390: adapt debug entries for instruction handling · 7cbde76b

Christian Borntraeger authored Jul 21, 2015

Use the default log level 3 for state changing and/or seldom events,
use 4 for others. Also change some numbers from %x to %d and vice versa
to match documentation. If hex, let's prepend the numbers with 0x.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>

7cbde76b

KVM: s390: improve debug feature usage · 1cb9cf72

Christian Borntraeger authored Jul 20, 2015

We do not use the exception logger, so the 2nd area is unused.
Just have one area that is bigger (32 pages).
At the same time we can limit the debug feature size to 7
longs, as the largest user has 3 parameters + string + boiler
plate (vCPU, PSW mask, PSW addr)
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>

1cb9cf72

KVM: s390: remove outdated documentation · f2d2eb9d

Christian Borntraeger authored Jul 20, 2015

The old Documentation/s390/kvm.txt file is either
outdated or described in Documentation/virtual/kvm/api.txt.
Let's get rid of it.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>

f2d2eb9d

KVM: s390: more irq names for trace events · a37281b6

David Hildenbrand authored Nov 21, 2014

This patch adds names for missing irq types to the trace events.
In order to identify adapter irqs, the define is moved from
interrupt.c to the other basic irq defines in uapi/linux/kvm.h.
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

a37281b6

KVM: s390: Fixup interrupt vcpu event messages and levels · 3f24ba15

Christian Borntraeger authored Jul 09, 2015

This reworks the debug logging for interrupt related logs.
Several changes:
- unify program int/irq
- improve decoding (e.g. use mcic instead of parm64 for machine
  check injection)
- remove useless interrupt type number (the name is enough)
- rename "interrupt:" to "deliver:" as the other side is called "inject"
- use log level 3 for state changing and/or seldom events (like machine
  checks, restart..)
- use log level 4 for frequent events
- use 0x prefix for hex numbers
- add pfault done logging
- move some tracing outside spinlock
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Jens Freimann <jfrei@linux.vnet.ibm.com>

3f24ba15

KVM: s390: add more debug data for the pfault diagnoses · ab7090a6

Christian Borntraeger authored Jul 16, 2015

We're not only interested in the address of the control block, but
also in the requested subcommand and for the token subcommand, in the
specified token address and masks.
Suggested-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>

ab7090a6

KVM: s390: remove "from (user|kernel)" from irq injection messages · ed2afcfa

David Hildenbrand authored Jul 20, 2015

The "from user"/"from kernel" part of the log/trace messages is not
always correct anymore and therefore not really helpful.

Let's remove that part from the log + trace messages. For program
interrupts, we can now move the logging/tracing part into the real
injection function, as already done for the other injection functions.
Reviewed-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

ed2afcfa

KVM: s390: VCPU_EVENT cleanup for prefix changes · 71db35d2

Christian Borntraeger authored Jul 10, 2015

SPX (SET PREFIX)  and SIGP (Set prefix) can change the prefix
register of a CPU. As sigp set prefix may be handled in user
space (KVM_CAP_S390_USER_SIGP), we would not log the changes
triggered via SIGP in that case. Let's have just one VCPU_EVENT
at the central location that tracks prefix changes.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>

71db35d2

KVM: s390: Improve vcpu event debugging for diagnoses · 15e8b5da

Christian Borntraeger authored Jul 09, 2015

Let's add a vcpu event for the page reference handling and change
the default debugging level for the ipl diagnose. Both are not
frequent AND change the global state, so lets log them always.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

15e8b5da

KVM: s390: add kvm stat counter for all diagnoses · 175a5c9e

Christian Borntraeger authored Jul 07, 2015

Sometimes kvm stat counters are the only performance metric to check
after something went wrong. Let's add additional counters for some
diagnoses.

In addition do the count for diag 10 all the time, even if we inject
a program interrupt.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Jens Freimann <jfrei@linux.vnet.ibm.com>

175a5c9e

KVM: s390: only reset CMMA state if it was enabled before · c3489155

Dominik Dingel authored Jun 18, 2015

There is no point in resetting the CMMA state if it was never enabled.
Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

c3489155

KVM: s390: clean up cmma_enable check · e6db1d61

Dominik Dingel authored May 07, 2015

As we already only enable CMMA when userspace requests it, we can
safely move the additional checks to the request handler and avoid
doing them multiple times. This also tells userspace if CMMA is
available.
Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

e6db1d61

KVM: s390: filter space-switch events when PER is enforced · 0df30abc

David Hildenbrand authored Jun 23, 2015

When guest debugging is active, space-switch events might be enforced
by PER. While the PER events are correctly filtered out,
space-switch-events could be forwarded to the guest, although from a
guest point of view, they should not have been reported.

Therefore we have to filter out space-switch events being concurrently
reported with a PER event, if the PER event got filtered out. To do so,
we theoretically have to know which instruction was responsible for the
event. As the applicable instructions modify the PSW address, the
address space set in the PSW and even the address space in cr1, we
can't figure out the instruction that way.

For this reason, we have to rely on the information about the old and
new address space, in order to guess the responsible instruction type
and do appropriate checks for space-switch events.
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

0df30abc

KVM: s390: propagate error from enable storage key · 14d4a425

Dominik Dingel authored May 07, 2015

As enabling storage keys might fail, we should forward the error.
Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

14d4a425

23 Jul, 2015 12 commits

KVM: svm: handle KVM_X86_QUIRK_CD_NW_CLEARED in svm_get_mt_mask · 54928303

Paolo Bonzini authored Jul 10, 2015

We can disable CD unconditionally when there is no assigned device.
KVM now forces guest PAT to all-writeback in that case, so it makes
sense to also force CR0.CD=0.

When there are assigned devices, emulate cache-disabled operation
through the page tables.  This behavior is consistent with VMX
microcode, where CD/NW are not touched by vmentry/vmexit.  However,
keep this dependent on the quirk because OVMF enables the caches
too late.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

54928303

kvm/x86: add support for MONITOR_TRAP_FLAG · 5f3d45e7

Mihai Donțu authored Jul 05, 2015

Allow a nested hypervisor to single step its guests.
Signed-off-by: Mihai Donțu <mihai.dontu@gmail.com>
[Fix overlong line. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

5f3d45e7

kvm/x86: add sending hyper-v crash notification to user space · 2ce79189

Andrey Smetanin authored Jul 03, 2015

Sending of notification is done by exiting vcpu to user space
if KVM_REQ_HV_CRASH is enabled for vcpu. At exit to user space
the kvm_run structure contains system_event with type
KVM_SYSTEM_EVENT_CRASH to notify about guest crash occurred.
Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Peter Hornyack <peterhornyack@google.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Gleb Natapov <gleb@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

2ce79189

kvm/x86: added hyper-v crash msrs into kvm hyperv context · e7d9513b

Andrey Smetanin authored Jul 03, 2015

Added kvm Hyper-V context hv crash variables as storage
of Hyper-V crash msrs.
Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Peter Hornyack <peterhornyack@google.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Gleb Natapov <gleb@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

e7d9513b

kvm: introduce vcpu_debug = kvm_debug + vcpu context · ee86dbc6

Andrey Smetanin authored Jul 03, 2015

vcpu_debug is useful macro like kvm_debug but additionally
includes vcpu context inside output.
Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Peter Hornyack <peterhornyack@google.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Gleb Natapov <gleb@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

ee86dbc6

kvm/x86: move Hyper-V MSR's/hypercall code into hyperv.c file · e83d5887

Andrey Smetanin authored Jul 03, 2015

This patch introduce Hyper-V related source code file - hyperv.c and
per vm and per vcpu hyperv context structures.
All Hyper-V MSR's and hypercall code moved into hyperv.c.
All Hyper-V kvm/vcpu fields moved into appropriate hyperv context
structures. Copyrights and authors information copied from x86.c
to hyperv.c.
Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Peter Hornyack <peterhornyack@google.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Gleb Natapov <gleb@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

e83d5887

KVM: nVMX: VMX instructions: add checks for #GP/#SS exceptions · f9eb4af6

Eugene Korenevsky authored Apr 17, 2015

According to Intel SDM several checks must be applied for memory operands
of VMX instructions.

Long mode: #GP(0) or #SS(0) depending on the segment must be thrown
if the memory address is in a non-canonical form.

Protected mode, checks in chronological order:
- The segment type must be checked with access type (read or write) taken
into account.
	For write access: #GP(0) must be generated if the destination operand
		is located in a read-only data segment or any code segment.
	For read access: #GP(0) must be generated if if the source operand is
		located in an execute-only code segment.
- Usability of the segment must be checked. #GP(0) or #SS(0) depending on the
	segment must be thrown if the segment is unusable.
- Limit check. #GP(0) or #SS(0) depending on the segment must be
	thrown if the memory operand effective address is outside the segment
	limit.
Signed-off-by: Eugene Korenevsky <ekorenevsky@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

f9eb4af6

KVM: x86: rename quirk constants to KVM_X86_QUIRK_* · 0da029ed

Paolo Bonzini authored Jul 23, 2015

Make them clearly architecture-dependent; the capability is valid for
all architectures, but the argument is not.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

0da029ed

KVM: vmx: obey KVM_QUIRK_CD_NW_CLEARED · fb279950

Xiao Guangrong authored Jul 16, 2015

OVMF depends on WB to boot fast, because it only clears caches after
it has set up MTRRs---which is too late.

Let's do writeback if CR0.CD is set to make it happy, similar to what
SVM is already doing.
Signed-off-by: Xiao Guangrong <guangrong.xiao@intel.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

fb279950

KVM: x86: introduce kvm_check_has_quirk · 41dbc6bc

Paolo Bonzini authored Jul 23, 2015

The logic of the disabled_quirks field usually results in a double
negation.  Wrap it in a simple function that checks the bit and
negates it.

Based on a patch from Xiao Guangrong.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

41dbc6bc

KVM: MTRR: simplify kvm_mtrr_get_guest_memory_type · 3e5d2fdc

Xiao Guangrong authored Jul 16, 2015

kvm_mtrr_get_guest_memory_type never returns -1 which is implied
in the current code since if @type = -1 (means no MTRR contains the
range), iter.partial_map must be true

Simplify the code to indicate this fact
Signed-off-by: Xiao Guangrong <guangrong.xiao@intel.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

3e5d2fdc

KVM: MTRR: fix memory type handling if MTRR is completely disabled · 10dc331f

Xiao Guangrong authored Jul 16, 2015

Currently code uses default memory type if MTRR is fully disabled,
fix it by using UC instead.
Signed-off-by: Xiao Guangrong <guangrong.xiao@intel.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

10dc331f