- 05 Sep, 2017 5 commits
-
-
Christoffer Dall authored
When migrating guests around we need to know the active priorities to ensure functional virtual interrupt prioritization by the GIC. This commit clarifies the API and how active priorities of interrupts in different groups are represented, and implements the accessor functions for the uaccess register range. We live with a slight layering violation in accessing GICv3 data structures from vgic-mmio-v2.c, because anything else just adds too much complexity for us to deal with (it's not like there's a benefit elsewhere in the code of an intermediate representation as is the case with the VMCR). We accept this, because while doing v3 processing from a file named something-v2.c can look strange at first, this really is specific to dealing with the user space interface for something that looks like a GICv2. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@linaro.org>
-
Christoffer Dall authored
As we are about to access the APRs from the GICv2 uaccess interface, make this logic generally available. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@linaro.org>
-
Marc Zyngier authored
For unknown reasons, the its_ite data structure carries an "lpi" field which contains the intid of the LPI. This is an obvious duplication of the vgic_irq->intid field, so let's fix the only user and remove the now useless field. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Christoffer Dall <cdall@linaro.org>
-
Arvind Yadav authored
vgic_debug_seq_ops and file_operations are not supposed to change at runtime and none of the structures is modified. Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: Christoffer Dall <cdall@linaro.org>
-
James Morse authored
The ARM-ARM has two bits in the ESR/HSR relevant to external aborts. A range of {I,D}FSC values (of which bit 5 is always set) and bit 9 'EA' which provides: > an IMPLEMENTATION DEFINED classification of External Aborts. This bit is in addition to the {I,D}FSC range, and has an implementation defined meaning. KVM should always ignore this bit when handling external aborts from a guest. Remove the ESR_ELx_EA definition and rewrite its helper kvm_vcpu_dabt_isextabt() to check the {I,D}FSC range. This merges kvm_vcpu_dabt_isextabt() and the recently added is_abort_sea() helper. CC: Tyler Baicar <tbaicar@codeaurora.org> Reported-by: gengdongjiu <gengdj.1984@gmail.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@linaro.org>
-
- 25 Aug, 2017 1 commit
-
-
Jim Mattson authored
According to the SDM, if the "use TPR shadow" VM-execution control is 1, bits 11:0 of the virtual-APIC address must be 0 and the address should set any bits beyond the processor's physical-address width. Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
- 24 Aug, 2017 13 commits
-
-
Wanpeng Li authored
------------[ cut here ]------------ WARNING: CPU: 7 PID: 3861 at /home/kernel/ssd/kvm/arch/x86/kvm//vmx.c:11299 nested_vmx_vmexit+0x176e/0x1980 [kvm_intel] CPU: 7 PID: 3861 Comm: qemu-system-x86 Tainted: G W OE 4.13.0-rc4+ #11 RIP: 0010:nested_vmx_vmexit+0x176e/0x1980 [kvm_intel] Call Trace: ? kvm_multiple_exception+0x149/0x170 [kvm] ? handle_emulation_failure+0x79/0x230 [kvm] ? load_vmcs12_host_state+0xa80/0xa80 [kvm_intel] ? check_chain_key+0x137/0x1e0 ? reexecute_instruction.part.168+0x130/0x130 [kvm] nested_vmx_inject_exception_vmexit+0xb7/0x100 [kvm_intel] ? nested_vmx_inject_exception_vmexit+0xb7/0x100 [kvm_intel] vmx_queue_exception+0x197/0x300 [kvm_intel] kvm_arch_vcpu_ioctl_run+0x1b0c/0x2c90 [kvm] ? kvm_arch_vcpu_runnable+0x220/0x220 [kvm] ? preempt_count_sub+0x18/0xc0 ? restart_apic_timer+0x17d/0x300 [kvm] ? kvm_lapic_restart_hv_timer+0x37/0x50 [kvm] ? kvm_arch_vcpu_load+0x1d8/0x350 [kvm] kvm_vcpu_ioctl+0x4e4/0x910 [kvm] ? kvm_vcpu_ioctl+0x4e4/0x910 [kvm] ? kvm_dev_ioctl+0xbe0/0xbe0 [kvm] The flag "nested_run_pending", which can override the decision of which should run next, L1 or L2. nested_run_pending=1 means that we *must* run L2 next, not L1. This is necessary in particular when L1 did a VMLAUNCH of L2 and therefore expects L2 to be run (and perhaps be injected with an event it specified, etc.). Nested_run_pending is especially intended to avoid switching to L1 in the injection decision-point. This can be handled just like the other cases in vmx_check_nested_events, instead of having a special case in vmx_queue_exception. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Wanpeng Li authored
vmx_complete_interrupts() assumes that the exception is always injected, so it can be dropped by kvm_clear_exception_queue(). However, an exception cannot be injected immediately if it is: 1) originally destined to a nested guest; 2) trapped to cause a vmexit; 3) happening right after VMLAUNCH/VMRESUME, i.e. when nested_run_pending is true. This patch applies to exceptions the same algorithm that is used for NMIs, replacing exception.reinject with "exception.injected" (equivalent to nmi_injected). exception.pending now represents an exception that is queued and whose side effects (e.g., update RFLAGS.RF or DR7) have not been applied yet. If exception.pending is true, the exception might result in a nested vmexit instead, too (in which case the side effects must not be applied). exception.injected instead represents an exception that is going to be injected into the guest at the next vmentry. Reported-by: Radim Krčmář <rkrcmar@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Wanpeng Li authored
Use kvm_event_needs_reinjection() encapsulation. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
update_permission_bitmask currently does a 128-iteration loop to, essentially, compute a constant array. Computing the 8 bits in parallel reduces it to 16 iterations, and is enough to speed it up substantially because many boolean operations in the inner loop become constants or simplify noticeably. Because update_permission_bitmask is actually the top item in the profile for nested vmexits, this speeds up an L2->L1 vmexit by about ten thousand clock cycles, or up to 30%: before after cpuid 35173 25954 vmcall 35122 27079 inl_from_pmtimer 52635 42675 inl_from_qemu 53604 44599 inl_from_kernel 38498 30798 outl_to_kernel 34508 28816 wr_tsc_adjust_msr 34185 26818 rd_tsc_adjust_msr 37409 27049 mmio-no-eventfd:pci-mem 50563 45276 mmio-wildcard-eventfd:pci-mem 34495 30823 mmio-datamatch-eventfd:pci-mem 35612 31071 portio-no-eventfd:pci-io 44925 40661 portio-wildcard-eventfd:pci-io 29708 27269 portio-datamatch-eventfd:pci-io 31135 27164 (I wrote a small C program to compare the tables for all values of CR0.WP, CR4.SMAP and CR4.SMEP, and they match). Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Yu Zhang authored
This patch exposes 5 level page table feature to the VM. At the same time, the canonical virtual address checking is extended to support both 48-bits and 57-bits address width. Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Yu Zhang authored
Extends the shadow paging code, so that 5 level shadow page table can be constructed if VM is running in 5 level paging mode. Also extends the ept code, so that 5 level ept table can be constructed if maxphysaddr of VM exceeds 48 bits. Unlike the shadow logic, KVM should still use 4 level ept table for a VM whose physical address width is less than 48 bits, even when the VM is running in 5 level paging mode. Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> [Unconditionally reset the MMU context in kvm_cpuid_update. Changing MAXPHYADDR invalidates the reserved bit bitmasks. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Yu Zhang authored
Now we have 4 level page table and 5 level page table in 64 bits long mode, let's rename the PT64_ROOT_LEVEL to PT64_ROOT_4LEVEL, then we can use PT64_ROOT_5LEVEL for 5 level page table, it's helpful to make the code more clear. Also PT64_ROOT_MAX_LEVEL is defined as 4, so that we can just redefine it to 5 whenever a replacement is needed for 5 level paging. Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Yu Zhang authored
Currently, KVM uses CR3_L_MODE_RESERVED_BITS to check the reserved bits in CR3. Yet the length of reserved bits in guest CR3 should be based on the physical address width exposed to the VM. This patch changes CR3 check logic to calculate the reserved bits at runtime. Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Yu Zhang authored
Return false in kvm_cpuid() when it fails to find the cpuid entry. Also, this routine(and its caller) is optimized with a new argument - check_limit, so that the check_cpuid_limit() fall back can be avoided. Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
A guest may not be configured to support XSAVES/XRSTORS, even when the host does. If the guest does not support XSAVES/XRSTORS, clear the secondary execution control so that the processor will raise #UD. Also clear the "allowed-1" bit for XSAVES/XRSTORS exiting in the IA32_VMX_PROCBASED_CTLS2 MSR, and pass through VMCS12's control in the VMCS02. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Jim Mattson authored
A guest may not be configured to support RDSEED, even when the host does. If the guest does not support RDSEED, intercept the instruction and synthesize #UD. Also clear the "allowed-1" bit for RDSEED exiting in the IA32_VMX_PROCBASED_CTLS2 MSR. Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Jim Mattson authored
A guest may not be configured to support RDRAND, even when the host does. If the guest does not support RDRAND, intercept the instruction and synthesize #UD. Also clear the "allowed-1" bit for RDRAND exiting in the IA32_VMX_PROCBASED_CTLS2 MSR. Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
Currently, secondary execution controls are divided in three groups: - static, depending mostly on the module arguments or the processor (vmx_secondary_exec_control) - static, depending on CPUID (vmx_cpuid_update) - dynamic, depending on nested VMX or local APIC state Because walking CPUID is expensive, prepare_vmcs02 is using only the first group. This however is unnecessarily complicated. Just cache the static secondary execution controls, and then prepare_vmcs02 does not need to compute them every time. Computation of all static secondary execution controls is now kept in a single function, vmx_compute_secondary_exec_control. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
- 23 Aug, 2017 2 commits
-
-
Janakarajan Natarajan authored
Enable the Virtual GIF feature. This is done by setting bit 25 at position 60h in the vmcb. With this feature enabled, the processor uses bit 9 at position 60h as the virtual GIF when executing STGI/CLGI instructions. Since the execution of STGI by the L1 hypervisor does not cause a return to the outermost (L0) hypervisor, the enable_irq_window and enable_nmi_window are modified. The IRQ window will be opened even if GIF is not set, under the assumption that on resuming the L1 hypervisor the IRQ will be held pending until the processor executes the STGI instruction. For the NMI window, the STGI intercept is set. This will assist in opening the window only when GIF=1. Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Janakarajan Natarajan authored
Add a new cpufeature definition for Virtual GIF. Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com> Reviewed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
- 18 Aug, 2017 6 commits
-
-
David Hildenbrand authored
We already always set that type but don't check if it is supported. Also for nVMX, we only support WB for now. Let's just require it. Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
-
David Hildenbrand authored
Don't use shifts, tag them correctly as EPTP and use better matching names (PWL vs. GAW). Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
-
Denys Vlasenko authored
With lightly tweaked defconfig: text data bss dec hex filename 11259661 5109408 2981888 19350957 12745ad vmlinux.before 11259661 5109408 884736 17253805 10745ad vmlinux.after Only compile-tested. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: pbonzini@redhat.com Cc: rkrcmar@redhat.com Cc: tglx@linutronix.de Cc: mingo@redhat.com Cc: hpa@zytor.com Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
-
Paolo Bonzini authored
There is currently some confusion between nested and L1 GPAs. The assignment to "direct" in kvm_mmu_page_fault tries to fix that, but it is not enough. What this patch does is fence off the MMIO cache completely when using shadow nested page tables, since we have neither a GVA nor an L1 GPA to put in the cache. This also allows some simplifications in kvm_mmu_page_fault and FNAME(page_fault). The EPT misconfig likewise does not have an L1 GPA to pass to kvm_io_bus_write, so that must be skipped for guest mode. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> [Changed comment to say "GPAs" instead of "L1's physical addresses", as per David's review. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
-
Brijesh Singh authored
When a guest causes a page fault which requires emulation, the vcpu->arch.gpa_available flag is set to indicate that cr2 contains a valid GPA. Currently, emulator_read_write_onepage() makes use of gpa_available flag to avoid a guest page walk for a known MMIO regions. Lets not limit the gpa_available optimization to just MMIO region. The patch extends the check to avoid page walk whenever gpa_available flag is set. Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> [Fix EPT=0 according to Wanpeng Li's fix, plus ensure VMX also uses the new code. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> [Moved "ret < 0" to the else brach, as per David's review. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
-
Paolo Bonzini authored
Calling handle_mmio_page_fault() has been unnecessary since commit e9ee956e ("KVM: x86: MMU: Move handle_mmio_page_fault() call to kvm_mmu_page_fault()", 2016-02-22). handle_mmio_page_fault() can now be made static. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
-
- 15 Aug, 2017 1 commit
-
-
Arnd Bergmann authored
When PAGE_OFFSET is not a compile-time constant, we run into warnings from the use of kvm_is_error_hva() that the compiler cannot optimize out: arch/arm/kvm/../../../virt/kvm/kvm_main.c: In function '__kvm_gfn_to_hva_cache_init': arch/arm/kvm/../../../virt/kvm/kvm_main.c:1978:14: error: 'nr_pages_avail' may be used uninitialized in this function [-Werror=maybe-uninitialized] arch/arm/kvm/../../../virt/kvm/kvm_main.c: In function 'gfn_to_page_many_atomic': arch/arm/kvm/../../../virt/kvm/kvm_main.c:1660:5: error: 'entry' may be used uninitialized in this function [-Werror=maybe-uninitialized] This adds fake initializations to the two instances I ran into. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
-
- 11 Aug, 2017 5 commits
-
-
Jim Mattson authored
Host-initiated writes to the IA32_APIC_BASE MSR do not have to follow local APIC state transition constraints, but the value written must be valid. Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Wanpeng Li authored
Bailing out immediately if there is no available mmu page to alloc. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Wanpeng Li authored
watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [warn_test:3089] irq event stamp: 20532 hardirqs last enabled at (20531): [<ffffffff8e9b6908>] restore_regs_and_iret+0x0/0x1d hardirqs last disabled at (20532): [<ffffffff8e9b7ae8>] apic_timer_interrupt+0x98/0xb0 softirqs last enabled at (8266): [<ffffffff8e9badc6>] __do_softirq+0x206/0x4c1 softirqs last disabled at (8253): [<ffffffff8e083918>] irq_exit+0xf8/0x100 CPU: 5 PID: 3089 Comm: warn_test Tainted: G OE 4.13.0-rc3+ #8 RIP: 0010:kvm_mmu_prepare_zap_page+0x72/0x4b0 [kvm] Call Trace: make_mmu_pages_available.isra.120+0x71/0xc0 [kvm] kvm_mmu_load+0x1cf/0x410 [kvm] kvm_arch_vcpu_ioctl_run+0x1316/0x1bf0 [kvm] kvm_vcpu_ioctl+0x340/0x700 [kvm] ? kvm_vcpu_ioctl+0x340/0x700 [kvm] ? __fget+0xfc/0x210 do_vfs_ioctl+0xa4/0x6a0 ? __fget+0x11d/0x210 SyS_ioctl+0x79/0x90 entry_SYSCALL_64_fastpath+0x23/0xc2 ? __this_cpu_preempt_check+0x13/0x20 This can be reproduced readily by ept=N and running syzkaller tests since many syzkaller testcases don't setup any memory regions. However, if ept=Y rmode identity map will be created, then kvm_mmu_calculate_mmu_pages() will extend the number of VM's mmu pages to at least KVM_MIN_ALLOC_MMU_PAGES which just hide the issue. I saw the scenario kvm->arch.n_max_mmu_pages == 0 && kvm->arch.n_used_mmu_pages == 1, so there is one active mmu page on the list, kvm_mmu_prepare_zap_page() fails to zap any pages, however prepare_zap_oldest_mmu_page() always returns true. It incurs infinite loop in make_mmu_pages_available() which causes mmu->lock softlockup. This patch fixes it by setting the return value of prepare_zap_oldest_mmu_page() according to whether or not there is mmu page zapped. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
David Hildenbrand authored
Let's reuse the function introduced with eptp switching. We don't explicitly have to check against enable_ept_ad_bits, as this is implicitly done when checking against nested_vmx_ept_caps in valid_ept_address(). Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Andrew Jones authored
Remove nonexistent files, allow less awkward expressions when extracting arch-specific information, and only return relevant information when using arch-specific expressions. Additionally add include/trace/events/kvm.h, arch/*/include/uapi/asm/kvm*, and arch/powerpc/kernel/kvm* to appropriate sections. The arch- specific expressions are now: /KVM/ -- All KVM /\(KVM\)|\(KVM\/x86\)/ -- X86 /\(KVM\)|\(KVM\/x86\)|\(KVM\/amd\)/ -- X86 plus AMD /\(KVM\)|\(KVM\/arm\)/ -- ARM /\(KVM\)|\(KVM\/arm\)|\(KVM\/arm64\)/ -- ARM plus ARM64 /\(KVM\)|\(KVM\/powerpc\)/ -- POWERPC /\(KVM\)|\(KVM\/s390\)/ -- S390 /\(KVM\)|\(KVM\/mips\)/ -- MIPS Signed-off-by: Andrew Jones <drjones@redhat.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Acked-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
- 10 Aug, 2017 3 commits
-
-
Paolo Bonzini authored
This is the same as commit 14727754 ("kvm: svm: Add support for additional SVM NPF error codes", 2016-11-23), but for Intel processors. In this case, the exit qualification field's bit 8 says whether the EPT violation occurred while translating the guest's final physical address or rather while translating the guest page tables. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Brijesh Singh authored
Commit 14727754 ("kvm: svm: Add support for additional SVM NPF error codes", 2016-11-23) added a new error code to aid nested page fault handling. The commit unprotects (kvm_mmu_unprotect_page) the page when we get a NPF due to guest page table walk where the page was marked RO. However, if an L0->L2 shadow nested page table can also be marked read-only when a page is read only in L1's nested page table. If such a page is accessed by L2 while walking page tables it can cause a nested page fault (page table walks are write accesses). However, after kvm_mmu_unprotect_page we may get another page fault, and again in an endless stream. To cover this use case, we qualify the new error_code check with vcpu->arch.mmu_direct_map so that the error_code check would run on L1 guest, and not the L2 guest. This avoids hitting the above scenario. Fixes: 14727754 Cc: stable@vger.kernel.org Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Thomas Lendacky <thomas.lendacky@amd.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Wanpeng Li authored
Reported by syzkaller: The kvm-intel.unrestricted_guest=0 WARNING: CPU: 5 PID: 1014 at /home/kernel/data/kvm/arch/x86/kvm//x86.c:7227 kvm_arch_vcpu_ioctl_run+0x38b/0x1be0 [kvm] CPU: 5 PID: 1014 Comm: warn_test Tainted: G W OE 4.13.0-rc3+ #8 RIP: 0010:kvm_arch_vcpu_ioctl_run+0x38b/0x1be0 [kvm] Call Trace: ? put_pid+0x3a/0x50 ? rcu_read_lock_sched_held+0x79/0x80 ? kmem_cache_free+0x2f2/0x350 kvm_vcpu_ioctl+0x340/0x700 [kvm] ? kvm_vcpu_ioctl+0x340/0x700 [kvm] ? __fget+0xfc/0x210 do_vfs_ioctl+0xa4/0x6a0 ? __fget+0x11d/0x210 SyS_ioctl+0x79/0x90 entry_SYSCALL_64_fastpath+0x23/0xc2 ? __this_cpu_preempt_check+0x13/0x20 The syszkaller folks reported a residual mmio emulation request to userspace due to vm86 fails to emulate inject real mode interrupt(fails to read CS) and incurs a triple fault. The vCPU returns to userspace with vcpu->mmio_needed == true and KVM_EXIT_SHUTDOWN exit reason. However, the syszkaller testcase constructs several threads to launch the same vCPU, the thread which lauch this vCPU after the thread whichs get the vcpu->mmio_needed == true and KVM_EXIT_SHUTDOWN will trigger the warning. #define _GNU_SOURCE #include <pthread.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/wait.h> #include <sys/types.h> #include <sys/stat.h> #include <sys/mman.h> #include <fcntl.h> #include <unistd.h> #include <linux/kvm.h> #include <stdio.h> int kvmcpu; struct kvm_run *run; void* thr(void* arg) { int res; res = ioctl(kvmcpu, KVM_RUN, 0); printf("ret1=%d exit_reason=%d suberror=%d\n", res, run->exit_reason, run->internal.suberror); return 0; } void test() { int i, kvm, kvmvm; pthread_t th[4]; kvm = open("/dev/kvm", O_RDWR); kvmvm = ioctl(kvm, KVM_CREATE_VM, 0); kvmcpu = ioctl(kvmvm, KVM_CREATE_VCPU, 0); run = (struct kvm_run*)mmap(0, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, kvmcpu, 0); srand(getpid()); for (i = 0; i < 4; i++) { pthread_create(&th[i], 0, thr, 0); usleep(rand() % 10000); } for (i = 0; i < 4; i++) pthread_join(th[i], 0); } int main() { for (;;) { int pid = fork(); if (pid < 0) exit(1); if (pid == 0) { test(); exit(0); } int status; while (waitpid(pid, &status, __WALL) != pid) {} } return 0; } This patch fixes it by resetting the vcpu->mmio_needed once we receive the triple fault to avoid the residue. Reported-by: Dmitry Vyukov <dvyukov@google.com> Tested-by: Dmitry Vyukov <dvyukov@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
- 08 Aug, 2017 4 commits
-
-
Longpeng(Mike) authored
This implements the kvm_arch_vcpu_in_kernel() for ARM, and adjusts the calls to kvm_vcpu_on_spin(). Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Longpeng(Mike) authored
This implements kvm_arch_vcpu_in_kernel() for s390. DIAG is a privileged operation, so it cannot be called from problem state (user mode). Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Longpeng(Mike) authored
get_cpl requires vcpu_load, so we must cache the result (whether the vcpu was preempted when its cpl=0) in kvm_vcpu_arch. Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Longpeng(Mike) authored
If a vcpu exits due to request a user mode spinlock, then the spinlock-holder may be preempted in user mode or kernel mode. (Note that not all architectures trap spin loops in user mode, only AMD x86 and ARM/ARM64 currently do). But if a vcpu exits in kernel mode, then the holder must be preempted in kernel mode, so we should choose a vcpu in kernel mode as a more likely candidate for the lock holder. This introduces kvm_arch_vcpu_in_kernel() to decide whether the vcpu is in kernel-mode when it's preempted. kvm_vcpu_on_spin's new argument says the same of the spinning VCPU. Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-