- 26 Sep, 2014 1 commit
-
-
Andres Lagar-Cavilla authored
Confusion around -EBUSY and zero (inside a BUG_ON no less). Reported-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andres Lagar-Cavilla <andreslc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
- 24 Sep, 2014 24 commits
-
-
git://github.com/agraf/linux-2.6Paolo Bonzini authored
Patch queue for ppc - 2014-09-24 New awesome things in this release: - E500: e6500 core support - E500: guest and remote debug support - Book3S: remote sw breakpoint support - Book3S: HV: Minor bugfixes Alexander Graf (1): KVM: PPC: Pass enum to kvmppc_get_last_inst Bharat Bhushan (8): KVM: PPC: BOOKE: allow debug interrupt at "debug level" KVM: PPC: BOOKE : Emulate rfdi instruction KVM: PPC: BOOKE: Allow guest to change MSR_DE KVM: PPC: BOOKE: Clear guest dbsr in userspace exit KVM_EXIT_DEBUG KVM: PPC: BOOKE: Guest and hardware visible debug registers are same KVM: PPC: BOOKE: Add one reg interface for DBSR KVM: PPC: BOOKE: Add one_reg documentation of SPRG9 and DBSR KVM: PPC: BOOKE: Emulate debug registers and exception Madhavan Srinivasan (2): powerpc/kvm: support to handle sw breakpoint powerpc/kvm: common sw breakpoint instr across ppc Michael Neuling (1): KVM: PPC: Book3S HV: Add register name when loading toc Mihai Caraman (10): powerpc/booke: Restrict SPE exception handlers to e200/e500 cores powerpc/booke: Revert SPE/AltiVec common defines for interrupt numbers KVM: PPC: Book3E: Increase FPU laziness KVM: PPC: Book3e: Add AltiVec support KVM: PPC: Make ONE_REG powerpc generic KVM: PPC: Move ONE_REG AltiVec support to powerpc KVM: PPC: Remove the tasklet used by the hrtimer KVM: PPC: Remove shared defines for SPE and AltiVec interrupts KVM: PPC: e500mc: Add support for single threaded vcpus on e6500 core KVM: PPC: Book3E: Enable e6500 core Paul Mackerras (2): KVM: PPC: Book3S HV: Increase timeout for grabbing secondary threads KVM: PPC: Book3S HV: Only accept host PVR value for guest PVR
-
Tang Chen authored
In order to make the APIC access page migratable, stop pinning it in memory. And because the APIC access page is not pinned in memory, we can remove kvm_arch->apic_access_page. When we need to write its physical address into vmcs, we use gfn_to_page() to get its page struct, which is needed to call page_to_phys(); the page is then immediately unpinned. Suggested-by: Gleb Natapov <gleb@kernel.org> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Tang Chen authored
Currently, the APIC access page is pinned by KVM for the entire life of the guest. We want to make it migratable in order to make memory hot-unplug available for machines that run KVM. This patch prepares to handle this for the case where there is no nested virtualization, or where the nested guest does not have an APIC page of its own. All accesses to kvm->arch.apic_access_page are changed to go through kvm_vcpu_reload_apic_access_page. If the APIC access page is invalidated when the host is running, we update the VMCS in the next guest entry. If it is invalidated when the guest is running, the MMU notifier will force an exit, after which we will handle everything as in the previous case. If it is invalidated when a nested guest is running, the request will update either the VMCS01 or the VMCS02. Updating the VMCS01 is done at the next L2->L1 exit, while updating the VMCS02 is done in prepare_vmcs02. Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Tang Chen authored
Currently, the APIC access page is pinned by KVM for the entire life of the guest. We want to make it migratable in order to make memory hot-unplug available for machines that run KVM. This patch prepares to handle this in generic code, through a new request bit (that will be set by the MMU notifier) and a new hook that is called whenever the request bit is processed. Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Tang Chen authored
This will be used to let the guest run while the APIC access page is not pinned. Because subsequent patches will fill in the function for x86, place the (still empty) x86 implementation in the x86.c file instead of adding an inline function in kvm_host.h. Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Tang Chen authored
Different architectures need different requests, and in fact we will use this function in architecture-specific code later. This will be outside kvm_main.c, so make it non-static and rename it to kvm_make_all_cpus_request(). Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Andres Lagar-Cavilla authored
1. We were calling clear_flush_young_notify in unmap_one, but we are within an mmu notifier invalidate range scope. The spte exists no more (due to range_start) and the accessed bit info has already been propagated (due to kvm_pfn_set_accessed). Simply call clear_flush_young. 2. We clear_flush_young on a primary MMU PMD, but this may be mapped as a collection of PTEs by the secondary MMU (e.g. during log-dirty). This required expanding the interface of the clear_flush_young mmu notifier, so a lot of code has been trivially touched. 3. In the absence of shadow_accessed_mask (e.g. EPT A bit), we emulate the access bit by blowing the spte. This requires proper synchronizing with MMU notifier consumers, like every other removal of spte's does. Signed-off-by: Andres Lagar-Cavilla <andreslc@google.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Andres Lagar-Cavilla authored
Callbacks don't have to do extra computation to learn what the caller (lvm_handle_hva_range()) knows very well. Useful for debugging/tracing/printk/future. Signed-off-by: Andres Lagar-Cavilla <andreslc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
On x86_64, kernel text mappings are mapped read-only with CONFIG_DEBUG_RODATA. In that case, KVM will fail to patch VMCALL instructions to VMMCALL as required on AMD processors. The failure mode is currently a divide-by-zero exception, which obviously is a KVM bug that has to be fixed. However, picking the right instruction between VMCALL and VMMCALL will be faster and will help if you cannot upgrade the hypervisor. Reported-by: Chris Webb <chris@arachsys.com> Tested-by: Chris Webb <chris@arachsys.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Chen Yucong authored
Avoid open coded calculations for bank MSRs by using well-defined macros that hide the index of higher bank MSRs. No semantic changes. Signed-off-by: Chen Yucong <slaoub@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Nadav Amit authored
Commit 346874c9 ("KVM: x86: Fix CR3 reserved bits") removed non-PAE reserved bits which were not according to Intel SDM. However, residue was left in a debug assertion (CR3_NONPAE_RESERVED_BITS). Remove it. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
David Matlack authored
vcpu ioctls can hang the calling thread if issued while a vcpu is running. However, invalid ioctls can happen when userspace tries to probe the kind of file descriptors (e.g. isatty() calls ioctl(TCGETS)); in that case, we know the ioctl is going to be rejected as invalid anyway and we can fail before trying to take the vcpu mutex. This patch does not change functionality, it just makes invalid ioctls fail faster. Cc: stable@vger.kernel.org Signed-off-by: David Matlack <dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Andres Lagar-Cavilla authored
When KVM handles a tdp fault it uses FOLL_NOWAIT. If the guest memory has been swapped out or is behind a filemap, this will trigger async readahead and return immediately. The rationale is that KVM will kick back the guest with an "async page fault" and allow for some other guest process to take over. If async PFs are enabled the fault is retried asap from an async workqueue. If not, it's retried immediately in the same code path. In either case the retry will not relinquish the mmap semaphore and will block on the IO. This is a bad thing, as other mmap semaphore users now stall as a function of swap or filemap latency. This patch ensures both the regular and async PF path re-enter the fault allowing for the mmap semaphore to be relinquished in the case of IO wait. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Andres Lagar-Cavilla <andreslc@google.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Tiejun Chen authored
s/drity/dirty and s/vmsc01/vmcs01 Signed-off-by: Tiejun Chen <tiejun.chen@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Nadav Amit authored
Guest which sets the PAT CR to invalid value should get a #GP. Currently, if vmx supports loading PAT CR during entry, then the value is not checked. This patch makes the required check in that case. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Nadav Amit authored
In 64-bit mode a #GP should be delivered to the guest "if the code segment descriptor pointed to by the selector in the 64-bit gate doesn't have the L-bit set and the D-bit clear." - Intel SDM "Interrupt 13—General Protection Exception (#GP)". This patch fixes the behavior of CS loading emulation code. Although the comment says that segment loading is not supported in long mode, this function is executed in long mode, so the fix is necassary. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Liang Chen authored
A one-line wrapper around kvm_make_request is not particularly useful. Replace kvm_mmu_flush_tlb() with kvm_make_request(). Signed-off-by: Liang Chen <liangchen.linux@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Radim Krčmář authored
- we count KVM_REQ_TLB_FLUSH requests, not actual flushes (KVM can have multiple requests for one flush) - flushes from kvm_flush_remote_tlbs aren't counted - it's easy to make a direct request by mistake Solve these by postponing the counting to kvm_check_request(). Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Liang Chen <liangchen.linux@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Marcelo Tosatti authored
Initilization of L2 guest with -cpu host, on L1 guest with -cpu host triggers: (qemu) KVM: entry failed, hardware error 0x7 ... nested_vmx_run: VMCS MSR_{LOAD,STORE} unsupported Nested VMX MSR load/store support is not sufficient to allow perf for L2 guest. Until properly fixed, trap CPUID and disable function 0xA. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Nadav Amit authored
Commit fc3a9157 ("KVM: X86: Don't report L2 emulation failures to user-space") disabled the reporting of L2 (nested guest) emulation failures to userspace due to race-condition between a vmexit and the instruction emulator. The same rational applies also to userspace applications that are permitted by the guest OS to access MMIO area or perform PIO. This patch extends the current behavior - of injecting a #UD instead of reporting it to userspace - also for guest userspace code. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
In init_rmode_tss(), there two variables indicating the return value, r and ret, and it return 0 on error, 1 on success. The function is only called by vmx_set_tss_addr(), and ret is redundant. This patch removes the redundant variable, by making init_rmode_tss() return 0 on success, -errno on failure. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Nadav Amit authored
The KVM emulator code assumes that the guest virtual address space (in 64-bit) is 48-bits wide. Fail the KVM_SET_CPUID and KVM_SET_CPUID2 ioctl if userspace tries to create a guest that does not obey this restriction. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
/me got confused between the kernel and QEMU. In the kernel, you can only have one module_init function, and it will prevent unloading the module unless you also have the corresponding module_exit function. So, commit 80ce1639 (KVM: VFIO: register kvm_device_ops dynamically, 2014-09-02) broke unloading of the kvm module, by adding a module_init function and no module_exit. Repair it by making kvm_vfio_ops_init weak, and checking it in kvm_init. Cc: Will Deacon <will.deacon@arm.com> Cc: Gleb Natapov <gleb@kernel.org> Cc: Alex Williamson <Alex.Williamson@redhat.com> Fixes: 80ce1639Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Christoffer Dall authored
Commit c77dcacb (KVM: Move more code under CONFIG_HAVE_KVM_IRQFD) added functionality that depends on definitions in ioapic.h when __KVM_HAVE_IOAPIC is defined. At the same time, kvm-arm commit 0ba09511 (KVM: EVENTFD: remove inclusion of irq.h) removed the inclusion of irq.h, an architecture-specific header that is not present on ARM but which happened to include ioapic.h on x86. Include ioapic.h directly in eventfd.c if __KVM_HAVE_IOAPIC is defined. This fixes x86 and lets ARM use eventfd.c. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
- 22 Sep, 2014 15 commits
-
-
Alexander Graf authored
The kvmppc_get_last_inst function recently received a facelift that allowed us to pass an enum of the type of instruction we want to read into it rather than an unreadable boolean. Unfortunately, not all callers ended up passing the enum. This wasn't really an issue as "true" and "false" happen to match the two enum values we have, but it's still hard to read. Update all callers of kvmppc_get_last_inst() to follow the new calling convention. Signed-off-by: Alexander Graf <agraf@suse.de>
-
Madhavan Srinivasan authored
This patch extends the use of illegal instruction as software breakpoint instruction across the ppc platform. Patch extends booke program interrupt code to support software breakpoint. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> [agraf: Fix bookehv] Signed-off-by: Alexander Graf <agraf@suse.de>
-
Madhavan Srinivasan authored
This patch adds kernel side support for software breakpoint. Design is that, by using an illegal instruction, we trap to hypervisor via Emulation Assistance interrupt, where we check for the illegal instruction and accordingly we return to Host or Guest. Patch also adds support for software breakpoint in PR KVM. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de>
-
Mihai Caraman authored
Now that AltiVec and hardware thread support is in place enable e6500 core. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
-
Mihai Caraman authored
ePAPR represents hardware threads as cpu node properties in device tree. So with existing QEMU, hardware threads are simply exposed as vcpus with one hardware thread. The e6500 core shares TLBs between hardware threads. Without tlb write conditional instruction, the Linux kernel uses per core mechanisms to protect against duplicate TLB entries. The guest is unable to detect real siblings threads, so it can't use the TLB protection mechanism. An alternative solution is to use the hypervisor to allocate different lpids to guest's vcpus that runs simultaneous on real siblings threads. On systems with two threads per core this patch halves the size of the lpid pool that the allocator sees and use two lpids per VM. Use even numbers to speedup vcpu lpid computation with consecutive lpids per VM: vm1 will use lpids 2 and 3, vm2 lpids 4 and 5, and so on. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> [agraf: fix spelling] Signed-off-by: Alexander Graf <agraf@suse.de>
-
Paul Mackerras authored
Since the guest can read the machine's PVR (Processor Version Register) directly and see the real value, we should disallow userspace from setting any value for the guest's PVR other than the real host value. Therefore this makes kvm_arch_vcpu_set_sregs_hv() check the supplied PVR value and return an error if it is different from the host value, which has been put into vcpu->arch.pvr at vcpu creation time. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
-
Paul Mackerras authored
Occasional failures have been seen with split-core mode and migration where the message "KVM: couldn't grab cpu" appears. This increases the length of time that we wait from 1ms to 10ms, which seems to work around the issue. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
-
Mihai Caraman authored
We currently decide at compile-time which of the SPE or AltiVec units to support exclusively. Guard kernel defines with CONFIG_SPE_POSSIBLE and CONFIG_PPC_E500MC and remove shared defines. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
-
Mihai Caraman authored
Powerpc timer implementation is a copycat version of s390. Now that they removed the tasklet with commit ea74c0ea follow this optimization. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> Signed-off-by: Bogdan Purcareata <bogdan.purcareata@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
-
Bharat Bhushan authored
This patch emulates debug registers and debug exception to support guest using debug resource. This enables running gdb/kgdb etc in guest. On BOOKE architecture we cannot share debug resources between QEMU and guest because: When QEMU is using debug resources then debug exception must be always enabled. To achieve this we set MSR_DE and also set MSRP_DEP so guest cannot change MSR_DE. When emulating debug resource for guest we want guest to control MSR_DE (enable/disable debug interrupt on need). So above mentioned two configuration cannot be supported at the same time. So the result is that we cannot share debug resources between QEMU and Guest on BOOKE architecture. In the current design QEMU gets priority over guest, this means that if QEMU is using debug resources then guest cannot use them and if guest is using debug resource then QEMU can overwrite them. Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
-
Mihai Caraman authored
Move ONE_REG AltiVec support to powerpc generic layer. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
-
Mihai Caraman authored
Make ONE_REG generic for server and embedded architectures by moving kvm_vcpu_ioctl_get_one_reg() and kvm_vcpu_ioctl_set_one_reg() functions to powerpc layer. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
-
Mihai Caraman authored
Add AltiVec support in KVM for Book3e. FPU support gracefully reuse host infrastructure so follow the same approach for AltiVec. Book3e specification defines shared interrupt numbers for SPE and AltiVec units. Still SPE is present in e200/e500v2 cores while AltiVec is present in e6500 core. So we can currently decide at compile-time which of the SPE or AltiVec units to support exclusively by using CONFIG_SPE_POSSIBLE and CONFIG_PPC_E500MC defines. As Alexander Graf suggested, keep SPE and AltiVec exception handlers distinct to improve code readability. Guests have the privilege to enable AltiVec, so we always need to support AltiVec in KVM and implicitly in host to reflect interrupts and to save/restore the unit context. KVM will be loaded on cores with AltiVec unit only if CONFIG_ALTIVEC is defined. Use this define to guard KVM AltiVec logic. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
-
Mihai Caraman authored
Increase FPU laziness by loading the guest state into the unit before entering the guest instead of doing it on each vcpu schedule. Without this improvement an interrupt may claim floating point corrupting guest state. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
-
Bharat Bhushan authored
This was missed in respective one_reg implementation patch. Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
-