- 14 Apr, 2015 1 commit
-
-
Ulrich Obergfell authored
Have kvm_guest_init() use hardlockup_detector_disable() instead of watchdog_enable_hardlockup_detector(false). Remove the watchdog_hardlockup_detector_is_enabled() and the watchdog_enable_hardlockup_detector() function which are no longer needed. Signed-off-by:
Ulrich Obergfell <uobergfe@redhat.com> Signed-off-by:
Don Zickus <dzickus@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 18 Feb, 2015 1 commit
-
-
Raghavendra K T authored
Paravirt spinlock clears slowpath flag after doing unlock. As explained by Linus currently it does: prev = *lock; add_smp(&lock->tickets.head, TICKET_LOCK_INC); /* add_smp() is a full mb() */ if (unlikely(lock->tickets.tail & TICKET_SLOWPATH_FLAG)) __ticket_unlock_slowpath(lock, prev); which is *exactly* the kind of things you cannot do with spinlocks, because after you've done the "add_smp()" and released the spinlock for the fast-path, you can't access the spinlock any more. Exactly because a fast-path lock might come in, and release the whole data structure. Linus suggested that we should not do any writes to lock after unlock(), and we can move slowpath clearing to fastpath lock. So this patch implements the fix with: 1. Moving slowpath flag to head (Oleg): Unlocked locks don't care about the slowpath flag; therefore we can keep it set after the last unlock, and clear it again on the first (try)lock. -- this removes the write after unlock. note that keeping slowpath flag would result in unnecessary kicks. By moving the slowpath flag from the tail to the head ticket we also avoid the need to access both the head and tail tickets on unlock. 2. use xadd to avoid read/write after unlock that checks the need for unlock_kick (Linus): We further avoid the need for a read-after-release by using xadd; the prev head value will include the slowpath flag and indicate if we need to do PV kicking of suspended spinners -- on modern chips xadd isn't (much) more expensive than an add + load. Result: setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest) benchmark overcommit %improve kernbench 1x -0.13 kernbench 2x 0.02 dbench 1x -1.77 dbench 2x -0.63 [Jeremy: Hinted missing TICKET_LOCK_INC for kick] [Oleg: Moved slowpath flag to head, ticket_equals idea] [PeterZ: Added detailed changelog] Suggested-by:
Linus Torvalds <torvalds@linux-foundation.org> Reported-by:
Sasha Levin <sasha.levin@oracle.com> Tested-by:
Sasha Levin <sasha.levin@oracle.com> Signed-off-by:
Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by:
Oleg Nesterov <oleg@redhat.com> Cc: Andrew Jones <drjones@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Jones <davej@redhat.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Fernando Luis Vázquez Cao <fernando_b1@lab.ntt.co.jp> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ulrich Obergfell <uobergfe@redhat.com> Cc: Waiman Long <Waiman.Long@hp.com> Cc: a.ryabinin@samsung.com Cc: dave@stgolabs.net Cc: hpa@zytor.com Cc: jasowang@redhat.com Cc: jeremy@goop.org Cc: paul.gortmaker@windriver.com Cc: riel@redhat.com Cc: tglx@linutronix.de Cc: waiman.long@hp.com Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/20150215173043.GA7471@linux.vnet.ibm.com Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
- 10 Dec, 2014 1 commit
-
-
Andy Lutomirski authored
paravirt_enabled has the following effects: - Disables the F00F bug workaround warning. There is no F00F bug workaround any more because Linux's standard IDT handling already works around the F00F bug, but the warning still exists. This is only cosmetic, and, in any event, there is no such thing as KVM on a CPU with the F00F bug. - Disables 32-bit APM BIOS detection. On a KVM paravirt system, there should be no APM BIOS anyway. - Disables tboot. I think that the tboot code should check the CPUID hypervisor bit directly if it matters. - paravirt_enabled disables espfix32. espfix32 should *not* be disabled under KVM paravirt. The last point is the purpose of this patch. It fixes a leak of the high 16 bits of the kernel stack address on 32-bit KVM paravirt guests. Fixes CVE-2014-8134. Cc: stable@vger.kernel.org Suggested-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by:
Andy Lutomirski <luto@amacapital.net> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 14 Oct, 2014 1 commit
-
-
Ulrich Obergfell authored
Use watchdog_enable_hardlockup_detector() to set hard lockup detection's default value to false. It's risky to run this detection in a guest, as false positives are easy to trigger, especially if the host is overcommitted. Signed-off-by:
Ulrich Obergfell <uobergfe@redhat.com> Signed-off-by:
Andrew Jones <drjones@redhat.com> Signed-off-by:
Don Zickus <dzickus@redhat.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 26 Aug, 2014 1 commit
-
-
Christoph Lameter authored
__get_cpu_var() is used for multiple purposes in the kernel source. One of them is address calculation via the form &__get_cpu_var(x). This calculates the address for the instance of the percpu variable of the current processor based on an offset. Other use cases are for storing and retrieving data from the current processors percpu area. __get_cpu_var() can be used as an lvalue when writing data or on the right side of an assignment. __get_cpu_var() is defined as : #define __get_cpu_var(var) (*this_cpu_ptr(&(var))) __get_cpu_var() always only does an address determination. However, store and retrieve operations could use a segment prefix (or global register on other platforms) to avoid the address calculation. this_cpu_write() and this_cpu_read() can directly take an offset into a percpu area and use optimized assembly code to read and write per cpu variables. This patch converts __get_cpu_var into either an explicit address calculation using this_cpu_ptr() or into a use of this_cpu operations that use the offset. Thereby address calculations are avoided and less registers are used when code is generated. Transformations done to __get_cpu_var() 1. Determine the address of the percpu instance of the current processor. DEFINE_PER_CPU(int, y); int *x = &__get_cpu_var(y); Converts to int *x = this_cpu_ptr(&y); 2. Same as #1 but this time an array structure is involved. DEFINE_PER_CPU(int, y[20]); int *x = __get_cpu_var(y); Converts to int *x = this_cpu_ptr(y); 3. Retrieve the content of the current processors instance of a per cpu variable. DEFINE_PER_CPU(int, y); int x = __get_cpu_var(y) Converts to int x = __this_cpu_read(y); 4. Retrieve the content of a percpu struct DEFINE_PER_CPU(struct mystruct, y); struct mystruct x = __get_cpu_var(y); Converts to memcpy(&x, this_cpu_ptr(&y), sizeof(x)); 5. Assignment to a per cpu variable DEFINE_PER_CPU(int, y) __get_cpu_var(y) = x; Converts to __this_cpu_write(y, x); 6. Increment/Decrement etc of a per cpu variable DEFINE_PER_CPU(int, y); __get_cpu_var(y)++ Converts to __this_cpu_inc(y) Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86@kernel.org Acked-by:
H. Peter Anvin <hpa@linux.intel.com> Acked-by:
Ingo Molnar <mingo@kernel.org> Signed-off-by:
Christoph Lameter <cl@linux.com> Signed-off-by:
Tejun Heo <tj@kernel.org>
-
- 22 May, 2014 1 commit
-
-
Dave Hansen authored
I noticed on some of my systems that page fault tracing doesn't work: cd /sys/kernel/debug/tracing echo 1 > events/exceptions/enable cat trace; # nothing shows up I eventually traced it down to CONFIG_KVM_GUEST. At least in a KVM VM, enabling that option breaks page fault tracing, and disabling fixes it. I tried on some old kernels and this does not appear to be a regression: it never worked. There are two page-fault entry functions today. One when tracing is on and another when it is off. The KVM code calls do_page_fault() directly instead of calling the traced version: > dotraplinkage void __kprobes > do_async_page_fault(struct pt_regs *regs, unsigned long > error_code) > { > enum ctx_state prev_state; > > switch (kvm_read_and_reset_pf_reason()) { > default: > do_page_fault(regs, error_code); > break; > case KVM_PV_REASON_PAGE_NOT_PRESENT: I'm also having problems with the page fault tracing on bare metal (same symptom of no trace output). I'm unsure if it's related. Steven had an alternative to this which has zero overhead when tracing is off where this includes the standard noops even when tracing is disabled. I'm unconvinced that the extra complexity of his apporach: http://lkml.kernel.org/r/20140508194508.561ed220@gandalf.local.home is worth it, expecially considering that the KVM code is already making page fault entry slower here. This solution is dirt-simple. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86@kernel.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Gleb Natapov <gleb@redhat.com> Cc: kvm@vger.kernel.org Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Dave Hansen <dave.hansen@linux.intel.com> Acked-by:
"H. Peter Anvin" <hpa@zytor.com> Acked-by:
Steven Rostedt <rostedt@goodmis.org> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 24 Apr, 2014 1 commit
-
-
Masami Hiramatsu authored
Use NOKPROBE_SYMBOL macro for protecting functions from kprobes instead of __kprobes annotation under arch/x86. This applies nokprobe_inline annotation for some cases, because NOKPROBE_SYMBOL() will inhibit inlining by referring the symbol address. This just folds a bunch of previous NOKPROBE_SYMBOL() cleanup patches for x86 to one patch. Signed-off-by:
Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Link: http://lkml.kernel.org/r/20140417081814.26341.51656.stgit@ltc230.yrl.intra.hitachi.co.jp Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Borislav Petkov <bp@suse.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fernando Luis Vázquez Cao <fernando_b1@lab.ntt.co.jp> Cc: Gleb Natapov <gleb@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Lebon <jlebon@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Matt Fleming <matt.fleming@intel.com> Cc: Michel Lespinasse <walken@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Seiji Aguchi <seiji.aguchi@hds.com> Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vineet Gupta <vgupta@synopsys.com> Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
- 22 Feb, 2014 1 commit
-
-
Fernando Luis Vázquez Cao authored
These days hv_clock allocation is memblock based (i.e. the percpu allocator is not involved), which means that the physical address of each of the per-cpu hv_clock areas is guaranteed to remain unchanged through all its lifetime and we do not need to update its location after CPU bring-up. Signed-off-by:
Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 30 Jan, 2014 1 commit
-
-
Andi Kleen authored
These functions are called from inline assembler stubs, thus need to be global and visible. Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Gleb Natapov <gleb@kernel.org> Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Signed-off-by:
Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1382458079-24450-7-git-send-email-andi@firstfloor.org Signed-off-by:
H. Peter Anvin <hpa@linux.intel.com>
-
- 29 Jan, 2014 2 commits
-
-
Paolo Bonzini authored
When Hyper-V hypervisor leaves are present, KVM must relocate its own leaves at 0x40000100, because Windows does not look for Hyper-V leaves at indices other than 0x40000000. In this case, the KVM features are at 0x40000101, but the old code would always look at 0x40000001. Fix by using kvm_cpuid_base(). This also requires making the function non-inline, since kvm_cpuid_base() is static. Fixes: 1085ba7f Cc: stable@vger.kernel.org Cc: mtosatti@redhat.com Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
It is unnecessary to go through hypervisor_cpuid_base every time a leaf is found (which will be every time a feature is requested after the next patch). Fixes: 1085ba7f Cc: stable@vger.kernel.org Cc: mtosatti@redhat.com Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 08 Nov, 2013 1 commit
-
-
Seiji Aguchi authored
This patch registers exception handlers for tracing to a trace IDT. To implemented it in set_intr_gate(), this patch does followings. - Register the exception handlers to the trace IDT by prepending "trace_" to the handler's names. - Also, newly introduce trace_page_fault() to add tracepoints in a subsequent patch. Signed-off-by:
Seiji Aguchi <seiji.aguchi@hds.com> Link: http://lkml.kernel.org/r/52716DEC.5050204@hds.com Signed-off-by:
H. Peter Anvin <hpa@linux.intel.com>
-
- 30 Oct, 2013 1 commit
-
-
Tim Gardner authored
The x86 specific kvm init creates a new conflicting debugfs directory which causes modprobe issues with kvm_intel and kvm_amd. For example, sudo modprobe kvm_amd modprobe: ERROR: could not insert 'kvm_amd': Bad address The simplest fix is to just rename the directory. The following KVM config options are set: CONFIG_KVM_GUEST=y CONFIG_KVM_DEBUG_FS=y CONFIG_HAVE_KVM=y CONFIG_HAVE_KVM_IRQCHIP=y CONFIG_HAVE_KVM_IRQ_ROUTING=y CONFIG_HAVE_KVM_EVENTFD=y CONFIG_KVM_APIC_ARCHITECTURE=y CONFIG_KVM_MMIO=y CONFIG_KVM_ASYNC_PF=y CONFIG_HAVE_KVM_MSI=y CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT=y CONFIG_KVM=m CONFIG_KVM_INTEL=m CONFIG_KVM_AMD=m CONFIG_KVM_DEVICE_ASSIGNMENT=y Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by:
Tim Gardner <tim.gardner@canonical.com> [Change debugfs directory name. - Paolo] Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 15 Oct, 2013 1 commit
-
-
Raghavendra K T authored
We use jump label to enable pv-spinlock. With the changes in (442e0973 Merge branch 'x86/jumplabel'), the jump label behaviour has changed that would result in eventual hang of the VM since we would end up in a situation where slow path locks would halt the vcpus but we will not be able to wakeup the vcpu by lock releaser using unlock kick. Similar problem in Xen and more detailed description is available in a945928e (xen: Do not enable spinlocks before jump_label_init() has executed) This patch splits kvm_spinlock_init to separate jump label changes with pvops patching and also make jump label enabling after jump_label_init(). Signed-off-by:
Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Reviewed-by:
Paolo Bonzini <pbonzini@redhat.com> Reviewed-by:
Steven Rostedt <rostedt@goodmis.org> Signed-off-by:
Gleb Natapov <gleb@redhat.com>
-
- 19 Aug, 2013 1 commit
-
-
Raghavendra K T authored
It was not declared as static since it was thought to be used by pv-flushtlb earlier. Signed-off-by:
Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Cc: <gleb@redhat.com> Cc: <pbonzini@redhat.com> Cc: Jiri Kosina <trivial@kernel.org> Link: http://lkml.kernel.org/r/1376645921-8056-1-git-send-email-raghavendra.kt@linux.vnet.ibm.com Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
- 14 Aug, 2013 1 commit
-
-
Srivatsa Vaddagiri authored
During smp_boot_cpus paravirtualied KVM guest detects if the hypervisor has required feature (KVM_FEATURE_PV_UNHALT) to support pv-ticketlocks. If so, support for pv-ticketlocks is registered via pv_lock_ops. Use KVM_HC_KICK_CPU hypercall to wakeup waiting/halted vcpu. Signed-off-by:
Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20130810193849.GA25260@linux.vnet.ibm.com Signed-off-by:
Suzuki Poulose <suzuki@in.ibm.com> [Raghu: check_zero race fix, enum for kvm_contention_stat, jumplabel related changes, addition of safe_halt for irq enabled case, bailout spinning in nmi case(Gleb)] Signed-off-by:
Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Acked-by:
Gleb Natapov <gleb@redhat.com> Acked-by:
Ingo Molnar <mingo@kernel.org> Signed-off-by:
H. Peter Anvin <hpa@linux.intel.com>
-
- 05 Aug, 2013 1 commit
-
-
Jason Wang authored
We try to handle the hypervisor compatibility mode by detecting hypervisor through a specific order. This is not robust, since hypervisors may implement each others features. This patch tries to handle this situation by always choosing the last one in the CPUID leaves. This is done by letting .detect() return a priority instead of true/false and just re-using the CPUID leaf where the signature were found as the priority (or 1 if it was found by DMI). Then we can just pick hypervisor who has the highest priority. Other sophisticated detection method could also be implemented on top. Suggested by H. Peter Anvin and Paolo Bonzini. Acked-by:
K. Y. Srinivasan <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Doug Covelli <dcovelli@vmware.com> Cc: Borislav Petkov <bp@suse.de> Cc: Dan Hecht <dhecht@vmware.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by:
Jason Wang <jasowang@redhat.com> Link: http://lkml.kernel.org/r/1374742475-2485-4-git-send-email-jasowang@redhat.com Signed-off-by:
H. Peter Anvin <hpa@linux.intel.com>
-
- 14 Jul, 2013 1 commit
-
-
Paul Gortmaker authored
The __cpuinit type of throwaway sections might have made sense some time ago when RAM was more constrained, but now the savings do not offset the cost and complications. For example, the fix in commit 5e427ec2 ("x86: Fix bit corruption at CPU resume time") is a good example of the nasty type of bugs that can be created with improper use of the various __init prefixes. After a discussion on LKML[1] it was decided that cpuinit should go the way of devinit and be phased out. Once all the users are gone, we can then finally remove the macros themselves from linux/init.h. Note that some harmless section mismatch warnings may result, since notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c) are flagged as __cpuinit -- so if we remove the __cpuinit from arch specific callers, we will also get section mismatch warnings. As an intermediate step, we intend to turn the linux/init.h cpuinit content into no-ops as early as possible, since that will get rid of these warnings. In any case, they are temporary and harmless. This removes all the arch/x86 uses of the __cpuinit macros from all C files. x86 only had the one __CPUINIT used in assembly files, and it wasn't paired off with a .previous or a __FINIT, so we can delete it directly w/o any corresponding additional change there. [1] https://lkml.org/lkml/2013/5/20/589 Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Acked-by:
Ingo Molnar <mingo@kernel.org> Acked-by:
Thomas Gleixner <tglx@linutronix.de> Acked-by:
H. Peter Anvin <hpa@linux.intel.com> Signed-off-by:
Paul Gortmaker <paul.gortmaker@windriver.com>
-
- 07 Mar, 2013 2 commits
-
-
Frederic Weisbecker authored
On exception exit, we restore the previous context tracking state based on the regs of the interrupted frame. Iff that frame is in user mode as stated by user_mode() helper, we restore the context tracking user mode. However there is a tiny chunck of low level arch code after we pass through user_enter() and until the CPU eventually resumes userspace. If an exception happens in this tiny area, exception_enter() correctly exits the context tracking user mode but exception_exit() won't restore it because of the value returned by user_mode(regs). As a result we may return to userspace with the wrong context tracking state. To fix this, change exception_enter() to return the context tracking state prior to its call and pass this saved state to exception_exit(). This restores the real context tracking state of the interrupted frame. (May be this patch was suggested to me, I don't recall exactly. If so, sorry for the missing credit). Signed-off-by:
Frederic Weisbecker <fweisbec@gmail.com> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Kevin Hilman <khilman@linaro.org> Cc: Mats Liljegren <mats.liljegren@enea.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
-
Frederic Weisbecker authored
Exceptions handling on context tracking should share common treatment: on entry we exit user mode if the exception triggered in that context. Then on exception exit we return to that previous context. Generalize this to avoid duplication across archs. Signed-off-by:
Frederic Weisbecker <fweisbec@gmail.com> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Kevin Hilman <khilman@linaro.org> Cc: Mats Liljegren <mats.liljegren@enea.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
-
- 11 Feb, 2013 1 commit
-
-
Shuah Khan authored
Fix the following compile warning in kvm_register_steal_time(): CC arch/x86/kernel/kvm.o arch/x86/kernel/kvm.c: In function ‘kvm_register_steal_time’: arch/x86/kernel/kvm.c:302:3: warning: format ‘%lx’ expects argument of type ‘long unsigned int’, but argument 3 has type ‘phys_addr_t’ [-Wformat] Introduced via: 5dfd486c x86, kvm: Fix kvm's use of __pa() on percpu areas d7656534 x86, mm: Create slow_virt_to_phys() f3c4fbb6 x86, mm: Use new pagetable helpers in try_preserve_large_page() 4cbeb51b x86, mm: Pagetable level size/shift/mask helpers a25b9316 x86, mm: Make DEBUG_VIRTUAL work earlier in boot Signed-off-by:
Shuah Khan <shuah.khan@hp.com> Acked-by:
Gleb Natapov <gleb@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Cc: Rik van Riel <riel@redhat.com> Cc: shuahkhan@gmail.com Cc: avi@redhat.com Cc: gleb@redhat.com Cc: mst@redhat.com Link: http://lkml.kernel.org/r/1360119442.8356.8.camel@lorien2 Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
- 26 Jan, 2013 1 commit
-
-
Dave Hansen authored
In short, it is illegal to call __pa() on an address holding a percpu variable. This replaces those __pa() calls with slow_virt_to_phys(). All of the cases in this patch are in boot time (or CPU hotplug time at worst) code, so the slow pagetable walking in slow_virt_to_phys() is not expected to have a performance impact. The times when this actually matters are pretty obscure (certain 32-bit NUMA systems), but it _does_ happen. It is important to keep KVM guests working on these systems because the real hardware is getting harder and harder to find. This bug manifested first by me seeing a plain hang at boot after this message: CPU 0 irqstacks, hard=f3018000 soft=f301a000 or, sometimes, it would actually make it out to the console: [ 0.000000] BUG: unable to handle kernel paging request at ffffffff I eventually traced it down to the KVM async pagefault code. This can be worked around by disabling that code either at compile-time, or on the kernel command-line. The kvm async pagefault code was injecting page faults in to the guest which the guest misinterpreted because its "reason" was not being properly sent from the host. The guest passes a physical address of an per-cpu async page fault structure via an MSR to the host. Since __pa() is broken on percpu data, the physical address it sent was bascially bogus and the host went scribbling on random data. The guest never saw the real reason for the page fault (it was injected by the host), assumed that the kernel had taken a _real_ page fault, and panic()'d. The behavior varied, though, depending on what got corrupted by the bad write. Signed-off-by:
Dave Hansen <dave@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20130122212435.4905663F@kernel.stglabs.ibm.com Acked-by:
Rik van Riel <riel@redhat.com> Reviewed-by:
Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by:
H. Peter Anvin <hpa@linux.intel.com>
-
- 24 Jan, 2013 1 commit
-
-
Alok N Kataria authored
This patch updates x2apic initializaition code to allow x2apic on VMware platform even without interrupt remapping support. The hypervisor_x2apic_available hook was added in x2apic initialization code and used by KVM and XEN, before this. I have also cleaned up that code to export this hook through the hypervisor_x86 structure. Compile tested for KVM and XEN configs, this patch doesn't have any functional effect on those two platforms. On VMware platform, verified that x2apic is used in physical mode on products that support this. Signed-off-by:
Alok N Kataria <akataria@vmware.com> Reviewed-by:
Doug Covelli <dcovelli@vmware.com> Reviewed-by:
Dan Hecht <dhecht@vmware.com> Acked-by:
H. Peter Anvin <hpa@zytor.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Avi Kivity <avi@redhat.com> Link: http://lkml.kernel.org/r/1358466282.423.60.camel@akataria-dtop.eng.vmware.com Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
- 18 Dec, 2012 1 commit
-
-
Li Zhong authored
This patch adds user eqs exception hooks for async page fault page not present code path, to exit the user eqs and re-enter it as necessary. Async page fault is different from other exceptions that it may be triggered from idle process, so we still need rcu_irq_enter() and rcu_irq_exit() to exit cpu idle eqs when needed, to protect the code that needs use rcu. As Frederic pointed out it would be safest and simplest to protect the whole kvm_async_pf_task_wait(). Otherwise, "we need to check all the code there deeply for potential RCU uses and ensure it will never be extended later to use RCU.". However, We'd better re-enter the cpu idle eqs if we get the exception in cpu idle eqs, by calling rcu_irq_exit() before native_safe_halt(). So the patch does what Frederic suggested for rcu_irq_*() API usage here, except that I moved the rcu_irq_*() pair originally in do_async_page_fault() into kvm_async_pf_task_wait(). That's because, I think it's better to have rcu_irq_*() pairs to be in one function ( rcu_irq_exit() after rcu_irq_enter() ), especially here, kvm_async_pf_task_wait() has other callers, which might cause rcu_irq_exit() be called without a matching rcu_irq_enter() before it, which is illegal if the cpu happens to be in rcu idle state. Signed-off-by:
Li Zhong <zhong@linux.vnet.ibm.com> Signed-off-by:
Gleb Natapov <gleb@redhat.com>
-
- 28 Nov, 2012 2 commits
-
-
Gleb Natapov authored
As Frederic pointed idle_cpu() may return false even if async fault happened in the idle task if wake up is pending. In this case the code will try to put idle task to sleep. Fix this by using is_idle_task() to check for idle task. Reported-by:
Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by:
Gleb Natapov <gleb@redhat.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
Marcelo Tosatti authored
Hook into generic pvclock vsyscall code, with the aim to allow userspace to have visibility into pvclock data. Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
- 22 Oct, 2012 1 commit
-
-
Sasha Levin authored
KVM_PV_REASON_PAGE_NOT_PRESENT kicks cpu out of idleness, but we haven't marked that spot as an exit from idleness. Not doing so can cause RCU warnings such as: [ 732.788386] =============================== [ 732.789803] [ INFO: suspicious RCU usage. ] [ 732.790032] 3.7.0-rc1-next-20121019-sasha-00002-g6d8d02d-dirty #63 Tainted: G W [ 732.790032] ------------------------------- [ 732.790032] include/linux/rcupdate.h:738 rcu_read_lock() used illegally while idle! [ 732.790032] [ 732.790032] other info that might help us debug this: [ 732.790032] [ 732.790032] [ 732.790032] RCU used illegally from idle CPU! [ 732.790032] rcu_scheduler_active = 1, debug_locks = 1 [ 732.790032] RCU used illegally from extended quiescent state! [ 732.790032] 2 locks held by trinity-child31/8252: [ 732.790032] #0: (&rq->lock){-.-.-.}, at: [<ffffffff83a67528>] __schedule+0x178/0x8f0 [ 732.790032] #1: (rcu_read_lock){.+.+..}, at: [<ffffffff81152bde>] cpuacct_charge+0xe/0x200 [ 732.790032] [ 732.790032] stack backtrace: [ 732.790032] Pid: 8252, comm: trinity-child31 Tainted: G W 3.7.0-rc1-next-20121019-sasha-00002-g6d8d02d-dirty #63 [ 732.790032] Call Trace: [ 732.790032] [<ffffffff8118266b>] lockdep_rcu_suspicious+0x10b/0x120 [ 732.790032] [<ffffffff81152c60>] cpuacct_charge+0x90/0x200 [ 732.790032] [<ffffffff81152bde>] ? cpuacct_charge+0xe/0x200 [ 732.790032] [<ffffffff81158093>] update_curr+0x1a3/0x270 [ 732.790032] [<ffffffff81158a6a>] dequeue_entity+0x2a/0x210 [ 732.790032] [<ffffffff81158ea5>] dequeue_task_fair+0x45/0x130 [ 732.790032] [<ffffffff8114ae29>] dequeue_task+0x89/0xa0 [ 732.790032] [<ffffffff8114bb9e>] deactivate_task+0x1e/0x20 [ 732.790032] [<ffffffff83a67c29>] __schedule+0x879/0x8f0 [ 732.790032] [<ffffffff8117e20d>] ? trace_hardirqs_off+0xd/0x10 [ 732.790032] [<ffffffff810a37a5>] ? kvm_async_pf_task_wait+0x1d5/0x2b0 [ 732.790032] [<ffffffff83a67cf5>] schedule+0x55/0x60 [ 732.790032] [<ffffffff810a37c4>] kvm_async_pf_task_wait+0x1f4/0x2b0 [ 732.790032] [<ffffffff81139e50>] ? abort_exclusive_wait+0xb0/0xb0 [ 732.790032] [<ffffffff81139c25>] ? prepare_to_wait+0x25/0x90 [ 732.790032] [<ffffffff810a3a66>] do_async_page_fault+0x56/0xa0 [ 732.790032] [<ffffffff83a6a6e8>] async_page_fault+0x28/0x30 Signed-off-by:
Sasha Levin <sasha.levin@oracle.com> Acked-by:
Gleb Natapov <gleb@redhat.com> Acked-by:
Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
- 23 Aug, 2012 1 commit
-
-
Marcelo Tosatti authored
The distinction between CONFIG_KVM_CLOCK and CONFIG_KVM_GUEST is not so clear anymore, as demonstrated by recent bugs caused by poor handling of on/off combinations of these options. Merge CONFIG_KVM_CLOCK into CONFIG_KVM_GUEST. Reported-By:
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
- 15 Aug, 2012 1 commit
-
-
Florian Westphal authored
else, host continues to update stealtime after reboot, which can corrupt e.g. initramfs area. found when tracking down initramfs unpack error on initial reboot (with qemu-kvm -smp 2, no problem with single-core). Signed-off-by:
Florian Westphal <fw@strlen.de> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
- 16 Jul, 2012 1 commit
-
-
Michael S. Tsirkin authored
Use apic_set_eoi_write, apic_write to avoid meedling in core apic driver data structures directly. Signed-off-by:
Michael S. Tsirkin <mst@redhat.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
- 11 Jul, 2012 1 commit
-
-
Prarit Bhargava authored
While debugging I noticed that unlike all the other hypervisor code in the kernel, kvm does not have an entry for x86_hyper which is used in detect_hypervisor_platform() which results in a nice printk in the syslog. This is only really a stub function but it does make kvm more consistent with the other hypervisors. Signed-off-by:
Prarit Bhargava <prarit@redhat.com> Cc: Avi Kivity <avi@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Marcelo Tostatti <mtosatti@redhat.com> Cc: kvm@vger.kernel.org Signed-off-by:
Avi Kivity <avi@redhat.com>
-
- 25 Jun, 2012 1 commit
-
-
Michael S. Tsirkin authored
The idea is simple: there's a bit, per APIC, in guest memory, that tells the guest that it does not need EOI. Guest tests it using a single est and clear operation - this is necessary so that host can detect interrupt nesting - and if set, it can skip the EOI MSR. I run a simple microbenchmark to show exit reduction (note: for testing, need to apply follow-up patch 'kvm: host side for eoi optimization' + a qemu patch I posted separately, on host): Before: Performance counter stats for 'sleep 1s': 47,357 kvm:kvm_entry [99.98%] 0 kvm:kvm_hypercall [99.98%] 0 kvm:kvm_hv_hypercall [99.98%] 5,001 kvm:kvm_pio [99.98%] 0 kvm:kvm_cpuid [99.98%] 22,124 kvm:kvm_apic [99.98%] 49,849 kvm:kvm_exit [99.98%] 21,115 kvm:kvm_inj_virq [99.98%] 0 kvm:kvm_inj_exception [99.98%] 0 kvm:kvm_page_fault [99.98%] 22,937 kvm:kvm_msr [99.98%] 0 kvm:kvm_cr [99.98%] 0 kvm:kvm_pic_set_irq [99.98%] 0 kvm:kvm_apic_ipi [99.98%] 22,207 kvm:kvm_apic_accept_irq [99.98%] 22,421 kvm:kvm_eoi [99.98%] 0 kvm:kvm_pv_eoi [99.99%] 0 kvm:kvm_nested_vmrun [99.99%] 0 kvm:kvm_nested_intercepts [99.99%] 0 kvm:kvm_nested_vmexit [99.99%] 0 kvm:kvm_nested_vmexit_inject [99.99%] 0 kvm:kvm_nested_intr_vmexit [99.99%] 0 kvm:kvm_invlpga [99.99%] 0 kvm:kvm_skinit [99.99%] 57 kvm:kvm_emulate_insn [99.99%] 0 kvm:vcpu_match_mmio [99.99%] 0 kvm:kvm_userspace_exit [99.99%] 2 kvm:kvm_set_irq [99.99%] 2 kvm:kvm_ioapic_set_irq [99.99%] 23,609 kvm:kvm_msi_set_irq [99.99%] 1 kvm:kvm_ack_irq [99.99%] 131 kvm:kvm_mmio [99.99%] 226 kvm:kvm_fpu [100.00%] 0 kvm:kvm_age_page [100.00%] 0 kvm:kvm_try_async_get_page [100.00%] 0 kvm:kvm_async_pf_doublefault [100.00%] 0 kvm:kvm_async_pf_not_present [100.00%] 0 kvm:kvm_async_pf_ready [100.00%] 0 kvm:kvm_async_pf_completed 1.002100578 seconds time elapsed After: Performance counter stats for 'sleep 1s': 28,354 kvm:kvm_entry [99.98%] 0 kvm:kvm_hypercall [99.98%] 0 kvm:kvm_hv_hypercall [99.98%] 1,347 kvm:kvm_pio [99.98%] 0 kvm:kvm_cpuid [99.98%] 1,931 kvm:kvm_apic [99.98%] 29,595 kvm:kvm_exit [99.98%] 24,884 kvm:kvm_inj_virq [99.98%] 0 kvm:kvm_inj_exception [99.98%] 0 kvm:kvm_page_fault [99.98%] 1,986 kvm:kvm_msr [99.98%] 0 kvm:kvm_cr [99.98%] 0 kvm:kvm_pic_set_irq [99.98%] 0 kvm:kvm_apic_ipi [99.99%] 25,953 kvm:kvm_apic_accept_irq [99.99%] 26,132 kvm:kvm_eoi [99.99%] 26,593 kvm:kvm_pv_eoi [99.99%] 0 kvm:kvm_nested_vmrun [99.99%] 0 kvm:kvm_nested_intercepts [99.99%] 0 kvm:kvm_nested_vmexit [99.99%] 0 kvm:kvm_nested_vmexit_inject [99.99%] 0 kvm:kvm_nested_intr_vmexit [99.99%] 0 kvm:kvm_invlpga [99.99%] 0 kvm:kvm_skinit [99.99%] 284 kvm:kvm_emulate_insn [99.99%] 68 kvm:vcpu_match_mmio [99.99%] 68 kvm:kvm_userspace_exit [99.99%] 2 kvm:kvm_set_irq [99.99%] 2 kvm:kvm_ioapic_set_irq [99.99%] 28,288 kvm:kvm_msi_set_irq [99.99%] 1 kvm:kvm_ack_irq [99.99%] 131 kvm:kvm_mmio [100.00%] 588 kvm:kvm_fpu [100.00%] 0 kvm:kvm_age_page [100.00%] 0 kvm:kvm_try_async_get_page [100.00%] 0 kvm:kvm_async_pf_doublefault [100.00%] 0 kvm:kvm_async_pf_not_present [100.00%] 0 kvm:kvm_async_pf_ready [100.00%] 0 kvm:kvm_async_pf_completed 1.002039622 seconds time elapsed We see that # of exits is almost halved. Signed-off-by:
Michael S. Tsirkin <mst@redhat.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
- 06 May, 2012 1 commit
-
-
Gleb Natapov authored
It turned to be totally unneeded. The reason the code was introduced is so that KVM can prefault swapped in page, but prefault can fail even if mm is pinned since page table can change anyway. KVM handles this situation correctly though and does not inject spurious page faults. Fixes: "INFO: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected" warning while running LTP inside a KVM guest using the recent -next kernel. Reported-by:
Sasha Levin <levinsasha928@gmail.com> Signed-off-by:
Gleb Natapov <gleb@redhat.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
- 05 Apr, 2012 1 commit
-
-
Gleb Natapov authored
"Page ready" async PF can kick vcpu out of idle state much like IRQ. We need to tell RCU about this. Reported-by:
Sasha Levin <levinsasha928@gmail.com> Signed-off-by:
Gleb Natapov <gleb@redhat.com> Reviewed-by:
Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
- 24 Feb, 2012 1 commit
-
-
Ingo Molnar authored
static keys: Introduce 'struct static_key', static_key_true()/false() and static_key_slow_[inc|dec]() So here's a boot tested patch on top of Jason's series that does all the cleanups I talked about and turns jump labels into a more intuitive to use facility. It should also address the various misconceptions and confusions that surround jump labels. Typical usage scenarios: #include <linux/static_key.h> struct static_key key = STATIC_KEY_INIT_TRUE; if (static_key_false(&key)) do unlikely code else do likely code Or: if (static_key_true(&key)) do likely code else do unlikely code The static key is modified via: static_key_slow_inc(&key); ... static_key_slow_dec(&key); The 'slow' prefix makes it abundantly clear that this is an expensive operation. I've updated all in-kernel code to use this everywhere. Note that I (intentionally) have not pushed through the renam...
-
- 27 Dec, 2011 1 commit
-
-
Chris Wright authored
This has not been used for some years now. It's time to remove it. Signed-off-by:
Chris Wright <chrisw@redhat.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
- 24 Jul, 2011 1 commit
-
-
Glauber Costa authored
This patch implements the kvm bits of the steal time infrastructure. The most important part of it, is the steal time clock. It is an continuous clock that shows the accumulated amount of steal time since vcpu creation. It is supposed to survive cpu offlining/onlining. [marcelo: fix build with CONFIG_KVM_GUEST=n] Signed-off-by:
Glauber Costa <glommer@redhat.com> Acked-by:
Rik van Riel <riel@redhat.com> Tested-by:
Eric B Munson <emunson@mgebm.net> CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> CC: Peter Zijlstra <peterz@infradead.org> CC: Avi Kivity <avi@redhat.com> CC: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by:
Avi Kivity <avi@redhat.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
- 17 Mar, 2011 1 commit
-
-
Sedat Dilek authored
WARNING: arch/x86/built-in.o(.text+0x1bb74): Section mismatch in reference from the function kvm_guest_cpu_online() to the function .cpuinit.text:kvm_guest_cpu_init() The function kvm_guest_cpu_online() references the function __cpuinit kvm_guest_cpu_init(). This is often because kvm_guest_cpu_online lacks a __cpuinit annotation or the annotation of kvm_guest_cpu_init is wrong. This patch fixes the warning. Tested with linux-next (next-20101231) Signed-off-by:
Sedat Dilek <sedat.dilek@gmail.com> Acked-by:
Rik van Riel <riel@redhat.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
- 12 Jan, 2011 2 commits
-
-
Avi Kivity authored
Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Gleb Natapov authored
If guest can detect that it runs in non-preemptable context it can handle async PFs at any time, so let host know that it can send async PF even if guest cpu is not in userspace. Acked-by:
Rik van Riel <riel@redhat.com> Signed-off-by:
Gleb Natapov <gleb@redhat.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-