- 13 Jul, 2008 1 commit
-
-
Dmitry Adamushko authored
Commit f18f982a ("sched: CPU hotplug events must not destroy scheduler domains created by the cpusets") introduced a hotplug-related problem as described below: Upon CPU_DOWN_PREPARE, update_sched_domains() -> detach_destroy_domains(&cpu_online_map) does the following: /* * Force a reinitialization of the sched domains hierarchy. The domains * and groups cannot be updated in place without racing with the balancing * code, so we temporarily attach all running cpus to the NULL domain * which will prevent rebalancing while the sched domains are recalculated. */ The sched-domains should be rebuilt when a CPU_DOWN ops. has been completed, effectively either upon CPU_DEAD{_FROZEN} (upon success) or CPU_DOWN_FAILED{_FROZEN} (upon failure -- restore the things to their initial state). That's what update_sched_domains() also does but only for !CPUSETS case. With f18f982a, sched-domains' reinitialization is delegated to CPUSETS code: cpuset_handle_cpuhp() -> common_cpu_mem_hotplug_unplug() -> rebuild_sched_domains() Being called for CPU_UP_PREPARE and if its callback is called after update_sched_domains()), it just negates all the work done by update_sched_domains() -- i.e. a soon-to-be-offline cpu is included in the sched-domains and that makes it visible for the load-balancer while the CPU_DOWN ops. is in progress. __migrate_live_tasks() moves the tasks off a 'dead' cpu (it's already "offline" when this function is called). try_to_wake_up() is called for one of these tasks from another CPU -> the load-balancer (wake_idle()) picks up a "dead" CPU and places the task on it. Then e.g. BUG_ON(rq->nr_running) detects this a bit later -> oops. Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> Tested-by: Vegard Nossum <vegard.nossum@gmail.com> Cc: Paul Menage <menage@google.com> Cc: Max Krasnyansky <maxk@qualcomm.com> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: miaox@cn.fujitsu.com Cc: rostedt@goodmis.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
-
- 10 Jul, 2008 2 commits
-
-
Linus Torvalds authored
Clean up __migrate_task(): to just have separate "done" and "fail" cases, instead of that "out" case with random error behavior. Signed-off-by: Ingo Molnar <mingo@elte.hu>
-
Dmitry Adamushko authored
I think we may have a race between try_to_wake_up() and migrate_live_tasks() -> move_task_off_dead_cpu() when the later one may end up looping endlessly. Interrupts are enabled on other CPUs when migration_call(CPU_DEAD, ...) is called so we may get a race between try_to_wake_up() and migrate_live_tasks() -> move_task_off_dead_cpu(). The former one may push a task out of a dead CPU causing the later one to loop endlessly. Heiko Carstens observed: | That's exactly what explains a dump I got yesterday. Thanks for fixing! :) Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> Cc: miaox@cn.fujitsu.com Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Avi Kivity <avi@qumranet.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
-
- 01 Jul, 2008 1 commit
-
-
Raistlin authored
Here it is another little Oops we found while configuring invalid values via cgroups: echo 0 > /dev/cgroups/0/cpu.rt_period_us or echo 4294967296 > /dev/cgroups/0/cpu.rt_period_us [ 205.509825] divide error: 0000 [#1] [ 205.510151] Modules linked in: [ 205.510151] [ 205.510151] Pid: 2339, comm: bash Not tainted (2.6.26-rc8 #33) [ 205.510151] EIP: 0060:[<c030c6ef>] EFLAGS: 00000293 CPU: 0 [ 205.510151] EIP is at div64_u64+0x5f/0x70 [ 205.510151] EAX: 0000389f EBX: 00000000 ECX: 00000000 EDX: 00000000 [ 205.510151] ESI: d9800000 EDI: 00000000 EBP: c6cede60 ESP: c6cede50 [ 205.510151] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 [ 205.510151] Process bash (pid: 2339, ti=c6cec000 task=c79be370 task.ti=c6cec000) [ 205.510151] Stack: d9800000 0000389f c05971a0 d9800000 c6cedeb4 c0214dbd 00000000 00000000 [ 205.510151] c6cede88 c0242bd8 c05377c0 c7a41b40 00000000 00000000 00000000 c05971a0 [ 205.510151] c780ed20 c7508494 c7a41b40 00000000 00000002 c6cedebc c05971a0 ffffffea [ 205.510151] Call Trace: [ 205.510151] [<c0214dbd>] ? __rt_schedulable+0x1cd/0x240 [ 205.510151] [<c0242bd8>] ? cgroup_file_open+0x18/0xe0 [ 205.510151] [<c0214fe4>] ? tg_set_bandwidth+0xa4/0xf0 [ 205.510151] [<c0215066>] ? sched_group_set_rt_period+0x36/0x50 [ 205.510151] [<c021508e>] ? cpu_rt_period_write_uint+0xe/0x10 [ 205.510151] [<c0242dc5>] ? cgroup_file_write+0x125/0x160 [ 205.510151] [<c0232c15>] ? hrtimer_interrupt+0x155/0x190 [ 205.510151] [<c02f047f>] ? security_file_permission+0xf/0x20 [ 205.510151] [<c0277ad8>] ? rw_verify_area+0x48/0xc0 [ 205.510151] [<c0283744>] ? dupfd+0x104/0x130 [ 205.510151] [<c027838c>] ? vfs_write+0x9c/0x160 [ 205.510151] [<c0242ca0>] ? cgroup_file_write+0x0/0x160 [ 205.510151] [<c027850d>] ? sys_write+0x3d/0x70 [ 205.510151] [<c0203019>] ? sysenter_past_esp+0x6a/0x91 [ 205.510151] ======================= [ 205.510151] Code: 0f 45 de 31 f6 0f ad d0 d3 ea f6 c1 20 0f 45 c2 0f 45 d6 89 45 f0 89 55 f4 8b 55 f4 31 c9 8b 45 f0 39 d3 89 c6 77 08 89 d0 31 d2 <f7> f3 89 c1 83 c4 08 89 f0 f7 f3 89 ca 5b 5e 5d c3 55 89 e5 56 [ 205.510151] EIP: [<c030c6ef>] div64_u64+0x5f/0x70 SS:ESP 0068:c6cede50 The attached patch solves the issue for me. I'm checking as soon as possible for the period not being zero since, if it is, going ahead is useless. This way we also save a mutex_lock() and a read_lock() wrt doing it inside tg_set_bandwidth() or __rt_schedulable(). Signed-off-by: Dario Faggioli <raistlin@linux.it> Signed-off-by: Michael Trimarchi <trimarchimichael@yahoo.it> Signed-off-by: Ingo Molnar <mingo@elte.hu>
-
- 29 Jun, 2008 1 commit
-
-
Dmitry Adamushko authored
the CPU hotplug problems (crashes under high-volume unplug+replug tests) seem to be related to migrate_dead_tasks(). Firstly I added traces to see all tasks being migrated with migrate_live_tasks() and migrate_dead_tasks(). On my setup the problem pops up (the one with "se == NULL" in the loop of pick_next_task_fair()) shortly after the traces indicate that some has been migrated with migrate_dead_tasks()). btw., I can reproduce it much faster now with just a plain cpu down/up loop. [disclaimer] Well, unless I'm really missing something important in this late hour [/desclaimer] pick_next_task() is not something appropriate for migrate_dead_tasks() :-) the following change seems to eliminate the problem on my setup (although, I kept it running only for a few minutes to get a few messages indicating migrate_dead_tasks() does move tasks and the system is still ok) Signed-off-by: Ingo Molnar <mingo@elte.hu>
-
- 25 Jun, 2008 4 commits
-
-
Linus Torvalds authored
-
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6Linus Torvalds authored
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6: [IA64] Eliminate NULL test after alloc_bootmem in iosapic_alloc_rte() [IA64] Handle count==0 in sn2_ptc_proc_write() [IA64] Fix boot failure on ia64/sn2
-
git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixesLinus Torvalds authored
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes: [GFS2] fix gfs2 block allocation (cleaned up) [GFS2] BUG: unable to handle kernel paging request at ffff81002690e000
-
git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvmLinus Torvalds authored
* 'kvm-updates-2.6.26' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: KVM: Remove now unused structs from kvm_para.h x86: KVM guest: Use the paravirt clocksource structs and functions KVM: Make kvm host use the paravirt clocksource structs x86: Make xen use the paravirt clocksource structs and functions x86: Add structs and functions for paravirt clocksource KVM: VMX: Fix host msr corruption with preemption enabled KVM: ioapic: fix lost interrupt when changing a device's irq KVM: MMU: Fix oops on guest userspace access to guest pagetable KVM: MMU: large page update_pte issue with non-PAE 32-bit guests (resend) KVM: MMU: Fix rmap_write_protect() hugepage iteration bug KVM: close timer injection race window in __vcpu_run KVM: Fix race between timer migration and vcpu migration
-
- 24 Jun, 2008 26 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdogLinus Torvalds authored
* git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog: Revert "[WATCHDOG] hpwdt: Add CFLAGS to get driver working"
-
Linus Torvalds authored
Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: xen: remove support for non-PAE 32-bit
-
git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdbLinus Torvalds authored
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb: kgdb: sparse fix kgdb: documentation update - remove kgdboe
-
Jie Luo authored
On 9xx chips, bus mastering needs to be enabled at resume time for much of the chip to function. With this patch, vblank interrupts will work as expected on resume, along with other chip functions. Fixes kernel bugzilla #10844. Signed-off-by: Jie Luo <clotho67@gmail.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Gerd Hoffmann authored
The kvm_* structs are obsoleted by the pvclock_* ones. Now all users have been switched over and the old structs can be dropped. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
-
Gerd Hoffmann authored
This patch updates the kvm host code to use the pvclock structs and functions, thereby making it compatible with Xen. The patch also fixes an initialization bug: on SMP systems the per-cpu has two different locations early at boot and after CPU bringup. kvmclock must take that in account when registering the physical address within the host. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
-
Gerd Hoffmann authored
This patch updates the kvm host code to use the pvclock structs. It also makes the paravirt clock compatible with Xen. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
-
Gerd Hoffmann authored
This patch updates the xen guest to use the pvclock structs and helper functions. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
-
Gerd Hoffmann authored
This patch adds structs for the paravirt clocksource ABI used by both xen and kvm (pvclock-abi.h). It also adds some helper functions to read system time and wall clock time from a paravirtual clocksource (pvclock.[ch]). They are based on the xen code. They are enabled using CONFIG_PARAVIRT_CLOCK. Subsequent patches of this series will put the code in use. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
-
Benjamin Marzinski authored
This patch fixes bz 450641. This patch changes the computation for zero_metapath_length(), which it renames to metapath_branch_start(). When you are extending the metadata tree, The indirect blocks that point to the new data block must either diverge from the existing tree either at the inode, or at the first indirect block. They can diverge at the first indirect block because the inode has room for 483 pointers while the indirect blocks have room for 509 pointers, so when the tree is grown, there is some free space in the first indirect block. What metapath_branch_start() now computes is the height where the first indirect block for the new data block is located. It can either be 1 (if the indirect block diverges from the inode) or 2 (if it diverges from the first indirect block). Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
-
Julia Lawall authored
As noted by Akinobu Mita alloc_bootmem and related functions never return NULL and always return a zeroed region of memory. Thus a NULL test or memset after calls to these functions is unnecessary. Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Tony Luck <tony.luck@intel.com>
-
Cliff Wickman authored
The fix applied in e0c6d97c "security hole in sn2_ptc_proc_write" didn't take into account the case where count==0 (which results in a buffer underrun when adding the trailing '\0'). Thanks to Andi Kleen for pointing this out. Signed-off-by: Cliff Wickman <cpw@sgi.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
-
Jes Sorensen authored
Call check_sal_cache_flush() after platform_setup() as check_sal_cache_flush() now relies on being able to call platform vector code. Problem was introduced by: 3463a93d "Update check_sal_cache_flush to use platform_send_ipi()" Signed-off-by: Jes Sorensen <jes@sgi.com> Tested-by: Alex Chiang: <achiang@hp.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
-
Jason Wessel authored
- Fix warning reported by sparse kernel/kgdb.c:1502:6: warning: symbol 'kgdb_console_write' was not declared. Should it be static? Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
-
Jason Wessel authored
kgdboe is not presently included kgdb, and there should be no references to it. Also fix the tcp port terminal connection example. Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
-
Jeremy Fitzhardinge authored
Non-PAE operation has been deprecated in Xen for a while, and is rarely tested or used. xen-unstable has now officially dropped non-PAE support. Since Xen/pvops' non-PAE support has also been broken for a while, we may as well completely drop it altogether. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
-
Bob Peterson authored
This patch fixes bugzilla bug bz448866: gfs2: BUG: unable to handle kernel paging request at ffff81002690e000. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
-
Wim Van Sebroeck authored
After Linus fixed the inline assembly, the CFLAGS option is not needed anymore. Signed-off-by: Thomas Mingarelli <Thomas.Mingarelli@hp.com> Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
-
Avi Kivity authored
Switching msrs can occur either synchronously as a result of calls to the msr management functions (usually in response to the guest touching virtualized msrs), or asynchronously when preempting a kvm thread that has guest state loaded. If we're unlucky enough to have the two at the same time, host msrs are corrupted and the machine goes kaput on the next syscall. Most easily triggered by Windows Server 2008, as it does a lot of msr switching during bootup. Signed-off-by: Avi Kivity <avi@qumranet.com>
-
Avi Kivity authored
The ioapic acknowledge path translates interrupt vectors to irqs. It currently uses a first match algorithm, stopping when it finds the first redirection table entry containing the vector. That fails however if the guest changes the irq to a different line, leaving the old redirection table entry in place (though masked). Result is interrupts not making it to the guest. Fix by always scanning the entire redirection table. Signed-off-by: Avi Kivity <avi@qumranet.com>
-
Avi Kivity authored
KVM has a heuristic to unshadow guest pagetables when userspace accesses them, on the assumption that most guests do not allow userspace to access pagetables directly. Unfortunately, in addition to unshadowing the pagetables, it also oopses. This never triggers on ordinary guests since sane OSes will clear the pagetables before assigning them to userspace, which will trigger the flood heuristic, unshadowing the pagetables before the first userspace access. One particular guest, though (Xenner) will run the kernel in userspace, triggering the oops. Since the heuristic is incorrect in this case, we can simply remove it. Signed-off-by: Avi Kivity <avi@qumranet.com>
-
Marcelo Tosatti authored
kvm_mmu_pte_write() does not handle 32-bit non-PAE large page backed guests properly. It will instantiate two 2MB sptes pointing to the same physical 2MB page when a guest large pte update is trapped. Instead of duplicating code to handle this, disallow directory level updates to happen through kvm_mmu_pte_write(), so the two 2MB sptes emulating one guest 4MB pte can be correctly created by the page fault handling path. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
-
Marcelo Tosatti authored
rmap_next() does not work correctly after rmap_remove(), as it expects the rmap chains not to change during iteration. Fix (for now) by restarting iteration from the beginning. Signed-off-by: Avi Kivity <avi@qumranet.com>
-
Marcelo Tosatti authored
If a timer fires after kvm_inject_pending_timer_irqs() but before local_irq_disable() the code will enter guest mode and only inject such timer interrupt the next time an unrelated event causes an exit. It would be simpler if the timer->pending irq conversion could be done with IRQ's disabled, so that the above problem cannot happen. For now introduce a new vcpu requests bit to cancel guest entry. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
-
Marcelo Tosatti authored
A guest vcpu instance can be scheduled to a different physical CPU between the test for KVM_REQ_MIGRATE_TIMER and local_irq_disable(). If that happens, the timer will only be migrated to the current pCPU on the next exit, meaning that guest LAPIC timer event can be delayed until a host interrupt is triggered. Fix it by cancelling guest entry if any vcpu request is pending. This has the side effect of nicely consolidating vcpu->requests checks. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
-
Thorsten Kranzkowski authored
Commit 9267b4b3 ("alpha: fix module load failures on smp (bug #10926)") causes a regression for my ev4 uniprocessor build: CC arch/alpha/mm/init.o /export/data/repositories/linux-2.6/arch/alpha/mm/init.c:34: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘typeof’ make[2]: *** [arch/alpha/mm/init.o] Error 1 make[1]: *** [arch/alpha/mm] Error 2 make: *** [sub-make] Error 2 This fixes it for me (compile and boot tested): Signed-off-by: Thorsten Kranzkowski <dl8bcu@dl8bcu.de> Acked-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
- 23 Jun, 2008 5 commits
-
-
git://git.linux-nfs.org/projects/trondmy/nfs-2.6Linus Torvalds authored
* 'hotfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: NFS: nfs_updatepage(): don't mark page as dirty if an error occurred NFS: Fix filehandle size comparisons in the mount code NFS: Reduce the NFS mount code stack usage.
-
Trond Myklebust authored
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
-
Trond Myklebust authored
Fix a sign issue in xdr_decode_fhstatus3() Fix incorrect comparison in nfs_validate_mount_data() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
-
Trond Myklebust authored
This appears to fix the Oops reported in http://bugzilla.kernel.org/show_bug.cgi?id=10826Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
-
Linus Torvalds authored
Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: futexes: fix fault handling in futex_lock_pi
-