- 08 Jan, 2005 40 commits
-
-
Ingo Molnar authored
The attached patch fixes long latencies in unmap_vmas(). We had lockbreak code in that function already but it did not take delayed effects of TLB-gather into account. Has been tested as part of the -VP patchset. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Ingo Molnar authored
The attached patch fixes long scheduling latencies caused by backlog triggered by __release_sock(). That code only executes in process context, and we've made the backlog queue private already at this point so it is safe to do a cond_resched_softirq(). Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Ingo Molnar authored
The attached patch fixes long scheduling latencies caused by access to the /proc/net/tcp file. The seqfile functions keep softirqs disabled for a very long time (i've seen reports of 20+ msecs, if there are enough sockets in the system). With the attached patch it's below 100 usecs. The cond_resched_softirq() relies on the implicit knowledge that this code executes in process context and runs with softirqs disabled. Potentially enabling softirqs means that the socket list might change between buckets - but this is not an issue since seqfiles have a 4K iteration granularity anyway and /proc/net/tcp is often (much) larger than that. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Ingo Molnar authored
The attached patch fixes long scheduling latencies in select_parent() and prune_dcache(). The prune_dcache() lock-break is easy, but for select_parent() the only viable solution i found was to break out if there's a resched necessary - the reordering is not necessary and the dcache scanning/shrinking will later on do it anyway. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Ingo Molnar authored
break latency in invalidate_list(). Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Ingo Molnar authored
The attached patch fixes long scheduling latencies in the ext3 code, and it also cleans up the existing lock-break functionality to use the new primitives. This patch has been in the -VP patchset for quite some time. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Ingo Molnar authored
It adds cond_resched_softirq() which can be used by _process context_ softirqs-disabled codepaths to preempt if necessary. The function will enable softirqs before scheduling. (Later patches will use this primitive.) Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Ingo Molnar authored
Add lock_need_resched() which is to check for the necessity of lock-break in a critical section. Used by later latency-break patches. tested on x86, should work on all architectures. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Ingo Molnar authored
This is another generic fallout from the voluntary-preempt patchset: a cleanup of the cond_resched() infrastructure, in preparation of the latency reduction patches. The changes: - uninline cond_resched() - this makes the footprint smaller, especially once the number of cond_resched() points increase. - add a 'was rescheduled' return value to cond_resched. This makes it symmetric to cond_resched_lock() and later latency reduction patches rely on the ability to tell whether there was any preemption. - make cond_resched() more robust by using the same mechanism as preempt_kernel(): by using PREEMPT_ACTIVE. This preserves the task's state - e.g. if the task is in TASK_ZOMBIE but gets preempted via cond_resched() just prior scheduling off then this approach preserves TASK_ZOMBIE. - the patch also adds need_lockbreak() which critical sections can use to detect lock-break requests. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Ingo Molnar authored
SMP locking latencies are one of the last architectural problems that cause millisec-category scheduling delays. CONFIG_PREEMPT tries to solve some of the SMP issues but there are still lots of problems remaining: spinlocks nested at multiple levels, spinning with irqs turned off, and non-nested spinning with preemption turned off permanently. The nesting problem goes like this: if a piece of kernel code (e.g. the MM or ext3's journalling code) does the following: spin_lock(&spinlock_1); ... spin_lock(&spinlock_2); ... then even with CONFIG_PREEMPT enabled, current kernels may spin on spinlock_2 indefinitely. A number of critical sections break their long paths by using cond_resched_lock(), but this does not break the path on SMP, because need_resched() *of the other CPU* is not set so cond_resched_lock() doesnt notice that a reschedule is due. to solve this problem i've introduced a new spinlock field, lock->break_lock, which signals towards the holding CPU that a spinlock-break is requested by another CPU. This field is only set if a CPU is spinning in a spinlock function [at any locking depth], so the default overhead is zero. I've extended cond_resched_lock() to check for this flag - in this case we can also save a reschedule. I've added the lock_need_resched(lock) and need_lockbreak(lock) methods to check for the need to break out of a critical section. Another latency problem was that the stock kernel, even with CONFIG_PREEMPT enabled, didnt have any spin-nicely preemption logic for the following, commonly used SMP locking primitives: read_lock(), spin_lock_irqsave(), spin_lock_irq(), spin_lock_bh(), read_lock_irqsave(), read_lock_irq(), read_lock_bh(), write_lock_irqsave(), write_lock_irq(), write_lock_bh(). Only spin_lock() and write_lock() [the two simplest cases] where covered. In addition to the preemption latency problems, the _irq() variants in the above list didnt do any IRQ-enabling while spinning - possibly resulting in excessive irqs-off sections of code! preempt-smp.patch fixes all these latency problems by spinning irq-nicely (if possible) and by requesting lock-breaks if needed. Two architecture-level changes were necessary for this: the addition of the break_lock field to spinlock_t and rwlock_t, and the addition of the _raw_read_trylock() function. Testing done by Mark H Johnson and myself indicate SMP latencies comparable to the UP kernel - while they were basically indefinitely high without this patch. i successfully test-compiled and test-booted this patch ontop of BK-curr using the following .config combinations: SMP && PREEMPT, !SMP && PREEMPT, SMP && !PREEMPT and !SMP && !PREEMPT on x86, !SMP && !PREEMPT and SMP && PREEMPT on x64. I also test-booted x86 with the generic_read_trylock function to check that it works fine. Essentially the same patch has been in testing as part of the voluntary-preempt patches for some time already. NOTE to architecture maintainers: generic_raw_read_trylock() is a crude version that should be replaced with the proper arch-optimized version ASAP. From: Hugh Dickins <hugh@veritas.com> The i386 and x86_64 _raw_read_trylocks in preempt-smp.patch are too successful: atomic_read() returns a signed integer. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Nathan Lynch authored
Call idle_task_exit from cpu_die to avoid mm_struct leak. Signed-off-by: Nathan Lynch <nathanl@austin.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Nathan Lynch authored
Heiko Carstens figured out that offlining a cpu can leak mm_structs because the dying cpu's idle task fails to switch to init_mm and mmdrop its active_mm before the cpu is down. This patch introduces idle_task_exit, which allows the idle task to do this as Ingo suggested. I will follow this up with a patch for ppc64 which calls idle_task_exit from cpu_die. Signed-off-by: Nathan Lynch <nathanl@austin.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Josh Aas authored
This patch removes two outdated/misleading comments from the CPU scheduler. 1) The first comment removed is simply incorrect. The function it comments on is not used for what the comments says it is anymore. 2) The second comment is a leftover from when the "if" block it comments on contained a goto. It does not any more, and the comment doesn't make sense. There isn't really a reason to add different comments, though someone might feel differently in the case of the second one. I'll leave adding a comment to anybody who wants to - more important to just get rid of them now. Signed-off-by: Josh Aas <josha@sgi.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Dean Nelson authored
This patch exports sched_setscheduler() so that it can be used by a kernel module to set a kthread's scheduling policy and associated parameters. Signed-off-by: Dean Nelson <dcn@sgi.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Robert Love authored
no need to call task_rq in setscheduler; just use rq Signed-Off-By: Robert Love <rml@novell.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Oleg Nesterov authored
Replace open-coded thread_group_leader() calls. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Oleg Nesterov authored
schedule() can use prev instead of get_current(). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Con Kolivas authored
Special casing tasks by interactive credit was helpful for preventing fully cpu bound tasks from easily rising to interactive status. However it did not select out tasks that had periods of being fully cpu bound and then sleeping while waiting on pipes, signals etc. This led to a more disproportionate share of cpu time. Backing this out will no longer special case only fully cpu bound tasks, and prevents the variable behaviour that occurs at startup before tasks declare themseleves interactive or not, and speeds up application startup slightly under certain circumstances. It does cost in interactivity slightly as load rises but it is worth it for the fairness gains. Signed-off-by: Con Kolivas <kernel@kolivas.org> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Con Kolivas authored
Change the granularity code to requeue tasks at their best priority instead of changing priority while they're running. This keeps tasks at their top interactive level during their whole timeslice. Signed-off-by: Con Kolivas <kernel@kolivas.org> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Con Kolivas authored
We can requeue tasks for cheaper then doing a complete dequeue followed by an enqueue. Add the requeue_task function and perform it where possible. This will be hit frequently by upcoming changes to the requeueing in timeslice granularity. Signed-off-by: Con Kolivas <kernel@kolivas.org> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Con Kolivas authored
The minimum timeslice was decreased from 10ms to 5ms. In the process, the timeslice granularity was leading to much more rapid round robinning of interactive tasks at cache trashing levels. Restore minimum granularity to 10ms. Signed-off-by: Con Kolivas <kernel@kolivas.org> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Con Kolivas authored
Timeslice proportion has been increased substantially for -niced tasks. As a result of this kernel threads have much larger timeslices than they previously had. Change kernel threads' nice value to -5 to bring their timeslice back in line with previous behaviour. This means kernel threads will be less likely to cause large latencies under periods of system stress for normal nice 0 tasks. Signed-off-by: Con Kolivas <kernel@kolivas.org> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Con Kolivas authored
Convert whitespace in sched.c to tabs Signed-off-by: Con Kolivas <kernel@kolivas.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Anton Blanchard authored
Reset cache_hot_time to sane values (in the ms range). Some recent changes resulted in values in the us range. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Matthew Dobson authored
There is a small problem with the active_load_balance() patch that Darren sent out last week. As soon as we discover a potential 'target_cpu' from 'cpu_group' to try to push tasks to, we cease considering other CPUs in that group as potential 'target_cpu's. We break out of the for_each_cpu_mask() loop and try to push tasks to that CPU. The problem is that there may well be other idle cpus in that group that we should also try to push tasks to. Here is a patch to fix that small problem. The solution is to simply move the code that tries to push the tasks into the for_each_cpu_mask() loop and do away with the whole 'target_cpu' thing entirely. Compiled & booted on a 16-way x440. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Andrew Theurer authored
Allow idle_balance to search an incresingly larger span of cpus to find a cpu. Minor change, NODE_SD_INIT gets SD_BALANCE_NEWIDLE flag. This is critical for x86_64, where there is only one cpu oer node. In the current code, idle_balance for Opteron -never- works. Signed-off-by: <habanero@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Andrew Theurer authored
Fix can_migrate to allow aggressive steal for idle cpus. This -was- in mainline, but I believe sched_domains kind of blasted it outta there. IMO, it's a no brainer for an idle cpu (with all that cache going to waste) to be granted to steal a task. The one enhancement I have made was to make sure the whole cpu was idle. Signed-off-by: <habanero@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Andrew Theurer authored
This patch addresses some problems with wake_idle(). Currently wake_idle() will wake a task on an alternate cpu if: 1) task->cpu is not idle 2) an idle cpu can be found However the span of cpus to look for is very limited (only the task->cpu's sibling). The scheduler should find the closest idle cpu, starting with the lowest level domain, then going to higher level domains if allowed (doamin has flag SD_WAKE_IDLE). This patch does this. This and the other two patches (also to be submitted) combined have provided as much at 5% improvement on that "online transaction DB workload" and 2% on the industry standard J@EE workload. I asked Martin Bligh to test these for regression, and he did not find any. I would like to submit for inclusion to -mm and barring any problems eventually to mainline. Signed-off-by: <habanero@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Neil Brown authored
Avoid unlock-without-lock problem on error path in nfsd4_setclientid_confirm Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Greg Banks authored
With Keith Owens <kaos@sgi.com> This patch from Keith Owens fixes a bug in the ia64 port of oprofile when built without the kdb patch and with a pre-3.4 gcc. If you build a standard kernel with gcc < 3.4 then ia64_spinlock_contention_pre3_4 is defined. But a standard kernel does not have ia64_spinlock_contention_pre3_4_end, that label is only added by the kdb patch. To get the backtrace profiling with gcc < 3.4, the _end label needs to be added as part of the kernprof patch, then I will remove it from kdb. Signed-off-by: Keith Owens <kaos@sgi.com> Signed-off-by: Greg Banks <gnb@melbourne.sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Thayne Harbaugh authored
This patch makes several tweaks so that an initramfs image can be completely created by an unprivileged user. It should maintain compatibility with previous initramfs early userspace cpio/image creation and it updates documentation. There are a few very important tweaks: CONFIG_INITRAMFS_SOURCE is now either a single cpio archive that is directly used or a list of directories and files for building a cpio archive for the initramfs image. Making the cpio archive listable in CONFIG_INITRAMFS_SOURCE makes the cpio step more official and automated so that it doesn't have to be copied by hand to usr/initramfs_data.cpio (I think this was broken anyway and would be overwritten). The alternative list of directories *and* files means that files can be install in a "root" directory and device-special files can be listed in a file list. CONFIG_ROOT_UID and CONFIG_ROOT_GID are now available for doing simple user/group ID translation. That means that user ID 500, group ID 500 can create all the files in the "root" directory, but that they can all be owned by user ID 0, group ID 0 in the cpio image. Various documentation updates to pull it all together. Removal of old cruft that was unused/misleading. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Theodore Y. Ts'o authored
telldir() is broken on large ext3 dir_index'd directories because getdents() gives d_off==0 for the first entry Here's a patch which fixes the problem, but note the following warning from the readdir man page: According to POSIX, the dirent structure contains a field char d_name[] of unspecified size, with at most NAME_MAX characters preceding the terminating null character. Use of other fields will harm the porta- bility of your programs. Also, as always, telldir() and seekdir() are truly awful interfaces because they implicitly assume that (a) a directory is a linear data structure, and (b) that the position in a directory can be expressed in a cookie which hsa only 31 bits on 32-bit systems. So there will be hash colliions that will cause programs that assume that seekdir(dirent->d_off) will always return the next directory entry to sometimes lose directory entries in the not-as-unlikely-as-we-would wish case of a 31-bit hash collision. Really, any program which is using telldir/seekdir really should be rewritten to not use these interfaces if at all possible. So with these caveats.... What we need to do is wire '.' and '..' to have hash values of (0,0) and (2,0), respectively, without ignoring other existing dirents with colliding hashes. (In those cases the programs will break, but they are statistically rare, and there's not much we can do in those cases anyway.) Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Milton D. Miller II authored
According to "initramfs buffer format -- third draft" http://lwn.net/2002/0117/a/initramfs-buffer-format.php3 "the cpio "TRAILER!!!" entry (cpio end-of-archive) is optional, but is not ignored" The kernel handling does not follow this spec. If you add null padding after an uncompressed cpio without TRAILER!!! the kernel complains "no cpio magic". In a gzipped archive one gets "junk in gzipped archive" without the TRAILER!!! This patch changes the state transitions so the kernel will follow the spec. Tested: padded uncompressed, padded compressed, unpadded compressed (error) and trailing junk in compressed (error) === I have a boot loader that knows how to load files, determine their size, and advance to the next 4-byte boundary and reports the total size of the files loaded. It doesn't understand about converting this number to some ASCII representation. With this patch I can embed the contents of a file padded with NULs with out knowing the exact size of the file with the following files: 1) file containing cpio header & file name, padded to 4 bytes 2) contents of file 3) pad file of zeros, the size at least as large as the that specified for the file. hpa points out that you should be careful with the headers, use unique inode numbers and/or add a cpio header with just TRAILER!!! to reset the inode hash table to avoid unwanted hard links. I just put this sequence as the last files loaded. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Thayne Harbaugh authored
This patch makes gen_init_cpio more complete by adding symlink, pipe and socket support. It updates scripts/gen_initramfs_list.sh to support the new types. The patch applies to the recent mm series that already have the updated gen_init_cpio and gen_initramfs_list.sh. From: William Lee Irwin III <wli@holomorphy.com> The rest of gen_init_cpio.c seems to cast the result of strlen() to handle this situation, so this patch follows suit while killing off size_t -related printk() warnings. Signed-off-by: William Irwin <wli@holomorphy.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Randy Dunlap authored
Fix init & exit section usage, started with this diagnostic from reference_discarded.pl (make buildcheck): Error: ./drivers/misc/ibmasm/module.o .data refers to 00000058 R_386_32 .exit.text Signed-off-by: Randy Dunlap <rddunlap@osdl.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Jeff Garzik authored
Attached fixes sysfs naming of sx8 block devs to follow LANANA naming. You then get /sys/block/sx8!0, etc instead of /sys/block/sx80_0 (device names should be /dev/sx8/0 instead of /dev/sx80_0) Signed-off-by: Jeff Garzik <jgarzik@pobox.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Prasanna Meda authored
Small issue: return value missed in getdents64, but handled in getdents. Signed-Off-by: Prasanna Meda <pmeda@akamai.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Stas Sergeev authored
The attached patch fixes the CD-ROM autoclose. It is broken in recent kernels for CD-ROMs that do not properly report that the tray is opened. Now on such a drives the kernel will do one close attempt and check for the disc again. This is how it used to work in the past. Signed-off-by: Stas Sergeev <stsp@aknet.ru> Acked-by: Alexander Kern <alex.kern@gmx.de> Acked-by: Jens Axboe <axboe@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Nathan Lynch authored
proc_create() needs to check that the name of an entry to be created does not contain a '/' character. To test, I hacked the ibmveth driver to try to call request_irq with a bogus "foo/bar" devname. The creation of the /proc/irq/1234/xxx entry silently fails, as intended. Perhaps the irq code should be made to check for the failure. Signed-off-by: Nathan Lynch <nathanl@austin.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Olaf Hering authored
A few users of request_irq pass a string with '/'. As a result, ls -l /proc/irq/*/* will fail to list these entries. Signed-off-by: Olaf Hering <olh@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-