An error occurred fetching the project authors.
- 14 May, 2004 1 commit
-
-
Andrew Morton authored
From: William Lee Irwin III <wli@holomorphy.com> This patch series is solving the "thundering herd" problem that occurs in the mainline implementation of hashed waitqueues. There are two sources of spurious wakeups in such arrangements: (a) Hash collisions that place waiters on different objects on the same waitqueue, which wakes threads falsely when any of the objects hashed to the same queue receives a wakeup. i.e. loss of information about which object a wakeup event is related to. (b) Loss of information about which object a given waiter is waiting on. This precludes wake-one semantics for mutual exclusion scenarios. For instance, a lock bit may be slept on. If there are any waiters on the object, a lock bit release event must wake at least one of them so as to prevent deadlock. But without information as to which waiter is waiting on which object, we must resort to waking all waiters who could possibly be waiting on it. Now, as the lock bit provides mutual exclusion, only one of the waiters woken can proceed, and the remainder will go back to sleep and wait for another event, creating unnecessary system load. Once wake-one semantics are established, only one of the waiters waiting to acquire a lock bit need to be woken, which measurably reduces system load and improves efficiency (i.e. it's the subject of the benchmarking I've been sending to you). Even beyond the measurable efficiency gains, there are reasons of robustness and responsiveness to motivate addressing the issue of thundering herds. In a real-life scenario I've been personally involved in resolving, the thundering herd issue caused powerful modern SMP machines with fast IO systems to be unresponsive to user input for a minute at a time or more. Analogues of these patches for the distro kernels involved fully resolved the issue to the customer's satisfaction and obviated workarounds to limit the pagecache's size. The latest spin of these patches basically shoves more pieces of the logic into the wakeup functions, with some efficiency gains from sharing the hot codepath with the rest of the kernel, and a slightly larger diff than the patches with the newly-introduced entrypoint. Writing these was motivated by the push to insulate sched.c from more of the details of wakeup semantics by putting more of the logic into the wakeup functions. In order to accomplish this while still solving (b), the wakeup functions grew a new argument for communication about what object a wakeup event is related to to be passed by the waker. ========= This patch provides an additional argument to wakeup functions so that information may be passed from the waker to the waiter. This is provided as a separate patch so that the overhead of the additional argument can be measured in isolation. No change in performance was observable here.
-
- 10 May, 2004 2 commits
-
-
Andrew Morton authored
From: Srivatsa Vaddagiri <vatsa@in.ibm.com> migrate_all_tasks is currently run with rest of the machine stopped. It iterates thr' the complete task table, turning off cpu affinity of any task that it finds affine to the dying cpu. Depending on the task table size this can take considerable time. All this time machine is stopped, doing nothing. Stopping the machine for such extended periods can be avoided if we do task migration in CPU_DEAD notification and that's precisely what this patch does. The patch puts idle task to the _front_ of the dying CPU's runqueue at the highest priority possible. This cause idle thread to run _immediately_ after kstopmachine thread yields. Idle thread notices that its cpu is offline and dies quickly. Task migration can then be done at leisure in CPU_DEAD notification, when rest of the CPUs are running. Some advantages with this approach are: - More scalable. Predicatable amout of time that machine is stopped. - No changes to hot path/core code. We are just exploiting scheduler rules which runs the next high-priority task on the runqueue. Also since I put idle task to the _front_ of the runqueue, there are no races when a equally high priority task is woken up and added to the runqueue. It gets in at the back of the runqueue, _after_ idle task! - cpu_is_offline check that is presenty required in try_to_wake_up, idle_balance and rebalance_tick can be removed, thus speeding them up a bit From: Srivatsa Vaddagiri <vatsa@in.ibm.com> Rusty mentioned that the unlikely hints against cpu_is_offline is redundant since the macro already has that hint. Patch below removes those redundant hints I added.
-
Andrew Morton authored
From: Ingo Molnar <mingo@elte.hu> Implement balancing during clone(). It does the following things: - introduces SD_BALANCE_CLONE that can serve as a tool for an architecture to limit the search-idlest-CPU scope on clone(). E.g. the 512-CPU systems should rather not enable this. - uses the highest sd for the imbalance_pct, not this_rq (which didnt make sense). - unifies balance-on-exec and balance-on-clone via the find_idlest_cpu() function. Gets rid of sched_best_cpu() which was still a bit inconsistent IMO, it used 'min_load < load' as a condition for balancing - while a more correct approach would be to use half of the imbalance_pct, like passive balancing does. - the patch also reintroduces the possibility to do SD_BALANCE_EXEC on SMP systems, and activates it - to get testing. - NOTE: there's one thing in this patch that is slightly unclean: i introduced wake_up_forked_thread. I did this to make it easier to get rid of this patch later (wake_up_forked_process() has lots of dependencies in various architectures). If this capability remains in the kernel then i'll clean it up and introduce one function for wake_up_forked_process/thread. - NOTE2: i added the SD_BALANCE_CLONE flag to the NUMA CPU template too. Some NUMA architectures probably want to disable this.
-
- 30 Apr, 2004 1 commit
-
-
Andrew Morton authored
The recent slab alignment changes broke an unknown number of architectures (parisc and x86_64 for sure) by causing task_structs to be insufficiently aligned. We need good alignemnt because architectures do things like dumping FP state into the task_struct with instructions which require particular alignment (I think). So change the default alignment to L1_CACHE_BYTES, which is what we used to have, via SLAB_HW_CACHE_ALIGN.
-
- 12 Apr, 2004 8 commits
-
-
Andrew Morton authored
From: <john.l.byrne@hp.com> In do_fork(), if an error occurs after the mm_struct for the child has been allocated, it is never freed. The exit_mm() meant to free it increments the mm_count and this count is never decremented. (For a running process that is exitting, schedule() takes care this; however, the child process being cleaned up is not running.) In the CLONE_VM case, the parent's mm_struct will get an extra mm_count and so it will never be freed. This patch should fix both the CLONE_VM and the not CLONE_VM case; the test of p->active_mm prevents a panic in the case that a kernel-thread is being cloned.
-
Andrew Morton authored
From: Roland McGrath <roland@redhat.com> The posix-timers implementation associates timers with the creating thread and destroys timers when their creator thread dies. POSIX clearly specifies that these timers are per-process, and a timer should not be torn down when the thread that created it exits. I hope there won't be any controversy on what the correct semantics are here, since POSIX is clear and the Linux feature is called "posix-timers". The attached program built with NPTL -lrt -lpthread demonstrates the bug. The program is correct by POSIX, but fails on Linux. Note that a until just the other day, NPTL had a trivial bug that always disabled its use of kernel timer syscalls (check strace for lack of timer_create/SYS_259). So unless you have built your own NPTL libs very recently, you probably won't see the kernel calls actually used by this program. Also attached is my patch to fix this. It (you guessed it) moves the posix_timers field from task_struct to signal_struct. Access is now governed by the siglock instead of the task lock. exit_itimers is called from __exit_signal, i.e. only on the death of the last thread in the group, rather than from do_exit for every thread. Timers' it_process fields store the group leader's pointer, which won't die. For the case of SIGEV_THREAD_ID, I hold a ref on the task_struct for it_process to stay robust in case the target thread dies; the ref is released and the dangling pointer cleared when the timer fires and the target thread is dead. (This should only come up in a buggy user program, so noone cares exactly how the kernel handles that case. But I think what I did is robust and sensical.) /* Test for bogus per-thread deletion of timers. */ #include <stdio.h> #include <error.h> #include <time.h> #include <signal.h> #include <stdint.h> #include <sys/time.h> #include <sys/resource.h> #include <unistd.h> #include <pthread.h> /* Creating timers in another thread should work too. */ static void *do_timer_create(void *arg) { struct sigevent *const sigev = arg; timer_t *const timerId = sigev->sigev_value.sival_ptr; if (timer_create(CLOCK_REALTIME, sigev, timerId) < 0) { perror("timer_create"); return NULL; } return timerId; } int main(void) { int i, res; timer_t timerId; struct itimerspec itval; struct sigevent sigev; itval.it_interval.tv_sec = 2; itval.it_interval.tv_nsec = 0; itval.it_value.tv_sec = 2; itval.it_value.tv_nsec = 0; sigev.sigev_notify = SIGEV_SIGNAL; sigev.sigev_signo = SIGALRM; sigev.sigev_value.sival_ptr = (void *)&timerId; for (i = 0; i < 100; i++) { printf("cnt = %d\n", i); pthread_t thr; res = pthread_create(&thr, NULL, &do_timer_create, &sigev); if (res) { error(0, res, "pthread_create"); continue; } void *val; res = pthread_join(thr, &val); if (res) { error(0, res, "pthread_join"); continue; } if (val == NULL) continue; res = timer_settime(timerId, 0, &itval, NULL); if (res < 0) perror("timer_settime"); res = timer_delete(timerId); if (res < 0) perror("timer_delete"); } return 0; }
-
Andrew Morton authored
From: Rik Faith <faith@redhat.com> This patch provides a low-overhead system-call auditing framework for Linux that is usable by LSM components (e.g., SELinux). This is an update of the patch discussed in this thread: http://marc.theaimsgroup.com/?t=107815888100001&r=1&w=2 In brief, it provides for netlink-based logging of audit records that have been generated in other parts of the kernel (e.g., SELinux) as well as the ability to audit system calls, either independently (using simple filtering) or as a compliment to the audit record that another part of the kernel generated. The main goals were to provide system call auditing with 1) as low overhead as possible, and 2) without duplicating functionality that is already provided by SELinux (and/or other security infrastructures). This framework will work "stand-alone", but is not designed to provide, e.g., CAPP functionality without another security component in place. This updated patch includes changes from feedback I have received, including the ability to compile without CONFIG_NET (and better use of tabs, so use -w if you diff against the older patch). Please see http://people.redhat.com/faith/audit/ for an early example user-space client (auditd-0.4.tar.gz) and instructions on how to try it. My future intentions at the kernel level include improving filtering (e.g., syscall personality/exit codes) and syscall support for more architectures. First, though, I'm going to work on documentation, a (real) audit daemon, and patches for other user-space tools so that people can play with the framework and understand how it can be used with and without SELinux. Update: Light-weight Auditing Framework receive filter fixes From: Rik Faith <faith@redhat.com> Since audit_receive_filter() is only called with audit_netlink_sem held, it cannot race with either audit_del_rule() or audit_add_rule(), so the list_for_each_entry_rcu()s may be replaced by list_for_each_entry()s, and the rcu_read_{un,}lock()s removed. A fix for this is part of the attached patch. Other features of the attached patch are: 1) generalized the ability to test for inequality 2) added syscall exit status reporting and testing 3) added ability to report and test first 4 syscall arguments (this adds a large amount of flexibility for little cost; not implemented or tested on ppc64) 4) added ability to report and test personality User-space demo program enhanced for new fields and inequality testing: http://people.redhat.com/faith/audit/auditd-0.5.tar.gz
-
Andrew Morton authored
From: Hugh Dickins <hugh@veritas.com> First of six patches against 2.6.5-rc3, cleaning up mremap's move_vma, and fixing truncation orphan issues raised by Rajesh Venkatasubramanian. Originally done as part of the anonymous objrmap work on mremap move, but useful fixes now extracted for mainline. The mremap changes need some exposure in the -mm tree first, but the first (fork one-liner) is safe enough to go straight into 2.6.5. From: Rajesh Venkatasubramanian. Despite the comment that child vma should be inserted just after parent vma, 2.5.6 did exactly the reverse: thus a racing vmtruncate may free the child's ptes, then advance to the parent, and meanwhile copy_page_range has propagated more ptes from the parent to the child, leaving file pages still mapped after truncation.
-
Andrew Morton authored
From: Matt Mackall <mpm@selenic.com> The nswap and cnswap variables counters have never been incremented as Linux doesn't do task swapping.
-
Andrew Morton authored
From: Manfred Spraul <manfred@colorfullife.com> Description: Right now kmem_cache_create automatically decides about the alignment of allocated objects. The automatic decisions are sometimes wrong: - for some objects, it's better to keep them as small as possible to reduce the memory usage. Ingo already added a parameter to kmem_cache_create for the sigqueue cache, but it wasn't implemented. - for s390, normal kmalloc must be 8-byte aligned. With debugging enabled, the default allocation was 4-bytes. This means that s390 cannot enable slab debugging. - arm26 needs 1 kB aligned objects. Previously this was impossible to generate, therefore arm has its own allocator in arm26/machine/small_page.c - most objects should be cache line aligned, to avoid false sharing. But the cache line size was set at compile time, often to 128 bytes for generic kernels. This wastes memory. The new code uses the runtime determined cache line size instead. - some caches want an explicit alignment. One example are the pte_chain objects: they must find the start of the object with addr&mask. Right now pte_chain objects are scaled to the cache line size, because that was the only alignment that could be generated reliably. The implementation reuses the "offset" parameter of kmem_cache_create and now uses it to pass in the requested alignment. offset was ignored by the current implementation, and the only user I found is sigqueue, which intended to set the alignment. In the long run, it might be interesting for the main tree: due to the 128 byte alignment, only 7 inodes fit into one page, with 64-byte alignment, 9 inodes - 20% memory recovered for Athlon systems. For generic kernels running on P6 cpus (i.e. 32 byte cachelines), it means Number of objects per page: ext2_inode_cache: 8 instead of 7 ext3_inode_cache: 8 instead of 7 fat_inode_cache: 9 instead of 7 rpc_tasks: 24 instead of 15 tcp_tw_bucket: 40 instead of 30 arp_cache: 40 instead of 30 nfs_write_data: 9 instead of 7
-
Andrew Morton authored
From: Roland McGrath <roland@redhat.com> This patch moves all the fields relating to job control from task_struct to signal_struct, so that all this info is properly per-process rather than being per-thread.
-
Andrew Morton authored
From: Manfred Spraul <manfred@colorfullife.com> cleanup of sysv ipc as a preparation for posix message queues: - replace !CONFIG_SYSVIPC wrappers for copy_semundo and exit_sem with static inline wrappers. Now the whole ipc/util.c file is only used if CONFIG_SYSVIPC is set, use makefile magic instead of #ifdef. - remove the prototypes for copy_semundo and exit_sem from kernel/fork.c - they belong into a header file. - create a new msgutil.c with the helper functions for message queues. - cleanup the helper functions: run Lindent, add __user tags.
-
- 08 Mar, 2004 1 commit
-
-
Andrew Morton authored
From: Hugh Dickins <hugh@veritas.com> Fixes bugzilla #2219 fork's dup_mmap leaves child mm_rb as copied from parent mm while doing all the copy_page_ranges, and then calls build_mmap_rb without holding page_table_lock. try_to_unmap_one's find_vma (holding page_table_lock not mmap_sem) coming on another cpu may cause mm mayhem. It may leave the child's mmap_cache pointing to a vma of the parent mm. When the parent exits and the child faults, quite what happens rather depends on what junk then inhabits vm_page_prot, which gets set in the page table, with page_add_rmap adding the ptep, but junk pte likely to fail the tests for page_remove_rmap. Eventually the child exits, the page table is freed and try_to_unmap_one oopses on null ptep_to_mm (but in a kernel with rss limiting, usually page_referenced hits the null ptep_to_mm first). This took me days and days to unravel! Big thanks to Matthieu for reporting it with a good test case.
-
- 06 Mar, 2004 1 commit
-
-
Andrew Morton authored
From: Gerd Knorr <kraxel@suse.de> Current gcc's error out if a function's declaration and definition disagree about the register passing convention. The patch adds a new `fastcall' declatation primitive, and uses that in all the FASTCALL functions which we could find. A number of inconsistencies were fixed up along the way.
-
- 25 Feb, 2004 1 commit
-
-
Andrew Morton authored
From: "Randy.Dunlap" <rddunlap@osdl.org> Add syscalls.h, which contains prototypes for the kernel's system calls. Replace open-coded declarations all over the place. This patch found a couple of prior bugs. It appears to be more important with -mregparm=3 as we discover more asmlinkage mismatches. Some syscalls have arch-dependent arguments, so their prototypes are in the arch-specific unistd.h. Maybe it should have been asm/syscalls.h, but there were already arch-specific syscall prototypes in asm/unistd.h... Tested on x86, ia64, x86_64, ppc64, s390 and sparc64. May cause trivial-to-fix build breakage on other architectures.
-
- 18 Feb, 2004 1 commit
-
-
Andrew Morton authored
From: Tim Hockin <thockin@sun.com>, Neil Brown <neilb@cse.unsw.edu.au>, me New groups infrastructure. task->groups and task->ngroups are replaced by task->group_info. Group)info is a refcounted, dynamic struct with an array of pages. This allows for large numbers of groups. The current limit of 32 groups has been raised to 64k groups. It can be raised more by changing the NGROUPS_MAX constant in limits.h
-
- 04 Feb, 2004 1 commit
-
-
Andrew Morton authored
From: Andi Kleen <ak@muc.de> Just many more warning fixes for a gcc 3.4 snapshot. It warns for a lot of things now, e.g. for ?: and ({ ... }) and casts as lvalues. And for functions marked inline in headers, but no body. Actually there are more warnings, i stopped fixing at some point. Some of the warnings seem to be dubious (e.g. the binfmt_elf.c one, which looks more like a compiler bug to me) I also fixed the _exit() prototype to be void because gcc was complaining about this.
-
- 19 Jan, 2004 4 commits
-
-
Andrew Morton authored
From: Andries.Brouwer@cwi.nl Remove obsolete CLONE_DETACHED
-
Andrew Morton authored
From: Rusty Russell <rusty@rustcorp.com.au> Some places use cpu_online() where they should be using cpu_possible, most commonly for tallying statistics. This makes no difference without hotplug CPU. Use the for_each_cpu() macro in those places, providing good examples (and making the external hotplug CPU patch smaller). Some places use cpu_online() where they should be using cpu_possible, most commonly for tallying statistics. This makes no difference without hotplug CPU. Use the for_each_cpu() macro in those places, providing good examples (and making the external hotplug CPU patch smaller).
-
Andrew Morton authored
From: Ingo Molnar <mingo@elte.hu> - move scheduling-state initializtion from copy_process() to sched_fork() (Nick Piggin)
-
Andrew Morton authored
From: viro@parcelfarce.linux.theplanet.co.uk <viro@parcelfarce.linux.theplanet.co.uk> More uses of ->i_mapping switched to uses of ->f_mapping - stuff that was not caught by the earlier f_mapping conversion.
-
- 08 Jan, 2004 1 commit
-
-
Linus Torvalds authored
We must not mark the process TASK_STOPPED early, because that might allow a signal to wake it up before we actually got to the "wake_up_forked_process()" state. Total confusion would happen. Make wake_up_forked_process() verify the new world order.
-
- 29 Dec, 2003 1 commit
-
-
Andrew Morton authored
From: Chris Wright <chrisw@osdl.org> Introduce unshare_files as a helper for use during execve to eliminate potential leak of the execve'd binary's fd.
-
- 13 Dec, 2003 1 commit
-
-
Linus Torvalds authored
This time we have a SMP memory ordering issue in prepare_to_wait(), where we really need to make sure that subsequent tests for the event we are waiting for can not migrate up to before the wait queue has been set up.
-
- 12 Dec, 2003 1 commit
-
-
Linus Torvalds authored
corruption on SMP because of another CPU still accessing a waitqueue even after it was de-allocated. Use a careful version of the list emptiness check to make sure we don't de-allocate the stack frame before the waitqueue is all done.
-
- 25 Nov, 2003 1 commit
-
-
Linus Torvalds authored
-
- 09 Oct, 2003 1 commit
-
-
Linus Torvalds authored
cause NULL pointer references in /proc. Moreover, it's questionable whether the whole thing makes sense at all. Per-thread state is good. Cset exclude: davem@nuts.ninka.net|ChangeSet|20031005193942|01097 Cset exclude: akpm@osdl.org[torvalds]|ChangeSet|20031005180420|42200 Cset exclude: akpm@osdl.org[torvalds]|ChangeSet|20031005180411|42211
-
- 07 Oct, 2003 1 commit
-
-
Arnaldo Carvalho de Melo authored
-
- 05 Oct, 2003 1 commit
-
-
Andrew Morton authored
From: Roland McGrath <roland@redhat.com> This patch completes what was started with the `process_group' accessor function, moving all the job control-related fields from task_struct into signal_struct and using process_foo accessor functions to read them. All these things are per-process in POSIX, none per-thread. Off hand it's hard to come up with the hairy MT scenarios in which the existing code would do insane things, but trust me, they're there. At any rate, all the uses being done via inline accessor functions now has got to be all good. I did a "make allyesconfig" build and caught the few random drivers and whatnot that referred to these fields. I was surprised to find how few references to ->tty there really were to fix up. I'm sure there will be a few more fixups needed in non-x86 code. The only actual testing of a running kernel with these patches I've done is on my normal minimal x86 config. Everything works fine as it did before as far as I can tell. One issue that may be of concern is the lack of any locking on multiple threads diddling these fields. I don't think it really matters, though there might be some obscure races that could produce inconsistent job control results. Nothing shattering, I'm sure; probably only something like a multi-threaded program calling setsid while its other threads do tty i/o, which never happens in reality. This is the same situation we get by using ->group_leader->foo without other synchronization, which seemed to be the trend and noone was worried about it.
-
- 22 Sep, 2003 1 commit
-
-
Albert Cahalan authored
Elimination of this nonsense allows for the assumption that a task group shares VM. This lets procps run faster.
-
- 21 Sep, 2003 3 commits
-
-
Andrew Morton authored
From: Anton Blanchard <anton@samba.org> If init_new_context fails we definitely do not want to call mmput, because that will call destroy_context against an uninitialised context. Instead we should back out what we did in init_mm. Fixes some weird failures on ppc64 when running a fork bomb.
-
Andrew Morton authored
From: Ingo Molnar <mingo@elte.hu> the attached scheduler patch (against test2-mm2) adds the scheduling infrastructure items discussed on lkml. I got good feedback - and while i dont expect it to solve all problems, it does solve a number of bad ones: - test_starve.c code from David Mosberger - thud.c making the system unusuable due to unfairness - fair/accurate sleep average based on a finegrained clock - audio skipping way too easily other changes in sched-test2-mm2-A3: - ia64 sched_clock() code, from David Mosberger. - migration thread startup without relying on implicit scheduling behavior. While the current 2.6 code is correct (due to the cpu-up code adding CPUs one by one), but it's also fragile - and this code cannot be carried over into the 2.4 backports. So adding this method would clean up the startup and would make it easier to have 2.4 backports. and here's the original changelog for the scheduler changes: - cycle accuracy (nanosec resolution) timekeeping within the scheduler. This fixes a number of audio artifacts (skipping) i've reproduced. I dont think we can get away without going cycle accuracy - reading the cycle counter adds some overhead, but it's acceptable. The first nanosec-accuracy patch was done by Mike Galbraith - this patch is different but similar in nature. I went further in also changing the sleep_avg to be of nanosec resolution. - more finegrained timeslices: there's now a timeslice 'sub unit' of 50 usecs (TIMESLICE_GRANULARITY) - CPU hogs on the same priority level will roundrobin with this unit. This change is intended to make gaming latencies shorter. - include scheduling latency in sleep bonus calculation. This change extends the sleep-average calculation to the period of time a task spends on the runqueue but doesnt get scheduled yet, right after wakeup. Note that tasks that were preempted (ie. not woken up) and are still on the runqueue do not get this benefit. This change closes one of the last hole in the dynamic priority estimation, it should result in interactive tasks getting more priority under heavy load. This change also fixes the test-starve.c testcase from David Mosberger. The TSC-based scheduler clock is disabled on ia32 NUMA platforms. (ie. platforms that have unsynched TSC for sure.) Those platforms should provide the proper code to rely on the TSC in a global way. (no such infrastructure exists at the moment - the monotonic TSC-based clock doesnt deal with TSC offsets either, as far as i can tell.)
-
Andrew Morton authored
From: Jeremy Fitzhardinge <jeremy@goop.org> I'm resending my patch to fix this problem. To recap: every task_struct has its own copy of the thread group's pgrp. Only the thread group leader is allowed to change the tgrp's pgrp, but it only updates its own copy of pgrp, while all the other threads in the tgrp use the old value they inherited on creation. This patch simply updates all the other thread's pgrp when the tgrp leader changes pgrp. Ulrich has already expressed reservations about this patch since it is (1) incomplete (it doesn't cover the case of other ids which have similar problems), (2) racy (it doesn't synchronize with other threads looking at the task pgrp, so they could see an inconsistent view) and (3) slow (it takes linear time with respect to the number of threads in the tgrp). My reaction is that (1) it fixes the actual bug I'm encountering in a real program. (2) doesn't really matter for pgrp, since it is mostly an issue with respect to the terminal job-control code (which is even more broken without this patch. Regarding (3), I think there are very few programs which have a large number of threads which change process group id on a regular basis (a heavily multi-threaded job-control shell?). Ulrich also said he has a (proposed?) much better fix, which I've been looking forward to. I'm submitting this patch as a stop-gap fix for a real bug, and perhaps to prompt the improved patch. An alternative fix, at least for pgrp, is to change all references to ->pgrp to group_leader->pgrp. This may be sufficient on its own, but it would be a reasonably intrusive patch (I count 95 instances in 32 files in the 2.6.0-test3-mm3 tree).
-
- 31 Aug, 2003 1 commit
-
-
Andrew Morton authored
From: Peter Chubb <peterc@gelato.unsw.edu.au> Currently, the context switch counters reported by getrusage() are always zero. The appended patch adds fields to struct task_struct to count context switches, and adds code to do the counting. The patch adds 4 longs to struct task struct, and a single addition to the fast path in schedule().
-
- 20 Aug, 2003 1 commit
-
-
Andrew Morton authored
From: Suparna Bhattacharya <suparna@in.ibm.com> The /proc code's bare atomic_inc(&mm->count) is racy against __exit_mm()'s mmput() on another CPU: it calls mmput() outside task_lock(tsk), and task_lock() isn't appropriate locking anyway. So what happens is: CPU0 CPU1 mmput() ->atomic_dec_and_lock(mm->mm_users) atomic_inc(mm->mm_users) ->list_del(mm->mmlist) mmput() ->atomic_dec_and_lock(mm->mm_users) ->list_del(mm->mmlist) And the double list_del() of course goes splat. So we use mmlist_lock to synchronise these steps. The patch implements a new mmgrab() routine which increments mm_users only if the mm isn't already going away. Changes get_task_mm() and proc_pid_stat() to call mmgrab() instead of a direct atomic_inc(&mm->mm_users). Hugh, there's some cruft in swapoff which looks like it should be using mmgrab()...
-
- 18 Aug, 2003 1 commit
-
-
Andrew Morton authored
From: William Lee Irwin III <wli@holomorphy.com> Contributions from: Jan Dittmer <jdittmer@sfhq.hn.org> Arnd Bergmann <arnd@arndb.de> "Bryan O'Sullivan" <bos@serpentine.com> "David S. Miller" <davem@redhat.com> Badari Pulavarty <pbadari@us.ibm.com> "Martin J. Bligh" <mbligh@aracnet.com> Zwane Mwaikambo <zwane@linuxpower.ca> It has ben tested on x86, sparc64, x86_64, ia64 (I think), ppc and ppc64. cpumask_t enables systems with NR_CPUS > BITS_PER_LONG to utilize all their cpus by creating an abstract data type dedicated to representing cpu bitmasks, similar to fd sets from userspace, and sweeping the appropriate code to update callers to the access API. The fd set-like structure is according to Linus' own suggestion; the macro calling convention to ambiguate representations with minimal code impact is my own invention. Specifically, a new set of inline functions for manipulating arbitrary-width bitmaps is introduced with a relatively simple implementation, in tandem with a new data type representing bitmaps of width NR_CPUS, cpumask_t, whose accessor functions are defined in terms of the bitmap manipulation inlines. This bitmap ADT found an additional use in i386 arch code handling sparse physical APIC ID's, which was convenient to use in this case as the accounting structure was required to be wider to accommodate the physids consumed by larger numbers of cpus. For the sake of simplicity and low code impact, these cpu bitmasks are passed primarily by value; however, an additional set of accessors along with an auxiliary data type with const call-by-reference semantics is provided to address performance concerns raised in connection with very large systems, such as SGI's larger models, where copying and call-by-value overhead would be prohibitive. Few (if any) users of the call-by-reference API are immediately introduced. Also, in order to avoid calling convention overhead on architectures where structures are required to be passed by value, NR_CPUS <= BITS_PER_LONG is special-cased so that cpumask_t falls back to an unsigned long and the accessors perform the usual bit twiddling on unsigned longs as opposed to arrays thereof. Audits were done with the structure overhead in-place, restoring this special-casing only afterward so as to ensure a more complete API conversion while undergoing the majority of its end-user exposure in -mm. More -mm's were shipped after its restoration to be sure that was tested, too. The immediate users of this functionality are Sun sparc64 systems, SGI mips64 and ia64 systems, and IBM ia32, ppc64, and s390 systems. Of these, only the ppc64 machines needing the functionality have yet to be released; all others have had systems requiring it for full functionality for at least 6 months, and in some cases, since the initial Linux port to the affected architecture.
-
- 14 Aug, 2003 1 commit
-
-
Linus Torvalds authored
CLONE_THREAD without CLONE_DETACHED will now return -EINVAL, and for a while we will warn about anything that uses it (there are no known users, but this will help pinpoint any problems if somebody used to care about the invalid combination).
-
- 18 Jul, 2003 3 commits
-
-
Andrew Morton authored
From: Ulrich Drepper <drepper@redhat.com> CLONE_STOPPED: start a thread in a stopped state. Required for NTPL.
-
Andrew Morton authored
From: Neil Brown <neilb@cse.unsw.edu.au> 1/ If a setuid process swaps it's real and effective uids and then forks, the fork fails if the new realuid has more processes than the original process was limited to. This is particularly a problem if a user with a process limit (e.g. 256) runs a setuid-root program which does setuid() + fork() (e.g. lprng) while root already has more than 256 process (which is quite possible). The root problem here is that a limit which should be a per-user limit is being implemented as a per-process limit with per-process (e.g. CAP_SYS_RESOURCE) controls. Being a per-user limit, it should be that the root-user can over-ride it, not just some process with CAP_SYS_RESOURCE. This patch adds a test to ignore process limits if the real user is root. 2/ When a root-owned process (e.g. cgiwrap) sets up process limits and then calls setuid, the setuid should fail if the user would then be running more than rlim_cur[RLIMIT_NPROC] processes, but it doesn't. This patch adds an appropriate test. With this patch, and per-user process limit imposed in cgiwrap really works.
-
Andrew Morton authored
From: Manfred Spraul <manfred@colorfullife.com> kernel/fork.c contains a disabled cache for task stuctures. task structures are placed into the task cache only if "tsk==current", and "tsk==current" is impossible. There is even a WARN_ON against that in __put_task_struct(). So remove it entirely - it's dead code. One problem is that order-1 allocations are not cached per-cpu - we can use kmalloc for the stack.
-