An error occurred fetching the project authors.
  1. 29 Sep, 2003 1 commit
  2. 21 Sep, 2003 5 commits
    • Andrew Morton's avatar
      [PATCH] might_sleep diagnostics · d6dbfa23
      Andrew Morton authored
      might_sleep() can be triggered by either local interrupts being disabled or
      by elevated preempt count.  Disambiguate them.
      d6dbfa23
    • Andrew Morton's avatar
      [PATCH] CPU scheduler interactivity changes · 2cf13d58
      Andrew Morton authored
      From: Con Kolivas <kernel@kolivas.org>
      
      Interactivity scheduler tweaks on top of Ingo's A3 interactivity patch.
      
      Interactive credit added to task struct to find truly interactive tasks and
      treat them differently.
      
      Extra #defines included as helpers for conversion to/from nanosecond timing,
      to work out an average timeslice for nice 0 tasks, and the effective dynamic
      priority bonuses that will be given to tasks.
      
      MAX_SLEEP_AVG modified to change dynamic priority by one for a nice 0 task
      sleeping or running for one full timeslice.
      
      CREDIT_LIMIT is the number of times a task earns sleep_avg over MAX_SLEEP_AVG
      before it is considered HIGH_CREDIT (truly interactive); and -CREDIT_LIMIT is
      LOW_CREDIT
      
      TIMESLICE GRANULARITY is modified to be more frequent for more
      interactivetasks (10 ms for top 2 dynamic priorities and then halving each
      priority belowthat) and less frequent per extra cpu.
      
      JUST_INTERACTIVE_SLEEP logic created to be a sleep_avg consistent with giving
      a task enough dynamic priority to remain on the active array.
      
      Task preemption of equal priority tasks is dropped as requeuing with
      TIMESLICE_GRANULARITY makes this unecessary.
      
      Dynamic priority bonus simplified.
      
      User tasks that sleep a long time and not waking from uninterruptible sleep
      are sought and categorised as idle. Their sleep avg is limited in it's rise to
      prevent them becoming high priority and suddenly turning into cpu hogs.
      
      Bonus for sleeping is proportionately higher the lower the dynamic priority of
      a task is; this allows for very rapid escalation to interactive status.
      
      Tasks that are LOW_CREDIT are limited in rise per sleep to one priority level.
      
      Non HIGH_CREDIT tasks waking from uninterruptible sleep are sought to detect
      cpu hogs waiting on I/O and their sleep_avg rise is limited to just
      interactive state to prevent cpu bound tasks from becoming interactive during
      I/O wait.
      
      Tasks that earn sleep_avg over MAX_SLEEP_AVG get interactive credits.
      
      On runqueue bonus is not given to non HIGH_CREDIT tasks waking from
      uninterruptible sleep.
      
      Forked tasks and their parents get sleep_avg limited to the minimum necessary
      to maintain their effective dynamic priority thus preventing repeated forking
      from being a way to get highly interactive, but not penalise them noticably
      otherwise.
      
      CAN_MIGRATE_TASK cleaned up and modified to work with nanosecond timestamps.
      
      Reverted Ingo's A3 Starvation limit change - it was making interactive tasks
      suffer more under increasing load. If a cpu is grossly overloaded and
      everyone is going to starve it may as well run interactive tasks
      preferentially.
      
      Task requeuing is limited to interactive tasks only (cpu bound tasks dont need
      low latency and derive benefit from longer timeslices), and they must have at
      least TIMESLICE_GRANULARITY remaining.
      
      HIGH_CREDIT tasks get penalised less sleep_avg the more interactive they are
      thus keeping them interactive for bursts but if they become sustained cpu hogs
      they will slide increasingly rapidly down the dynamic priority scale.
      
      Tasks that run out of sleep_avg, are still using up cpu time and are not high
      or low credit yet get penalised interactive credits to determine LOW_CREDIT
      tasks (cpu bound ones).
      2cf13d58
    • Andrew Morton's avatar
      [PATCH] CPU scheduler balancing fix · 875ee1e1
      Andrew Morton authored
      From: Nick Piggin <piggin@cyberone.com.au>
      
      The patch changes the imbalance required before a balance to 25% from 50% -
      as the comments intend.  It also changes a case where the balancing
      wouldn't be done if the imbalance was >= 25% but only 1 task difference.
      
      The downside of the second change is that one task may bounce from one cpu
      to another for some loads.  This will only bounce once every 200ms, so it
      shouldn't be a big problem.
      
      (Benchmarking results are basically a wash - SDET is increased maybe 0.5%)
      875ee1e1
    • Andrew Morton's avatar
      [PATCH] scheduler infrastructure · f221af36
      Andrew Morton authored
      From: Ingo Molnar <mingo@elte.hu>
      
      the attached scheduler patch (against test2-mm2) adds the scheduling
      infrastructure items discussed on lkml. I got good feedback - and while i
      dont expect it to solve all problems, it does solve a number of bad ones:
      
       - test_starve.c code from David Mosberger
      
       - thud.c making the system unusuable due to unfairness
      
       - fair/accurate sleep average based on a finegrained clock
      
       - audio skipping way too easily
      
      other changes in sched-test2-mm2-A3:
      
       - ia64 sched_clock() code, from David Mosberger.
      
       - migration thread startup without relying on implicit scheduling
         behavior. While the current 2.6 code is correct (due to the cpu-up code
         adding CPUs one by one), but it's also fragile - and this code cannot
         be carried over into the 2.4 backports. So adding this method would
         clean up the startup and would make it easier to have 2.4 backports.
      
      and here's the original changelog for the scheduler changes:
      
       - cycle accuracy (nanosec resolution) timekeeping within the scheduler.
         This fixes a number of audio artifacts (skipping) i've reproduced. I
         dont think we can get away without going cycle accuracy - reading the
         cycle counter adds some overhead, but it's acceptable. The first
         nanosec-accuracy patch was done by Mike Galbraith - this patch is
         different but similar in nature. I went further in also changing the
         sleep_avg to be of nanosec resolution.
      
       - more finegrained timeslices: there's now a timeslice 'sub unit' of 50
         usecs (TIMESLICE_GRANULARITY) - CPU hogs on the same priority level
         will roundrobin with this unit. This change is intended to make gaming
         latencies shorter.
      
       - include scheduling latency in sleep bonus calculation. This change
         extends the sleep-average calculation to the period of time a task
         spends on the runqueue but doesnt get scheduled yet, right after
         wakeup. Note that tasks that were preempted (ie. not woken up) and are
         still on the runqueue do not get this benefit. This change closes one
         of the last hole in the dynamic priority estimation, it should result
         in interactive tasks getting more priority under heavy load. This
         change also fixes the test-starve.c testcase from David Mosberger.
      
      
      The TSC-based scheduler clock is disabled on ia32 NUMA platforms.  (ie. 
      platforms that have unsynched TSC for sure.) Those platforms should provide
      the proper code to rely on the TSC in a global way.  (no such infrastructure
      exists at the moment - the monotonic TSC-based clock doesnt deal with TSC
      offsets either, as far as i can tell.)
      f221af36
    • Andrew Morton's avatar
      [PATCH] real-time enhanced page allocator and throttling · 55b50278
      Andrew Morton authored
      From: Robert Love <rml@tech9.net>
      
      - Let real-time tasks dip further into the reserves than usual in
        __alloc_pages().  There are a lot of ways to special case this.  This
        patch just cuts z->pages_low in half, before doing the incremental min
        thing, for real-time tasks.  I do not do anything in the low memory slow
        path.  We can be a _lot_ more aggressive if we want.  Right now, we just
        give real-time tasks a little help.
      
      - Never ever call balance_dirty_pages() on a real-time task.  Where and
        how exactly we handle this is up for debate.  We could, for example,
        special case real-time tasks inside balance_dirty_pages().  This would
        allow us to perform some of the work (say, waking up pdflush) but not
        other work (say, the active throttling).  As it stands now, we do the
        per-processor accounting in balance_dirty_pages_ratelimited() but we
        never call balance_dirty_pages().  Lots of approaches work.  What we want
        to do is never engage the real-time task in forced writeback.
      55b50278
  3. 09 Sep, 2003 1 commit
  4. 08 Sep, 2003 1 commit
    • Patrick Mochel's avatar
      [power] Add support for refrigerator to the migration_thread. · a803561d
      Patrick Mochel authored
      - The PM code currently must signal each kernel thread when suspending, and
        each thread must call refrigerator() to stop itself. This patch adds 
        support for this to migration_thread, which allows suspend states to work
        on an SMP-enabled kernel (though not necessarily an SMP machine).
      
      - Note I do not know why the process freezing code was designed in such a 
        way. One would think we could do it without having to call each thread
        individually, and fix up the threads that need special work individually..
      a803561d
  5. 31 Aug, 2003 1 commit
    • Andrew Morton's avatar
      [PATCH] add context switch counters · a776ac8d
      Andrew Morton authored
      From: Peter Chubb <peterc@gelato.unsw.edu.au>
      
      Currently, the context switch counters reported by getrusage() are
      always zero.  The appended patch adds fields to struct task_struct to
      count context switches, and adds code to do the counting.
      
      The patch adds 4 longs to struct task struct, and a single addition to
      the fast path in schedule().
      a776ac8d
  6. 18 Aug, 2003 1 commit
    • Andrew Morton's avatar
      [PATCH] cpumask_t: allow more than BITS_PER_LONG CPUs · bf8cb61f
      Andrew Morton authored
      From: William Lee Irwin III <wli@holomorphy.com>
      
      Contributions from:
      	Jan Dittmer <jdittmer@sfhq.hn.org>
      	Arnd Bergmann <arnd@arndb.de>
      	"Bryan O'Sullivan" <bos@serpentine.com>
      	"David S. Miller" <davem@redhat.com>
      	Badari Pulavarty <pbadari@us.ibm.com>
      	"Martin J. Bligh" <mbligh@aracnet.com>
      	Zwane Mwaikambo <zwane@linuxpower.ca>
      
      It has ben tested on x86, sparc64, x86_64, ia64 (I think), ppc and ppc64.
      
      cpumask_t enables systems with NR_CPUS > BITS_PER_LONG to utilize all their
      cpus by creating an abstract data type dedicated to representing cpu
      bitmasks, similar to fd sets from userspace, and sweeping the appropriate
      code to update callers to the access API.  The fd set-like structure is
      according to Linus' own suggestion; the macro calling convention to ambiguate
      representations with minimal code impact is my own invention.
      
      Specifically, a new set of inline functions for manipulating arbitrary-width
      bitmaps is introduced with a relatively simple implementation, in tandem with
      a new data type representing bitmaps of width NR_CPUS, cpumask_t, whose
      accessor functions are defined in terms of the bitmap manipulation inlines.
      This bitmap ADT found an additional use in i386 arch code handling sparse
      physical APIC ID's, which was convenient to use in this case as the
      accounting structure was required to be wider to accommodate the physids
      consumed by larger numbers of cpus.
      
      For the sake of simplicity and low code impact, these cpu bitmasks are passed
      primarily by value; however, an additional set of accessors along with an
      auxiliary data type with const call-by-reference semantics is provided to
      address performance concerns raised in connection with very large systems,
      such as SGI's larger models, where copying and call-by-value overhead would
      be prohibitive.  Few (if any) users of the call-by-reference API are
      immediately introduced.
      
      Also, in order to avoid calling convention overhead on architectures where
      structures are required to be passed by value, NR_CPUS <= BITS_PER_LONG is
      special-cased so that cpumask_t falls back to an unsigned long and the
      accessors perform the usual bit twiddling on unsigned longs as opposed to
      arrays thereof.  Audits were done with the structure overhead in-place,
      restoring this special-casing only afterward so as to ensure a more complete
      API conversion while undergoing the majority of its end-user exposure in -mm.
       More -mm's were shipped after its restoration to be sure that was tested,
      too.
      
      The immediate users of this functionality are Sun sparc64 systems, SGI mips64
      and ia64 systems, and IBM ia32, ppc64, and s390 systems.  Of these, only the
      ppc64 machines needing the functionality have yet to be released; all others
      have had systems requiring it for full functionality for at least 6 months,
      and in some cases, since the initial Linux port to the affected architecture.
      bf8cb61f
  7. 17 Aug, 2003 1 commit
  8. 14 Aug, 2003 1 commit
    • Andrew Morton's avatar
      [PATCH] fix task struct refcount bug · 4c0d7322
      Andrew Morton authored
      From: Manfred Spraul <manfred@colorfullife.com>
      
      (We think this might be the mystery bug which has been hanging about for
      months)
      
      
      We found a [the?] task struct refcount error: A task that dies sets
      tsk->state to TASK_ZOMBIE.  The next scheduled task checks prev->state, and
      if it's ZOMBIE, then it decrements the reference count of prev.  The
      prev->state & _ZOMBIE test is not atomic with schedule, thus if prev is
      scheduled again and dies between dropping the runqueue lock and checking
      prev->state, then the reference it dropped twice.
      
      This is possible with either preemption [schedule_tail is called by
      ret_from_fork with preemption count 1, finish_arch_switch drops it to 0] or
      profiling [profile_exit_mmap can sleep on profile_rwsem, called by
      mmdrop()] enabled.
      4c0d7322
  9. 21 Jul, 2003 1 commit
    • Kai Germaschewski's avatar
      ISDN: Export "kstat" · 7ec85ce3
      Kai Germaschewski authored
      This patch exports the kstat per-cpu variable, needed for
      hisax, which uses kstat_irqs() during card probing to make sure
      that irqs actually work. This could possibly replaced by a
      private counter in the hisax ISRs, but that's really just
      unnecessary overhead, since the core kernel already does the work
      anyway.
      7ec85ce3
  10. 18 Jul, 2003 1 commit
  11. 10 Jul, 2003 1 commit
  12. 07 Jul, 2003 2 commits
    • Rusty Russell's avatar
      [PATCH] switch_mm and enter_lazy_tlb: remove cpu arg · 8a6879c6
      Rusty Russell authored
      switch_mm and enter_lazy_tlb take a CPU arg, which is always
      smp_processor_id().  This is misleading, and pointless if they use
      per-cpu variables or other optimizations.  gcc will eliminate
      redundant smp_processor_id() (in inline functions) anyway.
      
      This removes that arg from all the architectures.
      8a6879c6
    • Rusty Russell's avatar
      [PATCH] Make kstat_this_cpu in terms of __get_cpu_var and use it · b993be7e
      Rusty Russell authored
      kstat_this_cpu() is defined in terms of per_cpu instead of __get_cpu_var.
      
      This patch changes that, and uses it everywhere appropriate.  The sched.c
      change puts it in a local variable, which helps gcc generate better code.
      b993be7e
  13. 06 Jul, 2003 1 commit
    • Andrew Morton's avatar
      [PATCH] use task_cpu() not ->thread_info->cpu in sched.c · 8ffcb67a
      Andrew Morton authored
      From: Mikael Pettersson <mikpe@csd.uu.se>
      
      This patch fixes two p->thread_info->cpu occurrences in kernel/sched.c to
      use the task_cpu(p) macro instead, which is optimised on UP.  Although one
      of the occurrences is under #ifdef CONFIG_SMP, it's bad style to use the
      raw non-optimisable form in non-arch code.
      8ffcb67a
  14. 01 Jul, 2003 1 commit
  15. 25 Jun, 2003 2 commits
    • Andrew Morton's avatar
      [PATCH] normalise node load for NUMA · 325a2824
      Andrew Morton authored
      From: Andrew Theurer <habanero@us.ibm.com>
      
      This patch ensures that when node loads are compared, the load value is
      normalised.  Without this, load balance across nodes of dissimilar cpu
      counts can cause unfairness and sometimes lower overall performance.
      
      For example, a 2 node system with 4 cpus in the first node and 2 cpus in
      the second.  A workload with 6 running tasks would have 3 tasks running on
      one node and 3 on the other, leaving one cpu idle in the first node and two
      tasks sharing a cpu in the second node.  The patch would ensure that 4
      tasks run in the first node and 2 in the second.
      
      I ran some kernel compiles comparing this patch on a 2 node 4 cpu/2 cpu
      system to show the benefits.  Without the patch I got 140 second elapsed
      time.  With the patch I get 132 seconds (6% better).
      
      Although it is not very common to have nodes with dissimilar cpu counts, it
      is already happening.  PPC64 systems with partitioning have this happen,
      and I expect it to be more common on ia32 as partitioning becomes more
      common.
      325a2824
    • Andrew Morton's avatar
      [PATCH] setscheduler needs to force a reschedule · 71c19018
      Andrew Morton authored
      From: Robert Love <rml@tech9.net>
      
      Basically, the problem is that setscheduler() does not set need_resched
      when needed.  There are two basic cases where this is needed:
      
      	- the task is running, but now it is no longer the highest
      	  priority task on the rq
      	- the task is not running, but now it is the highest
      	  priority task on the rq
      
      In either case, we need to set need_resched to invoke the scheduler.
      71c19018
  16. 21 Jun, 2003 1 commit
    • Rusty Russell's avatar
      [PATCH] More care in sys_setaffinity · 7cd3f199
      Rusty Russell authored
      We currently mask off offline CPUs in both set_cpus_allowed and
      sys_sched_setaffinity.  This is firstly redundant, and secondly
      erroneous when more CPUs come online (eg. setting affinity to all 1s
      should mean all CPUs, including future ones).
      
      We mask with cpu_online_map() in sys_sched_getaffinity *anyway* (which
      is another issue, since this is not valid with changing of online
      cpus either), so userspace won't see any difference.
      
      This patch makes set_cpus_allowed() return -errno, and check that in
      sys_sched_setaffinity.
      7cd3f199
  17. 20 Jun, 2003 1 commit
    • Andrew Morton's avatar
      [PATCH] show_stack() portability and cleanup patch · 0d5ff9d0
      Andrew Morton authored
      From: David Mosberger <davidm@napali.hpl.hp.com>
      
      This is an attempt at sanitizing the interface for stack trace dumping
      somewhat.  It's basically the last thing which prevents 2.5.x from working
      out-of-the-box for ia64.  ia64 apparently cannot reasonably implement the
      show_stack interface declared in sched.h.
      
      Here is the rationale: modern calling conventions don't maintain a frame
      pointer and it's not possible to get a reliable stack trace with only a stack
      pointer as the starting point.  You really need more machine state to start
      with.  For a while, I thought the solution is to pass a task pointer to
      show_stack(), but it turns out that this would negatively impact x86 because
      it's sometimes useful to show only portions of a stack trace (e.g., starting
      from the point at which a trap occurred).  Thus, this patch _adds_ the task
      pointer instead:
      
       extern void show_stack(struct task_struct *tsk, unsigned long *sp);
      
      The idea here is that show_stack(tsk, sp) will show the backtrace of task
      "tsk", starting from the stack frame that "sp" is pointing to.  If tsk is
      NULL, the trace will be for the current task.  If "sp" is NULL, all stack
      frames of the task are shown.  If both are NULL, you'll get the full trace of
      the current task.
      
      I _think_ this should make everyone happy.
      
      The patch also removes the declaration of show_trace() in linux/sched.h (it
      never was a generic function; some platforms, in particular x86, may want to
      update accordingly).
      
      Finally, the patch replaces the one call to show_trace_task() with the
      equivalent call show_stack(task, NULL).
      
      The patch below is for Alpha and i386, since I can (compile-)test those (I'll
      provide the ia64 update through my regular updates).  The other arches will
      break visibly and updating the code should be trivial:
      
      - add a task pointer argument to show_stack() and pass NULL as the first
        argument where needed
      
      - remove show_trace_task()
      
      - declare show_trace() in a platform-specific header file if you really
        want to keep it around
      0d5ff9d0
  18. 14 Jun, 2003 3 commits
    • Andrew Morton's avatar
      [PATCH] NUMA fixes · 1d292c60
      Andrew Morton authored
      From: Anton Blanchard <anton@samba.org>
      
      
      Anton has been testing odd setups:
      
      /* node 0 - no cpus, no memory */
      /* node 1 - 1 cpu, no memory */
      /* node 2 - 0 cpus, 1GB memory */
      /* node 3 - 3 cpus, 3GB memory */
      
      Two things tripped so far.  Firstly the ppc64 debug check for invalid cpus
      in cpu_to_node().  Fix that in kernel/sched.c:node_nr_running_init().
      
      The other problem concerned nodes with memory but no cpus.  kswapd tries to
      set_cpus_allowed(0) and bad things happen.  So we only set cpu affinity
      for kswapd if there are cpus in the node.
      1d292c60
    • Rusty Russell's avatar
      [PATCH] sched.c neatening and fixes. · 03540697
      Rusty Russell authored
      1) Fix the comments for the migration_thread.  A while back Ingo
         agreed they were exactly wrong, IIRC. 8).
      
      2) Changed spin_lock_irqsave to spin_lock_irq, since it's in a
         kernel thread.
      
      3) Don't repeat if the task has moved off the original CPU, just finish.
         This is because we are simply trying to push the task off this CPU:
         if it's already moved, great.  Currently we might theoretically move
         a task which is actually running on another CPU, which is v. bad.
      
      4) Replace the __ffs(p->cpus_allowed) with any_online_cpu(), since
         that's what it's for, and __ffs() can give the wrong answer, eg. if
         there's no CPU 0.
      
      5) Move the core functionality of migrate_task into a separate function,
         move_task_away, which I want for the hotplug CPU patch.
      03540697
    • Benjamin Herrenschmidt's avatar
      [PATCH] Nuke check_highmem_ptes() · f3d844bc
      Benjamin Herrenschmidt authored
      It was broken on at least ppc32 & sparc32, and the debugging it
      offered wasn't worth it any more anyway.
      f3d844bc
  19. 10 Jun, 2003 1 commit
    • Andrew Morton's avatar
      [PATCH] fix scheduler bug not passing idle · dd1b5a41
      Andrew Morton authored
      From: "Martin J. Bligh" <mbligh@aracnet.com>
      
      rebalance_tick is not properly passing the idle argument through to
      load_balance in one case.  The fix is trivial.  Pointed out by John Hawkes.
      dd1b5a41
  20. 06 Jun, 2003 2 commits
    • Rusty Russell's avatar
      [PATCH] Move cpu notifiers et al to cpu.h · 542f238e
      Rusty Russell authored
      Trivial patch: when these were introduced cpu.h didn't exist.
      542f238e
    • Andrew Morton's avatar
      [PATCH] Don't let processes be scheduled on CPU-less nodes (3/3) · 946ac12e
      Andrew Morton authored
      From: Matthew Dobson <colpatch@us.ibm.com>
      
      This patch implements a generic version of the nr_cpus_node(node) macro
      implemented for ppc64 by the previous patch.
      
      The generic version simply computes an hweight of the bitmask returned by
      node_to_cpumask(node) topology macro.
      
      This patch also adds a generic_hweight64() function and an hweight_long()
      function which are used as helpers for the generic nr_cpus_node() macro.
      
      This patch also adds a for_each_node_with_cpus() macro, which is used in
      sched_best_cpu() in kernel/sched.c to fix the original problem of
      scheduling processes on CPU-less nodes.  This macro should also be used in
      the future to avoid similar problems.
      
      Test compiled and booted by Andrew Theurer (habanero@us.ibm.com) on both
      x440 and ppc64.
      946ac12e
  21. 27 May, 2003 1 commit
  22. 26 May, 2003 1 commit
    • Ingo Molnar's avatar
      [PATCH] signal latency improvement · ee2f48bc
      Ingo Molnar authored
      This further optimizes the 'kick wakeup' scheduler feature:
      
       - do not kick any CPU on UP
      
       - no need to mark the target task for reschedule - it's enough to send an
         interrupt to that CPU, that will initiate a signal processing pass.
      ee2f48bc
  23. 19 May, 2003 4 commits
    • Ingo Molnar's avatar
      [PATCH] Fix lost scheduler rebalances · e7778aa6
      Ingo Molnar authored
      This fixes a race noticed by Mike Galbraith: the scheduler can lose a
      rebalance tick if some task happens to not be rescheduled in time.  This
      is not a fatal condition, but an inconsistency nevertheless.
      e7778aa6
    • Ingo Molnar's avatar
      [PATCH] sync wakeup on UP · 84205d05
      Ingo Molnar authored
      This fixes the scheduler's sync-wakeup code to be consistent on UP as
      well.
      
      Right now there's a behavioral difference between an UP kernel and an
      SMP kernel running on a UP box: sync wakeups (which are only activated
      on SMP) can cause a wakeup of a higher prio task, without preemption.
      On UP kernels this does not happen.  This difference in wakeup behavior
      is bad.
      
      This patch activates sync wakeups on UP as well - in the cases sync
      wakeups are done the waker knows that it will schedule away soon, so
      this 'delay preemption' decision is correct on UP as well.
      84205d05
    • Ingo Molnar's avatar
      [PATCH] scheduler cleanup · d1347e18
      Ingo Molnar authored
      This removes the unused requeueing code.
      d1347e18
    • Ingo Molnar's avatar
      [PATCH] signal latency fixes · 79e4dd94
      Ingo Molnar authored
      This fixes an SMP window where the kernel could miss to handle a signal,
      and increase signal delivery latency up to 200 msecs.  Sun has reported
      to Ulrich that their JVM sees occasional unexpected signal delays under
      Linux.  The more CPUs, the more delays.
      
      The cause of the problem is that the current signal wakeup
      implementation is racy in kernel/signal.c:signal_wake_up():
      
              if (t->state == TASK_RUNNING)
                      kick_if_running(t);
      	...
              if (t->state & mask) {
                      wake_up_process(t);
                      return;
              }
      
      If thread (or process) 't' is woken up on another CPU right after the
      TASK_RUNNING check, and thread starts to run, then the wake_up_process()
      here will do nothing, and the signal stays pending up until the thread
      will call into the kernel next time - which can be up to 200 msecs
      later.
      
      The solution is to do the 'kicking' of a running thread on a remote CPU
      atomically with the wakeup.  For this i've added wake_up_process_kick().
      There is no slowdown for the other wakeup codepaths, the new flag to
      try_to_wake_up() is compiled off for them.  Some other subsystems might
      want to use this wakeup facility as well in the future (eg.  AIO).
      
      In fact this race triggers quite often under Volanomark rusg, with this
      change added, Volanomark performance is up from 500-800 to 2000-3000, on
      a 4-way x86 box.
      79e4dd94
  24. 12 May, 2003 1 commit
  25. 21 Apr, 2003 1 commit
    • Robert Love's avatar
      [PATCH] trivial task_prio() fix · 7957f703
      Robert Love authored
      Here is a trivial fix for task_prio() in the case MAX_RT_PRIO !=
      MAX_USER_RT_PRIO.  In this case, all priorities are skewed by
      (MAX_RT_PRIO - MAX_USER_RT_PRIO).
      
      The fix is to subtract the full MAX_RT_PRIO value from p->prio, not just
      MAX_USER_RT_PRIO.  This makes sense, as the full priority range is
      unrelated to the maximum user value.  Only the real maximum RT value
      matters.
      
      This has been in Andrew's tree for awhile, with no issue.  Also, Ingo
      acked it.
      7957f703
  26. 20 Apr, 2003 1 commit
    • Andrew Morton's avatar
      [PATCH] Turn on NUMA rebalancing · 26fbf90f
      Andrew Morton authored
      From: "Martin J. Bligh" <mbligh@aracnet.com>
      
      I'd forgotten that I'd set this to only fire every 20s in the past, because
      it would rebalance too agressively.  That seems to be fixed now, so we should
      turn it back on.
      26fbf90f
  27. 12 Apr, 2003 1 commit
  28. 08 Apr, 2003 1 commit