1. 15 Mar, 2008 5 commits
    • Peter Zijlstra's avatar
      sched: fix overload performance: buddy wakeups · aa2ac252
      Peter Zijlstra authored
      Currently we schedule to the leftmost task in the runqueue. When the
      runtimes are very short because of some server/client ping-pong,
      especially in over-saturated workloads, this will cycle through all
      tasks trashing the cache.
      
      Reduce cache trashing by keeping dependent tasks together by running
      newly woken tasks first. However, by not running the leftmost task first
      we could starve tasks because the wakee can gain unlimited runtime.
      
      Therefore we only run the wakee if its within a small
      (wakeup_granularity) window of the leftmost task. This preserves
      fairness, but does alternate server/client task groups.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      aa2ac252
    • Ingo Molnar's avatar
      sched: fix calc_delta_mine() · 27d11726
      Ingo Molnar authored
      lw->weight can be 0 for a short time during bootup.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      27d11726
    • Ingo Molnar's avatar
      sched: fix update_load_add()/sub() · e89996ae
      Ingo Molnar authored
      Clear the cached inverse value when updating load. This is needed for
      calc_delta_mine() to work correctly when using the rq load.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      e89996ae
    • Peter Zijlstra's avatar
      sched: min_vruntime fix · 3fe69747
      Peter Zijlstra authored
      Current min_vruntime tracking is incorrect and will cause serious
      problems when we don't run the leftmost task for some reason.
      
      min_vruntime does two things; 1) it's used to determine a forward
      direction when the u64 vruntime wraps, 2) it's used to track the
      leftmost vruntime to position newly enqueued tasks from.
      
      The current logic advances min_vruntime whenever the current task's
      vruntime advance. Because the current task may pass the leftmost task
      still waiting we're failing the second goal. This causes new tasks to be
      placed too far ahead and thus penalizes their runtime.
      
      Fix this by making min_vruntime the min_vruntime of the waiting tasks by
      tracking it in enqueue/dequeue, and compare against current's vruntime
      to obtain the absolute minimum when placing new tasks.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      3fe69747
    • Hiroshi Shimamoto's avatar
      sched: fix race in schedule() · 0e1f3483
      Hiroshi Shimamoto authored
      Fix a hard to trigger crash seen in the -rt kernel that also affects
      the vanilla scheduler.
      
      There is a race condition between schedule() and some dequeue/enqueue
      functions; rt_mutex_setprio(), __setscheduler() and sched_move_task().
      
      When scheduling to idle, idle_balance() is called to pull tasks from
      other busy processor. It might drop the rq lock. It means that those 3
      functions encounter on_rq=0 and running=1. The current task should be
      put when running.
      
      Here is a possible scenario:
      
         CPU0                               CPU1
          |                              schedule()
          |                              ->deactivate_task()
          |                              ->idle_balance()
          |                              -->load_balance_newidle()
      rt_mutex_setprio()                     |
          |                              --->double_lock_balance()
          *get lock                          *rel lock
          * on_rq=0, ruuning=1               |
          * sched_class is changed           |
          *rel lock                          *get lock
          :                                  |
                                             :
                                         ->put_prev_task_rt()
                                         ->pick_next_task_fair()
                                             => panic
      
      The current process of CPU1(P1) is scheduling. Deactivated P1, and the
      scheduler looks for another process on other CPU's runqueue because CPU1
      will be idle. idle_balance(), load_balance_newidle() and
      double_lock_balance() are called and double_lock_balance() could drop
      the rq lock. On the other hand, CPU0 is trying to boost the priority of
      P1. The result of boosting only P1's prio and sched_class are changed to
      RT. The sched entities of P1 and P1's group are never put. It makes
      cfs_rq invalid, because the cfs_rq has curr and no leaf, but
      pick_next_task_fair() is called, then the kernel panics.
      Signed-off-by: default avatarHiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      0e1f3483
  2. 14 Mar, 2008 3 commits
  3. 13 Mar, 2008 32 commits