1. 02 Apr, 2010 13 commits
    • Peter Zijlstra's avatar
      sched: Add enqueue/dequeue flags · 371fd7e7
      Peter Zijlstra authored
      In order to reduce the dependency on TASK_WAKING rework the enqueue
      interface to support a proper flags field.
      
      Replace the int wakeup, bool head arguments with an int flags argument
      and create the following flags:
      
        ENQUEUE_WAKEUP - the enqueue is a wakeup of a sleeping task,
        ENQUEUE_WAKING - the enqueue has relative vruntime due to
                         having sched_class::task_waking() called,
        ENQUEUE_HEAD - the waking task should be places on the head
                       of the priority queue (where appropriate).
      
      For symmetry also convert sched_class::dequeue() to a flags scheme.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      371fd7e7
    • Peter Zijlstra's avatar
      sched: Fix nr_uninterruptible count · cc87f76a
      Peter Zijlstra authored
      The cpuload calculation in calc_load_account_active() assumes
      rq->nr_uninterruptible will not change on an offline cpu after
      migrate_nr_uninterruptible(). However the recent migrate on wakeup
      changes broke that and would result in decrementing the offline cpu's
      rq->nr_uninterruptible.
      
      Fix this by accounting the nr_uninterruptible on the waking cpu.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      cc87f76a
    • Peter Zijlstra's avatar
      sched: Optimize task_rq_lock() · 65cc8e48
      Peter Zijlstra authored
      Now that we hold the rq->lock over set_task_cpu() again, we can do
      away with most of the TASK_WAKING checks and reduce them again to
      set_cpus_allowed_ptr().
      
      Removes some conditionals from scheduling hot-paths.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Oleg Nesterov <oleg@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      65cc8e48
    • Peter Zijlstra's avatar
      sched: Fix TASK_WAKING vs fork deadlock · 0017d735
      Peter Zijlstra authored
      Oleg noticed a few races with the TASK_WAKING usage on fork.
      
       - since TASK_WAKING is basically a spinlock, it should be IRQ safe
       - since we set TASK_WAKING (*) without holding rq->lock it could
         be there still is a rq->lock holder, thereby not actually
         providing full serialization.
      
      (*) in fact we clear PF_STARTING, which in effect enables TASK_WAKING.
      
      Cure the second issue by not setting TASK_WAKING in sched_fork(), but
      only temporarily in wake_up_new_task() while calling select_task_rq().
      
      Cure the first by holding rq->lock around the select_task_rq() call,
      this will disable IRQs, this however requires that we push down the
      rq->lock release into select_task_rq_fair()'s cgroup stuff.
      
      Because select_task_rq_fair() still needs to drop the rq->lock we
      cannot fully get rid of TASK_WAKING.
      Reported-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      0017d735
    • Oleg Nesterov's avatar
      sched: Make select_fallback_rq() cpuset friendly · 9084bb82
      Oleg Nesterov authored
      Introduce cpuset_cpus_allowed_fallback() helper to fix the cpuset problems
      with select_fallback_rq(). It can be called from any context and can't use
      any cpuset locks including task_lock(). It is called when the task doesn't
      have online cpus in ->cpus_allowed but ttwu/etc must be able to find a
      suitable cpu.
      
      I am not proud of this patch. Everything which needs such a fat comment
      can't be good even if correct. But I'd prefer to not change the locking
      rules in the code I hardly understand, and in any case I believe this
      simple change make the code much more correct compared to deadlocks we
      currently have.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20100315091027.GA9155@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      9084bb82
    • Oleg Nesterov's avatar
      sched: _cpu_down(): Don't play with current->cpus_allowed · 6a1bdc1b
      Oleg Nesterov authored
      _cpu_down() changes the current task's affinity and then recovers it at
      the end. The problems are well known: we can't restore old_allowed if it
      was bound to the now-dead-cpu, and we can race with the userspace which
      can change cpu-affinity during unplug.
      
      _cpu_down() should not play with current->cpus_allowed at all. Instead,
      take_cpu_down() can migrate the caller of _cpu_down() after __cpu_disable()
      removes the dying cpu from cpu_online_mask.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarRafael J. Wysocki <rjw@sisk.pl>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20100315091023.GA9148@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      6a1bdc1b
    • Oleg Nesterov's avatar
      sched: sched_exec(): Remove the select_fallback_rq() logic · 30da688e
      Oleg Nesterov authored
      sched_exec()->select_task_rq() reads/updates ->cpus_allowed lockless.
      This can race with other CPUs updating our ->cpus_allowed, and this
      looks meaningless to me.
      
      The task is current and running, it must have online cpus in ->cpus_allowed,
      the fallback mode is bogus. And, if ->sched_class returns the "wrong" cpu,
      this likely means we raced with set_cpus_allowed() which was called
      for reason, why should sched_exec() retry and call ->select_task_rq()
      again?
      
      Change the code to call sched_class->select_task_rq() directly and do
      nothing if the returned cpu is wrong after re-checking under rq->lock.
      
      From now task_struct->cpus_allowed is always stable under TASK_WAKING,
      select_fallback_rq() is always called under rq-lock or the caller or
      the caller owns TASK_WAKING (select_task_rq).
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20100315091019.GA9141@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      30da688e
    • Oleg Nesterov's avatar
      sched: move_task_off_dead_cpu(): Remove retry logic · c1804d54
      Oleg Nesterov authored
      The previous patch preserved the retry logic, but it looks unneeded.
      
      __migrate_task() can only fail if we raced with migration after we dropped
      the lock, but in this case the caller of set_cpus_allowed/etc must initiate
      migration itself if ->on_rq == T.
      
      We already fixed p->cpus_allowed, the changes in active/online masks must
      be visible to racer, it should migrate the task to online cpu correctly.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20100315091014.GA9138@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      c1804d54
    • Oleg Nesterov's avatar
      sched: move_task_off_dead_cpu(): Take rq->lock around select_fallback_rq() · 1445c08d
      Oleg Nesterov authored
      move_task_off_dead_cpu()->select_fallback_rq() reads/updates ->cpus_allowed
      lockless. We can race with set_cpus_allowed() running in parallel.
      
      Change it to take rq->lock around select_fallback_rq(). Note that it is not
      trivial to move this spin_lock() into select_fallback_rq(), we must recheck
      the task was not migrated after we take the lock and other callers do not
      need this lock.
      
      To avoid the races with other callers of select_fallback_rq() which rely on
      TASK_WAKING, we also check p->state != TASK_WAKING and do nothing otherwise.
      The owner of TASK_WAKING must update ->cpus_allowed and choose the correct
      CPU anyway, and the subsequent __migrate_task() is just meaningless because
      p->se.on_rq must be false.
      
      Alternatively, we could change select_task_rq() to take rq->lock right
      after it calls sched_class->select_task_rq(), but this looks a bit ugly.
      
      Also, change it to not assume irqs are disabled and absorb __migrate_task_irq().
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20100315091010.GA9131@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      1445c08d
    • Oleg Nesterov's avatar
      sched: Kill the broken and deadlockable cpuset_lock/cpuset_cpus_allowed_locked code · 897f0b3c
      Oleg Nesterov authored
      This patch just states the fact the cpusets/cpuhotplug interaction is
      broken and removes the deadlockable code which only pretends to work.
      
      - cpuset_lock() doesn't really work. It is needed for
        cpuset_cpus_allowed_locked() but we can't take this lock in
        try_to_wake_up()->select_fallback_rq() path.
      
      - cpuset_lock() is deadlockable. Suppose that a task T bound to CPU takes
        callback_mutex. If cpu_down(CPU) happens before T drops callback_mutex
        stop_machine() preempts T, then migration_call(CPU_DEAD) tries to take
        cpuset_lock() and hangs forever because CPU is already dead and thus
        T can't be scheduled.
      
      - cpuset_cpus_allowed_locked() is deadlockable too. It takes task_lock()
        which is not irq-safe, but try_to_wake_up() can be called from irq.
      
      Kill them, and change select_fallback_rq() to use cpu_possible_mask, like
      we currently do without CONFIG_CPUSETS.
      
      Also, with or without this patch, with or without CONFIG_CPUSETS, the
      callers of select_fallback_rq() can race with each other or with
      set_cpus_allowed() pathes.
      
      The subsequent patches try to to fix these problems.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20100315091003.GA9123@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      897f0b3c
    • Li Zefan's avatar
      sched: Remove USER_SCHED from documentation · 25c2d55c
      Li Zefan authored
      USER_SCHED has been removed, so update the documentation
      accordingly.
      Signed-off-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: default avatarSerge E. Hallyn <serue@us.ibm.com>
      LKML-Reference: <4BA9A07E.8070508@cn.fujitsu.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      25c2d55c
    • Li Zefan's avatar
      sched: Remove remaining USER_SCHED code · 32bd7eb5
      Li Zefan authored
      This is left over from commit 7c941438 ("sched: Remove USER_SCHED"")
      Signed-off-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      Acked-by: default avatarDhaval Giani <dhaval.giani@gmail.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: David Howells <dhowells@redhat.com>
      LKML-Reference: <4BA9A05F.7010407@cn.fujitsu.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      32bd7eb5
    • Ingo Molnar's avatar
      Merge branch 'linus' into sched/core · c9494727
      Ingo Molnar authored
      Merge reason: update to latest upstream
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      c9494727
  2. 01 Apr, 2010 27 commits