Commit 91bc0bf7 authored by Andrew Morton's avatar Andrew Morton Committed by Linus Torvalds

[PATCH] Hotplug CPU sched_balance_exec Fix

From: Rusty Russell <rusty@rustcorp.com.au>

From: Srivatsa Vaddagiri <vatsa@in.ibm.com>
From: Andrew Morton <akpm@osdl.org>
From: Rusty Russell <rusty@rustcorp.com.au>

We want to get rid of lock_cpu_hotplug() in sched_migrate_task.  Found
that lockless migration of execing task is _extremely_ racy.  The
races I hit are described below, alongwith probable solutions.

Task migration done elsewhere should be safe (?) since they either
hold the lock (sys_sched_setaffinity) or are done entirely with preemption 
disabled (load_balance).

   sched_balance_exec does:

	a. disables preemption
	b. finds new_cpu for current
	c. enables preemption
	d. calls sched_migrate_task to migrate current to new_cpu

   and sched_migrate_task does:

	e. task_rq_lock(p)
	f. migrate_task(p, dest_cpu ..)
		(if we have to wait for migration thread)
		g. task_rq_unlock()
		h. wake_up_process(rq->migration_thread)
		i. wait_for_completion()

   Several things can happen here:

	1. new_cpu can go down after h and before migration thread has
	   got around to handle the request

	   ==> we need to add a cpu_is_offline check in __migrate_task

	2. new_cpu can go down between c and d or before f.

	   ===> Even though this case is automatically handled by the above 
	        change (migrate_task being called on a running task, current,
		will delegate migration to migration thread), would it be 
	 	good practice to avoid calling migrate_task in the first place
		itself when dest_cpu is offline. This means adding another
	 	cpu_is_offline check after e in sched_migrate_task

	3. The 'current' task can get preempted _immediately_ after
	   g and when it comes back, task_cpu(p) can be dead. In
	   which case, it is invalid to do wake_up on a non-existent migration 
	   thread.  (rq->migration_thread can be NULL).

	   ===> We should disable preemption thr' g and h

	4. Before migration thread gets around to handle the request, its cpu
	   goes dead. This will leave unhandled migration requests in the dead 
	   cpu. 

	   ===> We need to wakeup sleeping requestors (if any) in CPU_DEAD
	        notification.

I really wonder if we can get rid of these issues by avoiding balancing at 
exec time and instead have it balanced during load_balance ..Alternately
if this is valuable and we want to retain it, I think we still need to
consider a read/write sem, with sched_migrate_task doing down_read_trylock.
This may eliminate the deadlock I hit between cpu_up and CPU_UP_PREPARE 
notification, which had forced me away from r/w sem.

Anyway patch below addresses the above races. Its against 2.6.6-rc2-mm1
and has been tested on a 4way Intel Pentium SMP m/c.


Rusty sez:

Two other changes:
1) I grabbed a reference to the thread, rather than using
preempt_disable().  It's the more obvious way I think.

2) Why the wait_to_die code?  It might be needed if we move tasks after
stop_machine, but for nowI don't see the problem with the migration
thread running on the wrong CPU for a bit: nothing is on this runqueue
so active_load_balance is safe, and __migrate task will be a noop (due
to cpu_is_offline() check).  If there is a problem, your fix is racy,
because we could be preempted immediately afterwards.

So I just stop the kthread then wakeup any remaining...
parent 850f7d78
......@@ -1124,16 +1124,19 @@ static void sched_migrate_task(task_t *p, int dest_cpu)
runqueue_t *rq;
unsigned long flags;
lock_cpu_hotplug();
rq = task_rq_lock(p, &flags);
if (!cpu_isset(dest_cpu, p->cpus_allowed))
if (!cpu_isset(dest_cpu, p->cpus_allowed)
|| unlikely(cpu_is_offline(dest_cpu)))
goto out;
/* force the process onto the specified CPU */
if (migrate_task(p, dest_cpu, &req)) {
/* Need to wait for migration thread. */
/* Need to wait for migration thread (might exit: take ref). */
struct task_struct *mt = rq->migration_thread;
get_task_struct(mt);
task_rq_unlock(rq, &flags);
wake_up_process(rq->migration_thread);
wake_up_process(mt);
put_task_struct(mt);
wait_for_completion(&req.done);
/*
......@@ -1142,13 +1145,11 @@ static void sched_migrate_task(task_t *p, int dest_cpu)
* the migration.
*/
tlb_migrate_prepare(current->mm);
unlock_cpu_hotplug();
return;
}
out:
task_rq_unlock(rq, &flags);
unlock_cpu_hotplug();
}
/*
......@@ -3191,6 +3192,9 @@ static void __migrate_task(struct task_struct *p, int dest_cpu)
{
runqueue_t *rq_dest;
if (unlikely(cpu_is_offline(dest_cpu)))
return;
rq_dest = cpu_rq(dest_cpu);
double_rq_lock(this_rq(), rq_dest);
......@@ -3349,9 +3353,24 @@ static int migration_call(struct notifier_block *nfb, unsigned long action,
/* Unbind it from offline cpu so it can run. Fall thru. */
kthread_bind(cpu_rq(cpu)->migration_thread,smp_processor_id());
case CPU_DEAD:
kthread_stop(cpu_rq(cpu)->migration_thread);
cpu_rq(cpu)->migration_thread = NULL;
BUG_ON(cpu_rq(cpu)->nr_running != 0);
rq = cpu_rq(cpu);
kthread_stop(rq->migration_thread);
rq->migration_thread = NULL;
BUG_ON(rq->nr_running != 0);
/* No need to migrate the tasks: it was best-effort if
* they didn't do lock_cpu_hotplug(). Just wake up
* the requestors. */
spin_lock_irq(&rq->lock);
while (!list_empty(&rq->migration_queue)) {
migration_req_t *req;
req = list_entry(rq->migration_queue.next,
migration_req_t, list);
BUG_ON(req->type != REQ_MOVE_TASK);
list_del_init(&req->list);
complete(&req->done);
}
spin_unlock_irq(&rq->lock);
break;
#endif
}
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment