• Peter Zijlstra's avatar
    sched/core: Fix TASK_DEAD race in finish_task_switch() · 95913d97
    Peter Zijlstra authored
    So the problem this patch is trying to address is as follows:
    
            CPU0                            CPU1
    
            context_switch(A, B)
                                            ttwu(A)
                                              LOCK A->pi_lock
                                              A->on_cpu == 0
            finish_task_switch(A)
              prev_state = A->state  <-.
              WMB                      |
              A->on_cpu = 0;           |
              UNLOCK rq0->lock         |
                                       |    context_switch(C, A)
                                       `--  A->state = TASK_DEAD
              prev_state == TASK_DEAD
                put_task_struct(A)
                                            context_switch(A, C)
                                            finish_task_switch(A)
                                              A->state == TASK_DEAD
                                                put_task_struct(A)
    
    The argument being that the WMB will allow the load of A->state on CPU0
    to cross over and observe CPU1's store of A->state, which will then
    result in a double-drop and use-after-free.
    
    Now the comment states (and this was true once upon a long time ago)
    that we need to observe A->state while holding rq->lock because that
    will order us against the wakeup; however the wakeup will not in fact
    acquire (that) rq->lock; it takes A->pi_lock these days.
    
    We can obviously fix this by upgrading the WMB to an MB, but that is
    expensive, so we'd rather avoid that.
    
    The alternative this patch takes is: smp_store_release(&A->on_cpu, 0),
    which avoids the MB on some archs, but not important ones like ARM.
    Reported-by: default avatarOleg Nesterov <oleg@redhat.com>
    Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
    Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    Cc: <stable@vger.kernel.org> # v3.1+
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: linux-kernel@vger.kernel.org
    Cc: manfred@colorfullife.com
    Cc: will.deacon@arm.com
    Fixes: e4a52bcb ("sched: Remove rq->lock from the first half of ttwu()")
    Link: http://lkml.kernel.org/r/20150929124509.GG3816@twins.programming.kicks-ass.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
    95913d97
sched.h 44.6 KB