• Rusty Russell's avatar
    [PATCH] Fix occasional stop_machine() lockup with > 2 CPUs · a041464f
    Rusty Russell authored
    Stephen Rothwell noted a case where one CPU was sitting in userspace, one
    in stop_machine() waiting for everyone to enter stopmachine().  This can
    happen if migration occurs at exactly the wrong time with more than 2 CPUS.
     Say we have 4 CPUS:
    
    1) stop_machine() on CPU 0creates stopmachine() threads for CPUS 1, 2
       and 3, and yields waiting for them to migrate to their CPUs and
       ack.
    
    2) stopmachine(2) gets rebalanced (probably on exec) to CPU 1.
    
    3) stopmachine(2) calls set_cpus_allowed on CPU 1, sleeps awaiting
       migration thread.
    
    4) stopmachine(1) calls set_cpus_allowed on CPU 0, moves onto CPU1 and
       starts spinning.
    
    Now the migration thread never runs, and we deadlock.  The simplest
    solution is for stopmachine() to yield until they are all in place.
    Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
    Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
    a041464f
stop_machine.c 4.69 KB