• Roland McGrath's avatar
    [PATCH] signal fix for wedge on multithreaded core dump · 874f2e47
    Roland McGrath authored
    This is a fix made almost a month ago, during the flurry of signal changes.
    I didn't realize until today that this hadn't made it into 2.5.  Sorry
    about the delay.
    
    This fix is necessary to avoid sometimes wedging in uninterruptible sleep
    when doing a multithreaded core dump triggered by a process signal (kill)
    rather than a trap.  You can reproduce the problem by running your favorite
    multithreaded program (NPTL) and then using "kill -SEGV" on it.  It will
    often wedge.  The actual fix could be just a two line diff:
    
    +                       if (current->signal->group_exit)
    +                               goto dequeue;
    
    after the group_exit_task check.  That is the fix that has been used in
    Ingo's backport for weeks and tested heavily (well, as heavily as core
    dumping ever gets tested, but it's been in our production systems).
    
    But I broke the hair out into a separate function.  The patch below has the
    same effect as the two-liner, and no other difference.  I have tested
    2.5.64 with this patch and it works for me, though I haven't beat on it.
    
    The way the wedge happens is that for a core-dump signal group_send_sig_info
    does a group stop of other threads before the one thread handles the fatal
    signal.  If the fatal thread gets into do_coredump and coredump_wait first,
    then other threads see the group stop and suspend with SIGKILL pending.
    All other fatal cases clear group_stop_count, so this is the only way this
    ever happens.  Checking group_exit fixes it.  I didn't make do_coredump
    clear group_stop_count because doing it with the appropriate ordering and
    locking doesn't fit the organization that code.
    874f2e47
signal.c 57 KB