• Willem Riede's avatar
    Change signal used to exit scsi error handlers · e18106d2
    Willem Riede authored
    I earlier reported, that the error handler for ide-scsi exits prematurely if modprobed
    from rc.sysinit. I put in some debug prints to apprehend the culprit responsible for
    sending the SIGHUP signal that causes the exit.
    
    This is what my log captured:
    
    Jan  1 12:20:13 fallguy kernel: Process 223 [modprobe] starting scsi error handler
    Jan  1 12:20:13 fallguy kernel: Wake up parent of scsi_eh_2, pid 224
    Jan  1 12:20:13 fallguy kernel: Signals pending for scsi_eh_2: 00000000 00000000
    Jan  1 12:20:13 fallguy kernel: Error handler scsi_eh_2 sleeping
    Jan  1 12:20:13 fallguy kernel: scsi2 : SCSI host adapter emulation for IDE ATAPI devices
    [detected devices skipped]
    Jan  1 12:20:14 fallguy kernel: Signal 15 sent from 181 [rc.sysinit] to 182 [getkey]
    Jan  1 12:20:14 fallguy kernel: Signal 1 sent from 22 [init] to 22 [init]
    Jan  1 12:20:14 fallguy kernel: Signal 18 sent from 22 [init] to 22 [init]
    Jan  1 12:20:14 fallguy kernel: Signal 1 sent from 22 [init] to 22 [init]
    Jan  1 12:20:14 fallguy kernel: Signal 1 sent from 22 [init] to 24 [initlog]
    Jan  1 12:20:14 fallguy kernel: Signal 1 sent from 22 [init] to 78 [khubd]
    Jan  1 12:20:14 fallguy kernel: Signal 1 sent from 22 [init] to 224 [scsi_eh_2]
    Jan  1 12:20:14 fallguy kernel: Signals pending for scsi_eh_2: 00000001 00000000
    Jan  1 12:20:14 fallguy kernel: Error handler scsi_eh_2 exiting
    
    Here is a snapshot of some processes made during rc.sysinit:
    
      F   UID   PID  PPID PRI  NI   VSZ  RSS WCHAN  STAT TTY        TIME COMMAND
    100     0     1     0  15   0  1332  420 schedu S    ?          0:05 init
    ...
    040     0    22     1  16   0  1332  388 wait4  S    tty1       0:00 init
    000     0    23    22  15   0  4116 1316 wait4  S    tty1       0:00 /bin/bash /
    040     0    24    23  16   0  2160 1364 schedu S    tty1       0:00 /sbin/initl
    ...
    
    Init must have forked to exec bash to exec rc.sysinit which then gets re-executed
    through initlog. When rc.sysinit ends, the last thing it does is send that TERM
    signal from sub-process 181 to getkey (process 182) -- the 'Signal 15 ...' line 
    above.
    
    As the forked init (process 22) exits, it sends a flurry of signals to all surviving
    processes created from it. That looks like standard "if I am to die I need to take
    all my offspring down with me" behavior -- do you agree?
    
    Since we want error handlers to survive, IMHO that means that the choice of signal
    for error handler exit is unfortunate. The source of scsi_error suggests SIGPWR
    might be a worthy alternative. I think that is true. From inspecting init source,
    it is not capable of sending SIGPWR. SIGPWR should never be sent by dying processes
    (its sole use should be from a power daemon _to_ init to shut the system down when
    the juice is running out).
    
    So I suggest the following changes to hosts.c and scsi_error.c:
    e18106d2
scsi_error.c 48.7 KB