• Abhijeet Joglekar's avatar
    [SCSI] libfc: remote port gets stuck in restart state without really restarting · 5543c72e
    Abhijeet Joglekar authored
    We ran into a scenario where a remote port goes into RESTART state, but
    never gets added to scsi transport. The running vmcore showed the following:
    a) Port was in RESTART state
    b) rdata->event was STOP
    c) no work gets scheduled for the remote work to fc_rport_work
    
    After this point, shut/no-shut of the remote port did not cause the port
    to get re-discovered. The port would move betwen DELETE and RESTART states,
    but the event would always be STOP, no work would get scheduled to
    fc_rport_work and the port would not get added to scsi_transport.
    
    The problem is that rdata->event is not set to NONE after a port is
    restarted. After this point, no more work gets scheduled for the remote port
    since new work is scheduled only if rdata->event is non-NONE. So, the event
    and state keep changing, but fc_rport_work does not get scheduled to actually
    handle the event.
    
    Here's a transition of states that explains the above observation:
    
    ) Port is first in READY State, event is NONE
    
    2) RSCN on shut, port goes to DELETED, event is stop
    
    3) Before fc_rport_work runs, RSCN on no-shut, port goes to RESTART, event is
    still STOP
    
    4) fc_rport_work gets scheduled, removes the port from transport, sees state
    as RESTART, begins the PLOGI state machine, event remains as STOP (event NOT
    changed to NONE, this is the bug)
    
    5) Plogi state machine completes, port state goes to READY, event goes to
    READY, but no work is scheduled since event was STOP (non-NONE) before.
    Fc_rport_work is not scheduled, port remains in READY state, but is not added
    to transport.
    
    Things are broken at this point. Libfc rport is ready, but no transport rport
    created.
    
    6) now a shut causes port state to change to DELETE, event to change to STOP,
    no work gets scheduled
    
    7) no-shut causes port state to change to RESTART, event remains at STOP,
    no work gets scheduled
    
    (6) and (7) now get repeated everytime we do shut/no-shut. No way to get out
    of this state. Fcc reset does not help too.
    
    Only way to get out is to load/unload module.
    
    Fix is to set rdata->event to NONE while processing the STOP/LOGO/FAILED
    events, inside the discovery and rport locks.
    Signed-off-by: default avatarAbhijeet Joglekar <abjoglek@cisco.com>
    Signed-off-by: default avatarRobert Love <robert.w.love@intel.com>
    Signed-off-by: default avatarJames Bottomley <James.Bottomley@suse.de>
    5543c72e
fc_rport.c 45.8 KB