1. 12 Dec, 2009 1 commit
    • Abhijeet Joglekar's avatar
      [SCSI] libfc: remote port gets stuck in restart state without really restarting · 5543c72e
      Abhijeet Joglekar authored
      We ran into a scenario where a remote port goes into RESTART state, but
      never gets added to scsi transport. The running vmcore showed the following:
      a) Port was in RESTART state
      b) rdata->event was STOP
      c) no work gets scheduled for the remote work to fc_rport_work
      
      After this point, shut/no-shut of the remote port did not cause the port
      to get re-discovered. The port would move betwen DELETE and RESTART states,
      but the event would always be STOP, no work would get scheduled to
      fc_rport_work and the port would not get added to scsi_transport.
      
      The problem is that rdata->event is not set to NONE after a port is
      restarted. After this point, no more work gets scheduled for the remote port
      since new work is scheduled only if rdata->event is non-NONE. So, the event
      and state keep changing, but fc_rport_work does not get scheduled to actually
      handle the event.
      
      Here's a transition of states that explains the above observation:
      
      ) Port is first in READY State, event is NONE
      
      2) RSCN on shut, port goes to DELETED, event is stop
      
      3) Before fc_rport_work runs, RSCN on no-shut, port goes to RESTART, event is
      still STOP
      
      4) fc_rport_work gets scheduled, removes the port from transport, sees state
      as RESTART, begins the PLOGI state machine, event remains as STOP (event NOT
      changed to NONE, this is the bug)
      
      5) Plogi state machine completes, port state goes to READY, event goes to
      READY, but no work is scheduled since event was STOP (non-NONE) before.
      Fc_rport_work is not scheduled, port remains in READY state, but is not added
      to transport.
      
      Things are broken at this point. Libfc rport is ready, but no transport rport
      created.
      
      6) now a shut causes port state to change to DELETE, event to change to STOP,
      no work gets scheduled
      
      7) no-shut causes port state to change to RESTART, event remains at STOP,
      no work gets scheduled
      
      (6) and (7) now get repeated everytime we do shut/no-shut. No way to get out
      of this state. Fcc reset does not help too.
      
      Only way to get out is to load/unload module.
      
      Fix is to set rdata->event to NONE while processing the STOP/LOGO/FAILED
      events, inside the discovery and rport locks.
      Signed-off-by: default avatarAbhijeet Joglekar <abjoglek@cisco.com>
      Signed-off-by: default avatarRobert Love <robert.w.love@intel.com>
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@suse.de>
      5543c72e
  2. 10 Dec, 2009 39 commits