• James Smart's avatar
    scsi: lpfc: Fix NVMe recovery after mailbox timeout · 9ec58ec7
    James Smart authored
    If a mailbox command times out, the SLI port is deemed in error and the
    port is reset.  The HBA cleanup is not returning I/Os to the NVMe layer
    before the port is unregistered. This is due to the HBA being marked
    offline (!SLI_ACTIVE) and cleanup being done by the mailbox timeout handler
    rather than an general adapter reset routine.  The mailbox timeout handler
    mailbox handler only cleaned up SCSI I/Os.
    
    Fix by reworking the mailbox handler to:
    
     - After handling the mailbox error, detect the board is already in
       failure (may be due to another error), and leave cleanup to the
       other handler.
    
     - If the mailbox command timeout is initial detector of the port error,
       continue with the board cleanup and marking the adapter offline
       (!SLI_ACTIVE). Remove the SCSI-only I/O cleanup routine. The generic
       reset adapter routine that is subsequently invoked, will clean up the
       I/Os.
    
     - Have the reset adapter routine flush all NVMe and SCSI I/Os if the
       adapter has been marked failed (!SLI_ACTIVE).
    
     - Rework the NVMe I/O terminate routine to take a status code to fail the
       I/O with and update so that cleaned up I/O calls the wqe completion
       routine. Currently it is bypassing the wqe cleanup and calling the NVMe
       I/O completion directly. The wqe completion routine will take care of
       data structure and node cleanup then call the NVMe I/O completion
       handler.
    
    Link: https://lore.kernel.org/r/20210104180240.46824-11-jsmart2021@gmail.comCo-developed-by: default avatarDick Kennedy <dick.kennedy@broadcom.com>
    Signed-off-by: default avatarDick Kennedy <dick.kennedy@broadcom.com>
    Signed-off-by: default avatarJames Smart <jsmart2021@gmail.com>
    Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
    9ec58ec7
lpfc_init.c 414 KB