• James Smart's avatar
    scsi: lpfc: Improve PCI EEH Error and Recovery Handling · 35ed9613
    James Smart authored
    Following EEH errors, the driver can crash or hang when deleting the
    localport or when attempting to unload.
    
    The EEH handlers in the driver did not notify the NVMe-FC transport before
    tearing the driver down. This was delayed until the resume steps. This
    worked for SCSI because lpfc_block_scsi() would notify the
    scsi_fc_transport that the target was not available but it would not clean
    up all the references to the ndlp.
    
    The SLI3 prep for dev reset handler did the lpfc_offline_prep() and
    lpfc_offline() calls to get the port stopped before restarting. The SLI4
    version of the prep for dev reset just destroyed the queues and did not
    stop NVMe from continuing.  Also because the port was not really stopped
    the localport destroy would hang because the transport was still waiting
    for I/O. Additionally, a devloss tmo can fire and post events to a stopped
    worker thread creating another hang condition.
    
    lpfc_sli4_prep_dev_for_reset() is modified to call lpfc_offline_prep() and
    lpfc_offline() rather than just lpfc_scsi_dev_block() to ensure both SCSI
    and NVMe transports are notified to block I/O to the driver.
    
    Logic is added to devloss handler and worker thread to clean up ndlp
    references and quiesce appropriately.
    
    Link: https://lore.kernel.org/r/20220317032737.45308-2-jsmart2021@gmail.comCo-developed-by: default avatarJustin Tee <justin.tee@broadcom.com>
    Signed-off-by: default avatarJustin Tee <justin.tee@broadcom.com>
    Signed-off-by: default avatarJames Smart <jsmart2021@gmail.com>
    Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
    35ed9613
lpfc_sli.c 673 KB