• James Smart's avatar
    scsi: lpfc: Fix unload hang after back to back PCI EEH faults · a4691038
    James Smart authored
    When injecting EEH errors the port is getting hung up waiting on the node
    list to empty, message number 0233. The driver is stuck at this point and
    also can't unload. The driver makes transport remoteport delete calls which
    try to abort I/O's, but the EEH daemon has already called the driver to
    detach and the detachment has set the global FC_UNLOADING flag.  There are
    several code paths that will avoid I/O cleanup if the FC_UNLOADING flag is
    set, resulting in transports waiting for I/O while the driver is waiting on
    transports to clean up.
    
    Additionally, during study of the list, a locking issue was found in
    lpfc_sli_abort_iocb_ring that could corrupt the list.
    
    A special case was added to the lpfc_cleanup() routine to call
    lpfc_sli_flush_rings() if the driver is FC_UNLOADING and if the pci-slot
    is offline (e.g. EEH).
    
    The SLI4 part of lpfc_sli_abort_iocb_ring() is changed to use the
    ring_lock.  Also added code to cancel the I/Os if the pci-slot is offline
    and added checks and returns for the FC_UNLOADING and HBA_IOQ_FLUSH flags
    to prevent trying to send an I/O that we cannot handle.
    
    Link: https://lore.kernel.org/r/20220317032737.45308-3-jsmart2021@gmail.comCo-developed-by: default avatarJustin Tee <justin.tee@broadcom.com>
    Signed-off-by: default avatarJustin Tee <justin.tee@broadcom.com>
    Signed-off-by: default avatarJames Smart <jsmart2021@gmail.com>
    Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
    a4691038
lpfc_sli.c 673 KB