• James Smart's avatar
    scsi: lpfc: Fix bad ndlp ptr in xri aborted handling · 324e1c40
    James Smart authored
    In cases where I/O may be aborted, such as driver unload or link bounces,
    the system will crash based on a bad ndlp pointer.
    
    Example:
      RIP: 0010:lpfc_sli4_abts_err_handler+0x15/0x140 [lpfc]
      ...
      lpfc_sli4_io_xri_aborted+0x20d/0x270 [lpfc]
      lpfc_sli4_sp_handle_abort_xri_wcqe.isra.54+0x84/0x170 [lpfc]
      lpfc_sli4_fp_handle_cqe+0xc2/0x480 [lpfc]
      __lpfc_sli4_process_cq+0xc6/0x230 [lpfc]
      __lpfc_sli4_hba_process_cq+0x29/0xc0 [lpfc]
      process_one_work+0x14c/0x390
    
    Crash was caused by a bad ndlp address passed to I/O indicated by the XRI
    aborted CQE.  The address was not NULL so the routine deferenced the ndlp
    ptr. The bad ndlp also caused the lpfc_sli4_io_xri_aborted to call an
    erroneous io handler.  Root cause for the bad ndlp was an lpfc_ncmd that
    was aborted, put on the abort_io list, completed, taken off the abort_io
    list, sent to lpfc_release_nvme_buf where it was put back on the abort_io
    list because the lpfc_ncmd->flags setting LPFC_SBUF_XBUSY was not cleared
    on the final completion.
    
    Rework the exchange busy handling to ensure the flags are properly set for
    both scsi and nvme.
    
    Fixes: c490850a ("scsi: lpfc: Adapt partitioned XRI lists to efficient sharing")
    Cc: <stable@vger.kernel.org> # v5.1+
    Link: https://lore.kernel.org/r/20191018211832.7917-6-jsmart2021@gmail.comSigned-off-by: default avatarDick Kennedy <dick.kennedy@broadcom.com>
    Signed-off-by: default avatarJames Smart <jsmart2021@gmail.com>
    Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
    324e1c40
lpfc_sli.h 16.1 KB