• Arun Easi's avatar
    scsi: qla2xxx: Fix hang during NVMe session tear down · 310e69ed
    Arun Easi authored
    The following hung task call trace was seen:
    
        [ 1230.183294] INFO: task qla2xxx_wq:523 blocked for more than 120 seconds.
        [ 1230.197749] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
        [ 1230.205585] qla2xxx_wq      D    0   523      2 0x80004000
        [ 1230.205636] Workqueue: qla2xxx_wq qlt_free_session_done [qla2xxx]
        [ 1230.205639] Call Trace:
        [ 1230.208100]  __schedule+0x2c4/0x700
        [ 1230.211607]  schedule+0x38/0xa0
        [ 1230.214769]  schedule_timeout+0x246/0x2f0
        [ 1230.222651]  wait_for_completion+0x97/0x100
        [ 1230.226921]  qlt_free_session_done+0x6a0/0x6f0 [qla2xxx]
        [ 1230.232254]  process_one_work+0x1a7/0x360
    
    ...when device side port resets were done.
    
    Abort threads were getting out without processing due to the "deleted"
    flag check. The delete thread, meanwhile, could not proceed with a
    logout (that would have cleared out pending requests) as the logout IOCB
    work was not progressing. It appears like the hung qlt_free_session_done()
    thread is causing the ha->wq works on hold. The qlt_free_session_done()
    was hung waiting for nvme_fc_unregister_remoteport() + localport_delete cb
    to be complete, which would only happen when all I/Os are released.
    
    Fix this by allowing abort to progress until device delete is completely
    done. This should make the qlt_free_session_done() proceed without hang and
    thus clear up the deadlock.
    
    Link: https://lore.kernel.org/r/20210817051315.2477-5-njavali@marvell.comSigned-off-by: default avatarArun Easi <aeasi@marvell.com>
    Signed-off-by: default avatarNilesh Javali <njavali@marvell.com>
    Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
    310e69ed
qla_nvme.c 22.2 KB