• John Garry's avatar
    scsi: libsas: Don't always drain event workqueue for HA resume · fbefe228
    John Garry authored
    For the hisi_sas driver, if a directly attached disk is removed during
    suspend, a hang will occur in the resume process:
    
    The background is that in commit 16fd4a7c ("scsi: hisi_sas: Add device
    link between SCSI devices and hisi_hba"), it is ensured that the HBA device
    cannot be runtime suspended when any SCSI device associated is active.
    
    Other drivers which use libsas don't worry about this as none support
    runtime suspend.
    
    The mentioned hang occurs when an disk is removed during suspend. In the
    removal process - from PHYE_RESUME_TIMEOUT event processing - we call into
    scsi_remove_device(), which is being processed in the HA event workqueue.
    Here we wait for all suppliers of the SCSI device to resume, which includes
    the HBA device (from the above commit). However the HBA device cannot
    resume, as it is waiting for the PHYE_RESUME_TIMEOUT to be processed (from
    calling sas_resume_ha() -> sas_drain_work()). This is the deadlock.
    
    There does not appear to be any need for the sas_drain_work() to be called
    at all in sas_resume_ha() as it is not syncing against anything, so allow
    LLDDs to avoid this by providing a variant of sas_resume_ha() which does
    "sync", i.e. doesn't drain the event workqueue.
    
    Link: https://lore.kernel.org/r/1639999298-244569-2-git-send-email-chenxiang66@hisilicon.comSigned-off-by: default avatarJohn Garry <john.garry@huawei.com>
    Signed-off-by: default avatarXiang Chen <chenxiang66@hisilicon.com>
    Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
    fbefe228
libsas.h 18.2 KB