Commit fbefe228 authored by John Garry's avatar John Garry Committed by Martin K. Petersen

scsi: libsas: Don't always drain event workqueue for HA resume

For the hisi_sas driver, if a directly attached disk is removed during
suspend, a hang will occur in the resume process:

The background is that in commit 16fd4a7c ("scsi: hisi_sas: Add device
link between SCSI devices and hisi_hba"), it is ensured that the HBA device
cannot be runtime suspended when any SCSI device associated is active.

Other drivers which use libsas don't worry about this as none support
runtime suspend.

The mentioned hang occurs when an disk is removed during suspend. In the
removal process - from PHYE_RESUME_TIMEOUT event processing - we call into
scsi_remove_device(), which is being processed in the HA event workqueue.
Here we wait for all suppliers of the SCSI device to resume, which includes
the HBA device (from the above commit). However the HBA device cannot
resume, as it is waiting for the PHYE_RESUME_TIMEOUT to be processed (from
calling sas_resume_ha() -> sas_drain_work()). This is the deadlock.

There does not appear to be any need for the sas_drain_work() to be called
at all in sas_resume_ha() as it is not syncing against anything, so allow
LLDDs to avoid this by providing a variant of sas_resume_ha() which does
"sync", i.e. doesn't drain the event workqueue.

Link: https://lore.kernel.org/r/1639999298-244569-2-git-send-email-chenxiang66@hisilicon.comSigned-off-by: default avatarJohn Garry <john.garry@huawei.com>
Signed-off-by: default avatarXiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
parent 4be6181f
......@@ -4950,7 +4950,15 @@ static int _resume_v3_hw(struct device *device)
return rc;
}
phys_init_v3_hw(hisi_hba);
sas_resume_ha(sha);
/*
* If a directly-attached disk is removed during suspend, a deadlock
* may occur, as the PHYE_RESUME_TIMEOUT processing will require the
* hisi_hba->device to be active, which can only happen when resume
* completes. So don't wait for the HA event workqueue to drain upon
* resume.
*/
sas_resume_ha_no_sync(sha);
clear_bit(HISI_SAS_RESETTING_BIT, &hisi_hba->flags);
return 0;
......
......@@ -387,7 +387,7 @@ static int phys_suspended(struct sas_ha_struct *ha)
return rc;
}
void sas_resume_ha(struct sas_ha_struct *ha)
static void _sas_resume_ha(struct sas_ha_struct *ha, bool drain)
{
const unsigned long tmo = msecs_to_jiffies(25000);
int i;
......@@ -417,10 +417,23 @@ void sas_resume_ha(struct sas_ha_struct *ha)
* flush out disks that did not return
*/
scsi_unblock_requests(ha->core.shost);
sas_drain_work(ha);
if (drain)
sas_drain_work(ha);
}
void sas_resume_ha(struct sas_ha_struct *ha)
{
_sas_resume_ha(ha, true);
}
EXPORT_SYMBOL(sas_resume_ha);
/* A no-sync variant, which does not call sas_drain_ha(). */
void sas_resume_ha_no_sync(struct sas_ha_struct *ha)
{
_sas_resume_ha(ha, false);
}
EXPORT_SYMBOL(sas_resume_ha_no_sync);
void sas_suspend_ha(struct sas_ha_struct *ha)
{
int i;
......
......@@ -660,6 +660,7 @@ extern int sas_register_ha(struct sas_ha_struct *);
extern int sas_unregister_ha(struct sas_ha_struct *);
extern void sas_prep_resume_ha(struct sas_ha_struct *sas_ha);
extern void sas_resume_ha(struct sas_ha_struct *sas_ha);
extern void sas_resume_ha_no_sync(struct sas_ha_struct *sas_ha);
extern void sas_suspend_ha(struct sas_ha_struct *sas_ha);
int sas_set_phy_speed(struct sas_phy *phy, struct sas_phy_linkrates *rates);
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment