Commit de606900 authored by Guilherme G. Piccoli's avatar Guilherme G. Piccoli Committed by Stefan Bader

nvme: Avoid reset work on watchdog timer function during error recovery

BugLink: http://bugs.launchpad.net/bugs/1602724

This patch adds a check on nvme_watchdog_timer() function to avoid the
call to reset_work() when an error recovery process is ongoing on
controller. The check is made by looking at pci_channel_offline()
result.

If we don't check for this on nvme_watchdog_timer(), error recovery
mechanism can't recover well, because reset_work() won't be able to
do its job (since we're in the middle of an error) and so the
controller is removed from the system before error recovery mechanism
can perform slot reset (which would allow the adapter to recover).

In this patch we also have split the huge condition expression on
nvme_watchdog_timer() by introducing an auxiliary function to help
make the code more readable.
Reviewed-by: default avatarKeith Busch <keith.busch@intel.com>
Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: default avatarGuilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Signed-off-by: default avatarJens Axboe <axboe@fb.com>
(cherry picked from commit c875a709)
Signed-off-by: default avatarTim Gardner <tim.gardner@canonical.com>
Acked-by: default avatarBrad Figg <brad.figg@canonical.com>
Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
parent c33fe0a8
...@@ -1366,22 +1366,44 @@ static int nvme_configure_admin_queue(struct nvme_dev *dev) ...@@ -1366,22 +1366,44 @@ static int nvme_configure_admin_queue(struct nvme_dev *dev)
return result; return result;
} }
static bool nvme_should_reset(struct nvme_dev *dev, u32 csts)
{
/* If true, indicates loss of adapter communication, possibly by a
* NVMe Subsystem reset.
*/
bool nssro = dev->subsystem && (csts & NVME_CSTS_NSSRO);
/* If there is a reset ongoing, we shouldn't reset again. */
if (work_busy(&dev->reset_work))
return false;
/* We shouldn't reset unless the controller is on fatal error state
* _or_ if we lost the communication with it.
*/
if (!(csts & NVME_CSTS_CFS) && !nssro)
return false;
/* If PCI error recovery process is happening, we cannot reset or
* the recovery mechanism will surely fail.
*/
if (pci_channel_offline(to_pci_dev(dev->dev)))
return false;
return true;
}
static void nvme_watchdog_timer(unsigned long data) static void nvme_watchdog_timer(unsigned long data)
{ {
struct nvme_dev *dev = (struct nvme_dev *)data; struct nvme_dev *dev = (struct nvme_dev *)data;
u32 csts = readl(dev->bar + NVME_REG_CSTS); u32 csts = readl(dev->bar + NVME_REG_CSTS);
/* /* Skip controllers under certain specific conditions. */
* Skip controllers currently under reset. if (nvme_should_reset(dev, csts)) {
*/ if (queue_work(nvme_workq, &dev->reset_work))
if (!work_pending(&dev->reset_work) && !work_busy(&dev->reset_work) &&
((csts & NVME_CSTS_CFS) ||
(dev->subsystem && (csts & NVME_CSTS_NSSRO)))) {
if (queue_work(nvme_workq, &dev->reset_work)) {
dev_warn(dev->dev, dev_warn(dev->dev,
"Failed status: 0x%x, reset controller.\n", "Failed status: 0x%x, reset controller.\n",
csts); csts);
}
return; return;
} }
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment