accel/habanalabs: disable PCI when escalating compute to hard-reset

In case a compute reset has failed or a request for a hard reset has just arrived, then we escalate current reset procedure from compute to hard-reset. In such a case, the FW should be aware of the updated error cause, and if LKD is the one who performs the reset (rather than the FW), then we ask the FW to disable PCI access. We would also like to have relevant debug info and therefore we print the currently escalating reset type. Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>

accel/habanalabs: disable PCI when escalating compute to hard-reset
In case a compute reset has failed or a request for a hard reset has just arrived, then we escalate current reset procedure from compute to hard-reset. In such a case, the FW should be aware of the updated error cause, and if LKD is the one who performs the reset (rather than the FW), then we ask the FW to disable PCI access. We would also like to have relevant debug info and therefore we print the currently escalating reset type. Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
57479adb · Koby Elbaz · Oded Gabbay · e0ad6bad · 57479adb
Commit 57479adb authored Jan 29, 2023 by Koby Elbaz Committed by Oded Gabbay Mar 15, 2023
Hide whitespace changes
Inline Side-by-side

Showing with 8 additions and 8 deletions

drivers/accel/habanalabs/common/device.c drivers/accel/habanalabs/common/device.c +8 -8

No files found.
--- a/drivers/accel/habanalabs/common/device.c
+++ b/drivers/accel/habanalabs/common/device.c
@@ -1452,7 +1452,7 @@ static void handle_reset_trigger(struct hl_device *hdev, u32 flags)
 		 */
 		if (hl_fw_send_pci_access_msg(hdev, CPUCP_PACKET_DISABLE_PCI_ACCESS, 0x0))
 			dev_warn(hdev->dev,
-				"Failed to disable PCI access by F/W\n");
+				"Failed to disable FW's PCI access\n");
 	}
 }

@@ -1530,14 +1530,14 @@ int hl_device_reset(struct hl_device *hdev, u32 flags)

 	/*
 	 * Prevent concurrency in this function - only one reset should be
-	 * done at any given time. Only need to perform this if we didn't
-	 * get from the dedicated hard reset thread
+	 * done at any given time. We need to perform this only if we didn't
+	 * get here from a dedicated hard reset thread.
 	 */
 	if (!from_hard_reset_thread) {
 		/* Block future CS/VM/JOB completion operations */
 		spin_lock(&hdev->reset_info.lock);
 		if (hdev->reset_info.in_reset) {
-			/* We only allow scheduling of a hard reset during compute reset */
+			/* We allow scheduling of a hard reset only during a compute reset */
 			if (hard_reset && hdev->reset_info.in_compute_reset)
 				hdev->reset_info.hard_reset_schedule_flags = flags;
 			spin_unlock(&hdev->reset_info.lock);
@@ -1574,6 +1574,7 @@ int hl_device_reset(struct hl_device *hdev, u32 flags)
 		if (delay_reset)
 			usleep_range(HL_RESET_DELAY_USEC, HL_RESET_DELAY_USEC << 1);

+escalate_reset_flow:
 		handle_reset_trigger(hdev, flags);

 		/* This also blocks future CS/VM/JOB completion operations */
@@ -1589,7 +1590,6 @@ int hl_device_reset(struct hl_device *hdev, u32 flags)
 			dev_dbg(hdev->dev, "Going to reset engines of inference device\n");
 	}

-again:
 	if ((hard_reset) && (!from_hard_reset_thread)) {
 		hdev->reset_info.hard_reset_pending = true;

@@ -1837,7 +1837,7 @@ int hl_device_reset(struct hl_device *hdev, u32 flags)
 			hdev->disabled = true;
 			hard_reset = true;
 			handle_reset_trigger(hdev, flags);
-			goto again;
+			goto escalate_reset_flow;
 		}
 	}

@@ -1860,14 +1860,14 @@ int hl_device_reset(struct hl_device *hdev, u32 flags)
 		flags |= HL_DRV_RESET_HARD;
 		flags &= ~HL_DRV_RESET_DEV_RELEASE;
 		hard_reset = true;
-		goto again;
+		goto escalate_reset_flow;
 	} else {
 		spin_unlock(&hdev->reset_info.lock);
 		dev_err(hdev->dev, "Failed to do compute reset\n");
 		hdev->reset_info.compute_reset_cnt++;
 		flags |= HL_DRV_RESET_HARD;
 		hard_reset = true;
-		goto again;
+		goto escalate_reset_flow;
 	}

 	hdev->reset_info.in_reset = 0;