drm/i915/guc: Pre-allocate output nodes for extraction

In the rare but possible scenario where we are in the midst of multiple GuC error-capture (and engine reset) events and the user also triggers a forced full GT reset or the internal watchdog triggers the same, intel_guc_submission_reset_prepare's call to flush_work(&guc->ct.requests.worker) can cause the G2H message handler to trigger intel_guc_capture_store_snapshot upon receiving new G2H error-capture notifications. This can happen despite the prior call to disable_submission(guc);. However, there's no race-free way for intel_guc_capture_store_snapshot to know that we are in the midst of a reset. That said, we can never dynamically allocate the output nodes in this handler. Thus, we shall pre-allocate a fixed number of empty nodes up front (at the time of ADS registration) that we can consume from or return to an internal cached list of nodes. Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220321164527.2500062-12-alan.previn.teres.alexis@intel.com

drm/i915/guc: Pre-allocate output nodes for extraction
In the rare but possible scenario where we are in the midst of multiple GuC error-capture (and engine reset) events and the user also triggers a forced full GT reset or the internal watchdog triggers the same, intel_guc_submission_reset_prepare's call to flush_work(&guc->ct.requests.worker) can cause the G2H message handler to trigger intel_guc_capture_store_snapshot upon receiving new G2H error-capture notifications. This can happen despite the prior call to disable_submission(guc);. However, there's no race-free way for intel_guc_capture_store_snapshot to know that we are in the midst of a reset. That said, we can never dynamically allocate the output nodes in this handler. Thus, we shall pre-allocate a fixed number of empty nodes up front (at the time of ADS registration) that we can consume from or return to an internal cached list of nodes. Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220321164527.2500062-12-alan.previn.teres.alexis@intel.com
247f8071 · Alan Previn · Lucas De Marchi · f5718a72 · 247f8071 · 247f8071
Commit 247f8071 authored Mar 21, 2022 by Alan Previn Committed by Lucas De Marchi Mar 22, 2022
Showing with 161 additions and 35 deletions

drivers/gpu/drm/i915/gt/uc/guc_capture_fwif.h drivers/gpu/drm/i915/gt/uc/guc_capture_fwif.h +17 -2

drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c +144 -33

No files found.
--- a/drivers/gpu/drm/i915/gt/uc/guc_capture_fwif.h
+++ b/drivers/gpu/drm/i915/gt/uc/guc_capture_fwif.h
@@ -31,7 +31,7 @@ struct __guc_capture_bufstate {
 *
 * A single unit of extracted error-capture output data grouped together
 * at an engine-instance level. We keep these nodes in a linked list.
- * See outlist below.
+ * See cachelist and outlist below.
 */
 struct __guc_capture_parsed_output {
 	/*
@@ -190,7 +190,22 @@ struct intel_guc_state_capture {
 	void *ads_null_cache;

 	/**
-	 * @outlist: allocated nodes with parsed engine-instance error capture data
+	 * @cachelist: Pool of pre-allocated nodes for error capture output
+	 *
+	 * We need this pool of pre-allocated nodes because we cannot
+	 * dynamically allocate new nodes when receiving the G2H notification
+	 * because the event handlers for all G2H event-processing is called
+	 * by the ct processing worker queue and when that queue is being
+	 * processed, there is no absoluate guarantee that we are not in the
+	 * midst of a GT reset operation (which doesn't allow allocations).
+	 */
+	struct list_head cachelist;
+#define PREALLOC_NODES_MAX_COUNT (3 * GUC_MAX_ENGINE_CLASSES * GUC_MAX_INSTANCES_PER_CLASS)
+#define PREALLOC_NODES_DEFAULT_NUMREGS 64
+	int max_mmio_per_node;
+
+	/**
+	 * @outlist: Pool of pre-allocated nodes for error capture output
 	 *
 	 * A linked list of parsed GuC error-capture output data before
 	 * reporting with formatting via i915_gpu_coredump. Each node in this linked list shall

--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c