1. 10 Jan, 2024 1 commit
  2. 09 Jan, 2024 3 commits
    • John Harrison's avatar
      drm/i915/guc: Avoid circular locking issue on busyness flush · 0e00a881
      John Harrison authored
      Avoid the following lockdep complaint:
      <4> [298.856498] ======================================================
      <4> [298.856500] WARNING: possible circular locking dependency detected
      <4> [298.856503] 6.7.0-rc5-CI_DRM_14017-g58ac4ffc75b6+ #1 Tainted: G
          N
      <4> [298.856505] ------------------------------------------------------
      <4> [298.856507] kworker/4:1H/190 is trying to acquire lock:
      <4> [298.856509] ffff8881103e9978 (&gt->reset.backoff_srcu){++++}-{0:0}, at:
      _intel_gt_reset_lock+0x35/0x380 [i915]
      <4> [298.856661]
      but task is already holding lock:
      <4> [298.856663] ffffc900013f7e58
      ((work_completion)(&(&guc->timestamp.work)->work)){+.+.}-{0:0}, at:
      process_scheduled_works+0x264/0x530
      <4> [298.856671]
      which lock already depends on the new lock.
      
      The complaint is not actually valid. The busyness worker thread does
      indeed hold the worker lock and then attempt to acquire the reset lock
      (which may have happened in reverse order elsewhere). However, it does
      so with a trylock that exits if the reset lock is not available
      (specifically to prevent this and other similar deadlocks).
      Unfortunately, lockdep does not understand the trylock semantics (the
      lock is an i915 specific custom implementation for resets).
      
      Not doing a synchronous flush of the worker thread when a reset is in
      progress resolves the lockdep splat by never even attempting to grab
      the lock in this particular scenario.
      
      There are situatons where a synchronous cancel is required, however.
      So, always do the synchronous cancel if not in reset. And add an extra
      synchronous cancel to the end of the reset flow to account for when a
      reset is occurring at driver shutdown and the cancel is no longer
      synchronous but could lead to unallocated memory accesses if the
      worker is not stopped.
      Signed-off-by: default avatarZhanjun Dong <zhanjun.dong@intel.com>
      Signed-off-by: default avatarJohn Harrison <John.C.Harrison@Intel.com>
      Cc: Andi Shyti <andi.shyti@linux.intel.com>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Reviewed-by: default avatarAndi Shyti <andi.shyti@linux.intel.com>
      Acked-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Link: https://patchwork.freedesktop.org/patch/msgid/20231219195957.212600-1-John.C.Harrison@Intel.com
      0e00a881
    • Alan Previn's avatar
      drm/i915/guc: Close deregister-context race against CT-loss · 2f2cc53b
      Alan Previn authored
      If we are at the end of suspend or very early in resume
      its possible an async fence signal (via rcu_call) is triggered
      to free_engines which could lead us to the execution of
      the context destruction worker (after a prior worker flush).
      
      Thus, when suspending, insert rcu_barriers at the start
      of i915_gem_suspend (part of driver's suspend prepare) and
      again in i915_gem_suspend_late so that all such cases have
      completed and context destruction list isn't missing anything.
      
      In destroyed_worker_func, close the race against CT-loss
      by checking that CT is enabled before calling into
      deregister_destroyed_contexts.
      
      Based on testing, guc_lrc_desc_unpin may still race and fail
      as we traverse the GuC's context-destroy list because the
      CT could be disabled right before calling GuC's CT send function.
      
      We've witnessed this race condition once every ~6000-8000
      suspend-resume cycles while ensuring workloads that render
      something onscreen is continuously started just before
      we suspend (and the workload is small enough to complete
      and trigger the queued engine/context free-up either very
      late in suspend or very early in resume).
      
      In such a case, we need to unroll the entire process because
      guc-lrc-unpin takes a gt wakeref which only gets released in
      the G2H IRQ reply that never comes through in this corner
      case. Without the unroll, the taken wakeref is leaked and will
      cascade into a kernel hang later at the tail end of suspend in
      this function:
      
         intel_wakeref_wait_for_idle(&gt->wakeref)
         (called by) - intel_gt_pm_wait_for_idle
         (called by) - wait_for_suspend
      
      Thus, do an unroll in guc_lrc_desc_unpin and deregister_destroyed_-
      contexts if guc_lrc_desc_unpin fails due to CT send falure.
      When unrolling, keep the context in the GuC's destroy-list so
      it can get picked up on the next destroy worker invocation
      (if suspend aborted) or get fully purged as part of a GuC
      sanitization (end of suspend) or a reset flow.
      Signed-off-by: default avatarAlan Previn <alan.previn.teres.alexis@intel.com>
      Signed-off-by: default avatarAnshuman Gupta <anshuman.gupta@intel.com>
      Tested-by: default avatarMousumi Jana <mousumi.jana@intel.com>
      Acked-by: default avatarDaniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
      Reviewed-by: default avatarRodrigo Vivi <rodrigo.vivi@intel.com>
      Signed-off-by: default avatarMatt Roper <matthew.d.roper@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20231229215143.581619-1-alan.previn.teres.alexis@intel.com
      2f2cc53b
    • Alan Previn's avatar
      drm/i915/guc: Flush context destruction worker at suspend · 5e83c060
      Alan Previn authored
      When suspending, flush the context-guc-id
      deregistration worker at the final stages of
      intel_gt_suspend_late when we finally call gt_sanitize
      that eventually leads down to __uc_sanitize so that
      the deregistration worker doesn't fire off later as
      we reset the GuC microcontroller.
      Signed-off-by: default avatarAlan Previn <alan.previn.teres.alexis@intel.com>
      Reviewed-by: default avatarRodrigo Vivi <rodrigo.vivi@intel.com>
      Tested-by: default avatarMousumi Jana <mousumi.jana@intel.com>
      Signed-off-by: default avatarMatt Roper <matthew.d.roper@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20231228045558.536585-2-alan.previn.teres.alexis@intel.com
      5e83c060
  3. 06 Jan, 2024 1 commit
  4. 05 Jan, 2024 2 commits
  5. 02 Jan, 2024 1 commit
  6. 29 Dec, 2023 4 commits
  7. 22 Dec, 2023 1 commit
  8. 19 Dec, 2023 1 commit
  9. 15 Dec, 2023 9 commits
  10. 14 Dec, 2023 2 commits
  11. 13 Dec, 2023 1 commit
  12. 11 Dec, 2023 1 commit
  13. 08 Dec, 2023 2 commits
  14. 07 Dec, 2023 2 commits
  15. 30 Nov, 2023 3 commits
  16. 29 Nov, 2023 1 commit
  17. 28 Nov, 2023 1 commit
  18. 21 Nov, 2023 1 commit
  19. 20 Nov, 2023 3 commits