1. 23 Jul, 2024 34 commits
  2. 22 Jul, 2024 4 commits
  3. 18 Jul, 2024 2 commits
    • Matthew Brost's avatar
      drm/xe: Don't suspend device upon wedge · 90936a0a
      Matthew Brost authored
      When wedging a device we shouldn't be suspending device as state for
      debug will be lost.
      
      Also this appears to not work as the below stack trace pops upon trying
      to resume a wedged device:
      
      [  304.245044] INFO: task cat:12115 blocked for more than 151 seconds.
      [  304.251333]       Tainted: G        W          6.10.0-rc7-xe+ #3518
      [  304.257617] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [  304.265459] task:cat             state:D stack:13384 pid:12115 tgid:12115 ppid:3986   flags:0x00000006
      [  304.265465] Call Trace:
      [  304.265467]  <TASK>
      [  304.265469]  __schedule+0x3c4/0xdf0
      [  304.265478]  schedule+0x3c/0x140
      [  304.265481]  rpm_resume+0x1cc/0x740
      [  304.265484]  ? __pfx_autoremove_wake_function+0x10/0x10
      [  304.265489]  __pm_runtime_resume+0x49/0x80
      [  304.265494]  guc_info+0x6b/0xb0 [xe]
      [  304.265538]  ? __pfx___drm_printfn_seq_file+0x10/0x10
      [  304.265541]  ? __pfx___drm_puts_seq_file+0x10/0x10
      [  304.265545]  seq_read_iter+0x111/0x4c0
      [  304.265551]  seq_read+0xfc/0x140
      [  304.265556]  full_proxy_read+0x58/0x80
      [  304.265560]  vfs_read+0xa7/0x360
      [  304.265563]  ? find_held_lock+0x2b/0x80
      [  304.265568]  ksys_read+0x64/0xe0
      [  304.265571]  do_syscall_64+0x68/0x140
      [  304.265575]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
      [  304.265578] RIP: 0033:0x7f4254d14992
      [  304.265580] RSP: 002b:00007ffc558666f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
      [  304.265583] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f4254d14992
      [  304.265584] RDX: 0000000000020000 RSI: 00007f4254ebb000 RDI: 0000000000000003
      [  304.265586] RBP: 00007f4254ebb000 R08: 00007f4254eba010 R09: 00007f4254eba010
      [  304.265587] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000022000
      [  304.265588] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000
      [  304.265593]  </TASK>
      [  304.265594]
                     Showing all locks held in the system:
      [  304.265598] 1 lock held by khungtaskd/57:
      [  304.265599]  #0: ffffffff8273b860 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x36/0x1c0
      [  304.265607] 3 locks held by kworker/6:1/90:
      [  304.265610] 1 lock held by in:imklog/547:
      [  304.265611]  #0: ffff88810498cd88 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0x76/0xc0
      [  304.265620] 1 lock held by dmesg/1310:
      
      v2: Drop local 'err' variable (Jonathan)
      
      Fixes: 8ed9aaae ("drm/xe: Force wedged state and block GT reset upon any GPU hang")
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Signed-off-by: default avatarMatthew Brost <matthew.brost@intel.com>
      Reviewed-by: default avatarJonathan Cavitt <jonathan.cavitt@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20240716063902.1390130-2-matthew.brost@intel.com
      (cherry picked from commit 452bca0e)
      Signed-off-by: default avatarRodrigo Vivi <rodrigo.vivi@intel.com>
      90936a0a
    • Matthew Brost's avatar
      drm/xe: Wedge the entire device · c9474b72
      Matthew Brost authored
      Wedge the entire device, not just GT which may have triggered the wedge.
      To implement this, cleanup the layering so xe_device_declare_wedged()
      calls into the lower layers (GT) to ensure entire device is wedged.
      
      While we are here, also signal any pending GT TLB invalidations upon
      wedging device.
      
      Lastly, short circuit reset wait if device is wedged.
      
      v2:
       - Short circuit reset wait if device is wedged (Local testing)
      
      Fixes: 8ed9aaae ("drm/xe: Force wedged state and block GT reset upon any GPU hang")
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Signed-off-by: default avatarMatthew Brost <matthew.brost@intel.com>
      Reviewed-by: default avatarJonathan Cavitt <jonathan.cavitt@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20240716063902.1390130-1-matthew.brost@intel.com
      (cherry picked from commit 7dbe8af1)
      Signed-off-by: default avatarRodrigo Vivi <rodrigo.vivi@intel.com>
      c9474b72