Commit cfb83b1d authored by Monk Liu's avatar Monk Liu Committed by Alex Deucher

drm/amdgpu:fix gpu recover missing skipping(v2)

if app close CTX right after IB submit, gpu recover
will fail to find out the entity behind this guilty
job thus lead to no job skipping for this guilty job.

to fix this corner case just move the increasement of
job->karma out of the entity iteration.

v2:
only do karma increasment if bad->s_priority != KERNEL
because we always consider KERNEL job be correct and always
want to recover an unfinished kernel job (sometimes kernel
job is interrupted by VF FLR or other GPU hang event)
Signed-off-by: default avatarMonk Liu <Monk.Liu@amd.com>
Reviewed-by: default avatarChristian König <christian.koenig@amd.com>
Reviewed-By: default avatarXiangliang Yu <Xiangliang.Yu@amd.com>
Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
parent 75bc6099
...@@ -463,7 +463,8 @@ void amd_sched_hw_job_reset(struct amd_gpu_scheduler *sched, struct amd_sched_jo ...@@ -463,7 +463,8 @@ void amd_sched_hw_job_reset(struct amd_gpu_scheduler *sched, struct amd_sched_jo
} }
spin_unlock(&sched->job_list_lock); spin_unlock(&sched->job_list_lock);
if (bad) { if (bad && bad->s_priority != AMD_SCHED_PRIORITY_KERNEL) {
atomic_inc(&bad->karma);
/* don't increase @bad's karma if it's from KERNEL RQ, /* don't increase @bad's karma if it's from KERNEL RQ,
* becuase sometimes GPU hang would cause kernel jobs (like VM updating jobs) * becuase sometimes GPU hang would cause kernel jobs (like VM updating jobs)
* corrupt but keep in mind that kernel jobs always considered good. * corrupt but keep in mind that kernel jobs always considered good.
...@@ -474,7 +475,7 @@ void amd_sched_hw_job_reset(struct amd_gpu_scheduler *sched, struct amd_sched_jo ...@@ -474,7 +475,7 @@ void amd_sched_hw_job_reset(struct amd_gpu_scheduler *sched, struct amd_sched_jo
spin_lock(&rq->lock); spin_lock(&rq->lock);
list_for_each_entry_safe(entity, tmp, &rq->entities, list) { list_for_each_entry_safe(entity, tmp, &rq->entities, list) {
if (bad->s_fence->scheduled.context == entity->fence_context) { if (bad->s_fence->scheduled.context == entity->fence_context) {
if (atomic_inc_return(&bad->karma) > bad->sched->hang_limit) if (atomic_read(&bad->karma) > bad->sched->hang_limit)
if (entity->guilty) if (entity->guilty)
atomic_set(entity->guilty, 1); atomic_set(entity->guilty, 1);
break; break;
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment