drm/sched: fix the bug of time out calculation(v4)
issue: in cleanup_job the cancle_delayed_work will cancel a TO timer even the its corresponding job is still running. fix: do not cancel the timer in cleanup_job, instead do the cancelling only when the heading job is signaled, and if there is a "next" job we start_timeout again. v2: further cleanup the logic, and do the TDR timer cancelling if the signaled job is the last one in its scheduler. v3: change the issue description remove the cancel_delayed_work in the begining of the cleanup_job recover the implement of drm_sched_job_begin. v4: remove the kthread_should_park() checking in cleanup_job routine, we should cleanup the signaled job asap TODO: 1)introduce pause/resume scheduler in job_timeout to serial the handling of scheduler and job_timeout. 2)drop the bad job's del and insert in scheduler due to above serialization (no race issue anymore with the serialization) Tested-by: jingwen <jingwen.chen@@amd.com> Signed-off-by: Monk Liu <Monk.Liu@amd.com> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/1630457207-13107-1-git-send-email-Monk.Liu@amd.com
Showing
Please register or sign in to comment