• Vitaly Prosyak's avatar
    drm/sched: Check scheduler work queue before calling timeout handling · 2da5bffe
    Vitaly Prosyak authored
    During an IGT GPU reset test we see again oops despite of
    commit 0c8c901aaaebc9 (drm/sched: Check scheduler ready before calling
    timeout handling).
    
    It uses ready condition whether to call drm_sched_fault which unwind
    the TDR leads to GPU reset.
    However it looks the ready condition is overloaded with other meanings,
    for example, for the following stack is related GPU reset :
    
    0  gfx_v9_0_cp_gfx_start
    1  gfx_v9_0_cp_gfx_resume
    2  gfx_v9_0_cp_resume
    3  gfx_v9_0_hw_init
    4  gfx_v9_0_resume
    5  amdgpu_device_ip_resume_phase2
    
    does the following:
    	/* start the ring */
    	gfx_v9_0_cp_gfx_start(adev);
    	ring->sched.ready = true;
    
    The same approach is for other ASICs as well :
    gfx_v8_0_cp_gfx_resume
    gfx_v10_0_kiq_resume, etc...
    
    As a result, our GPU reset test causes GPU fault which calls unconditionally gfx_v9_0_fault
    and then drm_sched_fault. However now it depends on whether the interrupt service routine
    drm_sched_fault is executed after gfx_v9_0_cp_gfx_start is completed which sets the ready
    field of the scheduler to true even  for uninitialized schedulers and causes oops vs
    no fault or when ISR  drm_sched_fault is completed prior  gfx_v9_0_cp_gfx_start and
    NULL pointer dereference does not occur.
    
    Use the field timeout_wq  to prevent oops for uninitialized schedulers.
    The field could be initialized by the work queue of resetting the domain.
    
    v1: Corrections to commit message (Luben)
    
    Fixes: 11b3b9f4 ("drm/sched: Check scheduler ready before calling timeout handling")
    Signed-off-by: default avatarVitaly Prosyak <vitaly.prosyak@amd.com>
    Link: https://lore.kernel.org/r/20230510135111.58631-1-vitaly.prosyak@amd.comReviewed-by: default avatarLuben Tuikov <luben.tuikov@amd.com>
    Signed-off-by: default avatarLuben Tuikov <luben.tuikov@amd.com>
    2da5bffe
sched_main.c 32.2 KB