drm/amdgpu: race issue when jobs on 2 ring timeout
Fix a racing issue when jobs on 2 rings timeout simultaneously. If 2 rings timed out at the same time, the amdgpu_device_gpu_recover will be reentered. Then the adev->gmc.xgmi.head will be grabbed by 2 local linked list, which may cause wild pointer issue in iterating. lock the device earily to prevent the node be added to 2 different lists. also increase karma for the skipped job since the job is also timed out and should be guilty. Signed-off-by: Horace Chen <horace.chen@amd.com> Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Showing
Please register or sign in to comment