• Andrey Grodzovsky's avatar
    drm/amdgpu: Fix bugs in amdgpu_device_gpu_recover in XGMI case. · 12ffa55d
    Andrey Grodzovsky authored
    Issue 1:
    In  XGMI case amdgpu_device_lock_adev for other devices in hive
    was called to late, after access to their repsective schedulers.
    So relocate the lock to the begining of accessing the other devs.
    
    Issue 2:
    Using amdgpu_device_ip_need_full_reset to switch the device list from
    all devices in hive to the single 'master' device who owns this reset
    call is wrong because when stopping schedulers we iterate all the devices
    in hive but when restarting we will only reactivate the 'master' device.
    Also, in case amdgpu_device_pre_asic_reset conlcudes that full reset IS
    needed we then have to stop schedulers for all devices in hive and not
    only the 'master' but with amdgpu_device_ip_need_full_reset  we
    already missed the opprotunity do to so. So just remove this logic and
    always stop and start all schedulers for all devices in hive.
    
    Also minor cleanup and print fix.
    
    v4: Minor coding style fix.
    Signed-off-by: default avatarAndrey Grodzovsky <andrey.grodzovsky@amd.com>
    Acked-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
    Reviewed-by: default avatarHawking Zhang <Hawking.Zhang@amd.com>
    Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
    12ffa55d
amdgpu_device.c 109 KB