• Monk Liu's avatar
    drm/amdgpu:implement new GPU recover(v3) · 5740682e
    Monk Liu authored
    1,new imple names amdgpu_gpu_recover which gives more hint
    on what it does compared with gpu_reset
    
    2,gpu_recover unify bare-metal and SR-IOV, only the asic reset
    part is implemented differently
    
    3,gpu_recover will increase hang job karma and mark its entity/context
    as guilty if exceeds limit
    
    V2:
    
    4,in scheduler main routine the job from guilty context  will be immedialy
    fake signaled after it poped from queue and its fence be set with
    "-ECANCELED" error
    
    5,in scheduler recovery routine all jobs from the guilty entity would be
    dropped
    
    6,in run_job() routine the real IB submission would be skipped if @skip parameter
    equales true or there was VRAM lost occured.
    
    V3:
    
    7,replace deprecated gpu reset, use new gpu recover
    Signed-off-by: default avatarMonk Liu <Monk.Liu@amd.com>
    Reviewed-by: default avatarChristian König <christian.koenig@amd.com>
    Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
    5740682e
mxgpu_vi.c 20.6 KB