• Yifan Zhang's avatar
    drm/amdgpu: remove amdgpu_mes_self_test in gpu recover · a17f574a
    Yifan Zhang authored
    gpu tlb flush is skipped if reset sem is held, it makes
    mes_self_test fail since it involves add_hw_queue/remove_hw_queue
    which needs tlb flush functional. Remove mes_self_test in gpu
    recover sequence.
    
    This patch is to fix the recover failure in gfx11.
    
    [ 1831.768292] [drm] ring sdma_32769.3.3 was added
    [ 1831.768313] [drm] ring gfx_32769.1.1 ib test pass
    [ 1831.768337] [drm] ring compute_32769.2.2 ib test pass
    [ 1831.768399] amdgpu 0000:c2:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:8 pasid:32769, for process  pid 0 thread  pid 0)
    [ 1831.768434] amdgpu 0000:c2:00.0: amdgpu:   in page starting at address 0x0000aec200000000 from client 10
    [ 1831.768456] amdgpu 0000:c2:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00800A30
    [ 1831.768473] amdgpu 0000:c2:00.0: amdgpu:      Faulty UTCL2 client ID: CPC (0x5)
    [ 1831.768489] amdgpu 0000:c2:00.0: amdgpu:      MORE_FAULTS: 0x0
    [ 1831.768501] amdgpu 0000:c2:00.0: amdgpu:      WALKER_ERROR: 0x0
    [ 1831.768513] amdgpu 0000:c2:00.0: amdgpu:      PERMISSION_FAULTS: 0x3
    [ 1831.768521] amdgpu 0000:c2:00.0: amdgpu:      MAPPING_ERROR: 0x0
    [ 1831.768529] amdgpu 0000:c2:00.0: amdgpu:      RW: 0x0
    [ 1831.931229] amdgpu 0000:c2:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring sdma_32769.3.3 test failed (-110)
    [ 1832.062917] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
    [ 1832.063107] [drm:amdgpu_mes_remove_hw_queue [amdgpu]] *ERROR* failed to remove hardware queue, queue id = 3
    
    Fixes: e2e37888 ("drm/amdgpu: rework lock handling for flush_tlb v2")
    Reported-by: default avatarLi Ma <li.ma@amd.com>
    Reviewed-by: default avatarChristian König <christian.koenig@amd.com>
    Signed-off-by: default avatarYifan Zhang <yifan1.zhang@amd.com>
    Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
    a17f574a
amdgpu_device.c 168 KB