• Mario Limonciello's avatar
    drm/amd: Fail the suspend if resources can't be evicted · 8d4de331
    Mario Limonciello authored
    If a system does not have swap and memory is under 100% usage,
    amdgpu will fail to evict resources.  Currently the suspend
    carries on proceeding to reset the GPU:
    
    ```
    [drm] evicting device resources failed
    [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <vcn_v3_0> failed -12
    [drm] free PSP TMR buffer
    [TTM] Failed allocating page table
    [drm] evicting device resources failed
    amdgpu 0000:03:00.0: amdgpu: MODE1 reset
    amdgpu 0000:03:00.0: amdgpu: GPU mode1 reset
    amdgpu 0000:03:00.0: amdgpu: GPU smu mode1 reset
    ```
    
    At this point if the suspend actually succeeded I think that amdgpu
    would have recovered because the GPU would have power cut off and
    restored.  However the kernel fails to continue the suspend from the
    memory pressure and amdgpu fails to run the "resume" from the aborted
    suspend.
    
    ```
    ACPI: PM: Preparing to enter system sleep state S3
    SLUB: Unable to allocate memory on node -1, gfp=0xdc0(GFP_KERNEL|__GFP_ZERO)
      cache: Acpi-State, object size: 80, buffer size: 80, default order: 0, min order: 0
      node 0: slabs: 22, objs: 1122, free: 0
    ACPI Error: AE_NO_MEMORY, Could not update object reference count (20210730/utdelete-651)
    
    [drm:psp_hw_start [amdgpu]] *ERROR* PSP load kdb failed!
    [drm:psp_resume [amdgpu]] *ERROR* PSP resume failed
    [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* resume of IP block <psp> failed -62
    amdgpu 0000:03:00.0: amdgpu: amdgpu_device_ip_resume failed (-62).
    PM: dpm_run_callback(): pci_pm_resume+0x0/0x100 returns -62
    amdgpu 0000:03:00.0: PM: failed to resume async: error -62
    ```
    
    To avoid this series of unfortunate events, fail amdgpu's suspend
    when the memory eviction fails.  This will let the system gracefully
    recover and the user can try suspend again when the memory pressure
    is relieved.
    
    Reported-by: post@davidak.de
    Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2223Signed-off-by: default avatarMario Limonciello <mario.limonciello@amd.com>
    Reviewed-by: default avatarAlex Deucher <alexander.deucher@amd.com>
    Acked-by: default avatarChristian König <christian.koenig@amd.com>
    Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
    8d4de331
amdgpu_device.c 163 KB