• Andrey Grodzovsky's avatar
    drm/amdgpu: Fix crash on device remove/driver unload · d82e2c24
    Andrey Grodzovsky authored
    Crash:
    BUG: unable to handle page fault for address: 00000000000010e1
    RIP: 0010:vega10_power_gate_vce+0x26/0x50 [amdgpu]
    Call Trace:
    pp_set_powergating_by_smu+0x16a/0x2b0 [amdgpu]
    amdgpu_dpm_set_powergating_by_smu+0x92/0xf0 [amdgpu]
    amdgpu_dpm_enable_vce+0x2e/0xc0 [amdgpu]
    vce_v4_0_hw_fini+0x95/0xa0 [amdgpu]
    amdgpu_device_fini_hw+0x232/0x30d [amdgpu]
    amdgpu_driver_unload_kms+0x5c/0x80 [amdgpu]
    amdgpu_pci_remove+0x27/0x40 [amdgpu]
    pci_device_remove+0x3e/0xb0
    device_release_driver_internal+0x103/0x1d0
    device_release_driver+0x12/0x20
    pci_stop_bus_device+0x79/0xa0
    pci_stop_and_remove_bus_device_locked+0x1b/0x30
    remove_store+0x7b/0x90
    dev_attr_store+0x17/0x30
    sysfs_kf_write+0x4b/0x60
    kernfs_fop_write_iter+0x151/0x1e0
    
    Why:
    VCE/UVD had dependency on SMC block for their suspend but
    SMC block is the first to do HW fini due to some constraints
    
    How:
    Since the original patch was dealing with suspend issues
    move the SMC block dependency back into suspend hooks as
    was done in V1 of the original patches.
    Keep flushing idle work both in suspend and HW fini seuqnces
    since it's essential in both cases.
    
    Fixes: 859e4659 ("drm/amdgpu: add missing cleanups for more ASICs on UVD/VCE suspend")
    Fixes: bf756fb8 ("drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend")
    Signed-off-by: default avatarAndrey Grodzovsky <andrey.grodzovsky@amd.com>
    Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
    d82e2c24
vce_v4_0.c 35.6 KB