• YiPeng Chai's avatar
    drm/amdgpu: Adjust removal control flow for smu v13_0_2 · f5c7e779
    YiPeng Chai authored
    Adjust removal control flow for smu v13_0_2:
       During amdgpu uninstallation, when removing the first
    device, the kernel needs to first send a mode1reset message
    to all gpu devices. Otherwise, smu initialization will fail
    the next time amdgpu is installed.
    
    V2:
    1. Update commit comments.
    2. Remove the global variable amdgpu_device_remove_cnt
       and add a variable to the structure amdgpu_hive_info.
    3. Use hive to detect the first removed device instead of
       a global variable.
    
    V3:
     1. Update commit comments.
     2. Split a patch into multiple patches.
     3. The current patch does:
        a. Add a work mode of AMDGPU_RESET_FOR_DEVICE_REMOVE into
           the existing gpu recover path, which make all devices
           in hive list only have HW reset but no resume (except
           the base IP).
        b. Call AMDGPU_RESET_FOR_DEVICE_REMOVE and
           AMDGPU_NEED_FULL_RESET mode of amdgpu_device_gpu_recover
           in amdgpu_pci_remove when removing the first device in
           hive list.
        c. When removing the first device, the IP blocks keyword
           function call sequence is as follows:
    .suspend->mode1reset->.resume(basic ip)->.hw_fini->.early_fini->.sw_fini.
       ^                           |
       |-<----------<---------<----|
    	The first three sequences are because of a call to
            amdgpu_device_gpu_recover. The three sequences will be
            executed in a loop until all devices in the hive list
            are iterated.
            The sequences starting from .hw_fini only apply to the
            first device. Since .suspend has been called before,
            except the resumed phase1 basic ip blocks, all other ip
            blocks .hw_fini of current device will do nothing.
         d. When removing other devices, the calling sequences is the
            same as legacy:
    	   .hw_fini -> .early_fini -> .sw_fini.
    	Since .suspend has been called when removing the first device,
            except the resumed phase1 basic ip blocks, all of other ip
            blocks .hw_fini of current device will do nothing.
    Signed-off-by: default avatarYiPeng Chai <YiPeng.Chai@amd.com>
    Reviewed-by: default avatarHawking Zhang <Hawking.Zhang@amd.com>
    Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
    f5c7e779
amdgpu_xgmi.h 2.73 KB