• YiPeng Chai's avatar
    drm/amdgpu: timely save bad pages to eeprom after gpu ras reset is completed · e23300df
    YiPeng Chai authored
    The problem case is as follows:
    1. GPU A triggers a gpu ras reset, and GPU A drives
       GPU B to also perform a gpu ras reset.
    2. After gpu B ras reset started, gpu B queried a DE
       data. Since the DE data was queried in the ras reset
       thread instead of the page retirement thread, bad
       page retirement work would not be triggered. Then
       even if all gpu resets are completed, the bad pages
       will be cached in RAM until GPU B's bad page retirement
       work is triggered again and then saved to eeprom.
    
    This patch can save the bad pages to eeprom in time after gpu
    ras reset is completed.
    
    v2:
      1. Add the above description to code comments.
      2. Reuse existing function.
    Signed-off-by: default avatarYiPeng Chai <YiPeng.Chai@amd.com>
    Reviewed-by: default avatarTao Zhou <tao.zhou1@amd.com>
    Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
    e23300df
amdgpu_ras.c 128 KB