• Matthew Dawson's avatar
    drm/radeon: Avoid double gpu reset by adding a timeout on IB ring tests. · 04db4caf
    Matthew Dawson authored
    When the radeon driver resets a gpu, it attempts to test whether all the
    rings can successfully handle an IB.  If these rings fail to respond, the
    process will wait forever.  Another gpu reset can't happen at this point,
    as the current reset holds a lock required to do so.  Instead, make all
    the IB tests run with a timeout, so the system can attempt to recover
    in this case.
    
    While this doesn't fix the underlying issue with card resets failing, it
    gives the system a higher chance of recovering.  These timeouts have been
    confirmed to help both a Tathi and Hawaii card recover after a gpu reset.
    
    This also adds a new function, radeon_fence_wait_timeout, that behaves like
    fence_wait_timeout.  It is used instead of fence_wait_timeout as it continues
    to work during a reset.  radeon_fence_wait is changed to be implemented
    using this function.
    
    V2:
     - Changed the timeout to 1s, as the default 10s from radeon_wait_timeout was
    too long.  A timeout of 100ms was tested and found to be too short.
     - Changed radeon_fence_wait_timeout to behave more like fence_wait_timeout.
    Reviewed-by: default avatarChristian König <christian.koenig@amd.com>
    Signed-off-by: default avatarMatthew Dawson <matthew@mjdsystems.ca>
    Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
    04db4caf
r100.c 117 KB