• Mika Kuoppala's avatar
    drm/i915: detect hang using per ring hangcheck_score · 05407ff8
    Mika Kuoppala authored
    Keep track of ring seqno progress and if there are no
    progress detected, declare hang. Use actual head (acthd)
    to distinguish between ring stuck and batchbuffer looping
    situation. Stuck ring will be kicked to trigger progress.
    
    This commit adds a hard limit for batchbuffer completion time.
    If batchbuffer completion time is more than 4.5 seconds,
    the gpu will be declared hung.
    
    Review comment from Ben which nicely clarifies the semantic change:
    
    "Maybe I'm just stating the functional changes of the patch, but in case
    they were unintended here is what I see as potential issues:
    
    1. "If ring B is waiting on ring A via semaphore, and ring A is making
       progress, albeit slowly - the hangcheck will fire. The check will
       determine that A is moving, however ring B will appear hung because
       the ACTHD doesn't move. I honestly can't say if that's actually a
       realistic problem to hit it probably implies the timeout value is too
       low.
    
    2. "There's also another corner case on the kick. If the seqno = 2
       (though not stuck), and on the 3rd hangcheck, the ring is stuck, and
       we try to kick it... we don't actually try to find out if the kick
       helped"
    
    v2: use atchd to detect stuck ring from loop (Ben Widawsky)
    
    v3: Use acthd to check when ring needs kicking.
    Declare hang on third time in order to give time for
    kick_ring to take effect.
    
    v4: Update commit msg
    Signed-off-by: default avatarMika Kuoppala <mika.kuoppala@intel.com>
    Reviewed-by: default avatarBen Widawsky <ben@bwidawsk.net>
    [danvet: Paste in Ben's review comment.]
    Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
    05407ff8
i915_irq.c 102 KB