• Chris Wilson's avatar
    drm/i915: Update reset path to fix incomplete requests · 821ed7df
    Chris Wilson authored
    Update reset path in preparation for engine reset which requires
    identification of incomplete requests and associated context and fixing
    their state so that engine can resume correctly after reset.
    
    The request that caused the hang will be skipped and head is reset to the
    start of breadcrumb. This allows us to resume from where we left-off.
    Since this request didn't complete normally we also need to cleanup elsp
    queue manually. This is vital if we employ nonblocking request
    submission where we may have a web of dependencies upon the hung request
    and so advancing the seqno manually is no longer trivial.
    
    ABI: gem_reset_stats / DRM_IOCTL_I915_GET_RESET_STATS
    
    We change the way we count pending batches. Only the active context
    involved in the reset is marked as either innocent or guilty, and not
    mark the entire world as pending. By inspection this only affects
    igt/gem_reset_stats (which assumes implementation details) and not
    piglit.
    
    ARB_robustness gives this guide on how we expect the user of this
    interface to behave:
    
     * Provide a mechanism for an OpenGL application to learn about
       graphics resets that affect the context.  When a graphics reset
       occurs, the OpenGL context becomes unusable and the application
       must create a new context to continue operation. Detecting a
       graphics reset happens through an inexpensive query.
    
    And with regards to the actual meaning of the reset values:
    
       Certain events can result in a reset of the GL context. Such a reset
       causes all context state to be lost. Recovery from such events
       requires recreation of all objects in the affected context. The
       current status of the graphics reset state is returned by
    
    	enum GetGraphicsResetStatusARB();
    
       The symbolic constant returned indicates if the GL context has been
       in a reset state at any point since the last call to
       GetGraphicsResetStatusARB. NO_ERROR indicates that the GL context
       has not been in a reset state since the last call.
       GUILTY_CONTEXT_RESET_ARB indicates that a reset has been detected
       that is attributable to the current GL context.
       INNOCENT_CONTEXT_RESET_ARB indicates a reset has been detected that
       is not attributable to the current GL context.
       UNKNOWN_CONTEXT_RESET_ARB indicates a detected graphics reset whose
       cause is unknown.
    
    The language here is explicit in that we must mark up the guilty batch,
    but is loose enough for us to relax the innocent (i.e. pending)
    accounting as only the active batches are involved with the reset.
    
    In the future, we are looking towards single engine resetting (with
    minimal locking), where it seems inappropriate to mark the entire world
    as innocent since the reset occurred on a different engine. Reducing the
    information available means we only have to encounter the pain once, and
    also reduces the information leaking from one context to another.
    
    v2: Legacy ringbuffer submission required a reset following hibernation,
    or else we restore stale values to the RING_HEAD and walked over
    stolen garbage.
    
    v3: GuC requires replaying the requests after a reset.
    
    v4: Restore engine IRQ after reset (so waiters will be woken!)
        Rearm hangcheck if resetting with a waiter.
    
    Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Cc: Mika Kuoppala <mika.kuoppala@intel.com>
    Cc: Arun Siluvery <arun.siluvery@linux.intel.com>
    Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: default avatarMika Kuoppala <mika.kuoppala@intel.com>
    Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-13-chris@chris-wilson.co.uk
    821ed7df
intel_engine_cs.c 9.59 KB