• John Harrison's avatar
    drm/i915: Improve long running compute w/a for GuC submission · d7a8680e
    John Harrison authored
    A workaround was added to the driver to allow compute workloads to run
    'forever' by disabling pre-emption on the RCS engine for Gen12.
    It is not totally unbound as the heartbeat will kick in eventually
    and cause a reset of the hung engine.
    
    However, this does not work well in GuC submission mode. In GuC mode,
    the pre-emption timeout is how GuC detects hung contexts and triggers
    a per engine reset. Thus, disabling the timeout means also losing all
    per engine reset ability. A full GT reset will still occur when the
    heartbeat finally expires, but that is a much more destructive and
    undesirable mechanism.
    
    The purpose of the workaround is actually to give compute tasks longer
    to reach a pre-emption point after a pre-emption request has been
    issued. This is necessary because Gen12 does not support mid-thread
    pre-emption and compute tasks can have long running threads.
    
    So, rather than disabling the timeout completely, just set it to a
    'long' value.
    
    v2: Review feedback from Tvrtko - must hard code the 'long' value
    instead of determining it algorithmically. So make it an extra CONFIG
    definition. Also, remove the execlist centric comment from the
    existing pre-emption timeout CONFIG option given that it applies to
    more than just execlists.
    Signed-off-by: default avatarJohn Harrison <John.C.Harrison@Intel.com>
    Reviewed-by: default avatarDaniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
    Acked-by: default avatarMichal Mrozek <michal.mrozek@intel.com>
    Acked-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20221006213813.1563435-5-John.C.Harrison@Intel.com
    d7a8680e
intel_engine_cs.c 63.9 KB