• Chris Wilson's avatar
    drm/i915/execlists: Double check breadcrumb before crying foul · e2ccf0d0
    Chris Wilson authored
      process_csb: 0000:00:02.0 bcs0: cs-irq head=4, tail=5
      process_csb: 0000:00:02.0 bcs0: csb[5]: status=0x00008002:0x60000020
      trace_ports: 0000:00:02.0 bcs0: preempted { ff84:45154! prio 2 }
      trace_ports: 0000:00:02.0 bcs0: promote { ff84:45155* prio 2 }
      trace_ports: 0000:00:02.0 bcs0: submit { ff84:45156 prio 2 }
    
      process_csb: 0000:00:02.0 bcs0: cs-irq head=5, tail=6
      process_csb: 0000:00:02.0 bcs0: csb[6]: status=0x00000018:0x60000020
      trace_ports: 0000:00:02.0 bcs0: completed { ff84:45155* prio 2 }
      process_csb: 0000:00:02.0 bcs0: ring:{start:0x00178000, head:0928, tail:0928, ctl:00000000, mode:00000200}
      process_csb: 0000:00:02.0 bcs0: rq:{start:00178000, head:08b0, tail:08f0, seqno:ff84:45155, hwsp:45156},
      process_csb: 0000:00:02.0 bcs0: ctx:{start:00178000, head:e000928, tail:0928},
      process_csb: GEM_BUG_ON("context completed before request")
    
    In this sequence, we can see that although we have submitted the next
    request [ff84:45156] to HW (via ELSP[]) it has not yet reported the
    lite-restore. Instead, we see the completion event of the currently
    active request [ff84:45155] but at the time of processing that event,
    the breadcrumb has not yet been written. Though by the time we do print
    out the debug info, the seqno write of ff84:45156 has landed!
    
    Therefore there is a serialisation problem between the seqno writes and
    CS events, not just between the CS buffer and its head/tail pointers as
    previously observed on Icelake.
    
    This is not a huge problem, as we don't strictly rely on the breadcrumb
    to determine HW activity, but it may indicate that interrupt delivery is
    before the seqno write, aka bringing back the plague of missed
    interrupts from yesteryear. However, there is no indication of this
    wider problem, so let's just flush the seqno read before reporting an
    error. If it persists after the fresh read we can worry again.
    Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
    Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
    Reviewed-by: default avatarMika Kuoppala <mika.kuoppala@linux.intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20200330234318.30638-1-chris@chris-wilson.co.uk
    e2ccf0d0
intel_lrc.c 150 KB