An error occurred fetching the project authors.
  1. 07 Jul, 2014 2 commits
    • Ben Widawsky's avatar
      drm/i915/bdw: implement semaphore signal · 3e78998a
      Ben Widawsky authored
      Semaphore signalling works similarly to previous GENs with the exception
      that the per ring mailboxes no longer exist. Instead you must define
      your own space, somewhere in the GTT.
      
      The comments in the code define the layout I've opted for, which should
      be fairly future proof. Ie. I tried to define offsets in abstract terms
      (NUM_RINGS, seqno size, etc).
      
      NOTE: If one wanted to move this to the HWSP they could. I've decided
      one 4k object would be easier to deal with, and provide potential wins
      with cache locality, but that's all speculative.
      
      v2: Update the macro to not need the other ring's ring->id (Chris)
      Update the comment to use the correct formula (Chris)
      
      v3: Move the macros the ringbuffer.h to prevent churn in next patch
      (Ville)
      
      v4: Fixed compilation rebase conflict
      commit 1ec9e26d
      Author: Daniel Vetter <daniel.vetter@ffwll.ch>
      Date:   Fri Feb 14 14:01:11 2014 +0100
      
          drm/i915: Consolidate binding parameters into flags
      
      v5: VCS2 rebase
      Replace hweight_long with hweight32
      
      v6 (Rodrigo): * Add missed VC2 gen8 ring signal init
         	      * fixing conflicst on rebase
          	      * minor fixes on address table
      	      * remove WARN_ON
      Reviewed-by: default avatarRodrigo Vivi <rodrigo.vivi@intel.com>
      Signed-off-by: default avatarBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: default avatarRodrigo Vivi <rodrigo.vivi@intel.com>
      [danvet: s/BUG_ON/WARN_ON/]
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      3e78998a
    • Rodrigo Vivi's avatar
      drm/i915: Updating comments. · ddd4dbc6
      Rodrigo Vivi authored
      ring index calculation table was out of date after other rings were added,
      although the formula is flexible and scale when adding new rings.
      
      So this patch just update the comments and add a brief explanation
      why to use sync_seqno[ring index].
      Signed-off-by: default avatarRodrigo Vivi <rodrigo.vivi@intel.com>
      Reviewed-by: default avatarBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      ddd4dbc6
  2. 11 Jun, 2014 1 commit
  3. 22 May, 2014 5 commits
    • Oscar Mateo's avatar
      drm/i915: s/i915_hw_context/intel_context · 273497e5
      Oscar Mateo authored
      Up until now, contexts had one (and only one) backing object that was
      used by the hardware to save/restore render ring contexts (via the
      MI_SET_CONTEXT command). Other rings did not have or need this, so
      our i915_hw_context struct had a 1:1 relationship with a a real HW
      context.
      
      With Logical Ring Contexts and Execlists, this is not possible anymore:
      all rings need a backing object, and it cannot be reused. To prepare
      for that, rename our contexts to the more generic term intel_context.
      
      No functional changes.
      Signed-off-by: default avatarOscar Mateo <oscar.mateo@intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      273497e5
    • Oscar Mateo's avatar
      drm/i915: Split the ringbuffers from the rings (3/3) · 93b0a4e0
      Oscar Mateo authored
      Manual cleanup after the previous Coccinelle script.
      
      Yes, I could write another Coccinelle script to do this but I
      don't want labor-replacing robots making an honest programmer's
      work obsolete (also, I'm lazy).
      Signed-off-by: default avatarOscar Mateo <oscar.mateo@intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      93b0a4e0
    • Oscar Mateo's avatar
      drm/i915: Split the ringbuffers from the rings (2/3) · ee1b1e5e
      Oscar Mateo authored
      This refactoring has been performed using the following Coccinelle
      semantic script:
      
          @@
          struct intel_engine_cs r;
          @@
          (
          - (r).obj
          + r.buffer->obj
          |
          - (r).virtual_start
          + r.buffer->virtual_start
          |
          - (r).head
          + r.buffer->head
          |
          - (r).tail
          + r.buffer->tail
          |
          - (r).space
          + r.buffer->space
          |
          - (r).size
          + r.buffer->size
          |
          - (r).effective_size
          + r.buffer->effective_size
          |
          - (r).last_retired_head
          + r.buffer->last_retired_head
          )
      
          @@
          struct intel_engine_cs *r;
          @@
          (
          - (r)->obj
          + r->buffer->obj
          |
          - (r)->virtual_start
          + r->buffer->virtual_start
          |
          - (r)->head
          + r->buffer->head
          |
          - (r)->tail
          + r->buffer->tail
          |
          - (r)->space
          + r->buffer->space
          |
          - (r)->size
          + r->buffer->size
          |
          - (r)->effective_size
          + r->buffer->effective_size
          |
          - (r)->last_retired_head
          + r->buffer->last_retired_head
          )
      
          @@
          expression E;
          @@
          (
          - LP_RING(E)->obj
          + LP_RING(E)->buffer->obj
          |
          - LP_RING(E)->virtual_start
          + LP_RING(E)->buffer->virtual_start
          |
          - LP_RING(E)->head
          + LP_RING(E)->buffer->head
          |
          - LP_RING(E)->tail
          + LP_RING(E)->buffer->tail
          |
          - LP_RING(E)->space
          + LP_RING(E)->buffer->space
          |
          - LP_RING(E)->size
          + LP_RING(E)->buffer->size
          |
          - LP_RING(E)->effective_size
          + LP_RING(E)->buffer->effective_size
          |
          - LP_RING(E)->last_retired_head
          + LP_RING(E)->buffer->last_retired_head
          )
      
      Note: On top of this this patch also removes the now unused ringbuffer
      fields in intel_engine_cs.
      Signed-off-by: default avatarOscar Mateo <oscar.mateo@intel.com>
      [danvet: Add note about fixup patch included here.]
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      ee1b1e5e
    • Oscar Mateo's avatar
      drm/i915: Split the ringbuffers from the rings (1/3) · 8ee14975
      Oscar Mateo authored
      As advanced by the previous patch, the ringbuffers and the engine
      command streamers belong in different structs. This is so because,
      while they used to be tightly coupled together, the new Logical
      Ring Contexts (LRC for short) have a ringbuffer each.
      
      In legacy code, we will use the buffer* pointer inside each ring
      to get to the pertaining ringbuffer (the actual switch will be
      done in the next patch). In the new Execlists code, this pointer
      will be NULL and we will use instead the one inside the context
      instead.
      Signed-off-by: default avatarOscar Mateo <oscar.mateo@intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      8ee14975
    • Oscar Mateo's avatar
      drm/i915: s/intel_ring_buffer/intel_engine_cs · a4872ba6
      Oscar Mateo authored
      In the upcoming patches we plan to break the correlation between
      engine command streamers (a.k.a. rings) and ringbuffers, so it
      makes sense to refactor the code and make the change obvious.
      
      No functional changes.
      Signed-off-by: default avatarOscar Mateo <oscar.mateo@intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      a4872ba6
  4. 12 May, 2014 1 commit
    • Brad Volkin's avatar
      drm/i915: Use hash tables for the command parser · 44e895a8
      Brad Volkin authored
      For clients that submit large batch buffers the command parser has
      a substantial impact on performance. On my HSW ULT system performance
      drops as much as ~20% on some tests. Most of the time is spent in the
      command lookup code. Converting that from the current naive search to
      a hash table lookup reduces the performance drop to ~10%.
      
      The choice of value for I915_CMD_HASH_ORDER allows all commands
      currently used in the parser tables to hash to their own bucket (except
      for one collision on the render ring). The tradeoff is that it wastes
      memory. Because the opcodes for the commands in the tables are not
      particularly well distributed, reducing the order still leaves many
      buckets empty. The increased collisions don't seem to have a huge
      impact on the performance gain, but for now anyhow, the parser trades
      memory for performance.
      
      NB: Ville noticed that the error paths through the ring init code
      will leak memory. I've not addressed that here. We can do a follow
      up pass to handle all of the leaks.
      
      v2: improved comment describing selection of hash key mask (Damien)
      replace a BUG_ON() with an error return (Tvrtko, Ville)
      commit message improvements
      Signed-off-by: default avatarBrad Volkin <bradley.d.volkin@intel.com>
      Reviewed-by: default avatarDamien Lespiau <damien.lespiau@intel.com>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      44e895a8
  5. 05 May, 2014 7 commits
  6. 25 Apr, 2014 1 commit
  7. 03 Apr, 2014 2 commits
    • Chris Wilson's avatar
      drm/i915: Move all ring resets before setting the HWS page · 9991ae78
      Chris Wilson authored
      In commit a51435a3
      Author: Naresh Kumar Kachhi <naresh.kumar.kachhi@intel.com>
      Date:   Wed Mar 12 16:39:40 2014 +0530
      
          drm/i915: disable rings before HW status page setup
      
      we reordered stopping the rings to do so before we set the HWS register.
      However, there is an extra workaround for g45 to reset the rings twice,
      and for consistency we should apply that workaround before setting the
      HWS to be sure that the rings are truly stopped.
      
      Cc: Naresh Kumar Kachhi <naresh.kumar.kachhi@intel.com>
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarJesse Barnes <jbarnes@virtuousgeek.org>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      9991ae78
    • Ben Widawsky's avatar
      drm/i915: Invariably invalidate before ctx switch · 057f6a8a
      Ben Widawsky authored
      We have been setting the bit which was originally BIOS dependent since:
      commit f05bb0c7
      Author: Chris Wilson <chris@chris-wilson.co.uk>
      Date:   Sun Jan 20 16:33:32 2013 +0000
      
          drm/i915: GFX_MODE Flush TLB Invalidate Mode must be '1' for scanline waits
      
      Therefore, we do not need to try to figure it out dynamically and we can
      just always invalidate the TLBs.
      
      It's a partial revert of:
      commit 12b0286f
      Author: Ben Widawsky <ben@bwidawsk.net>
      Date:   Mon Jun 4 14:42:50 2012 -0700
      
          drm/i915: possibly invalidate TLB before context switch
      
      The original commit attempted to only invalidate when necessary
      (very much a relic from the old days). Now, we can just always invalidate.
      
      I guess the old TODO still exists. Since we seem to have abandoned ILK
      contexts however, there isn't much point in even remembering.
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarBen Widawsky <ben@bwidawsk.net>
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      057f6a8a
  8. 28 Mar, 2014 1 commit
    • Chris Wilson's avatar
      drm/i915: Broadwell expands ACTHD to 64bit · 50877445
      Chris Wilson authored
      As Broadwell has an increased virtual address size, it requires more
      than 32 bits to store offsets into its address space. This includes the
      debug registers to track the current HEAD of the individual rings, which
      may be anywhere within the per-process address spaces. In order to find
      the full location, we need to read the high bits from a second register.
      We then also need to expand our storage to keep track of the larger
      address.
      
      v2: Carefully read the two registers to catch wraparound between
          the reads.
      v3: Use a WARN_ON rather than loop indefinitely on an unstable
          register read.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Ben Widawsky <benjamin.widawsky@intel.com>
      Cc: Timo Aaltonen <tjaalton@ubuntu.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      Reviewed-by: default avatarBen Widawsky <ben@bwidawsk.net>
      [danvet: Drop spurious hunk which conflicted.]
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      50877445
  9. 12 Mar, 2014 1 commit
  10. 07 Mar, 2014 1 commit
    • Brad Volkin's avatar
      drm/i915: Implement command buffer parsing logic · 351e3db2
      Brad Volkin authored
      The command parser scans batch buffers submitted via execbuffer ioctls before
      the driver submits them to hardware. At a high level, it looks for several
      things:
      
      1) Commands which are explicitly defined as privileged or which should only be
         used by the kernel driver. The parser generally rejects such commands, with
         the provision that it may allow some from the drm master process.
      2) Commands which access registers. To support correct/enhanced userspace
         functionality, particularly certain OpenGL extensions, the parser provides a
         whitelist of registers which userspace may safely access (for both normal and
         drm master processes).
      3) Commands which access privileged memory (i.e. GGTT, HWS page, etc). The
         parser always rejects such commands.
      
      See the overview comment in the source for more details.
      
      This patch only implements the logic. Subsequent patches will build the tables
      that drive the parser.
      
      v2: Don't set the secure bit if the parser succeeds
      Fail harder during init
      Makefile cleanup
      Kerneldoc cleanup
      Clarify module param description
      Convert ints to bools in a few places
      Move client/subclient defs to i915_reg.h
      Remove the bits_count field
      
      OTC-Tracker: AXIA-4631
      Change-Id: I50b98c71c6655893291c78a2d1b8954577b37a30
      Signed-off-by: default avatarBrad Volkin <bradley.d.volkin@intel.com>
      Reviewed-by: default avatarJani Nikula <jani.nikula@intel.com>
      [danvet: Appease checkpatch.]
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      351e3db2
  11. 11 Feb, 2014 1 commit
  12. 04 Feb, 2014 1 commit
  13. 10 Sep, 2013 1 commit
    • Chris Wilson's avatar
      drm/i915: Write RING_TAIL once per-request · 09246732
      Chris Wilson authored
      Ignoring the legacy DRI1 code, and a couple of special cases (to be
      discussed later), all access to the ring is mediated through requests.
      The first write to a ring will grab a seqno and mark the ring as having
      an outstanding_lazy_request. Either through explicitly adding a request
      after an execbuffer or through an implicit wait (either by the CPU or by
      a semaphore), that sequence of writes will be terminated with a request.
      So we can ellide all the intervening writes to the tail register and
      send the entire command stream to the GPU at once. This will reduce the
      number of *serialising* writes to the tail register by a factor or 3-5
      times (depending upon architecture and number of workarounds, context
      switches, etc involved). This becomes even more noticeable when the
      register write is overloaded with a number of debugging tools. The
      astute reader will wonder if it is then possible to overflow the ring
      with a single command. It is not. When we start a command sequence to
      the ring, we check for available space and issue a wait in case we have
      not. The ring wait will in this case be forced to flush the outstanding
      register write and then poll the ACTHD for sufficient space to continue.
      
      The exception to the rule where everything is inside a request are a few
      initialisation cases where we may want to write GPU commands via the CS
      before userspace wakes up and page flips.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      09246732
  14. 06 Sep, 2013 1 commit
  15. 05 Sep, 2013 2 commits
  16. 03 Sep, 2013 1 commit
  17. 23 Aug, 2013 1 commit
  18. 22 Aug, 2013 1 commit
  19. 11 Jul, 2013 2 commits
    • Daniel Vetter's avatar
      drm/i915: unify ring irq refcounts (again) · c7113cc3
      Daniel Vetter authored
      With the simplified locking there's no reason any more to keep the
      refcounts seperate.
      
      v2: Readd the lost comment that ring->irq_refcount is protected by
      dev_priv->irq_lock.
      Reviewed-by: default avatarBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      c7113cc3
    • Daniel Vetter's avatar
      drm/i915: kill dev_priv->rps.lock · 59cdb63d
      Daniel Vetter authored
      Now that the rps interrupt locking isn't clearly separated (at elast
      conceptually) from all the other interrupt locking having a different
      lock stopped making sense: It protects much more than just the rps
      workqueue it started out with. But with the addition of VECS the
      separation started to blurr and resulted in some more complex locking
      for the ring interrupt refcount.
      
      With this we can (again) unifiy the ringbuffer irq refcounts without
      causing a massive confusion, but that's for the next patch.
      
      v2: Explain better why the rps.lock once made sense and why no longer,
      requested by Ben.
      Reviewed-by: default avatarBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      59cdb63d
  20. 13 Jun, 2013 1 commit
  21. 11 Jun, 2013 1 commit
    • Chris Wilson's avatar
      drm/i915: Don't count semaphore waits towards a stuck ring · 6274f212
      Chris Wilson authored
      If we detect a ring is in a valid wait for another, just let it be.
      Eventually it will either begin to progress again, or the entire system
      will come grinding to a halt and then hangcheck will fire as soon as the
      deadlock is detected.
      
      This error was foretold by Ben in
      commit 05407ff8
      Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Date:   Thu May 30 09:04:29 2013 +0300
      
          drm/i915: detect hang using per ring hangcheck_score
      
      "If ring B is waiting on ring A via semaphore, and ring A is making
      progress, albeit slowly - the hangcheck will fire. The check will
      determine that A is moving, however ring B will appear hung because
      the ACTHD doesn't move. I honestly can't say if that's actually a
      realistic problem to hit it probably implies the timeout value is too
      low."
      
      v2: Make sure we don't even incur the KICK cost whilst waiting.
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=65394Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Cc: Ben Widawsky <ben@bwidawsk.net>
      Reviewed-by: default avatarMika Kuoppala <mika.kuoppala@intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      6274f212
  22. 07 Jun, 2013 1 commit
  23. 03 Jun, 2013 1 commit
    • Mika Kuoppala's avatar
      drm/i915: detect hang using per ring hangcheck_score · 05407ff8
      Mika Kuoppala authored
      Keep track of ring seqno progress and if there are no
      progress detected, declare hang. Use actual head (acthd)
      to distinguish between ring stuck and batchbuffer looping
      situation. Stuck ring will be kicked to trigger progress.
      
      This commit adds a hard limit for batchbuffer completion time.
      If batchbuffer completion time is more than 4.5 seconds,
      the gpu will be declared hung.
      
      Review comment from Ben which nicely clarifies the semantic change:
      
      "Maybe I'm just stating the functional changes of the patch, but in case
      they were unintended here is what I see as potential issues:
      
      1. "If ring B is waiting on ring A via semaphore, and ring A is making
         progress, albeit slowly - the hangcheck will fire. The check will
         determine that A is moving, however ring B will appear hung because
         the ACTHD doesn't move. I honestly can't say if that's actually a
         realistic problem to hit it probably implies the timeout value is too
         low.
      
      2. "There's also another corner case on the kick. If the seqno = 2
         (though not stuck), and on the 3rd hangcheck, the ring is stuck, and
         we try to kick it... we don't actually try to find out if the kick
         helped"
      
      v2: use atchd to detect stuck ring from loop (Ben Widawsky)
      
      v3: Use acthd to check when ring needs kicking.
      Declare hang on third time in order to give time for
      kick_ring to take effect.
      
      v4: Update commit msg
      Signed-off-by: default avatarMika Kuoppala <mika.kuoppala@intel.com>
      Reviewed-by: default avatarBen Widawsky <ben@bwidawsk.net>
      [danvet: Paste in Ben's review comment.]
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      05407ff8
  24. 31 May, 2013 3 commits