1. 13 Aug, 2014 17 commits
  2. 12 Aug, 2014 2 commits
    • Daniel Vetter's avatar
      drm/i915: Some cleanups for the ppgtt lifetime handling · ee960be7
      Daniel Vetter authored
      So when reviewing Michel's patch I've noticed a few things and cleaned
      them up:
      - The early checks in ppgtt_release are now redundant: The inactive
        list should always be empty now, so we can ditch these checks. Even
        for the aliasing ppgtt (though that's a different confusion) since
        we tear that down after all the objects are gone.
      - The ppgtt handling functions are splattered all over. Consolidate
        them in i915_gem_gtt.c, give them OCD prefixes and add wrappers for
        get/put.
      - There was a bit a confusion in ppgtt_release about whether it cares
        about the active or inactive list. It should care about them both,
        so augment the WARNINGs to check for both.
      
      There's still create_vm_for_ctx left to do, put that is blocked on the
      removal of ppgtt->ctx. Once that's done we can rename it to
      i915_ppgtt_create and move it to its siblings for handling ppgtts.
      
      v2: Move the ppgtt checks into the inline get/put functions as
      suggested by Chris.
      
      v3: Inline the now redundant ppgtt local variable.
      
      Cc: Michel Thierry <michel.thierry@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarMichel Thierry <michel.thierry@intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      ee960be7
    • Michel Thierry's avatar
      drm/i915: vma/ppgtt lifetime rules · b9d06dd9
      Michel Thierry authored
      VMAs should take a reference of the address space they use.
      
      Now, when the fd is closed, it will release the ref that the context was
      holding, but it will still be referenced by any vmas that are still
      active.
      
      ppgtt_release() should then only be called when the last thing referencing
      it releases the ref, and it can just call the base cleanup and free the
      ppgtt.
      
      Note that with this we will extend the lifetime of ppgtts which
      contain shared objects. But all the non-shared objects will get
      removed as soon as they drop of the active list and for the shared
      ones the shrinker can eventually reap them. Since we currently can't
      evict ppgtt pagetables either I don't think that temporary leak is
      important.
      Signed-off-by: default avatarMichel Thierry <michel.thierry@intel.com>
      [danvet: Add note about potential ppgtt leak with this approach.]
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      b9d06dd9
  3. 11 Aug, 2014 21 commits
    • Oscar Mateo's avatar
      drm/i915/bdw: Always use MMIO flips with Execlists · 14bf993e
      Oscar Mateo authored
      The normal flip function places things in the ring in the legacy
      way, so we either fix that or force MMIO flips always as we do in
      this patch.
      Signed-off-by: default avatarOscar Mateo <oscar.mateo@intel.com>
      Reviewed-by: default avatarDamien Lespiau <damien.lespiau@intel.com>
      [danvet: Checkpatch. Fucking again.]
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      14bf993e
    • Oscar Mateo's avatar
      drm/i915/bdw: Workload submission mechanism for Execlists · ba8b7ccb
      Oscar Mateo authored
      This is what i915_gem_do_execbuffer calls when it wants to execute some
      worload in an Execlists world.
      
      v2: Check arguments before doing stuff in intel_execlists_submission. Also,
      get rel_constants parsing right.
      Signed-off-by: default avatarOscar Mateo <oscar.mateo@intel.com>
      Reviewed-by: default avatarDamien Lespiau <damien.lespiau@intel.com>
      [danvet: Drop the chipset flush, that's pre-gen6. And appease
      checkpatch a bit .... again!]
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      ba8b7ccb
    • Oscar Mateo's avatar
      drm/i915/bdw: GEN-specific logical ring emit batchbuffer start · 15648585
      Oscar Mateo authored
      Dispatch_execbuffer's evil twin.
      Signed-off-by: default avatarOscar Mateo <oscar.mateo@intel.com>
      Reviewed-by: default avatarDamien Lespiau <damien.lespiau@intel.com>
      [danvet: Ditch the check for aliasing ppgtt. It'll break soon and
      execlists requires full ppgtt anyway.]
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      15648585
    • Oscar Mateo's avatar
      drm/i915/bdw: Interrupts with logical rings · 73d477f6
      Oscar Mateo authored
      We need to attend context switch interrupts from all rings. Also, fixed writing
      IMR/IER and added HWSTAM at ring init time.
      
      Notice that, if added to irq_enable_mask, the context switch interrupts would
      be incorrectly masked out when the user interrupts are due to no users waiting
      on a sequence number. Therefore, this commit adds a bitmask of interrupts to
      be kept unmasked at all times.
      
      v2: Disable HWSTAM, as suggested by Damien (nobody listens to these interrupts,
      anyway).
      
      v3: Add new get/put_irq functions.
      
      Signed-off-by: Thomas Daniel <thomas.daniel@intel.com> (v1)
      Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v2 & v3)
      Reviewed-by: default avatarDamien Lespiau <damien.lespiau@intel.com>
      [danvet: Drop the GEN8_ prefix from the context switch interrupt
      define and move it to its brethren.]
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      73d477f6
    • Oscar Mateo's avatar
      drm/i915/bdw: Ring idle and stop with logical rings · 9832b9da
      Oscar Mateo authored
      This is a hard one, since there is no direct hardware ring to
      control when in Execlists.
      
      We reuse intel_ring_idle here, but it should be fine as long
      as i915_add_request does the ring thing.
      Signed-off-by: default avatarOscar Mateo <oscar.mateo@intel.com>
      Reviewed-by: default avatarDamien Lespiau <damien.lespiau@intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      9832b9da
    • Oscar Mateo's avatar
      drm/i915/bdw: GEN-specific logical ring emit flush · 4712274c
      Oscar Mateo authored
      Same as the legacy-style ring->flush.
      
      v2: The BSD invalidate bit still exists in GEN8! Add it for the VCS
      rings (but still consolidate the blt and bsd ring flushes into one).
      This was noticed by Brad Volkin.
      
      v3: The command for BSD and for other rings is slightly different:
      get it exactly the same as in gen6_ring_flush + gen6_bsd_ring_flush
      Signed-off-by: default avatarOscar Mateo <oscar.mateo@intel.com>
      Reviewed-by: default avatarDamien Lespiau <damien.lespiau@intel.com>
      [danvet: Checkpatch.]
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      4712274c
    • Oscar Mateo's avatar
      drm/i915/bdw: GEN-specific logical ring emit request · 4da46e1e
      Oscar Mateo authored
      Very similar to the legacy add_request, only modified to account for
      logical ringbuffer.
      
      v2: Use MI_GLOBAL_GTT, as suggested by Brad Volkin.
      
      v3: Unify render and non-render in the same function, as noticed by
      Brad Volkin.
      Signed-off-by: default avatarOscar Mateo <oscar.mateo@intel.com>
      Reviewed-by: default avatarDamien Lespiau <damien.lespiau@intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      4da46e1e
    • Oscar Mateo's avatar
      drm/i915/bdw: New logical ring submission mechanism · 82e104cc
      Oscar Mateo authored
      Well, new-ish: if all this code looks familiar, that's because it's
      a clone of the existing submission mechanism (with some modifications
      here and there to adapt it to LRCs and Execlists).
      
      And why did we do this instead of reusing code, one might wonder?
      Well, there are some fears that the differences are big enough that
      they will end up breaking all platforms.
      
      Also, Execlists offer several advantages, like control over when the
      GPU is done with a given workload, that can help simplify the
      submission mechanism, no doubt. I am interested in getting Execlists
      to work first and foremost, but in the future this parallel submission
      mechanism will help us to fine tune the mechanism without affecting
      old gens.
      
      v2: Pass the ringbuffer only (whenever possible).
      Signed-off-by: default avatarOscar Mateo <oscar.mateo@intel.com>
      Reviewed-by: default avatarDamien Lespiau <damien.lespiau@intel.com>
      [danvet: Appease checkpatch. Again. And drop the legacy sarea gunk
      that somehow crept in.]
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      82e104cc
    • Ville Syrjälä's avatar
      drm/i915: Make hpd debug messages less cryptic · 26fbb774
      Ville Syrjälä authored
      Don't print raw numbers, use port_name() and tell the user whether it's
      long or short without having to figure out what the other magic number
      means.
      Signed-off-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Reviewed-by: default avatarDamien Lespiau <damien.lespiau@intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      26fbb774
    • Oscar Mateo's avatar
      drm/i915/bdw: GEN-specific logical ring set/get seqno · e94e37ad
      Oscar Mateo authored
      No mistery here: the seqno is still retrieved from the engine's
      HW status page (the one in the default context. For the moment,
      I see no reason to worry about other context's HWS page).
      Signed-off-by: default avatarOscar Mateo <oscar.mateo@intel.com>
      Reviewed-by: default avatarDamien Lespiau <damien.lespiau@intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      e94e37ad
    • Oscar Mateo's avatar
      drm/i915/bdw: GEN-specific logical ring init · 9b1136d5
      Oscar Mateo authored
      Logical rings do not need most of the initialization their
      legacy ringbuffer counterparts do: we just need the pipe
      control object for the render ring, enable Execlists on the
      hardware and a few workarounds.
      
      v2: Squash with: "drm/i915: Extract pipe control fini & make
      init outside accesible".
      Signed-off-by: default avatarOscar Mateo <oscar.mateo@intel.com>
      Reviewed-by: default avatarDamien Lespiau <damien.lespiau@intel.com>
      [danvet: Make checkpatch happy.]
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      9b1136d5
    • Oscar Mateo's avatar
      drm/i915/bdw: Generic logical ring init and cleanup · 48d82387
      Oscar Mateo authored
      Allocate and populate the default LRC for every ring, call
      gen-specific init/cleanup, init/fini the command parser and
      set the status page (now inside the LRC object). These are
      things all engines/rings have in common.
      
      Stopping the ring before cleanup and initializing the seqnos
      is left as a TODO task (we need more infrastructure in place
      before we can achieve this).
      
      v2: Check the ringbuffer backing obj for ring_is_initialized,
      instead of the context backing obj (similar, but not exactly
      the same).
      Signed-off-by: default avatarOscar Mateo <oscar.mateo@intel.com>
      Reviewed-by: default avatarDamien Lespiau <damien.lespiau@intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      48d82387
    • Oscar Mateo's avatar
      drm/i915/bdw: Skeleton for the new logical rings submission path · 454afebd
      Oscar Mateo authored
      Execlists are indeed a brave new world with respect to workload
      submission to the GPU.
      
      In previous version of these series, I have tried to impact the
      legacy ringbuffer submission path as little as possible (mostly,
      passing the context around and using the correct ringbuffer when I
      needed one) but Daniel is afraid (probably with a reason) that
      these changes and, especially, future ones, will end up breaking
      older gens.
      
      This commit and some others coming next will try to limit the
      damage by creating an alternative path for workload submission.
      The first step is here: laying out a new ring init/fini.
      Signed-off-by: default avatarOscar Mateo <oscar.mateo@intel.com>
      Reviewed-by: default avatarDamien Lespiau <damien.lespiau@intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      454afebd
    • Oscar Mateo's avatar
      drm/i915: Abstract the legacy workload submission mechanism away · a83014d3
      Oscar Mateo authored
      As suggested by Daniel Vetter. The idea, in subsequent patches, is to
      provide an alternative to these vfuncs for the Execlists submission
      mechanism.
      
      v2: Splitted into two and reordered to illustrate our intentions, instead
      of showing it off. Also, remove the add_request vfunc and added the
      stop_ring one.
      Signed-off-by: default avatarOscar Mateo <oscar.mateo@intel.com>
      Reviewed-by: default avatarDamien Lespiau <damien.lespiau@intel.com>
      [danvet:
      - Make checkpatch happy.
      - Be grumpy about the excessive vtable.
      - Ditch gt->is_ring_initialized.]
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      a83014d3
    • Oscar Mateo's avatar
      drm/i915/bdw: Deferred creation of user-created LRCs · ec3e9963
      Oscar Mateo authored
      The backing objects and ringbuffers for contexts created via open
      fd are actually empty until the user starts sending execbuffers to
      them. At that point, we allocate & populate them. We do this because,
      at create time, we really don't know which engine is going to be used
      with the context later on (and we don't want to waste memory on
      objects that we might never use).
      
      v2: As contexts created via ioctl can only be used with the render
      ring, we have enough information to allocate & populate them right
      away.
      
      v3: Defer the creation always, even with ioctl-created contexts, as
      requested by Daniel Vetter.
      Signed-off-by: default avatarOscar Mateo <oscar.mateo@intel.com>
      Reviewed-by: default avatarDamien Lespiau <damien.lespiau@intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      ec3e9963
    • Oscar Mateo's avatar
      drm/i915/bdw: Populate LR contexts (somewhat) · 8670d6f9
      Oscar Mateo authored
      For the most part, logical ring context objects are similar to hardware
      contexts in that the backing object is meant to be opaque. There are
      some exceptions where we need to poke certain offsets of the object for
      initialization, updating the tail pointer or updating the PDPs.
      
      For our basic execlist implementation we'll only need our PPGTT PDs,
      and ringbuffer addresses in order to set up the context. With previous
      patches, we have both, so start prepping the context to be load.
      
      Before running a context for the first time you must populate some
      fields in the context object. These fields begin 1 PAGE + LRCA, ie. the
      first page (in 0 based counting) of the context  image. These same
      fields will be read and written to as contexts are saved and restored
      once the system is up and running.
      
      Many of these fields are completely reused from previous global
      registers: ringbuffer head/tail/control, context control matches some
      previous MI_SET_CONTEXT flags, and page directories. There are other
      fields which we don't touch which we may want in the future.
      
      v2: CTX_LRI_HEADER_0 is MI_LOAD_REGISTER_IMM(14) for render and (11)
      for other engines.
      
      v3: Several rebases and general changes to the code.
      
      v4: Squash with "Extract LR context object populating"
      Also, Damien's review comments:
      - Set the Force Posted bit on the LRI header, as the BSpec suggest we do.
      - Prevent warning when compiling a 32-bits kernel without HIGHMEM64.
      - Add a clarifying comment to the context population code.
      
      v5: Damien's review comments:
      - The third MI_LOAD_REGISTER_IMM in the context does not set Force Posted.
      - Remove dead code.
      
      v6: Add a note about the (presumed) differences between BDW and CHV state
      contexts. Also, Brad's review comments:
      - Use the _MASKED_BIT_ENABLE, upper_32_bits and lower_32_bits macros.
      - Be less magical about how we set the ring size in the context.
      
      Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1)
      Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com> (v2)
      Signed-off-by: default avatarOscar Mateo <oscar.mateo@intel.com>
      Reviewed-by: default avatarDamien Lespiau <damien.lespiau@intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      8670d6f9
    • Daniel Vetter's avatar
      drm/i915/bdw: Add a context and an engine pointers to the ringbuffer · 0c7dd53b
      Daniel Vetter authored
      Any given ringbuffer is unequivocally tied to one context and one engine.
      By setting the appropriate pointers to them, the ringbuffer struct holds
      all the infromation you might need to submit a workload for processing,
      Execlists style.
      
      v2: Drop ring->ctx since that looks terribly ill-defined for legacy
      ringbuffer submission.
      
      Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v1)
      Acked-by: Damien Lespiau <damien.lespiau@intel.com> (v2)
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      0c7dd53b
    • Oscar Mateo's avatar
      drm/i915/bdw: Allocate ringbuffers for Logical Ring Contexts · 84c2377f
      Oscar Mateo authored
      As we have said a couple of times by now, logical ring contexts have
      their own ringbuffers: not only the backing pages, but the whole
      management struct.
      
      In a previous version of the series, this was achieved with two separate
      patches:
      drm/i915/bdw: Allocate ringbuffer backing objects for default global LRC
      drm/i915/bdw: Allocate ringbuffer for user-created LRCs
      Signed-off-by: default avatarOscar Mateo <oscar.mateo@intel.com>
      Reviewed-by: default avatarDamien Lespiau <damien.lespiau@intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      84c2377f
    • Oscar Mateo's avatar
      drm/i915/bdw: A bit more advanced LR context alloc/free · 8c857917
      Oscar Mateo authored
      Now that we have the ability to allocate our own context backing objects
      and we have multiplexed one of them per engine inside the context structs,
      we can finally allocate and free them correctly.
      
      Regarding the context size, reading the register to calculate the sizes
      can work, I think, however the docs are very clear about the actual
      context sizes on GEN8, so just hardcode that and use it.
      
      v2: Rebased on top of the Full PPGTT series. It is important to notice
      that at this point we have one global default context per engine, all
      of them using the aliasing PPGTT (as opposed to the single global
      default context we have with legacy HW contexts).
      
      v3:
      - Go back to one single global default context, this time with multiple
        backing objects inside.
      - Use different context sizes for non-render engines, as suggested by
        Damien (still hardcoded, since the information about the context size
        registers in the BSpec is, well, *lacking*).
      - Render ctx size is 20 (or 19) pages, but not 21 (caught by Damien).
      - Move default context backing object creation to intel_init_ring (so
        that we don't waste memory in rings that might not get initialized).
      
      v4:
      - Reuse the HW legacy context init/fini.
      - Create a separate free function.
      - Rename the functions with an intel_ preffix.
      
      v5: Several rebases to account for the changes in the previous patches.
      
      Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1)
      Signed-off-by: default avatarOscar Mateo <oscar.mateo@intel.com>
      Reviewed-by: default avatarDamien Lespiau <damien.lespiau@intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      8c857917
    • Oscar Mateo's avatar
      drm/i915/bdw: Introduce one context backing object per engine · c9e003af
      Oscar Mateo authored
      A context backing object only makes sense for a given engine (because
      it holds state data specific to that engine).
      
      In legacy ringbuffer sumission mode, the only MI_SET_CONTEXT we really
      perform is for the render engine, so one backing object is all we nee.
      
      With Execlists, however, we need backing objects for every engine, as
      contexts become the only way to submit workloads to the GPU. To tackle
      this problem, we multiplex the context struct to contain <no-of-engines>
      objects.
      
      Originally, I colored this code by instantiating one new context for
      every engine I wanted to use, but this change suggested by Brad Volkin
      makes it more elegant.
      
      v2: Leave the old backing object pointer behind. Daniel Vetter suggested
      using a union, but it makes more sense to keep rcs_state as a NULL
      pointer behind, to make sure no one uses it incorrectly when Execlists
      are enabled, similar to what he suggested for ring->buffer (Rusty's API
      level 5).
      
      v3: Use the name "state" instead of the too-generic "obj", so that it
      mirrors the name choice for the legacy rcs_state.
      Signed-off-by: default avatarOscar Mateo <oscar.mateo@intel.com>
      Reviewed-by: default avatarDamien Lespiau <damien.lespiau@intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      c9e003af
    • Oscar Mateo's avatar
      drm/i915/bdw: Initialization for Logical Ring Contexts · ede7d42b
      Oscar Mateo authored
      For the moment this is just a placeholder, but it shows one of the
      main differences between the good ol' HW contexts and the shiny
      new Logical Ring Contexts: LR contexts allocate  and free their
      own backing objects. Another difference is that the allocation is
      deferred (as the create function name suggests), but that does not
      happen in this patch yet, because for the moment we are only dealing
      with the default context.
      
      Early in the series we had our own gen8_gem_context_init/fini
      functions, but the truth is they now look almost the same as the
      legacy hw context init/fini functions. We can always split them
      later if this ceases to be the case.
      
      Also, we do not fall back to legacy ringbuffers when logical ring
      context initialization fails (not very likely to happen and, even
      if it does, hw contexts would probably fail as well).
      
      v2: Daniel says "explain, do not showcase".
      Signed-off-by: default avatarOscar Mateo <oscar.mateo@intel.com>
      Reviewed-by: default avatarDamien Lespiau <damien.lespiau@intel.com>
      [danvet: s/BUG_ON/WARN_ON/.]
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      ede7d42b