1. 22 Jun, 2017 5 commits
  2. 21 Jun, 2017 5 commits
  3. 20 Jun, 2017 15 commits
  4. 19 Jun, 2017 3 commits
  5. 16 Jun, 2017 12 commits
    • Rodrigo Vivi's avatar
      drm/i915/cfl: Introduce Coffee Lake workarounds. · 46c26662
      Rodrigo Vivi authored
      Coffee Lake inherit most of Kabylake production
      workarounds.
      
      v2: Fix typo on commit message and remove
          WaDisableKillLogic and GEN9_DISABLE_OCL_OOB_SUPPRESS_LOGIC,
          since as Mika pointed out they shouldn't be here for cfl
          according to BSpec.
      
      Cc: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
      Signed-off-by: default avatarRodrigo Vivi <rodrigo.vivi@intel.com>
      Reviewed-by: default avatarMika Kuoppala <mika.kuoppala@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/1497653398-15722-1-git-send-email-rodrigo.vivi@intel.com
      46c26662
    • Dhinakaran Pandiyan's avatar
      drm/i915: Store 9 bits of PCI Device ID for platforms with a LP PCH · 28e0f4ee
      Dhinakaran Pandiyan authored
      Although we use 9 bits of Device ID for identifying PCH, only 8 bits are
      stored in dev_priv->pch_id. This makes HAS_PCH_CNP_LP() and
      HAS_PCH_SPT_LP() incorrect. Fix this by storing all the 9 bits for the
      platforms with LP PCH.
      
      v2: Drop PCH_LPT_LP change (Imre)
      
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: Jani Nikula <jani.nikula@intel.com>
      Cc: Imre Deak <imre.deak@intel.com>
      Fixes: commit ec7e0bb3 ("drm/i915/cnp: Add PCI ID for Cannonpoint LP PCH")
      Reported-by: default avatarImre Deak <imre.deak@intel.com>
      Reviewed-by: default avatarImre Deak <imre.deak@intel.com>
      Signed-off-by: default avatarDhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
      Signed-off-by: default avatarImre Deak <imre.deak@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/1497641774-29104-1-git-send-email-dhinakaran.pandiyan@intel.com
      28e0f4ee
    • Chris Wilson's avatar
      drm/i915: Stash a pointer to the obj's resv in the vma · 95ff7c7d
      Chris Wilson authored
      During execbuf, a mandatory step is that we add this request (this
      fence) to each object's reservation_object. Inside execbuf, we track the
      vma, and to add the fence to the reservation_object then means having to
      first chase the obj, incurring another cache miss. We can reduce the
       number of cache misses by stashing a pointer to the reservation_object
      in the vma itself.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170616140525.6394-1-chris@chris-wilson.co.uk
      95ff7c7d
    • Chris Wilson's avatar
      drm/i915: Async GPU relocation processing · 7dd4f672
      Chris Wilson authored
      If the user requires patching of their batch or auxiliary buffers, we
      currently make the alterations on the cpu. If they are active on the GPU
      at the time, we wait under the struct_mutex for them to finish executing
      before we rewrite the contents. This happens if shared relocation trees
      are used between different contexts with separate address space (and the
      buffers then have different addresses in each), the 3D state will need
      to be adjusted between execution on each context. However, we don't need
      to use the CPU to do the relocation patching, as we could queue commands
      to the GPU to perform it and use fences to serialise the operation with
      the current activity and future - so the operation on the GPU appears
      just as atomic as performing it immediately. Performing the relocation
      rewrites on the GPU is not free, in terms of pure throughput, the number
      of relocations/s is about halved - but more importantly so is the time
      under the struct_mutex.
      
      v2: Break out the request/batch allocation for clearer error flow.
      v3: A few asserts to ensure rq ordering is maintained
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      7dd4f672
    • Chris Wilson's avatar
      drm/i915: Allow execbuffer to use the first object as the batch · 1a71cf2f
      Chris Wilson authored
      Currently, the last object in the execlist is the always the batch.
      However, when building the batch buffer we often know the batch object
      first and if we can use the first slot in the execlist we can emit
      relocation instructions relative to it immediately and avoid a separate
      pass to adjust the relocations to point to the last execlist slot.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      1a71cf2f
    • Chris Wilson's avatar
      drm/i915: Wait upon userptr get-user-pages within execbuffer · 8a2421bd
      Chris Wilson authored
      This simply hides the EAGAIN caused by userptr when userspace causes
      resource contention. However, it is quite beneficial with highly
      contended userptr users as we avoid repeating the setup costs and
      kernel-user context switches.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarMichał Winiarski <michal.winiarski@intel.com>
      8a2421bd
    • Chris Wilson's avatar
      drm/i915: First try the previous execbuffer location · 616d9cee
      Chris Wilson authored
      When choosing a slot for an execbuffer, we ideally want to use the same
      address as last time (so that we don't have to rebind it) and the same
      address as expected by the user (so that we don't have to fixup any
      relocations pointing to it). If we first try to bind the incoming
      execbuffer->offset from the user, or the currently bound offset that
      should hopefully achieve the goal of avoiding the rebind cost and the
      relocation penalty. However, if the object is not currently bound there
      we don't want to arbitrarily unbind an object in our chosen position and
      so choose to rebind/relocate the incoming object instead. After we
      report the new position back to the user, on the next pass the
      relocations should have settled down.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarJoonas Lahtinen <joonas.lahtien@linux.intel.com>
      616d9cee
    • Chris Wilson's avatar
      drm/i915: Store a persistent reference for an object in the execbuffer cache · dade2a61
      Chris Wilson authored
      If we take a reference to the object/vma when it is first used in an
      execbuf, we can keep that reference until the object's file-local handle
      is closed. Thereby saving a frequent ref/unref pair.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      dade2a61
    • Chris Wilson's avatar
      drm/i915: Eliminate lots of iterations over the execobjects array · 2889caa9
      Chris Wilson authored
      The major scaling bottleneck in execbuffer is the processing of the
      execobjects. Creating an auxiliary list is inefficient when compared to
      using the execobject array we already have allocated.
      
      Reservation is then split into phases. As we lookup up the VMA, we
      try and bind it back into active location. Only if that fails, do we add
      it to the unbound list for phase 2. In phase 2, we try and add all those
      objects that could not fit into their previous location, with fallback
      to retrying all objects and evicting the VM in case of severe
      fragmentation. (This is the same as before, except that phase 1 is now
      done inline with looking up the VMA to avoid an iteration over the
      execobject array. In the ideal case, we eliminate the separate reservation
      phase). During the reservation phase, we only evict from the VM between
      passes (rather than currently as we try to fit every new VMA). In
      testing with Unreal Engine's Atlantis demo which stresses the eviction
      logic on gen7 class hardware, this speed up the framerate by a factor of
      2.
      
      The second loop amalgamation is between move_to_gpu and move_to_active.
      As we always submit the request, even if incomplete, we can use the
      current request to track active VMA as we perform the flushes and
      synchronisation required.
      
      The next big advancement is to avoid copying back to the user any
      execobjects and relocations that are not changed.
      
      v2: Add a Theory of Operation spiel.
      v3: Fall back to slow relocations in preparation for flushing userptrs.
      v4: Document struct members, factor out eb_validate_vma(), add a few
      more comments to explain some magic and hide other magic behind macros.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      2889caa9
    • Chris Wilson's avatar
      drm/i915: Disable EXEC_OBJECT_ASYNC when doing relocations · 071750e5
      Chris Wilson authored
      If we write a relocation into the buffer, we require our own implicit
      synchronisation added after the start of the execbuf, outside of the
      user's control. As we may end up clflushing, or doing the patch itself
      on the GPU, asynchronously we need to look at the implicit serialisation
      on obj->resv and hence need to disable EXEC_OBJECT_ASYNC for this
      object.
      
      If the user does trigger a stall for relocations, we make sure the stall
      is complete enough so that the batch is not submitted before we complete
      those relocations.
      
      Fixes: 77ae9957 ("drm/i915: Enable userspace to opt-out of implicit fencing")
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Jason Ekstrand <jason@jlekstrand.net>
      Reviewed-by: default avatarJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      071750e5
    • Chris Wilson's avatar
      drm/i915: Pass vma to relocate entry · 507d977f
      Chris Wilson authored
      We can simplify our tracking of pending writes in an execbuf to the
      single bit in the vma->exec_entry->flags, but that requires the
      relocation function knowing the object's vma. Pass it along.
      
      Note we have only been using a single bit to track flushing since
      
      commit cc889e0f
      Author: Daniel Vetter <daniel.vetter@ffwll.ch>
      Date:   Wed Jun 13 20:45:19 2012 +0200
      
          drm/i915: disable flushing_list/gpu_write_list
      
      unconditionally flushed all render caches before the breadcrumb and
      
      commit 6ac42f41
      Author: Daniel Vetter <daniel.vetter@ffwll.ch>
      Date:   Sat Jul 21 12:25:01 2012 +0200
      
          drm/i915: Replace the complex flushing logic with simple invalidate/flush all
      
      did away with the explicit GPU domain tracking. This was then codified
      into the ABI with NO_RELOC in
      
      commit ed5982e6
      Author: Daniel Vetter <daniel.vetter@ffwll.ch> # Oi! Patch stealer!
      Date:   Thu Jan 17 22:23:36 2013 +0100
      
          drm/i915: Allow userspace to hint that the relocations were known
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      507d977f
    • Chris Wilson's avatar
      drm/i915: Store a direct lookup from object handle to vma · 4ff4b44c
      Chris Wilson authored
      The advent of full-ppgtt lead to an extra indirection between the object
      and its binding. That extra indirection has a noticeable impact on how
      fast we can convert from the user handles to our internal vma for
      execbuffer. In order to bypass the extra indirection, we use a
      resizable hashtable to jump from the object to the per-ctx vma.
      rhashtable was considered but we don't need the online resizing feature
      and the extra complexity proved to undermine its usefulness. Instead, we
      simply reallocate the hastable on demand in a background task and
      serialize it before iterating.
      
      In non-full-ppgtt modes, multiple files and multiple contexts can share
      the same vma. This leads to having multiple possible handle->vma links,
      so we only use the first to establish the fast path. The majority of
      buffers are not shared and so we should still be able to realise
      speedups with multiple clients.
      
      v2: Prettier names, more magic.
      v3: Many style tweaks, most notably hiding the misuse of execobj[].rsvd2
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      4ff4b44c