1. 12 Aug, 2016 2 commits
    • Chris Wilson's avatar
      drm/i915: Use SSE4.1 movntdqa to accelerate reads from WC memory · 0b1de5d5
      Chris Wilson authored
      This patch provides the infrastructure for performing a 16-byte aligned
      read from WC memory using non-temporal instructions introduced with sse4.1.
      Using movntdqa we can bypass the CPU caches and read directly from memory
      and ignoring the page attributes set on the CPU PTE i.e. negating the
      impact of an otherwise UC access. Copying using movntdqa from WC is almost
      as fast as reading from WB memory, modulo the possibility of both hitting
      the CPU cache or leaving the data in the CPU cache for the next consumer.
      (The CPU cache itself my be flushed for the region of the movntdqa and on
      later access the movntdqa reads from a separate internal buffer for the
      cacheline.) The write back to the memory is however cached.
      
      This will be used in later patches to accelerate accessing WC memory.
      
      v2: Report whether the accelerated copy is successful/possible.
      v3: Function alignment override was only necessary when using the
      function target("sse4.1") - which is not necessary for emitting movntdqa
      from __asm__.
      v4: Improve notes on CPU cache behaviour vs non-temporal stores.
      v5: Fix byte offsets for unrolled moves.
      v6: Find all remaining typos of "movntqda", use kernel_fpu_begin.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Akash Goel <akash.goel@intel.com>
      Cc: Damien Lespiau <damien.lespiau@intel.com>
      Cc: Mika Kuoppala <mika.kuoppala@intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/1471001999-17787-2-git-send-email-chris@chris-wilson.co.uk
      0b1de5d5
    • Chris Wilson's avatar
      drm/i915: Support for creating write combined type vmaps · d31d7cb1
      Chris Wilson authored
      vmaps has a provision for controlling the page protection bits, with which
      we can use to control the mapping type, e.g. WB, WC, UC or even WT.
      To allow the caller to choose their mapping type, we add a parameter to
      i915_gem_object_pin_map - but we still only allow one vmap to be cached
      per object. If the object is currently not pinned, then we recreate the
      previous vmap with the new access type, but if it was pinned we report an
      error. This effectively limits the access via i915_gem_object_pin_map to a
      single mapping type for the lifetime of the object. Not usually a problem,
      but something to be aware of when setting up the object's vmap.
      
      We will want to vary the access type to enable WC mappings of ringbuffer
      and context objects on !llc platforms, as well as other objects where we
      need coherent access to the GPU's pages without going through the GTT
      
      v2: Remove the redundant braces around pin count check and fix the marker
           in documentation (Chris)
      
      v3:
      - Add a new enum for the vmalloc mapping type & pass that as an argument to
         i915_object_pin_map. (Tvrtko)
      - Use PAGE_MASK to extract or filter the mapping type info and remove a
         superfluous BUG_ON.(Tvrtko)
      
      v4:
      - Rename the enums and clean up the pin_map function. (Chris)
      
      v5: Drop the VM_NO_GUARD, minor cosmetics.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarAkash Goel <akash.goel@intel.com>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/1471001999-17787-1-git-send-email-chris@chris-wilson.co.uk
      d31d7cb1
  2. 11 Aug, 2016 14 commits
  3. 10 Aug, 2016 17 commits
  4. 09 Aug, 2016 2 commits
  5. 05 Aug, 2016 5 commits