1. 01 Feb, 2019 2 commits
  2. 31 Jan, 2019 6 commits
  3. 30 Jan, 2019 10 commits
  4. 29 Jan, 2019 17 commits
  5. 28 Jan, 2019 5 commits
    • Chris Wilson's avatar
      drm/i915: Track active timelines · 9407d3bd
      Chris Wilson authored
      Now that we pin timelines around use, we have a clearly defined lifetime
      and convenient points at which we can track only the active timelines.
      This allows us to reduce the list iteration to only consider those
      active timelines and not all.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190128181812.22804-6-chris@chris-wilson.co.uk
      9407d3bd
    • Chris Wilson's avatar
      drm/i915: Track the context's seqno in its own timeline HWSP · 5013eb8c
      Chris Wilson authored
      Now that we have allocated ourselves a cacheline to store a breadcrumb,
      we can emit a write from the GPU into the timeline's HWSP of the
      per-context seqno as we complete each request. This drops the mirroring
      of the per-engine HWSP and allows each context to operate independently.
      We do not need to unwind the per-context timeline, and so requests are
      always consistent with the timeline breadcrumb, greatly simplifying the
      completion checks as we no longer need to be concerned about the
      global_seqno changing mid check.
      
      One complication though is that we have to be wary that the request may
      outlive the HWSP and so avoid touching the potentially danging pointer
      after we have retired the fence. We also have to guard our access of the
      HWSP with RCU, the release of the obj->mm.pages should already be RCU-safe.
      
      At this point, we are emitting both per-context and global seqno and
      still using the single per-engine execution timeline for resolving
      interrupts.
      
      v2: s/fake_complete/mark_complete/
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190128181812.22804-5-chris@chris-wilson.co.uk
      5013eb8c
    • Chris Wilson's avatar
      drm/i915: Share per-timeline HWSP using a slab suballocator · 8ba306a6
      Chris Wilson authored
      If we restrict ourselves to only using a cacheline for each timeline's
      HWSP (we could go smaller, but want to avoid needless polluting
      cachelines on different engines between different contexts), then we can
      suballocate a single 4k page into 64 different timeline HWSP. By
      treating each fresh allocation as a slab of 64 entries, we can keep it
      around for the next 64 allocation attempts until we need to refresh the
      slab cache.
      
      John Harrison noted the issue of fragmentation leading to the same worst
      case performance of one page per timeline as before, which can be
      mitigated by adopting a freelist.
      
      v2: Keep all partially allocated HWSP on a freelist
      
      This is still without migration, so it is possible for the system to end
      up with each timeline in its own page, but we ensure that no new
      allocation would needless allocate a fresh page!
      
      v3: Throw a selftest at the allocator to try and catch invalid cacheline
      reuse.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: John Harrison <John.C.Harrison@Intel.com>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190128181812.22804-4-chris@chris-wilson.co.uk
      8ba306a6
    • Chris Wilson's avatar
      drm/i915: Allocate a status page for each timeline · 52954edd
      Chris Wilson authored
      Allocate a page for use as a status page by a group of timelines, as we
      only need a dword of storage for each (rounded up to the cacheline for
      safety) we can pack multiple timelines into the same page. Each timeline
      will then be able to track its own HW seqno.
      
      v2: Reuse the common per-engine HWSP for the solitary ringbuffer
      timeline, so that we do not have to emit (using per-gen specialised
      vfuncs) the breadcrumb into the distinct timeline HWSP and instead can
      keep on using the common MI_STORE_DWORD_INDEX. However, to maintain the
      sleight-of-hand for the global/per-context seqno switchover, we will
      store both temporarily (and so use a custom offset for the shared timeline
      HWSP until the switch over).
      
      v3: Keep things simple and allocate a page for each timeline, page
      sharing comes next.
      
      v4: I was caught repeating the same MI_STORE_DWORD_IMM over and over
      again in selftests.
      
      v5: And caught red handed copying create timeline + check.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190128181812.22804-3-chris@chris-wilson.co.uk
      52954edd
    • Chris Wilson's avatar
      drm/i915: Enlarge vma->pin_count · b18fe4be
      Chris Wilson authored
      Previously we only accommodated having a vma pinned by a small number of
      users, with the maximum being pinned for use by the display engine. As
      such, we used a small bitfield only large enough to allow the vma to
      be pinned twice (for back/front buffers) in each scanout plane. Keeping
      the maximum permissible pin_count small allows us to quickly catch a
      potential leak. However, as we want to split a 4096B page into 64
      different cachelines and pin each cacheline for use by a different
      timeline, we will exceed the current maximum permissible vma->pin_count
      and so time has come to enlarge it.
      
      Whilst we are here, try to pull together the similar bits:
      
      Address/layout specification:
       - bias, mappable, zone_4g: address limit specifiers
       - fixed: address override, limits still apply though
       - high: not strictly an address limit, but an address direction to search
      
      Search controls:
       - nonblock, nonfault, noevict
      
      v2: Rewrite the guideline comment on bit consumption.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarJohn Harrison <john.C.Harrison@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190128181812.22804-2-chris@chris-wilson.co.uk
      b18fe4be