- 10 Apr, 2015 38 commits
-
-
Michel Thierry authored
Like with gen6/7, we can enable bitmap tracking with all the preallocations to make sure things actually don't blow up. v2: Rebased to match changes from previous patches. v3: Without teardown logic, rely on used_pdpes and used_pdes when freeing page tables. v4: Rebased after s/page_tables/page_table/. v5: Rebased after page table generalizations. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+) Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Michel Thierry authored
When we do dynamic page table allocations for gen8, we'll need to have more control over how and when we map page tables, similar to gen6. In particular, DMA mappings for page directories/tables occur at allocation time. This patch adds the functionality and calls it at init, which should have no functional change. The PDPEs are still a special case for now. We'll need a function for that in the future as well. v2: Handle renamed unmap_and_free_page functions. v3: Updated after teardown_va logic was removed. v4: Rebase after s/page_tables/page_table/. v5: No longer allocate all PDPs in GEN8+ systems with less than 4GB of memory, and update populate_lr_context to handle this new case (proper tracking will be added later in the patch series). v6: Assign lrc page directory pointer addresses using a macro. (Mika) Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+) Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Michel Thierry authored
This will be useful for when we move to 48b addressing, and the PDP isn't the root of the page table structure. v2: Rebase after changes for Gen8+ systems with less than 4GB of memory. v3: Rebase after Mika's code review. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2) Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Michel Thierry authored
These values are never quite useful for dynamic allocations of the page tables. Getting rid of them will help prevent later confusion. v2: Updated to use unmap_and_free_pd functions. v3: Updated gen8_ppgtt_free after teardown logic was removed. v4: Rebase after s/page_tables/page_table/. v5: Keep allocating all page directories in GEN8+ systems with less than 4GB of memory. Updated gen6_for_all_pdes. v6: Prevent (harmless) out of range access in gen6_for_all_pdes. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+) Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Michel Thierry authored
One important part of this patch is we now write a scratch page directory into any unused PDP descriptors. This matters for 2 reasons, first, we're not allowed to just use 0, or an invalid pointer, and second, we must wipe out any previous contents from the last context. The latter point only matters with full PPGTT. The former point only effect platforms with less than 4GB memory. v2: Updated commit message to point that we must set unused PDPs to the scratch page. v3: Unmap scratch_pd in gen8_ppgtt_free. v4: Initialize scratch_pd. (Mika) Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+) Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Michel Thierry authored
Start using gen8_for_each_pde macro to allocate page tables. v2: teardown_va_range references removed. v3: Rebase after s/page_tables/page_table/. v4: Keep setting up page tables for all page directories in systems with less than 4GB of memory. v5: Also initialize the page tables. (Mika) v6: Initialize all page tables, including the extra ones from systems with less than 4GB of memory. (Mika) Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+) Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Michel Thierry authored
Start using gen8_for_each_pdpe macro to allocate the page directories. Similar to PTs, while setting up a page directory, make all entries of the pd point to the scratch pd before mapping (and make all its entries point to the scratch page); this is to be safe in case of out of bound access or proactive prefetch. Systems without LLC require an explicit flush. v2: Rebased after s/free_pt_*/unmap_and_free_pt/ change. v3: Rebased after teardown va range logic was removed. v4: Keep setting up all page directories for systems with less than 4GB of memory. v5: Initialize PDs. (Mika) v6: Initialize also the extra PDs from systems with less than 4GB of memory. (Mika) Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+) Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Michel Thierry authored
Similar to gen6, we will use for_each_pde/for_each_pdpe and pte/pde/pdpe_index to iterate over these new structures. v2: Match trace_i915_va_teardown params v3: Multiple rebases. v4: Updated to use unmap_and_free_pt. v5: teardown_va_range logic no longer needed. v6: Rebase after s/page_tables/page_table/. v7: Renamed commit to match what it does now (it was "Use dynamic allocation idioms on free"). v8: Prevent (harmless) out of range access in gen8_for_each_pde and gen8_for_each_pdpe_e. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+) Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> [danvet: s/BUG/WARN/] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Michel Thierry authored
Similar to gen6, while setting up a page table, make all entries of the pt point to the scratch page before mapping; this is to be safe in case of out of bound access or proactive prefetch. Systems without LLC require an explicit flush. v2: Expanded commit text and fixed indentation (Mika) Signed-off-by: Michel Thierry <michel.thierry@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Michel Thierry authored
We are already unmapping them in gen8_ppgtt_free. This function became redundant since commit 06fda602 ("drm/i915: Create page table allocators"). Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Signed-off-by: Michel Thierry <michel.thierry@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Michel Thierry authored
Lets try to keep this consistent: Page Directory Pointer (PDP). Page Directory (PD), also known as page directory pointer entries. Page Table (PT), also known as page directory entries. s/struct i915_page_table_entry/struct i915_page_table/ s/struct i915_page_directory_entry/struct i915_page_directory/ s/struct i915_page_directory_pointer_entry/struct i915_page_directory_pointer/ Suggested-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Signed-off-by: Michel Thierry <michel.thierry@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Chris Wilson authored
This is mostly useful for execlists where the rings switch between contexts (and so checking that the ring's start register matches the context is important). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Chris Wilson authored
This is just so that I don't have to read about the batch pool on systems that are not using it! Rather than using a newline between the kernel clients and userspace clients, just distinguish the internal allocations with a '[k]' Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Chris Wilson authored
Since we use obj->active as a hint in many places throughout the code, knowing its state in debugfs is extremely useful. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Chris Wilson authored
Now with the trimmed memcpy before the command parser, we try to allocate many different sizes of batches, predominantly one or two pages. We can therefore speed up searching for a good sized batch by keeping the objects of buckets of roughly the same size. v2: Add a comment about bucket sizes Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Chris Wilson authored
At runtime, this helps ensure that the batch pools are kept trim and fast. Then at suspend, this releases memory that we do not need to restore. It also ties into the oom-notifier to ensure that we recover as much kernel memory as possible during OOM. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Chris Wilson authored
I woke up one morning and found 50k objects sitting in the batch pool and every search seemed to iterate the entire list... Painting the screen in oils would provide a more fluid display. One issue with the current design is that we only check for retirements on the current ring when preparing to submit a new batch. This means that we can have thousands of "active" batches on another ring that we have to walk over. The simplest way to avoid that is to split the pools per ring and then our LRU execution ordering will also ensure that the inactive buffers remain at the front. v2: execlists still requires duplicate code. v3: execlists requires more duplicate code Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Chris Wilson authored
Move the madvise logic out of the execbuffer main path into the relatively rare allocation path, making the execbuffer manipulation less fragile. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Chris Wilson authored
In the next patch, I want to use the structure elsewhere and so require it defined earlier. Rather than move the definition to an earlier location where it feels very odd, place it in its own header file. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Chris Wilson authored
This reverts commit ec5cc0f9 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Jun 12 10:28:55 2014 +0100 drm/i915: Restrict GPU boost to the RCS engine The premise that media/blitter workloads are not affected by boosting is patently false with a trip through igt. The question that remains is what exactly is going wrong with the media workload that prompted this? Hopefully that would be fixed by the missing agressive downclocking, in addition to the extra restrictions imposed on how frequent a process is allowed to boost. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Deepak S <deepak.s@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll> Acked-by: Deepak S <deepak.s@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Chris Wilson authored
With boosting for missed pageflips, we have a much stronger indication of when we need to (temporarily) boost GPU frequency to ensure smooth delivery of frames. So now only allow each client to perform one RPS boost in each period of GPU activity due to stalling on results. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Deepak S <deepak.s@linux.intel.com> Reviewed-by: Deepak S <deepak.s@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Chris Wilson authored
If we hit a vblank and see that have a pageflip queue but not yet processed, ensure that the GPU is running at maximum in order to clear the backlog. Pageflips are only queued for the following vblank, if we miss it, there will be a visible stutter. Boosting the GPU frequency doesn't prevent us from missing the target vblank, but it should help the subsequent frames hitting theirs. v2: Reorder vblank vs flip-complete so that we only check for a missed flip after processing the completion events, and avoid spurious boosts. v3: Rename missed_vblank v4: Rebase v5: Cancel the outstanding work in runtime suspend v6: Rebase v7: Rebase required fixing Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Deepak S<deepak.s@linux.intel.com> Reviewed-by: Deepak S<deepak.s@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Chris Wilson authored
The issue is that by computing the last_adj value after applying the clamping, we can end up with a bogus value for feeding into the next RPS autotuning step. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Deepak S <deepak.s@linux.intel.com> Reviewed-by: Deepak S <deepak.s@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Chris Wilson authored
Reuse the same reclocking strategy for Baytail as on its bigger brethren, Sandybridge and Ivybridge. In particular, this makes the device quicker to reclock (both up and down) though the tendency now is to downclock more aggressively to compensate for the RPS boosts. v2: Rebase v3: Exclude Cherrytrail as Deepak was concerned that the increased number of register writes would wake the common powerwell too often. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Deepak S <deepak.s@linux.intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Deepak S <deepak.s@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Chris Wilson authored
Currently we emit semaphore synchronisation as if we were going to flip using the target CS engine, but we then change our minds and do the flip using the CPU. Consequently we write instructions to the ring but never use them - even to the point of filling that ring up entirely and never submitting a request. The wrinkle in the ointment is that we have to tell a white lie to pin-to-display for it to skip the synchronisation for mmioflips as we will create a task specifically for that slow synchronisation. An oddity of note is the discrepancy in requests that we tell to pin-display to serialise to and that we then eventually wait upon. This is due to a limitation in the i915_gem_object_sync() routine that will be lifted later. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Chris Wilson authored
The biggest user of i915_gem_object_get_page() is the relocation processing during execbuffer. Typically userspace passes in a set of relocations in sorted order. Sadly, we alternate between relocations increasing from the start of the buffers, and relocations decreasing from the end. However the majority of consecutive lookups will still be in the same page. We could cache the start of the last sg chain, however for most callers, the entire sgl is inside a single chain and so we see no improve from the extra layer of caching. v2: Avoid the double increment inside unlikely() References: https://bugs.freedesktop.org/show_bug.cgi?id=88308Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Damien Lespiau authored
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Damien Lespiau authored
Both WaDisableSDEUnitClockGating and WaSetGAPSunitClckGateDisable are needed on B0 as well. Signed-off-by: Damien Lespiau <damien.lespiau@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Arun Siluvery authored
According to Spec this is a reserved bit for Gen9+ and should not be set. Change-Id: I0215fb7057b94139b7a2f90ecc7a0201c0c93ad4 Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Maarten Lankhorst authored
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Ander Conselvan de Oliveira authored
For the conversion to atomic. The pre_enable() hooks are called as part of the crtc enable sequence, at which point the staged config was already made effective. Furthermore, the function actually changes hardware state, so it should anyway deal with current and not staged config. Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Ander Conselvan de Oliveira authored
Reduce dependency on the staged config by using the atomic state instead. Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Ander Conselvan de Oliveira authored
Reduce dependency on the staged config by using the atomic state instead. Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Ander Conselvan de Oliveira authored
It's not needed anymore, now that all the users were converted to using an atomic state. Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Ander Conselvan de Oliveira authored
Move towards atomic by using the atomic state instead. Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Ander Conselvan de Oliveira authored
Now that we use a drm atomic state for the legacy modeset, it is possible to get rid of the usage of intel_crtc->new_config in the function intel_mode_max_pixclk(). Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Ville Syrjälä authored
../drivers/gpu/drm/i915/intel_pm.c:3185:45: warning: Initializer entry defined twice ../drivers/gpu/drm/i915/intel_pm.c:3185:52: also defined here Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Chris Wilson authored
Sometimes userspace wants a true overlay that is never clipped. In such cases, we need to disable the destination colorkey. However, it is currently unconditionally enabled in the overlay with no means of disabling. So rectify that by always default to on, and extending the UPDATE_ATTR ioctl to support explicit disabling of the colorkey. This is contrast to the spite code which requires explicit enabling of either the destination or source colorkey. Handling source colorkey is still todo for the overlay. (Of course it may be worth migrating overlay to sprite before then.) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
- 07 Apr, 2015 2 commits
-
-
Jani Nikula authored
Occasionally it would be interesting to read some of the DPCD registers for debug purposes, without having to resort to logging. Add an i915 specific i915_dpcd debugfs file for DP and eDP connectors to dump parts of the DPCD. Currently the DPCD addresses to be dumped are statically configured, and more can be added trivially. The implementation also makes it relatively easy to add other i915 and connector specific debugfs files in the future, as necessary. This is currently i915 specific just because there's no generic way to do AUX transactions given just a drm_connector. However it's all pretty straightforward to port to other drivers. v2: Add more DPCD registers to dump. Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Bob Paauwe <bob.j.paauwe@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Rodrigo Vivi authored
Program the default initial value of the L3SqcReg1 on BDW for performance v2: Default confirmed and using intel_ring_emit_wa as Mika pointed out. v3: Spec shows now a different value. It tells us to set to 0x784000 instead the 0x610000 that is there already. Also rebased after a long time so using WA_WRITE now. Cc: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-