- 15 Apr, 2016 12 commits
-
-
Imre Deak authored
Power well 1 is managed by the DMC firmware so don't toggle it on-demand from the driver. This means we need to follow the BSpec display initialization sequence during driver loading and resuming (both system and runtime) and enable power well 1 only once there. Afterwards DMC will toggle power well 1 whenever entering/exiting DC5. For this to work we also need to do away getting the PLL power domain, since that just kept runtime PM disabled for good. Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1459515767-29228-12-git-send-email-imre.deak@intel.com
-
Imre Deak authored
The power-down step logically belongs to the individual PHY uninit sequence so move it there. The only functional change is that we will power down now PHY 1 separately before PHY 0 and preserve the other bits in the register which are defined as reserved. Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1459515767-29228-11-git-send-email-imre.deak@intel.com
-
Imre Deak authored
For internal APIs passing dev_priv is preferred to reduce indirections, so convert over a few DDI PHY, CDCLK helpers. No functional change. Signed-off-by: Imre Deak <imre.deak@intel.com> Acked-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: David Weinehall <david.weinehall@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1459515767-29228-10-git-send-email-imre.deak@intel.com
-
Imre Deak authored
On Broxton we need to enable/disable power well 1 during the init/unit display sequence similarly to Skylake/Kabylake. The code for this will be added in a follow-up patch, but to prepare for that unexport skl_pw1_misc_io_init(). It's a simple function called only from a single place and having it inlined in the Skylake display core init/unit functions will make it easier to compare it with its Broxton counterpart. This also flips the order of Misc IO and power well 1 disabling which matches the enabling order. The specification doesn't prescribe the disabling order, so this should be fine. v2: - Fix incorrect enable vs. disable power well call in skl_display_core_uninit() (Patrik) - Add commit comment about chaning the order of PW1 and Misc IO power well disabling (Patrik) CC: Patrik Jakobsson <patrik.jakobsson@linux.intel.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Patrik Jakobsson <patrik.jakobsson@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1459773777-10701-1-git-send-email-imre.deak@intel.com
-
Imre Deak authored
On SKL/KBL suspend-to-idle (aka freeze/s0ix) is performed with DMC firmware assistance where the target display power state is DC6. On Broxton on the other hand we don't use the firmware for this, but rely instead on a manual DC9 flow. For this we have to uninitialize the display following the BSpec display uninit sequence, just as during S3/S4, so make sure we follow this sequence. CC: Patrik Jakobsson <patrik.jakobsson@linux.intel.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Patrik Jakobsson <patrik.jakobsson@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1459515767-29228-8-git-send-email-imre.deak@intel.com
-
Imre Deak authored
The display power well support and DC state management doesn't depend on runtime PM support, so remove the incorrect asserts about this. Also Broxton does support DC5, so the related assert in assert_can_enable_dc5() is incorrect. There is a more generic and correct assert for this already in gen9_set_dc_state(), so we can remove all the other ones. At the same time convert WARNs to WARN_ONCE for consistency with the other DC state asserts. CC: Patrik Jakobsson <patrik.jakobsson@linux.intel.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Patrik Jakobsson <patrik.jakobsson@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1459515767-29228-7-git-send-email-imre.deak@intel.com
-
Imre Deak authored
So far we only power well enabling was synchronous not disabling. Since we don't exactly know how the firmware (both DMC and PCU) synchronizes against the actual power well state during DC transitions, make the disabling also synchronous. CC: Mika Kuoppala <mika.kuoppala@linux.intel.com> CC: Patrik Jakobsson <patrik.jakobsson@linux.intel.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Patrik Jakobsson <patrik.jakobsson@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1459515767-29228-6-git-send-email-imre.deak@intel.com
-
Imre Deak authored
DMC forces on power well 1 and the misc IO power well by setting the corresponding request bits both in the BIOS and the DEBUG power well request registers. This is somewhat unexpected since the firmware should really just save and restore state but not alter it. We also depend on being able to disable power well 1, and the misc IO power well before entering S3/S4 on BXT and SKL or entering DC9 on BXT. To fix this make sure these request bits are cleared whenever we want to disable the given power wells. On SKL there is another twist where the firmware also clears the power well 1 request bit in HSW_POWER_WELL_DRIVER (but not that of the misc IO power well). This happens to not cause a problem due to the forced-on request bits in the other request registers. I've filed a bug about all this, but fixing that may take a while and having this sanity check in place makes sense even for future firmware versions. At the same time also check the KVMR request bits. I haven't seen this being altered, but we don't expect any request bits in here either, so sanitize this register as well. v2: - Apply the workaround on SKL as well. I noticed the related failure from the CI report, later Patrik also reported seeing it on his machine. CC: Patrik Jakobsson <patrik.jakobsson@linux.intel.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Patrik Jakobsson <patrik.jakobsson@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1459851965-6137-1-git-send-email-imre.deak@intel.com
-
Imre Deak authored
This register is read-only, so we have never actually set OCL2_LDOFUSE_PWR_DIS in it as specified by the specification. Add a code comment about this. I filed a specification update request to clarify this there. CC: Arthur J Runyan <arthur.j.runyan@intel.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: David Weinehall <david.weinehall@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1459515767-29228-4-git-send-email-imre.deak@intel.com
-
Imre Deak authored
This has been corrected in BSpec quite some time ago, but we missed it somehow. The wrong field definitions resulted in configuring PHY0 with an incorrect GRC value. v2: - Remove the FIXME comment, we left in the code exactly about this issue. (Ville) CC: Arthur J Runyan <arthur.j.runyan@intel.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1459515767-29228-3-git-send-email-imre.deak@intel.com
-
Imre Deak authored
DMC version 1.06 has a known bug, where the firmware polls forever for a port PLL to lock, if the PLL was disabled when entering DC5, which locks up the machine. Version 1.07 fixes this, so make that the minimum required version on BXT. CC: Mika Kuoppala <mika.kuoppala@linux.intel.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1459515767-29228-2-git-send-email-imre.deak@intel.com
-
Gustavo Padovan authored
Replace the legacy drm_send_vblank_event() with the new helper function. Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1460656118-16766-4-git-send-email-gustavo@padovan.org
-
- 14 Apr, 2016 28 commits
-
-
Ville Syrjälä authored
HSW still has the wake FIFO, so let's check it. Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Deepak S <deepak.s@linux.intel.com> Fixes: 05a2fb15 ("drm/i915: Consolidate forcewake code") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1460633942-24013-1-git-send-email-ville.syrjala@linux.intel.com Cc: stable@vger.kernel.org Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
-
Ville Syrjälä authored
As we did on VLV, split the gt irq handling to ack and handler phases on CHV. Leave the BDW+ codepath mostly intact for now. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1460571598-24452-13-git-send-email-ville.syrjala@linux.intel.comReviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-
Ville Syrjälä authored
It looks silly to pass both dev and dev_priv to the snb/ilk gt irq handlers. Just pass dev_priv. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1460571598-24452-12-git-send-email-ville.syrjala@linux.intel.comReviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-
Ville Syrjälä authored
No need to actually handle the GT/PM interrupt while we have interrupt sources disabled. Move the actual processing to happen after we've restored VLV_IER and master interrupt enable. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1460571598-24452-11-git-send-email-ville.syrjala@linux.intel.comReviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-
Ville Syrjälä authored
Minimize the amount of stuff we do with interrupt sources disabled by splitting the PIPESTAT irq handling into ack+handler phases. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1460571598-24452-10-git-send-email-ville.syrjala@linux.intel.comReviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-
Ville Syrjälä authored
Split the VLV/CHV hoplug irq handling to ack and handler phases. This way we can move the actual irq handling outside the section where we have disabled the interrupt sources. For now, we leave things as is for pre-VLV GMCH platforms, but eventually they could get the same treatment. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1460571598-24452-9-git-send-email-ville.syrjala@linux.intel.comReviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-
Ville Syrjälä authored
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1460571598-24452-8-git-send-email-ville.syrjala@linux.intel.comReviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-
Ville Syrjälä authored
Now that we've dealt with the races in clearing IIR bits via VLV_IER and the master interrupt enable, we can go ahead aliminate the loop from the VLV interrupt handler. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1460571598-24452-7-git-send-email-ville.syrjala@linux.intel.comReviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-
Ville Syrjälä authored
On VLV/CHV the master interrupt enable bit only affects GT/PM interrupts. Display interrupts are not affected by the master irq control. Also it seems that the CPU interrupt will only be generated when the combined result of all GT/PM/display interrupts has a 0->1 edge. We already use the master interrupt enable bit to make sure GT/PM interrupt can generate such an edge if we don't end up clearing all IIR bits. We must do the same for display interrupts, and for that we can simply clear out VLV_IER, and restore after we've acked all the interrupts we are about to process. So with both master interrupt enable and VLV_IER cleared out, we will guarantee that there will be a 0->1 edge if any IIR bits are still set at the end, and thus another CPU interrupt will be generated. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Fixes: 579de73b ("drm/i915: Exit cherryview_irq_handler() after one pass") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1460571598-24452-6-git-send-email-ville.syrjala@linux.intel.comReviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-
Ville Syrjälä authored
Like on CHV, let's clear out the master irq enable bit when we ack GT/PM interrupts. This will allow GT/PM interrupts to re-raise the CPU interrupt if we fail to clear all the bits from the IIR(s). Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1460571598-24452-5-git-send-email-ville.syrjala@linux.intel.comReviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-
Ville Syrjälä authored
On VLV/CHV VLV_IIR is not double double buffered, and it doesn't detect edges from PIPESTAT & co. like it does on gen4. Instead it just directly latches the level from PIPESTAT & co. That means we must clear VLV_IIR after PIPESTAT & co. or else we'll get a spurious bit in VLV_IIR every single time. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1460571598-24452-4-git-send-email-ville.syrjala@linux.intel.comReviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-
Ville Syrjälä authored
We're lacking VLV_MASTER_IER setup from valleyview_irq_preinstall(), so add it there. Also cargo cult in some POSTING_READ()s to match the other platforms. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1460571598-24452-3-git-send-email-ville.syrjala@linux.intel.comReviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-
Ville Syrjälä authored
Use GEN8_MASTER_IRQ_CONTROL instead of DE_MASTER_IRQ_CONTROL or MASTER_INTERRUPT_ENABLE with the GEN8_MASTER_IRQ register. They're all bit 31 so there's no actual bug here, but let's be consistent which name we use for the bit. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1460571598-24452-2-git-send-email-ville.syrjala@linux.intel.comReviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-
Ville Syrjälä authored
On CHV GTFIFODBG has some read-only bits to indicate the number of free FIFO entries. Ignore these when checking to see if any of the sticky error bits are set. This gets rid of these during device resume: [drm:cherryview_enable_rps] GT fifo had a previous error 1080000 While at it, move the assignments out of the if(). Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1460570970-14073-1-git-send-email-ville.syrjala@linux.intel.comReviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Peter Antoine authored
Allow for the MOCS to be programmed for all engines. Currently we program the MOCS when the first render batch goes through. This works on most platforms but fails on platforms that do not run a render batch early, i.e. headless servers. The patch now programs all initialised engines on init and the RCS is programmed again within the initial batch. This is done for predictable consistency with regards to the hardware context. Hardware context loading sets the values of the MOCS for RCS and L3CC. Programming them from within the batch makes sure that the render context is valid, no matter what the previous state of the saved-context was. v2: posted correct version to the mailing list. v3: moved programming to within engine->init_hw() (Chris Wilson) v4: code formatting and white-space changes. (Chris Wilson) Testcase: igt/gem_mocs_settings Signed-off-by: Peter Antoine <peter.antoine@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1460556205-6644-1-git-send-email-chris@chris-wilson.co.uk
-
Chris Wilson authored
Conceptually, each request is a record of a hardware transaction - we build up a list of pending commands and then either commit them to hardware, or cancel them. However, whilst building up the list of pending commands, we may modify state outside of the request and make references to the pending request. If we do so and then cancel that request, external objects then point to the deleted request leading to both graphical and memory corruption. The easiest example is to consider object/VMA tracking. When we mark an object as active in a request, we store a pointer to this, the most recent request, in the object. Then we want to free that object, we wait for the most recent request to be idle before proceeding (otherwise the hardware will write to pages now owned by the system, or we will attempt to read from those pages before the hardware is finished writing). If the request was cancelled instead, that wait completes immediately. As a result, all requests must be committed and not cancelled if the external state is unknown. All that remains of i915_gem_request_cancel() users are just a couple of extremely unlikely allocation failures, so remove the API entirely. A consequence of committing all incomplete requests is that we generate excess breadcrumbs and fill the ring much more often with dummy work. We have completely undone the outstanding_last_seqno optimisation. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93907Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: stable@vger.kernel.org Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-16-git-send-email-chris@chris-wilson.co.uk
-
Chris Wilson authored
After mi_set_context() succeeds, we need to update the state of the engine's last_context. This ensures that we hold a pin on the context whilst the hardware may write to it. However, since we didn't complete the post-switch setup of the context, we need to force the subsequent use of the same context to complete the setup (which means updating should_skip_switch()). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-15-git-send-email-chris@chris-wilson.co.uk
-
Chris Wilson authored
Having the !RCS legacy context switch threaded through the RCS switching code makes it much harder to follow and understand. In the next patch, I want to fix a bug handling the incomplete switch, this is made much simpler if we segregate the two paths now. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Daniel Vetter <daniel@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-14-git-send-email-chris@chris-wilson.co.uk
-
Chris Wilson authored
As paranoia, we want to ensure that the CPU's PTEs have been revoked for the object before we return from i915_gem_release_mmap(). This allows us to rely on there being no outstanding memory accesses from userspace and guarantees serialisation of the code against concurrent access just by calling i915_gem_release_mmap(). v2: Reduce the mb() into a wmb() following the revoke. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: "Goel, Akash" <akash.goel@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-13-git-send-email-chris@chris-wilson.co.uk
-
Chris Wilson authored
For reasons unknown Sandybridge GT1 (at least) will eventually hang when it encounters a ring wraparound at offset 0. The test case that reproduces the bug reliably forces a large number of interrupted context switches, thereby causing very frequent ring wraparounds, but there are similar bug reports in the wild with the same symptoms, seqno writes stop just before the wrap and the ringbuffer at address 0. It is also timing crucial, but adding various delays hasn't helped pinpoint where the window lies. Whether the fault is restricted to the ringbuffer itself or the GTT addressing is unclear, but moving the ringbuffer fixes all the hangs I have been able to reproduce. References: (e.g.) https://bugs.freedesktop.org/show_bug.cgi?id=93262 Testcase: igt/gem_exec_whisper/render-contexts-interruptible #snb-gt1 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: stable@vger.kernel.org Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-12-git-send-email-chris@chris-wilson.co.uk
-
Chris Wilson authored
Two concurrent writes into the same register cacheline has the chance of killing the machine on Ivybridge and other gen7. This includes LRI emitted from the command parser. The MI_SET_CONTEXT itself serves as serialising barrier and prevents the pair of register writes in the first packet from triggering the fault. However, if a second switch-context immediately occurs then we may have two adjacent blocks of LRI to the same registers which may then trigger the hang. To counteract this we need to insert a delay after the second register write using SRM. This is easiest to reproduce with something like igt/gem_ctx_switch/interruptible that triggers back-to-back context switches (with no operations in between them in the command stream, which requires the execbuf operation to be interrupted after the MI_SET_CONTEXT) but can be observed sporadically elsewhere when running interruptible igt. No reports from the wild though, so it must be of low enough frequency that no one has correlated the random machine freezes with i915.ko The issue was introduced with commit 2c550183 [v3.19] Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Dec 16 10:02:27 2014 +0000 drm/i915: Disable PSMI sleep messages on all rings around context switches Testcase: igt/gem_ctx_switch/render-interruptible #ivb Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: stable@vger.kernel.org Reviewed-by: Daniel Vetter <daniel@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-11-git-send-email-chris@chris-wilson.co.uk
-
Chris Wilson authored
If we do not have lowlevel support for reseting the GPU, or if the user has explicitly disabled reseting the device, the failure is expected. Since it is an expected failure, we should be using a lower priority message than *ERROR*, perhaps NOTICE. In the absence of DRM_NOTICE, just emit the expected failure as a DEBUG message. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-10-git-send-email-chris@chris-wilson.co.uk
-
Chris Wilson authored
Reporting -EIO from i915_wait_request() has proven very troublematic over the years, with numerous hard-to-reproduce bugs cropping up in the corner case of where a reset occurs and the code wasn't expecting such an error. If the we reset the GPU or have detected a hang and wish to reset the GPU, the request is forcibly complete and the wait broken. Currently, we report either -EAGAIN or -EIO in order for the caller to retreat and restart the wait (if appropriate) after dropping and then reacquiring the struct_mutex (essential to allow the GPU reset to proceed). However, if we take the view that the request is complete (no further work will be done on it by the GPU because it is dead and soon to be reset), then we can proceed with the task at hand and then drop the struct_mutex allowing the reset to occur. This transfers the burden of checking whether it is safe to proceed to the caller, which in all but one instance it is safe - completely eliminating the source of all spurious -EIO. Of note, we only have two API entry points where we expect that userspace can observe an EIO. First is when submitting an execbuf, if the GPU is terminally wedged, then the operation cannot succeed and an -EIO is reported. Secondly, existing userspace uses the throttle ioctl to detect an already wedged GPU before starting using HW acceleration (or to confirm that the GPU is wedged after an error condition). So if the GPU is wedged when the user calls throttle, also report -EIO. v2: Split more carefully the change to i915_wait_request() and assorted ABI from the reset handling. v3: Add a couple of WARN_ON(EIO) to the interruptible modesetting code so that we don't start to leak EIO there in future (and break our hang resistant modesetting). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-9-git-send-email-chris@chris-wilson.co.uk Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-1-git-send-email-chris@chris-wilson.co.uk
-
Chris Wilson authored
Now that the reset_counter is stored on the request, we can rearrange the code to handle reading the counter versus waiting during the atomic modesetting for readibility (by deleting the hairiest of codes). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-8-git-send-email-chris@chris-wilson.co.uk
-
Chris Wilson authored
As the request is only valid during the same global reset epoch, we can record the current reset_counter when constructing the request and reuse it when waiting upon that request in future. This removes a very hairy atomic check serialised by the struct_mutex at the time of waiting and allows us to transfer those waits to a central dispatcher for all waiters and all requests. PS: With per-engine resets, we obviously cannot assume a global reset epoch for the requests - a per-engine epoch makes the most sense. The challenge then is how to handle checking in the waiter for when to break the wait, as the fine-grained reset may also want to requeue the request (i.e. the assumption that just because the epoch changes the request is completed may be broken - or we just avoid breaking that assumption with the fine-grained resets). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-7-git-send-email-chris@chris-wilson.co.uk
-
Chris Wilson authored
In the reset_counter, we use two bits to track a GPU hang and reset. The low bit is a "reset-in-progress" flag that we set to signal when we need to break waiters in order for the recovery task to grab the mutex. As soon as the recovery task has the mutex, we can clear that flag (which we do by incrementing the reset_counter thereby incrementing the gobal reset epoch). By clearing that flag when the recovery task holds the struct_mutex, we can forgo a second flag that simply tells GEM to ignore the "reset-in-progress" flag. The second flag we store in the reset_counter is whether the reset failed and we consider the GPU terminally wedged. Whilst this flag is set, all access to the GPU (at least through GEM rather than direct mmio access) is verboten. PS: Fun is in store, as in the future we want to move from a global reset epoch to a per-engine reset engine with request recovery. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-6-git-send-email-chris@chris-wilson.co.uk
-
Chris Wilson authored
If we, when we store the reset_counter for the operation, we ensure that it is not in a wedged or in the middle of a reset, we can then assert that if any reset occurs the reset_counter must change. Later we can just compare the operation's reset epoch against the current counter to see if we need to abort the operation (to handle the hang). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-5-git-send-email-chris@chris-wilson.co.uk
-
Chris Wilson authored
This is principally a little bit of syntatic sugar to hide the atomic_read()s throughout the code to retrieve the current reset_counter. It also provides the other utility functions to check the reset state on the already read reset_counter, so that (in later patches) we can read it once and do multiple tests rather than risk the value changing between tests. v2: Be more strict on converting existing i915_reset_in_progress() over to the more verbose i915_reset_in_progress_or_wedged(). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-4-git-send-email-chris@chris-wilson.co.uk
-