Commits · d7d7c9ee699a0b85de0023433cdbd8f965e1ac08 · nexedi / linux

15 Apr, 2016 12 commits

drm/i915/bxt: Don't toggle power well 1 on-demand · d7d7c9ee

Imre Deak authored Apr 01, 2016

Power well 1 is managed by the DMC firmware so don't toggle it on-demand
from the driver. This means we need to follow the BSpec display
initialization sequence during driver loading and resuming (both system
and runtime) and enable power well 1 only once there. Afterwards DMC
will toggle power well 1 whenever entering/exiting DC5.

For this to work we also need to do away getting the PLL power domain,
since that just kept runtime PM disabled for good.
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1459515767-29228-12-git-send-email-imre.deak@intel.com

d7d7c9ee

drm/i915/bxt: Power down DDI PHYs separately during the per PHY uninit · d7d33fd8

Imre Deak authored Apr 01, 2016

The power-down step logically belongs to the individual PHY uninit
sequence so move it there. The only functional change is that we will
power down now PHY 1 separately before PHY 0 and preserve the other bits
in the register which are defined as reserved.
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1459515767-29228-11-git-send-email-imre.deak@intel.com

d7d33fd8

drm/i915/bxt: Pass drm_i915_private to DDI PHY, CDCLK helpers · c6c4696f

Imre Deak authored Apr 01, 2016

For internal APIs passing dev_priv is preferred to reduce indirections,
so convert over a few DDI PHY, CDCLK helpers.

No functional change.
Signed-off-by: Imre Deak <imre.deak@intel.com>
Acked-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: David Weinehall <david.weinehall@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1459515767-29228-10-git-send-email-imre.deak@intel.com

c6c4696f

drm/i915/skl: Unexport skl_pw1_misc_io_init · 443a93ac

Imre Deak authored Apr 04, 2016

On Broxton we need to enable/disable power well 1 during the init/unit
display sequence similarly to Skylake/Kabylake. The code for this will
be added in a follow-up patch, but to prepare for that unexport
skl_pw1_misc_io_init(). It's a simple function called only from a single
place and having it inlined in the Skylake display core init/unit
functions will make it easier to compare it with its Broxton
counterpart.

This also flips the order of Misc IO and power well 1 disabling which
matches the enabling order. The specification doesn't prescribe the
disabling order, so this should be fine.

v2:
- Fix incorrect enable vs. disable power well call in
  skl_display_core_uninit() (Patrik)
- Add commit comment about chaning the order of PW1 and Misc IO power
  well disabling (Patrik)

CC: Patrik Jakobsson <patrik.jakobsson@linux.intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Patrik Jakobsson <patrik.jakobsson@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1459773777-10701-1-git-send-email-imre.deak@intel.com

443a93ac

drm/i915/bxt: Suspend power domains during suspend-to-idle · a7c8125f

Imre Deak authored Apr 01, 2016

On SKL/KBL suspend-to-idle (aka freeze/s0ix) is performed with DMC
firmware assistance where the target display power state is DC6. On
Broxton on the other hand we don't use the firmware for this, but rely
instead on a manual DC9 flow. For this we have to uninitialize the
display following the BSpec display uninit sequence, just as during
S3/S4, so make sure we follow this sequence.

CC: Patrik Jakobsson <patrik.jakobsson@linux.intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Patrik Jakobsson <patrik.jakobsson@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1459515767-29228-8-git-send-email-imre.deak@intel.com

a7c8125f

drm/i915/gen9: Fix DMC/DC state asserts · bfcdabe8

Imre Deak authored Apr 01, 2016

The display power well support and DC state management doesn't depend on
runtime PM support, so remove the incorrect asserts about this.

Also Broxton does support DC5, so the related assert in
assert_can_enable_dc5() is incorrect. There is a more generic and
correct assert for this already in gen9_set_dc_state(), so we can remove
all the other ones.

At the same time convert WARNs to WARN_ONCE for consistency with the
other DC state asserts.

CC: Patrik Jakobsson <patrik.jakobsson@linux.intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Patrik Jakobsson <patrik.jakobsson@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1459515767-29228-7-git-send-email-imre.deak@intel.com

bfcdabe8

drm/i915/gen9: Make power well disabling synchronous · 1d963afa

Imre Deak authored Apr 01, 2016

So far we only power well enabling was synchronous not disabling. Since
we don't exactly know how the firmware (both DMC and PCU) synchronizes
against the actual power well state during DC transitions, make the
disabling also synchronous.

CC: Mika Kuoppala <mika.kuoppala@linux.intel.com>
CC: Patrik Jakobsson <patrik.jakobsson@linux.intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Patrik Jakobsson <patrik.jakobsson@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1459515767-29228-6-git-send-email-imre.deak@intel.com

1d963afa

drm/i915/gen9: Reset secondary power well requests left on by DMC/KVMR · c6782b76

Imre Deak authored Apr 05, 2016

DMC forces on power well 1 and the misc IO power well by setting the
corresponding request bits both in the BIOS and the DEBUG power well
request registers. This is somewhat unexpected since the firmware should
really just save and restore state but not alter it. We also depend on
being able to disable power well 1, and the misc IO power well before
entering S3/S4 on BXT and SKL or entering DC9 on BXT. To fix this make
sure these request bits are cleared whenever we want to disable the
given power wells.

On SKL there is another twist where the firmware also clears the power
well 1 request bit in HSW_POWER_WELL_DRIVER (but not that of the misc IO
power well). This happens to not cause a problem due to the forced-on
request bits in the other request registers.

I've filed a bug about all this, but fixing that may take a while and
having this sanity check in place makes sense even for future firmware
versions.

At the same time also check the KVMR request bits. I haven't seen this
being altered, but we don't expect any request bits in here either, so
sanitize this register as well.

v2:
- Apply the workaround on SKL as well. I noticed the related failure
from the CI report, later Patrik also reported seeing it on his
machine.

c6782b76

drm/i915/bxt: Add a note about BXT_PORT_CL1CM_DW30 being read-only · 28ca6931

Imre Deak authored Apr 01, 2016

This register is read-only, so we have never actually set
OCL2_LDOFUSE_PWR_DIS in it as specified by the specification. Add a code
comment about this. I filed a specification update request to clarify
this there.

CC: Arthur J Runyan <arthur.j.runyan@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: David Weinehall <david.weinehall@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1459515767-29228-4-git-send-email-imre.deak@intel.com

28ca6931

drm/i915/bxt: Fix GRC code register field definitions · d1e082ff

Imre Deak authored Apr 01, 2016

This has been corrected in BSpec quite some time ago, but we missed it
somehow. The wrong field definitions resulted in configuring PHY0 with
an incorrect GRC value.

v2:
- Remove the FIXME comment, we left in the code exactly about this
  issue. (Ville)

CC: Arthur J Runyan <arthur.j.runyan@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1459515767-29228-3-git-send-email-imre.deak@intel.com

d1e082ff

drm/i915/bxt: Reject DMC firmware versions with known bugs · e7968531

Imre Deak authored Apr 01, 2016

DMC version 1.06 has a known bug, where the firmware polls forever for a
port PLL to lock, if the PLL was disabled when entering DC5, which locks
up the machine. Version 1.07 fixes this, so make that the minimum
required version on BXT.

CC: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1459515767-29228-2-git-send-email-imre.deak@intel.com

e7968531

drm/i915: use drm_crtc_send_vblank_event() · 560ce1dc

Gustavo Padovan authored Apr 14, 2016

Replace the legacy drm_send_vblank_event() with the new helper function.
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1460656118-16766-4-git-send-email-gustavo@padovan.org

560ce1dc

14 Apr, 2016 28 commits

drm/i915: Use fw_domains_put_with_fifo() on HSW · 3d7d0c85

Ville Syrjälä authored Apr 14, 2016

HSW still has the wake FIFO, so let's check it.

Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Deepak S <deepak.s@linux.intel.com>
Fixes: 05a2fb15 ("drm/i915: Consolidate forcewake code")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460633942-24013-1-git-send-email-ville.syrjala@linux.intel.com
Cc: stable@vger.kernel.org
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>

3d7d0c85

drm/i915: Split gen8_gt_irq_handler into ack+handle · e30e251a

Ville Syrjälä authored Apr 13, 2016

As we did on VLV, split the gt irq handling to ack and handler phases on
CHV. Leave the BDW+ codepath mostly intact for now.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460571598-24452-13-git-send-email-ville.syrjala@linux.intel.comReviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

e30e251a

drm/i915: Eliminate passing dev+dev_priv to {snb,ilk}_gt_irq_handler() · 261e40b8

Ville Syrjälä authored Apr 13, 2016

It looks silly to pass both dev and dev_priv to the snb/ilk gt irq
handlers. Just pass dev_priv.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460571598-24452-12-git-send-email-ville.syrjala@linux.intel.comReviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

261e40b8

drm/i915: Move gt/pm irq handling out from irq disabled section on VLV · 52894874

Ville Syrjälä authored Apr 13, 2016

No need to actually handle the GT/PM interrupt while we have interrupt
sources disabled. Move the actual processing to happen after we've
restored VLV_IER and master interrupt enable.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460571598-24452-11-git-send-email-ville.syrjala@linux.intel.comReviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

52894874

drm/i915: Split VLV/CVH PIPESTAT handling into ack+handler · 2ecb8ca4

Ville Syrjälä authored Apr 13, 2016

Minimize the amount of stuff we do with interrupt sources disabled by
splitting the PIPESTAT irq handling into ack+handler phases.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460571598-24452-10-git-send-email-ville.syrjala@linux.intel.comReviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

2ecb8ca4

drm/i915: Split PORT_HOTPLUG_STAT ack out from i9xx_hpd_irq_handler() · 1ae3c34c

Ville Syrjälä authored Apr 13, 2016

Split the VLV/CHV hoplug irq handling to ack and handler phases. This
way we can move the actual irq handling outside the section where
we have disabled the interrupt sources.

For now, we leave things as is for pre-VLV GMCH platforms, but
eventually they could get the same treatment.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460571598-24452-9-git-send-email-ville.syrjala@linux.intel.comReviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

1ae3c34c

drm/i915: Move variables to narrower scope in VLV/CHV irq handlers · 6e814800

Ville Syrjälä authored Apr 13, 2016

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460571598-24452-8-git-send-email-ville.syrjala@linux.intel.comReviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

6e814800

drm/i915: Eliminate loop from VLV irq handler · 1e1cace9

Ville Syrjälä authored Apr 13, 2016

Now that we've dealt with the races in clearing IIR bits via VLV_IER
and the master interrupt enable, we can go ahead aliminate the loop
from the VLV interrupt handler.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460571598-24452-7-git-send-email-ville.syrjala@linux.intel.comReviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

1e1cace9

drm/i915: Clear VLV_IER around irq processing · a5e485a9

Ville Syrjälä authored Apr 13, 2016

On VLV/CHV the master interrupt enable bit only affects GT/PM
interrupts. Display interrupts are not affected by the master
irq control.

Also it seems that the CPU interrupt will only be generated when
the combined result of all GT/PM/display interrupts has a 0->1
edge. We already use the master interrupt enable bit to make sure
GT/PM interrupt can generate such an edge if we don't end up clearing
all IIR bits. We must do the same for display interrupts, and for
that we can simply clear out VLV_IER, and restore after we've acked
all the interrupts we are about to process.

So with both master interrupt enable and VLV_IER cleared out, we will
guarantee that there will be a 0->1 edge if any IIR bits are still set
at the end, and thus another CPU interrupt will be generated.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: 579de73b ("drm/i915: Exit cherryview_irq_handler() after one pass")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460571598-24452-6-git-send-email-ville.syrjala@linux.intel.comReviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

a5e485a9

drm/i915: Clear VLV_MASTER_IER around irq processing · 4a0a0202

Ville Syrjälä authored Apr 13, 2016

Like on CHV, let's clear out the master irq enable bit when we ack
GT/PM interrupts. This will allow GT/PM interrupts to re-raise the
CPU interrupt if we fail to clear all the bits from the IIR(s).
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460571598-24452-5-git-send-email-ville.syrjala@linux.intel.comReviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

4a0a0202

drm/i915: Clear VLV_IIR after PIPESTAT · 7ce4d1f2

Ville Syrjälä authored Apr 13, 2016

On VLV/CHV VLV_IIR is not double double buffered, and it doesn't detect
edges from PIPESTAT & co. like it does on gen4. Instead it just
directly latches the level from PIPESTAT & co. That means we must clear
VLV_IIR after PIPESTAT & co. or else we'll get a spurious bit in VLV_IIR
every single time.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460571598-24452-4-git-send-email-ville.syrjala@linux.intel.comReviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

7ce4d1f2

drm/i915: Set up VLV_MASTER_IER consistently · 34c7b8a7

Ville Syrjälä authored Apr 13, 2016

We're lacking VLV_MASTER_IER setup from valleyview_irq_preinstall(), so
add it there. Also cargo cult in some POSTING_READ()s to match the other
platforms.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460571598-24452-3-git-send-email-ville.syrjala@linux.intel.comReviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

34c7b8a7

drm/i915: Use GEN8_MASTER_IRQ_CONTROL consistently · e5328c43

Ville Syrjälä authored Apr 13, 2016

Use GEN8_MASTER_IRQ_CONTROL instead of DE_MASTER_IRQ_CONTROL or
MASTER_INTERRUPT_ENABLE with the GEN8_MASTER_IRQ register. They're
all bit 31 so there's no actual bug here, but let's be consistent
which name we use for the bit.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460571598-24452-2-git-send-email-ville.syrjala@linux.intel.comReviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

e5328c43

drm/i915: Ignore GTFIFODBG FIFO free entry fields on CHV · 297b32ec

Ville Syrjälä authored Apr 13, 2016

On CHV GTFIFODBG has some read-only bits to indicate the number
of free FIFO entries. Ignore these when checking to see if any
of the sticky error bits are set.

This gets rid of these during device resume:
[drm:cherryview_enable_rps] GT fifo had a previous error 1080000

While at it, move the assignments out of the if().

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460570970-14073-1-git-send-email-ville.syrjala@linux.intel.comReviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

297b32ec

drm/i915/mocs: Program MOCS for all engines on init · 0ccdacf6

Peter Antoine authored Apr 13, 2016

Allow for the MOCS to be programmed for all engines.
Currently we program the MOCS when the first render batch
goes through. This works on most platforms but fails on
platforms that do not run a render batch early,
i.e. headless servers. The patch now programs all initialised engines
on init and the RCS is programmed again within the initial batch. This
is done for predictable consistency with regards to the hardware
context.

Hardware context loading sets the values of the MOCS for RCS
and L3CC. Programming them from within the batch makes sure that
the render context is valid, no matter what the previous state of
the saved-context was.

v2: posted correct version to the mailing list.
v3: moved programming to within engine->init_hw() (Chris Wilson)
v4: code formatting and white-space changes. (Chris Wilson)

Testcase: igt/gem_mocs_settings
Signed-off-by: Peter Antoine <peter.antoine@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1460556205-6644-1-git-send-email-chris@chris-wilson.co.uk

0ccdacf6

drm/i915: Late request cancellations are harmful · aa9b7810

Chris Wilson authored Apr 13, 2016

Conceptually, each request is a record of a hardware transaction - we
build up a list of pending commands and then either commit them to
hardware, or cancel them. However, whilst building up the list of
pending commands, we may modify state outside of the request and make
references to the pending request. If we do so and then cancel that
request, external objects then point to the deleted request leading to
both graphical and memory corruption.

The easiest example is to consider object/VMA tracking. When we mark an
object as active in a request, we store a pointer to this, the most
recent request, in the object. Then we want to free that object, we wait
for the most recent request to be idle before proceeding (otherwise the
hardware will write to pages now owned by the system, or we will attempt
to read from those pages before the hardware is finished writing). If
the request was cancelled instead, that wait completes immediately. As a
result, all requests must be committed and not cancelled if the external
state is unknown.

All that remains of i915_gem_request_cancel() users are just a couple of
extremely unlikely allocation failures, so remove the API entirely.

A consequence of committing all incomplete requests is that we generate
excess breadcrumbs and fill the ring much more often with dummy work. We
have completely undone the outstanding_last_seqno optimisation.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93907Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-16-git-send-email-chris@chris-wilson.co.uk

aa9b7810

drm/i915: Reorganise legacy context switch to cope with late failure · fcb5106d

Chris Wilson authored Apr 13, 2016

After mi_set_context() succeeds, we need to update the state of the
engine's last_context. This ensures that we hold a pin on the context
whilst the hardware may write to it. However, since we didn't complete
the post-switch setup of the context, we need to force the subsequent
use of the same context to complete the setup (which means updating
should_skip_switch()).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-15-git-send-email-chris@chris-wilson.co.uk

fcb5106d

drm/i915: Split out !RCS legacy context switching · e1a8daa2

Chris Wilson authored Apr 13, 2016

Having the !RCS legacy context switch threaded through the RCS switching
code makes it much harder to follow and understand. In the next patch, I
want to fix a bug handling the incomplete switch, this is made much
simpler if we segregate the two paths now.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Daniel Vetter <daniel@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-14-git-send-email-chris@chris-wilson.co.uk

e1a8daa2

drm/i915: Move the mb() following release-mmap into release-mmap · 349f2ccf

Chris Wilson authored Apr 13, 2016

As paranoia, we want to ensure that the CPU's PTEs have been revoked for
the object before we return from i915_gem_release_mmap(). This allows us
to rely on there being no outstanding memory accesses from userspace
and guarantees serialisation of the code against concurrent access just
by calling i915_gem_release_mmap().

v2: Reduce the mb() into a wmb() following the revoke.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: "Goel, Akash" <akash.goel@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-13-git-send-email-chris@chris-wilson.co.uk

349f2ccf

drm/i915: Force ringbuffers to not be at offset 0 · a687a43a

Chris Wilson authored Apr 13, 2016

For reasons unknown Sandybridge GT1 (at least) will eventually hang when
it encounters a ring wraparound at offset 0. The test case that
reproduces the bug reliably forces a large number of interrupted context
switches, thereby causing very frequent ring wraparounds, but there are
similar bug reports in the wild with the same symptoms, seqno writes
stop just before the wrap and the ringbuffer at address 0. It is also
timing crucial, but adding various delays hasn't helped pinpoint where
the window lies.

Whether the fault is restricted to the ringbuffer itself or the GTT
addressing is unclear, but moving the ringbuffer fixes all the hangs I
have been able to reproduce.

References: (e.g.) https://bugs.freedesktop.org/show_bug.cgi?id=93262
Testcase: igt/gem_exec_whisper/render-contexts-interruptible #snb-gt1
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-12-git-send-email-chris@chris-wilson.co.uk

a687a43a

drm/i915: Prevent machine death on Ivybridge context switching · e9135c4f

Chris Wilson authored Apr 13, 2016

Two concurrent writes into the same register cacheline has the chance of
killing the machine on Ivybridge and other gen7. This includes LRI
emitted from the command parser.  The MI_SET_CONTEXT itself serves as
serialising barrier and prevents the pair of register writes in the first
packet from triggering the fault.  However, if a second switch-context
immediately occurs then we may have two adjacent blocks of LRI to the
same registers which may then trigger the hang. To counteract this we
need to insert a delay after the second register write using SRM.

This is easiest to reproduce with something like
igt/gem_ctx_switch/interruptible that triggers back-to-back context
switches (with no operations in between them in the command stream,
which requires the execbuf operation to be interrupted after the
MI_SET_CONTEXT) but can be observed sporadically elsewhere when running
interruptible igt. No reports from the wild though, so it must be of low
enough frequency that no one has correlated the random machine freezes
with i915.ko

The issue was introduced with
commit 2c550183 [v3.19]
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Dec 16 10:02:27 2014 +0000

    drm/i915: Disable PSMI sleep messages on all rings around context switches

Testcase: igt/gem_ctx_switch/render-interruptible #ivb
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Daniel Vetter <daniel@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-11-git-send-email-chris@chris-wilson.co.uk

e9135c4f

drm/i915: Suppress error message when GPU resets are disabled · 804e59a8

Chris Wilson authored Apr 13, 2016

If we do not have lowlevel support for reseting the GPU, or if the user
has explicitly disabled reseting the device, the failure is expected.
Since it is an expected failure, we should be using a lower priority
message than *ERROR*, perhaps NOTICE. In the absence of DRM_NOTICE, just
emit the expected failure as a DEBUG message.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-10-git-send-email-chris@chris-wilson.co.uk

804e59a8

drm/i915: Prevent leaking of -EIO from i915_wait_request() · f4457ae7

Chris Wilson authored Apr 13, 2016

Reporting -EIO from i915_wait_request() has proven very troublematic
over the years, with numerous hard-to-reproduce bugs cropping up in the
corner case of where a reset occurs and the code wasn't expecting such
an error.

If the we reset the GPU or have detected a hang and wish to reset the
GPU, the request is forcibly complete and the wait broken. Currently, we
report either -EAGAIN or -EIO in order for the caller to retreat and
restart the wait (if appropriate) after dropping and then reacquiring
the struct_mutex (essential to allow the GPU reset to proceed). However,
if we take the view that the request is complete (no further work will
be done on it by the GPU because it is dead and soon to be reset), then
we can proceed with the task at hand and then drop the struct_mutex
allowing the reset to occur. This transfers the burden of checking
whether it is safe to proceed to the caller, which in all but one
instance it is safe - completely eliminating the source of all spurious
-EIO.

Of note, we only have two API entry points where we expect that
userspace can observe an EIO. First is when submitting an execbuf, if
the GPU is terminally wedged, then the operation cannot succeed and an
-EIO is reported. Secondly, existing userspace uses the throttle ioctl
to detect an already wedged GPU before starting using HW acceleration
(or to confirm that the GPU is wedged after an error condition). So if
the GPU is wedged when the user calls throttle, also report -EIO.

v2: Split more carefully the change to i915_wait_request() and assorted
ABI from the reset handling.
v3: Add a couple of WARN_ON(EIO) to the interruptible modesetting code
so that we don't start to leak EIO there in future (and break our hang
resistant modesetting).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-9-git-send-email-chris@chris-wilson.co.uk
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-1-git-send-email-chris@chris-wilson.co.uk

f4457ae7

drm/i915: Simplify reset_counter handling during atomic modesetting · f7e5838b

Chris Wilson authored Apr 13, 2016

Now that the reset_counter is stored on the request, we can rearrange
the code to handle reading the counter versus waiting during the atomic
modesetting for readibility (by deleting the hairiest of codes).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-8-git-send-email-chris@chris-wilson.co.uk

f7e5838b

drm/i915: Store the reset counter when constructing a request · 299259a3

Chris Wilson authored Apr 13, 2016

As the request is only valid during the same global reset epoch, we can
record the current reset_counter when constructing the request and reuse
it when waiting upon that request in future. This removes a very hairy
atomic check serialised by the struct_mutex at the time of waiting and
allows us to transfer those waits to a central dispatcher for all
waiters and all requests.

PS: With per-engine resets, we obviously cannot assume a global reset
epoch for the requests - a per-engine epoch makes the most sense. The
challenge then is how to handle checking in the waiter for when to break
the wait, as the fine-grained reset may also want to requeue the
request (i.e. the assumption that just because the epoch changes the
request is completed may be broken - or we just avoid breaking that
assumption with the fine-grained resets).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-7-git-send-email-chris@chris-wilson.co.uk

299259a3

drm/i915: Tighten reset_counter for reset status · d98c52cf

Chris Wilson authored Apr 13, 2016

In the reset_counter, we use two bits to track a GPU hang and reset. The
low bit is a "reset-in-progress" flag that we set to signal when we need
to break waiters in order for the recovery task to grab the mutex. As
soon as the recovery task has the mutex, we can clear that flag (which
we do by incrementing the reset_counter thereby incrementing the gobal
reset epoch). By clearing that flag when the recovery task holds the
struct_mutex, we can forgo a second flag that simply tells GEM to ignore
the "reset-in-progress" flag.

The second flag we store in the reset_counter is whether the
reset failed and we consider the GPU terminally wedged. Whilst this flag
is set, all access to the GPU (at least through GEM rather than direct mmio
access) is verboten.

PS: Fun is in store, as in the future we want to move from a global
reset epoch to a per-engine reset engine with request recovery.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-6-git-send-email-chris@chris-wilson.co.uk

d98c52cf

drm/i915: Simplify checking of GPU reset_counter in display pageflips · 7f1847eb

Chris Wilson authored Apr 13, 2016

If we, when we store the reset_counter for the operation, we ensure that
it is not in a wedged or in the middle of a reset, we can then assert that
if any reset occurs the reset_counter must change. Later we can just
compare the operation's reset epoch against the current counter to see
if we need to abort the operation (to handle the hang).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-5-git-send-email-chris@chris-wilson.co.uk

7f1847eb

drm/i915: Hide the atomic_read(reset_counter) behind a helper · c19ae989

Chris Wilson authored Apr 13, 2016

This is principally a little bit of syntatic sugar to hide the
atomic_read()s throughout the code to retrieve the current reset_counter.
It also provides the other utility functions to check the reset state on the
already read reset_counter, so that (in later patches) we can read it once
and do multiple tests rather than risk the value changing between tests.

v2: Be more strict on converting existing i915_reset_in_progress() over to
the more verbose i915_reset_in_progress_or_wedged().
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-4-git-send-email-chris@chris-wilson.co.uk

c19ae989