Commits · ad18ba7b5eebf58209a898de8519f6bb2280620b · nexedi / linux

12 Feb, 2020 9 commits

drm/i915/execlists: Offline error capture · ad18ba7b

Chris Wilson authored Jan 16, 2020

Currently, we skip error capture upon forced preemption. We apply forced
preemption when there is a higher priority request that should be
running but is being blocked, and we skip inline error capture so that
the preemption request is not further delayed by a user controlled
capture -- extending the denial of service.

However, preemption reset is also used for heartbeats and regular GPU
hangs. By skipping the error capture, we remove the ability to debug GPU
hangs.

In order to capture the error without delaying the preemption request
further, we can do an out-of-line capture by removing the guilty request
from the execution queue and scheduling a worker to dump that request.
When removing a request, we need to remove the entire context and all
descendants from the execution queue, so that they do not jump past.

Closes: https://gitlab.freedesktop.org/drm/intel/issues/738
Fixes: 3a7a92ab ("drm/i915/execlists: Force preemption")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200116184754.2860848-3-chris@chris-wilson.co.uk
(cherry picked from commit 74831738)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>

ad18ba7b

drm/i915/gt: Allow temporary suspension of inflight requests · c3f1ed90

Chris Wilson authored Jan 16, 2020

In order to support out-of-line error capture, we need to remove the
active request from HW and put it to one side while a worker compresses
and stores all the details associated with that request. (As that
compression may take an arbitrary user-controlled amount of time, we
want to let the engine continue running on other workloads while the
hanging request is dumped.) Not only do we need to remove the active
request, but we also have to remove its context and all requests that
were dependent on it (both in flight, queued and future submission).

Finally once the capture is complete, we need to be able to resubmit the
request and its dependents and allow them to execute.

v2: Replace stack recursion with a simple list.
v3: Check all the parents, not just the first, when searching for a
stuck ancestor!

References: https://gitlab.freedesktop.org/drm/intel/issues/738Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200116184754.2860848-2-chris@chris-wilson.co.uk
(cherry picked from commit 32ff621f)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>

c3f1ed90

drm/i915: Keep track of request among the scheduling lists · 9e2750fc

Chris Wilson authored Jan 16, 2020

If we keep track of when the i915_request.sched.link is on the HW
runlist, or in the priority queue we can simplify our interactions with
the request (such as during rescheduling). This also simplifies the next
patch where we introduce a new in-between list, for requests that are
ready but neither on the run list or in the queue.

v2: Update i915_sched_node.link explanation for current usage where it
is a link on both the queue and on the runlists.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200116184754.2860848-1-chris@chris-wilson.co.uk
(cherry picked from commit 672c368f)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>

9e2750fc

Merge tag 'gvt-fixes-2020-02-12' of https://github.com/intel/gvt-linux into drm-intel-next-fixes · cc3251d8

Jani Nikula authored Feb 12, 2020

gvt-fixes-2020-02-12

- fix possible high-order allocation fail for late load (Igor)
- fix one missed lock for ppgtt mm LRU list (Igor)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
From: Zhenyu Wang <zhenyuw@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200212065912.GB4997@zhen-hp.sh.intel.com

cc3251d8

drm/i915/gem: Tighten checks and acquiring the mmap object · 2933803b

Chris Wilson authored Jan 30, 2020

Make sure we hold the rcu lock as we acquire the rcu protected reference
of the object when looking it up from the associated mmap vma.

Closes: https://gitlab.freedesktop.org/drm/intel/issues/1083
Fixes: cc662126 ("drm/i915: Introduce DRM_I915_GEM_MMAP_OFFSET")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200130143931.1906301-1-chris@chris-wilson.co.uk
(cherry picked from commit 280d14a6)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>

2933803b

drm/i915: Fix preallocated barrier list append · 52144db1

José Roberto de Souza authored Jan 29, 2020

Only the first and the last nodes were being added to
ref->preallocated_barriers.

Renaming variables to make it more easy to read.

Fixes: 84135022 ("drm/i915/gt: Drop mutex serialisation between context pin/unpin")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200129232345.84512-1-jose.souza@intel.com
(cherry picked from commit d4c3c0b8)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>

52144db1

drm/i915/gt: Acquire ce->active before ce->pin_count/ce->pin_mutex · 5b92415e

Chris Wilson authored Jan 27, 2020

Similar to commit ac0e331a ("drm/i915: Tighten atomicity of
i915_active_acquire vs i915_active_release") we have the same race of
trying to pin the context underneath a mutex while allowing the
decrement to be atomic outside of that mutex. This leads to the problem
where two threads may simultaneously try to pin the context and the
second not notice that they needed to repin the context.

<2> [198.669621] kernel BUG at drivers/gpu/drm/i915/gt/intel_timeline.c:387!
<4> [198.669703] invalid opcode: 0000 [#1] PREEMPT SMP PTI
<4> [198.669712] CPU: 0 PID: 1246 Comm: gem_exec_create Tainted: G     U  W         5.5.0-rc6-CI-CI_DRM_7755+ #1
<4> [198.669723] Hardware name:  /NUC7i5BNB, BIOS BNKBL357.86A.0054.2017.1025.1822 10/25/2017
<4> [198.669776] RIP: 0010:timeline_advance+0x7b/0xe0 [i915]
<4> [198.669785] Code: 00 48 c7 c2 10 f1 46 a0 48 c7 c7 70 1b 32 a0 e8 bb dd e7 e0 bf 01 00 00 00 e8 d1 af e7 e0 31 f6 bf 09 00 00 00 e8 35 ef d8 e0 <0f> 0b 48 c7 c1 48 fa 49 a0 ba 84 01 00 00 48 c7 c6 10 f1 46 a0 48
<4> [198.669803] RSP: 0018:ffffc900004c3a38 EFLAGS: 00010296
<4> [198.669810] RAX: ffff888270b35140 RBX: ffff88826f32ee00 RCX: 0000000000000006
<4> [198.669818] RDX: 00000000000017c5 RSI: 0000000000000000 RDI: 0000000000000009
<4> [198.669826] RBP: ffffc900004c3a64 R08: 0000000000000000 R09: 0000000000000000
<4> [198.669834] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88826f9b5980
<4> [198.669841] R13: 0000000000000cc0 R14: ffffc900004c3dc0 R15: ffff888253610068
<4> [198.669849] FS:  00007f63e663fe40(0000) GS:ffff888276c00000(0000) knlGS:0000000000000000
<4> [198.669857] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [198.669864] CR2: 00007f171f8e39a8 CR3: 000000026b1f6005 CR4: 00000000003606f0
<4> [198.669872] Call Trace:
<4> [198.669924]  intel_timeline_get_seqno+0x12/0x40 [i915]
<4> [198.669977]  __i915_request_create+0x76/0x5a0 [i915]
<4> [198.670024]  i915_request_create+0x86/0x1c0 [i915]
<4> [198.670068]  i915_gem_do_execbuffer+0xbf2/0x2500 [i915]
<4> [198.670082]  ? __lock_acquire+0x460/0x15d0
<4> [198.670128]  i915_gem_execbuffer2_ioctl+0x11f/0x470 [i915]
<4> [198.670171]  ? i915_gem_execbuffer_ioctl+0x300/0x300 [i915]
<4> [198.670181]  drm_ioctl_kernel+0xa7/0xf0
<4> [198.670188]  drm_ioctl+0x2e1/0x390
<4> [198.670233]  ? i915_gem_execbuffer_ioctl+0x300/0x300 [i915]

Fixes: 84135022 ("drm/i915/gt: Drop mutex serialisation between context pin/unpin")
References: ac0e331a ("drm/i915: Tighten atomicity of i915_active_acquire vs i915_active_release")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200127152829.2842149-1-chris@chris-wilson.co.uk
(cherry picked from commit e5429340)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>

5b92415e

drm/i915: Tighten atomicity of i915_active_acquire vs i915_active_release · 7c34bb03

Chris Wilson authored Jan 26, 2020

As we use a mutex to serialise the first acquire (as it may be a lengthy
operation), but only an atomic decrement for the release, we have to
be careful in case a second thread races and completes both
acquire/release as the first finishes its acquire.

Thread A			Thread B
i915_active_acquire		i915_active_acquire
  atomic_read() == 0		  atomic_read() == 0
  mutex_lock()			  mutex_lock()
				  atomic_read() == 0
				    ref->active();
				  atomic_inc()
				  mutex_unlock()
  atomic_read() == 1
				i915_active_release
				  atomic_dec_and_test() -> 0
				    ref->retire()
  atomic_inc() -> 1
  mutex_unlock()

So thread A has acquired the ref->active_count but since the ref was
still active at the time, it did not initialise it. By switching the
check inside the mutex to an atomic increment only if already active, we
close the race.

Fixes: c9ad602f ("drm/i915: Split i915_active.mutex into an irq-safe spinlock for the rbtree")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200126102346.1877661-3-chris@chris-wilson.co.uk
(cherry picked from commit ac0e331a)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>

7c34bb03

drm/i915: Stub out i915_gpu_coredump_put · 9556e5c7

Chris Wilson authored Jan 24, 2020

i915_gpu_coreddump_put is currently only defined if
CONFIG_DRM_I915_CAPTURE_ERROR is enabled, provide a stub otherwise.
Reported-by: Mike Lothian <mike@fireburn.co.uk>
Fixes: 742379c0 ("drm/i915: Start chopping up the GPU error capture")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mike Lothian <mike@fireburn.co.uk>
Cc: Andi Shyti <andi.shyti@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200124192255.541355-1-chris@chris-wilson.co.uk
(cherry picked from commit 7e36505d)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>

9556e5c7

11 Feb, 2020 5 commits

drm/i915: Check activity on i915_vma after confirming pin_count==0 · e4edd4fc

Chris Wilson authored Jan 23, 2020

Only assert that the i915_vma is now idle if and only if no other pins
are present. If another user has the i915_vma pinned, they may submit
more work to the i915_vma skipping the vm->mutex used to serialise the
unbind. We need to wait again, if we want to continue and unbind this
vma.

However, if we own the i915_vma (we hold the vm->mutex for the unbind
and the pin_count is 0), we can assert that the vma remains idle as we
unbind.

Fixes: 2850748e ("drm/i915: Pull i915_vma_pin under the vm->mutex")
Closes: https://gitlab.freedesktop.org/drm/intel/issues/530Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200123224459.38128-1-chris@chris-wilson.co.uk
(cherry picked from commit 60e94557)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>

e4edd4fc

drm/i915/gem: Detect overflow in calculating dumb buffer size · 051c89cf

Chris Wilson authored Jan 23, 2020

To multiply 2 u32 numbers to generate a u64 in C requires a bit of
forewarning for the compiler.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ramalingam C <ramalingam.c@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200123125934.1401755-1-chris@chris-wilson.co.uk
(cherry picked from commit 0f8f8a64)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>

051c89cf

drm/i915: Don't show the blank process name for internal/simulated errors · 1a9629d1

Chris Wilson authored Jan 21, 2020

For a simulated preemption reset, we don't populate the request and so
do not fill in the guilty context name.

[   79.991294] i915 0000:00:02.0: GPU HANG: ecode 9:1:e757fefe, in  [0]

Just don't mention the empty string in the logs!

Fixes: 742379c0 ("drm/i915: Start chopping up the GPU error capture")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200121132107.267709-1-chris@chris-wilson.co.uk
(cherry picked from commit 29baf3ae)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>

1a9629d1

drm/i915/gem: Store mmap_offsets in an rbtree rather than a plain list · 07ccd6bd

Chris Wilson authored Jan 20, 2020

Currently we create a new mmap_offset for every call to
mmap_offset_ioctl. This exposes ourselves to an abusive client that may
simply create new mmap_offsets ad infinitum, which will exhaust physical
memory and the virtual address space. In addition to the exhaustion, a
very long linear list of mmap_offsets causes other clients using the
object to incur long list walks -- these long lists can also be
generated by simply having many clients generate their own mmap_offset.

However, we can simply use the drm_vma_node itself to manage the file
association (allow/revoke) dropping our need to keep an mmo per-file.
Then if we keep a small rbtree of per-type mmap_offsets, we can lookup
duplicate requests quickly.

Fixes: cc662126 ("drm/i915: Introduce DRM_I915_GEM_MMAP_OFFSET")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200120104924.4000706-3-chris@chris-wilson.co.uk
(cherry picked from commit 78655598)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>

07ccd6bd

drm/i915/execlists: Leave resetting ring to intel_ring · a754012b

Chris Wilson authored Jan 15, 2020

We need to allow concurrent intel_context_unpin, which means avoiding
doing destructive operations like intel_ring_reset(). This was already
fixed for intel_ring_unpin() in commit 0725d9a3 ("drm/i915/gt: Make
intel_ring_unpin() safe for concurrent pint"), but I overlooked that
execlists_context_unpin() also made the same mistake.
Reported-by: Matthew Brost <matthew.brost@intel.com>
Fixes: 84135022 ("drm/i915/gt: Drop mutex serialisation between context pin/unpin")
References: 0725d9a3 ("drm/i915/gt: Make intel_ring_unpin() safe for concurrent pint")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200115175829.2761329-1-chris@chris-wilson.co.uk
(cherry picked from commit f3c0efc9)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>

a754012b

10 Feb, 2020 7 commits

drm/i915/gt: Use the BIT when checking the flags, not the index · 1b5af537

Chris Wilson authored Jan 15, 2020

In converting over to using set_bit()/test_bit(), when manually
inspecting the rq->fence.flags, we need to use BIT().

Fixes: e1c31fb5 ("drm/i915: Merge i915_request.flags with i915_request.fence.flags")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200115122509.2673075-1-chris@chris-wilson.co.uk
(cherry picked from commit 72ff2b8d)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>

1b5af537

drm/i915/selftests: Add a mock i915_vma to the mock_ring · 1fdea0cb

Chris Wilson authored Jan 14, 2020

Add a i915_vma to the mock_engine/mock_ring so that the core code can
always assume the presence of ring->vma.

Fixes: 8ccfc20a ("drm/i915/gt: Mark ring->vma as active while pinned")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200114160030.2468927-1-chris@chris-wilson.co.uk
(cherry picked from commit b63b4fea)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>

1fdea0cb

drm/i915: Make a copy of the ggtt view for slave plane · c631cc8f

Ville Syrjälä authored Jan 10, 2020

intel_prepare_plane_fb() will always pin plane_state->hw.fb whenever
it is present. We copy that from the master plane to the slave plane,
but we fail to copy the corresponding ggtt view. Thus when it comes time
to pin the slave plane's fb we use some stale ggtt view left over from
the last time the plane was used as a non-slave plane. If that previous
use involved 90/270 degree rotation or remapping we'll try to shuffle
the pages of the new fb around accordingingly. However the new
fb may be backed by a bo with less pages than what the ggtt view
rotation/remapped info requires, and so we we trip a GEM_BUG().

Steps to reproduce on icl:
1. plane 1: whatever
   plane 6: largish !NV12 fb + 90 degree rotation
2. plane 1: smallish NV12 fb
   plane 6: make invisible so it gets slaved to plane 1
3. GEM_BUG()

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Closes: https://gitlab.freedesktop.org/drm/intel/issues/951
Fixes: 1f594b20 ("drm/i915: Remove special case slave handling during hw programming, v3.")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200110183228.8199-1-ville.syrjala@linux.intel.comReviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
(cherry picked from commit 103605e0)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>

c631cc8f

drm/i915/gem: Take local vma references for the parser · 01c1b2cb

Chris Wilson authored Jan 13, 2020

Take and hold a reference to each of the vma (and their objects) as we
process them with the cmdparser. This stops them being freed during the
work if the GEM execbuf is interrupted and the request we expected to
keep the objects alive is incomplete.

Fixes: 686c7c35 ("drm/i915/gem: Asynchronous cmdparser")
Closes: https://gitlab.freedesktop.org/drm/intel/issues/970Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200113154555.1909639-1-chris@chris-wilson.co.uk
(cherry picked from commit 36c8e356)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>

01c1b2cb

drm/i915/pmu: Correct the rc6 offset upon enabling · 88a9c66d

Chris Wilson authored Jan 14, 2020

The rc6 residency starts ticking from 0 from BIOS POST, but the kernel
starts measuring the time from its boot. If we start measuruing
I915_PMU_RC6_RESIDENCY while the GT is idle, we start our sampling from
0 and then upon first activity (park/unpark) add in all the rc6
residency since boot. After the first park with the sampler engaged, the
sleep/active counters are aligned.

v2: With a wakeref to be sure

Closes: https://gitlab.freedesktop.org/drm/intel/issues/973
Fixes: df6a4205 ("drm/i915/pmu: Ensure monotonic rc6")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200114105648.2172026-1-chris@chris-wilson.co.uk
(cherry picked from commit f4e9894b)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>

88a9c66d

drm/i915/gvt: more locking for ppgtt mm LRU list · 0e9d7bb2

Igor Druzhinin authored Feb 03, 2020

When the lock was introduced in commit 72aabfb8 ("drm/i915/gvt: Add mutual
lock for ppgtt mm LRU list") one place got lost.

Fixes: 72aabfb8 ("drm/i915/gvt: Add mutual lock for ppgtt mm LRU list")
Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1580742421-25194-1-git-send-email-igor.druzhinin@citrix.com

0e9d7bb2

drm/i915/gvt: fix high-order allocation failure on late load · c216f12b

Igor Druzhinin authored Jan 22, 2020

If the module happens to be loaded later at runtime there is a chance
memory is already fragmented enough to fail allocation of firmware
blob storage and consequently GVT init. Since it doesn't seem to be
necessary to have the blob contiguous, use vmalloc() instead to avoid
the issue.
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1579723824-25711-1-git-send-email-igor.druzhinin@citrix.com

c216f12b

09 Feb, 2020 5 commits

drm/i915: Fix i915_error_state_store error defination · c2cebbc4

Zhang Xiaoxu authored Jan 17, 2020

Since commit 742379c0 ("drm/i915: Start chopping up the GPU error
capture"), function 'i915_error_state_store' was defined and used with
only one parameter.

But if no 'CONFIG_DRM_I915_CAPTURE_ERROR', this function was defined
with two parameter.

This may lead compile error. This patch fix it.

Fixes: 742379c0 ("drm/i915: Start chopping up the GPU error capture")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200117073436.6507-1-zhangxiaoxu5@huawei.com
(cherry picked from commit 04062c58)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>

c2cebbc4

drm/i915/bios: Fix the timing parameters · e73c1486

Vandita Kulkarni authored Jan 24, 2020

Fix htotal and vtotal parameters derived from DTD block of VBT. The
values miss the back porch.

Fixes: 33ef6d4f ("drm/i915/vbt: Handle generic DTD block")
Signed-off-by: Vandita Kulkarni <vandita.kulkarni@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200124125829.16973-1-vandita.kulkarni@intel.com
(cherry picked from commit ad278f35)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>

e73c1486

drm/i915/dsi: Ensure that the ACPI adapter lookup overrides the bus num · 1788fdf1

Vivek Kasireddy authored Jan 17, 2020

Remove the i2c_bus_num >= 0 check from the adapter lookup function
as this would prevent ACPI bus number override. This check was mainly
there to return early if the bus number has already been found but we
anyway return in the next line if the slave address does not match.

Fixes: 8cbf89db ("drm/i915/dsi: Parse the I2C element from the VBT MIPI sequence block (v3)")
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Nabendu Maiti <nabendu.bikash.maiti@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Bob Paauwe <bob.j.paauwe@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200118005848.20382-1-vivek.kasireddy@intel.com
(cherry picked from commit de409661)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>

1788fdf1

drm/i915: Fix post-fastset modeset check for port sync · 0887aa87

Ville Syrjälä authored Jan 15, 2020

The post-fastset "does anyone still need a full modeset?" for
port sync looks busted. The outer loop bails out of a full modeset
is still needed by the current crtc, and then we skip forcing
a full modeset on the related crtcs. That's totally the opposite
of what we want.

The MST path has the logic mostly the other way around so it
looks correct. To fix the port sync case let's follow the MST
logic for both. So, if the current crtc already needs a modeset
we do nothing. otherwise we check if any of the related crtcs
needs a modeset, and if so we force a full modeset for the
current crtc.

And while at let's change the else if to a plain if to so
we don't have needless coupling between the MST and port sync
checks.

Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Manasi Navare <manasi.d.navare@intel.com>
Fixes: 05a8e451 ("drm/i915/display: Use external dependency loop for port sync")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200115190813.17971-1-ville.syrjala@linux.intel.comReviewed-by: José Roberto de Souza <jose.souza@intel.com>
(cherry picked from commit d0eed154)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>

0887aa87

drm/i915/dsi: Lookup the i2c bus from ACPI NS only if CONFIG_ACPI=y (v2) · 6f4261fa

Vivek Kasireddy authored Jan 14, 2020

Perform the i2c bus/adapter lookup from ACPI Namespace only if ACPI is
enabled in the kernel config. If ACPI is not enabled or if the lookup
fails, we'll fallback to using the VBT for identifying the i2c bus.

v2: Add fixes tag (Jani)

Fixes: 8cbf89db ("drm/i915/dsi: Parse the I2C element from the VBT MIPI sequence block (v3)")
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Nabendu Maiti <nabendu.bikash.maiti@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Bob Paauwe <bob.j.paauwe@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200115012305.27395-1-vivek.kasireddy@intel.com
(cherry picked from commit 960287ca)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>

6f4261fa

07 Feb, 2020 3 commits

Merge tag 'amd-drm-next-5.6-2020-02-05' of git://people.freedesktop.org/~agd5f/linux into drm-next · 9f880327

Dave Airlie authored Feb 07, 2020

amd-drm-next-5.6-2020-02-05:

amdgpu:
- EDC fixes for Arcturus
- GDDR6 memory training fixe
- Fix for reading gfx clockgating registers while in GFXOFF state
- i2c freq fixes
- Misc display fixes
- TLB invalidation fix when using semaphores
- VCN 2.5 instancing fixes
- Switch raven1 gfxoff to a blacklist
- Coreboot workaround for KV/KB
- Root cause dongle fixes for display and revert workaround
- Enable GPU reset for renoir and navi
- Navi overclocking fixes
- Fix up confusing warnings in display clock validation on raven

amdkfd:
- SDMA fix

radeon:
- Misc LUT fixes
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200206035458.3894-1-alexander.deucher@amd.com

9f880327

Merge branch 'linux-5.6' of git://github.com/skeggsb/linux into drm-next · a345cc0d

Dave Airlie authored Feb 07, 2020

Just a couple of fixes to Volta/Turing modesetting on some systems.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Ben Skeggs <skeggsb@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ <CACAvsv7=eP+Ai1ouoMyYyo1xMF0pTQki=owYjJkS=NpvKQd1fg@mail.gmail.com

a345cc0d

Merge tag 'drm/tegra/for-5.6-rc1-fixes' of git://anongit.freedesktop.org/tegra/linux into drm-next · e139e8ae

Dave Airlie authored Feb 07, 2020

drm/tegra: Fixes for v5.6-rc1

These are a couple of quick fixes for regressions that were found during
the first two weeks of the merge window.
Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Thierry Reding <thierry.reding@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200206172753.2185390-1-thierry.reding@gmail.com

e139e8ae

06 Feb, 2020 3 commits

gpu: host1x: Set DMA direction only for DMA-mapped buffer objects · 98ae41ad

Thierry Reding authored Feb 04, 2020

The DMA direction is only used by the DMA API, so there is no use in
setting it when a buffer object isn't mapped with the DMA API.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Tested-by: Dmitry Osipenko <digetx@gmail.com>
Reviewed-by: Dmitry Osipenko <digetx@gmail.com>

98ae41ad

drm/tegra: Reuse IOVA mapping where possible · 273da5a0

Thierry Reding authored Feb 04, 2020

This partially reverts the DMA API support that was recently merged
because it was causing performance regressions on older Tegra devices.
Unfortunately, the cache maintenance performed by dma_map_sg() and
dma_unmap_sg() causes performance to drop by a factor of 10.

The right solution for this would be to cache mappings for buffers per
consumer device, but that's a bit involved. Instead, we simply revert to
the old behaviour of sharing IOVA mappings when we know that devices can
do so (i.e. they share the same IOMMU domain).

Cc: <stable@vger.kernel.org> # v5.5
Reported-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Tested-by: Dmitry Osipenko <digetx@gmail.com>
Reviewed-by: Dmitry Osipenko <digetx@gmail.com>

273da5a0

drm/tegra: Relax IOMMU usage criteria on old Tegra · 2d9384ff

Thierry Reding authored Feb 04, 2020

Older Tegra devices only allow addressing 32 bits of memory, so whether
or not the host1x is attached to an IOMMU doesn't matter. host1x IOMMU
attachment is only needed on devices that can address memory beyond the
32-bit boundary and where the host1x doesn't support the wide GATHER
opcode that allows it to access buffers at higher addresses.

Cc: <stable@vger.kernel.org> # v5.5
Signed-off-by: Thierry Reding <treding@nvidia.com>
Tested-by: Dmitry Osipenko <digetx@gmail.com>
Reviewed-by: Dmitry Osipenko <digetx@gmail.com>

2d9384ff

05 Feb, 2020 2 commits

drm/amd/dm/mst: Ignore payload update failures · 58fe03d6

Lyude Paul authored Jan 24, 2020

Disabling a display on MST can potentially happen after the entire MST
topology has been removed, which means that we can't communicate with
the topology at all in this scenario. Likewise, this also means that we
can't properly update payloads on the topology and as such, it's a good
idea to ignore payload update failures when disabling displays.
Currently, amdgpu makes the mistake of halting the payload update
process when any payload update failures occur, resulting in leaving
DC's local copies of the payload tables out of date.

This ends up causing problems with hotplugging MST topologies, and
causes modesets on the second hotplug to fail like so:

[drm] Failed to updateMST allocation table forpipe idx:1
------------[ cut here ]------------
WARNING: CPU: 5 PID: 1511 at
drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_link.c:2677
update_mst_stream_alloc_table+0x11e/0x130 [amdgpu]
Modules linked in: cdc_ether usbnet fuse xt_conntrack nf_conntrack
nf_defrag_ipv6 libcrc32c nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4
nft_counter nft_compat nf_tables nfnetlink tun bridge stp llc sunrpc
vfat fat wmi_bmof uvcvideo snd_hda_codec_realtek snd_hda_codec_generic
snd_hda_codec_hdmi videobuf2_vmalloc snd_hda_intel videobuf2_memops
videobuf2_v4l2 snd_intel_dspcfg videobuf2_common crct10dif_pclmul
snd_hda_codec videodev crc32_pclmul snd_hwdep snd_hda_core
ghash_clmulni_intel snd_seq mc joydev pcspkr snd_seq_device snd_pcm
sp5100_tco k10temp i2c_piix4 snd_timer thinkpad_acpi ledtrig_audio snd
wmi soundcore video i2c_scmi acpi_cpufreq ip_tables amdgpu(O)
rtsx_pci_sdmmc amd_iommu_v2 gpu_sched mmc_core i2c_algo_bit ttm
drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec drm
crc32c_intel serio_raw hid_multitouch r8152 mii nvme r8169 nvme_core
rtsx_pci pinctrl_amd
CPU: 5 PID: 1511 Comm: gnome-shell Tainted: G           O      5.5.0-rc7Lyude-Test+ #4
Hardware name: LENOVO FA495SIT26/FA495SIT26, BIOS R12ET22W(0.22 ) 01/31/2019
RIP: 0010:update_mst_stream_alloc_table+0x11e/0x130 [amdgpu]
Code: 28 00 00 00 75 2b 48 8d 65 e0 5b 41 5c 41 5d 41 5e 5d c3 0f b6 06
49 89 1c 24 41 88 44 24 08 0f b6 46 01 41 88 44 24 09 eb 93 <0f> 0b e9
2f ff ff ff e8 a6 82 a3 c2 66 0f 1f 44 00 00 0f 1f 44 00
RSP: 0018:ffffac428127f5b0 EFLAGS: 00010202
RAX: 0000000000000002 RBX: ffff8d1e166eee80 RCX: 0000000000000000
RDX: ffffac428127f668 RSI: ffff8d1e166eee80 RDI: ffffac428127f610
RBP: ffffac428127f640 R08: ffffffffc03d94a8 R09: 0000000000000000
R10: ffff8d1e24b02000 R11: ffffac428127f5b0 R12: ffff8d1e1b83d000
R13: ffff8d1e1bea0b08 R14: 0000000000000002 R15: 0000000000000002
FS:  00007fab23ffcd80(0000) GS:ffff8d1e28b40000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f151f1711e8 CR3: 00000005997c0000 CR4: 00000000003406e0
Call Trace:
 ? mutex_lock+0xe/0x30
 dc_link_allocate_mst_payload+0x9a/0x210 [amdgpu]
 ? dm_read_reg_func+0x39/0xb0 [amdgpu]
 ? core_link_enable_stream+0x656/0x730 [amdgpu]
 core_link_enable_stream+0x656/0x730 [amdgpu]
 dce110_apply_ctx_to_hw+0x58e/0x5d0 [amdgpu]
 ? dcn10_verify_allow_pstate_change_high+0x1d/0x280 [amdgpu]
 ? dcn10_wait_for_mpcc_disconnect+0x3c/0x130 [amdgpu]
 dc_commit_state+0x292/0x770 [amdgpu]
 ? add_timer+0x101/0x1f0
 ? ttm_bo_put+0x1a1/0x2f0 [ttm]
 amdgpu_dm_atomic_commit_tail+0xb59/0x1ff0 [amdgpu]
 ? amdgpu_move_blit.constprop.0+0xb8/0x1f0 [amdgpu]
 ? amdgpu_bo_move+0x16d/0x2b0 [amdgpu]
 ? ttm_bo_handle_move_mem+0x118/0x570 [ttm]
 ? ttm_bo_validate+0x134/0x150 [ttm]
 ? dm_plane_helper_prepare_fb+0x1b9/0x2a0 [amdgpu]
 ? _cond_resched+0x15/0x30
 ? wait_for_completion_timeout+0x38/0x160
 ? _cond_resched+0x15/0x30
 ? wait_for_completion_interruptible+0x33/0x190
 commit_tail+0x94/0x130 [drm_kms_helper]
 drm_atomic_helper_commit+0x113/0x140 [drm_kms_helper]
 drm_atomic_helper_set_config+0x70/0xb0 [drm_kms_helper]
 drm_mode_setcrtc+0x194/0x6a0 [drm]
 ? _cond_resched+0x15/0x30
 ? mutex_lock+0xe/0x30
 ? drm_mode_getcrtc+0x180/0x180 [drm]
 drm_ioctl_kernel+0xaa/0xf0 [drm]
 drm_ioctl+0x208/0x390 [drm]
 ? drm_mode_getcrtc+0x180/0x180 [drm]
 amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
 do_vfs_ioctl+0x458/0x6d0
 ksys_ioctl+0x5e/0x90
 __x64_sys_ioctl+0x16/0x20
 do_syscall_64+0x55/0x1b0
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fab2121f87b
Code: 0f 1e fa 48 8b 05 0d 96 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff
ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01
f0 ff ff 73 01 c3 48 8b 0d dd 95 2c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffd045f9068 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007ffd045f90a0 RCX: 00007fab2121f87b
RDX: 00007ffd045f90a0 RSI: 00000000c06864a2 RDI: 000000000000000b
RBP: 00007ffd045f90a0 R08: 0000000000000000 R09: 000055dbd2985d10
R10: 000055dbd2196280 R11: 0000000000000246 R12: 00000000c06864a2
R13: 000000000000000b R14: 0000000000000000 R15: 000055dbd2196280
---[ end trace 6ea888c24d2059cd ]---

Note as well, I have only been able to reproduce this on setups with 2
MST displays.

Changes since v1:
* Don't return false when part 1 or part 2 of updating the payloads
  fails, we don't want to abort at any step of the process even if
  things fail
Reviewed-by: Mikita Lipski <Mikita.Lipski@amd.com>
Signed-off-by: Lyude Paul <lyude@redhat.com>
Acked-by: Harry Wentland <harry.wentland@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

58fe03d6

drm/amdgpu: update default voltage for boot od table for navi1x · 7b913a76

Alex Deucher authored Feb 04, 2020

It needed to be updated as well so it will show the proper values
if you reset to the defaults.

Bug: https://gitlab.freedesktop.org/drm/amd/issues/1020Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

7b913a76

04 Feb, 2020 6 commits

drm/amdgpu/smu10: fix smu10_get_clock_by_type_with_voltage · 1064ad4a

Alex Deucher authored Jan 29, 2020

Cull out 0 clocks to avoid a warning in DC.

Bug: https://gitlab.freedesktop.org/drm/amd/issues/963Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

1064ad4a

drm/amdgpu/smu10: fix smu10_get_clock_by_type_with_latency · 4d0a72b6

Alex Deucher authored Jan 28, 2020

Only send non-0 clocks to DC for validation.  This mirrors
what the windows driver does.

Bug: https://gitlab.freedesktop.org/drm/amd/issues/963Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

4d0a72b6

drm/amdgpu/display: handle multiple numbers of fclks in dcn_calcs.c (v2) · c3724357

Alex Deucher authored Jan 28, 2020

We might get different numbers of clocks from powerplay depending
on what the OEM has populated.

v2: add assert for at least one level

Bug: https://gitlab.freedesktop.org/drm/amd/issues/963Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

c3724357

drm/amdgpu: fetch default VDDC curve voltages (v2) · 0531aa6e

Alex Deucher authored Jan 25, 2020

Ask the SMU for the default VDDC curve voltage values.  This
properly reports the VDDC values in the OD interface.

v2: only update if the original values are 0

Bug: https://gitlab.freedesktop.org/drm/amd/issues/1020Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 5.5.x

0531aa6e

drm/amdgpu/smu_v11_0: Correct behavior of restoring default tables (v2) · 93c5f1f6

Matt Coffin authored Jan 25, 2020

Previously, the syfs functionality for restoring the default powerplay
table was sourcing it's information from the currently-staged powerplay
table.

This patch adds a step to cache the first overdrive table that we see on
boot, so that it can be used later to "restore" the powerplay table

v2: sqaush my original with Matt's fix

Bug: https://gitlab.freedesktop.org/drm/amd/issues/1020Signed-off-by: Matt Coffin <mcoffin13@gmail.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 5.5.x

93c5f1f6

drm/amdgpu/navi10: add OD_RANGE for navi overclocking · ee23a518

Alex Deucher authored Jan 25, 2020

So users can see the range of valid values.

Bug: https://gitlab.freedesktop.org/drm/amd/issues/1020Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 5.5.x

ee23a518