Commits · 77cdd054dd2ced6cb6f3420a998d6a0a65d6076e · Kirill Smelkov / linux

28 Oct, 2021 2 commits

drm/i915/pmu: Connect engine busyness stats from GuC to pmu · 77cdd054

Umesh Nerlige Ramappa authored Oct 26, 2021

With GuC handling scheduling, i915 is not aware of the time that a
context is scheduled in and out of the engine. Since i915 pmu relies on
this info to provide engine busyness to the user, GuC shares this info
with i915 for all engines using shared memory. For each engine, this
info contains:

- total busyness: total time that the context was running (total)
- id: id of the running context (id)
- start timestamp: timestamp when the context started running (start)

At the time (now) of sampling the engine busyness, if the id is valid
(!= ~0), and start is non-zero, then the context is considered to be
active and the engine busyness is calculated using the below equation

engine busyness = total + (now - start)

All times are obtained from the gt clock base. For inactive contexts,
engine busyness is just equal to the total.

The start and total values provided by GuC are 32 bits and wrap around
in a few minutes. Since perf pmu provides busyness as 64 bit
monotonically increasing values, there is a need for this implementation
to account for overflows and extend the time to 64 bits before returning
busyness to the user. In order to do that, a worker runs periodically at
frequency = 1/8th the time it takes for the timestamp to wrap. As an
example, that would be once in 27 seconds for a gt clock frequency of
19.2 MHz.

Note:
There might be an over-accounting of busyness due to the fact that GuC
may be updating the total and start values while kmd is reading them.
(i.e kmd may read the updated total and the stale start). In such a
case, user may see higher busyness value followed by smaller ones which
would eventually catch up to the higher value.

v2: (Tvrtko)
- Include details in commit message
- Move intel engine busyness function into execlist code
- Use union inside engine->stats
- Use natural type for ping delay jiffies
- Drop active_work condition checks
- Use for_each_engine if iterating all engines
- Drop seq locking, use spinlock at GuC level to update engine stats
- Document worker specific details

v3: (Tvrtko/Umesh)
- Demarcate GuC and execlist stat objects with comments
- Document known over-accounting issue in commit
- Provide a consistent view of GuC state
- Add hooks to gt park/unpark for GuC busyness
- Stop/start worker in gt park/unpark path
- Drop inline
- Move spinlock and worker inits to GuC initialization
- Drop helpers that are called only once

v4: (Tvrtko/Matt/Umesh)
- Drop addressed opens from commit message
- Get runtime pm in ping, remove from the park path
- Use cancel_delayed_work_sync in disable_submission path
- Update stats during reset prepare
- Skip ping if reset in progress
- Explicitly name execlists and GuC stats objects
- Since disable_submission is called from many places, move resetting
stats to intel_guc_submission_reset_prepare

v5: (Tvrtko)
- Add a trylock helper that does not sleep and synchronize PMU event
callbacks and worker with gt reset

v6: (CI BAT failures)
- DUTs using execlist submission failed to boot since __gt_unpark is
called during i915 load. This ends up calling the GuC busyness unpark
hook and results in kick-starting an uninitialized worker. Let
park/unpark hooks check if GuC submission has been initialized.
- drop cant_sleep() from trylock helper since rcu_read_lock takes care
of that.

v7: (CI) Fix igt@i915_selftest@live@gt_engines
- For GuC mode of submission the engine busyness is derived from gt time
domain. Use gt time elapsed as reference in the selftest.
- Increase busyness calculation to 10ms duration to ensure batch runs
longer and falls within the busyness tolerances in selftest.

v8:
- Use ktime_get in selftest as before
- intel_reset_trylock_no_wait results in a lockdep splat that is not
trivial to fix since the PMU callback runs in irq context and the
reset paths are tightly knit into the driver. The test that uncovers
this is igt@perf_pmu@faulting-read. Drop intel_reset_trylock_no_wait,
instead use the reset_count to synchronize with gt reset during pmu
callback. For the ping, continue to use intel_reset_trylock since ping
is not run in irq context.

- GuC PM timestamp does not tick when GuC is idle. This can potentially
result in wrong busyness values when a context is active on the
engine, but GuC is idle. Use the RING TIMESTAMP as GPU timestamp to
process the GuC busyness stats. This works since both GuC timestamp and
RING timestamp are synced with the same clock.

- The busyness stats may get updated after the batch starts running.
This delay causes the busyness reported for 100us duration to fall
below 95% in the selftest. The only option at this time is to wait for
GuC busyness to change from idle to active before we sample busyness
over a 100us period.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211027004821.66097-2-umesh.nerlige.ramappa@intel.com

77cdd054

drm/i915/pmu: Add a name to the execlists stats · 344e6947

Umesh Nerlige Ramappa authored Oct 26, 2021

In preparation for GuC pmu stats, add a name to the execlists stats
structure so that it can be differentiated from the GuC stats.
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211027004821.66097-1-umesh.nerlige.ramappa@intel.com

344e6947

27 Oct, 2021 2 commits

drm/i915: Revert 'guc_id' from i915_request tracepoint · 64512a66

Joonas Lahtinen authored Oct 27, 2021

Avoid adding backend specific data to the tracepoints outside of
the LOW_LEVEL_TRACEPOINTS kernel config protection. These bits of
information are bound to change depending on the selected submission
method per platform and are not necessarily possible to maintain in
the future.

Fixes: dbf9da8d ("drm/i915/guc: Add trace point for GuC submit")
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: John Harrison <john.c.harrison@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211027093255.66489-1-joonas.lahtinen@linux.intel.com

64512a66

MAINTAINERS: Add Tvrtko as drm/i915 co-maintainer · 04bc1667

Joonas Lahtinen authored Oct 25, 2021

Add Tvrtko Ursulin as a co-maintainer for drm/i915 driver.
Tvrtko will bring added bandwidth and focus to the GT/GEM domain
(drm-intel-gt-next).

This will help with the increased driver maintenance efforts, allows
alternating the drm-intel-gt-next pull requests and also should increase
the coverage for the maintenance in general.
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Acked-by: Jani Nikula <jani.nikula@linux.intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Sean Paul <sean@poorly.run>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: dri-devel@lists.freedesktop.org
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211025134907.20078-1-joonas.lahtinen@linux.intel.com

04bc1667

25 Oct, 2021 2 commits

drm/i915: Use ERR_CAST instead of ERR_PTR(PTR_ERR()) · 86752bd6

Wan Jiabing authored Oct 25, 2021

Fix following coccicheck warning:
./drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:3117:15-22: WARNING:
ERR_CAST can be used with eb->requests[i].
Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211025113316.24424-1-wanjiabing@vivo.com

86752bd6

drm/i915/selftests: Fix inconsistent IS_ERR and PTR_ERR · fc7bf4c0

Kai Song authored Oct 22, 2021

Fix inconsistent IS_ERR and PTR_ERR in i915_gem_dmabuf.c
Signed-off-by: Kai Song <songkai01@inspur.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211022120655.22173-1-songkai01@inspur.com

fc7bf4c0

22 Oct, 2021 13 commits

drm/i915/guc: Fix recursive lock in GuC submission · 12a9917e

Matthew Brost authored Oct 20, 2021

Use __release_guc_id (lock held) rather than release_guc_id (acquires
lock), add lockdep annotations.

213.280129] i915: Running i915_perf_live_selftests/live_noa_gpr
[ 213.283459] ============================================
[ 213.283462] WARNING: possible recursive locking detected
{{[ 213.283466] 5.15.0-rc6+ #18 Tainted: G U W }}
[ 213.283470] --------------------------------------------
[ 213.283472] kworker/u24:0/8 is trying to acquire lock:
[ 213.283475] ffff8ffc4f6cc1e8 (&guc->submission_state.lock){....}-{2:2}, at: destroyed_worker_func+0x2df/0x350 [i915]
{{[ 213.283618] }}
{{ but task is already holding lock:}}
[ 213.283621] ffff8ffc4f6cc1e8 (&guc->submission_state.lock){....}-{2:2}, at: destroyed_worker_func+0x4f/0x350 [i915]
{{[ 213.283720] }}
{{ other info that might help us debug this:}}
[ 213.283724] Possible unsafe locking scenario:[ 213.283727] CPU0
[ 213.283728] ----
[ 213.283730] lock(&guc->submission_state.lock);
[ 213.283734] lock(&guc->submission_state.lock);
{{[ 213.283737] }}
{{ *** DEADLOCK ***}}[ 213.283740] May be due to missing lock nesting notation[ 213.283744] 3 locks held by kworker/u24:0/8:
[ 213.283747] #0: ffff8ffb80059d38 ((wq_completion)events_unbound){..}-{0:0}, at: process_one_work+0x1f3/0x550
[ 213.283757] #1: ffffb509000e3e78 ((work_completion)(&guc->submission_state.destroyed_worker)){..}-{0:0}, at: process_one_work+0x1f3/0x550
[ 213.283766] #2: ffff8ffc4f6cc1e8 (&guc->submission_state.lock){....}-{2:2}, at: destroyed_worker_func+0x4f/0x350 [i915]
{{[ 213.283860] }}
{{ stack backtrace:}}
[ 213.283863] CPU: 8 PID: 8 Comm: kworker/u24:0 Tainted: G U W 5.15.0-rc6+ #18
[ 213.283868] Hardware name: ASUS System Product Name/PRIME B560M-A AC, BIOS 0403 01/26/2021
[ 213.283873] Workqueue: events_unbound destroyed_worker_func [i915]
[ 213.283957] Call Trace:
[ 213.283960] dump_stack_lvl+0x57/0x72
[ 213.283966] __lock_acquire.cold+0x191/0x2d3
[ 213.283972] lock_acquire+0xb5/0x2b0
[ 213.283978] ? destroyed_worker_func+0x2df/0x350 [i915]
[ 213.284059] ? destroyed_worker_func+0x2d7/0x350 [i915]
[ 213.284139] ? lock_release+0xb9/0x280
[ 213.284143] _raw_spin_lock_irqsave+0x48/0x60
[ 213.284148] ? destroyed_worker_func+0x2df/0x350 [i915]
[ 213.284226] destroyed_worker_func+0x2df/0x350 [i915]
[ 213.284310] process_one_work+0x270/0x550
[ 213.284315] worker_thread+0x52/0x3b0
[ 213.284319] ? process_one_work+0x550/0x550
[ 213.284322] kthread+0x135/0x160
[ 213.284326] ? set_kthread_struct+0x40/0x40
[ 213.284331] ret_from_fork+0x1f/0x30

and a bit later in the trace:

{{ 227.499864] do_raw_spin_lock+0x94/0xa0}}
[ 227.499868] _raw_spin_lock_irqsave+0x50/0x60
[ 227.499871] ? guc_flush_destroyed_contexts+0x4f/0xf0 [i915]
[ 227.499995] guc_flush_destroyed_contexts+0x4f/0xf0 [i915]
[ 227.500104] intel_guc_submission_reset_prepare+0x99/0x4b0 [i915]
[ 227.500209] ? mark_held_locks+0x49/0x70
[ 227.500212] intel_uc_reset_prepare+0x46/0x50 [i915]
[ 227.500320] reset_prepare+0x78/0x90 [i915]
[ 227.500412] __intel_gt_set_wedged.part.0+0x13/0xe0 [i915]
[ 227.500485] intel_gt_set_wedged.part.0+0x54/0x100 [i915]
[ 227.500556] intel_gt_set_wedged_on_fini+0x1a/0x30 [i915]
[ 227.500622] intel_gt_driver_unregister+0x1e/0x60 [i915]
[ 227.500694] i915_driver_remove+0x4a/0xf0 [i915]
[ 227.500767] i915_pci_probe+0x84/0x170 [i915]
[ 227.500838] local_pci_probe+0x42/0x80
[ 227.500842] pci_device_probe+0xd9/0x190
[ 227.500844] really_probe+0x1f2/0x3f0
[ 227.500847] __driver_probe_device+0xfe/0x180
[ 227.500848] driver_probe_device+0x1e/0x90
[ 227.500850] __driver_attach+0xc4/0x1d0
[ 227.500851] ? __device_attach_driver+0xe0/0xe0
[ 227.500853] ? __device_attach_driver+0xe0/0xe0
[ 227.500854] bus_for_each_dev+0x64/0x90
[ 227.500856] bus_add_driver+0x12e/0x1f0
[ 227.500857] driver_register+0x8f/0xe0
[ 227.500859] i915_init+0x1d/0x8f [i915]
[ 227.500934] ? 0xffffffffc144a000
[ 227.500936] do_one_initcall+0x58/0x2d0
[ 227.500938] ? rcu_read_lock_sched_held+0x3f/0x80
[ 227.500940] ? kmem_cache_alloc_trace+0x238/0x2d0
[ 227.500944] do_init_module+0x5c/0x270
[ 227.500946] __do_sys_finit_module+0x95/0xe0
[ 227.500949] do_syscall_64+0x38/0x90
[ 227.500951] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 227.500953] RIP: 0033:0x7ffa59d2ae0d
[ 227.500954] Code: c8 0c 00 0f 05 eb a9 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3b 80 0c 00 f7 d8 64 89 01 48
[ 227.500955] RSP: 002b:00007fff320bbf48 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[ 227.500956] RAX: ffffffffffffffda RBX: 00000000022ea710 RCX: 00007ffa59d2ae0d
[ 227.500957] RDX: 0000000000000000 RSI: 00000000022e1d90 RDI: 0000000000000004
[ 227.500958] RBP: 0000000000000020 R08: 00007ffa59df3a60 R09: 0000000000000070
[ 227.500958] R10: 00000000022e1d90 R11: 0000000000000246 R12: 00000000022e1d90
[ 227.500959] R13: 00000000022e58e0 R14: 0000000000000043 R15: 00000000022e42c0

v2:
 (CI build)
  - Fix build error

Fixes: 1a52faed ("drm/i915/guc: Take GT PM ref when deregistering context")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211020192147.8048-1-matthew.brost@intel.com

12a9917e

drm/i915/selftests: Update live.evict to wait on requests / idle GPU after each loop · 393211e1

Matthew Brost authored Oct 21, 2021

Update live.evict to wait on last request and idle GPU after each loop.
This not only enhances the test to fill the GGTT on each engine class
but also avoid timeouts from igt_flush_test when using GuC submission.
igt_flush_test (idle GPU) can take a long time with GuC submission if
losts of contexts are created due to H2G / G2H required to destroy
contexts.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211021214040.33292-1-matthew.brost@intel.com

393211e1

drm/i915/selftests: Increase timeout in requests perf selftest · 7c287113

Matthew Brost authored Oct 11, 2021

perf_parallel_engines is micro benchmark to test i915 request
scheduling. The test creates a thread per physical engine and submits
NOP requests and waits the requests to complete in a loop. In execlists
mode this works perfectly fine as powerful CPU has enough cores to feed
each engine and process the CSBs. With GuC submission the uC gets
overwhelmed as all threads feed into a single CTB channel and the GuC
gets bombarded with CSBs as contexts are immediately switched in and out
on the engines due to the zero runtime of the requests. When the GuC is
overwhelmed scheduling of contexts is unfair due to the nature of the
GuC scheduling algorithm. This behavior is understood and deemed
acceptable as this micro benchmark isn't close to real world use case.
Increasing the timeout of wait period for requests to complete. This
makes the test understand that is ok for contexts to get starved in this
scenario.

A future patch / cleanup may just delete these micro benchmark tests as
they basically mean nothing. We care about real workloads not made up
ones.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211011175704.28509-1-matthew.brost@intel.com

7c287113

drm/i915/ttm: enable shmem tt backend · 5d12ffe6

Matthew Auld authored Oct 18, 2021

Turn on the shmem tt backend, and enable shrinking.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211018091055.1998191-8-matthew.auld@intel.com

5d12ffe6

drm/i915/ttm: use cached system pages when evicting lmem · 2eda4fc6

Matthew Auld authored Oct 18, 2021

This should let us do an accelerated copy directly to the shmem pages
when temporarily moving lmem-only objects, where the i915-gem shrinker
can later kick in to swap out the pages, if needed.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211018091055.1998191-7-matthew.auld@intel.com

2eda4fc6

drm/i915/ttm: move shrinker management into adjust_lru · ebd4a8ec

Matthew Auld authored Oct 18, 2021

We currently just evict lmem objects to system memory when under memory
pressure. For this case we might lack the usual object mm.pages, which
effectively hides the pages from the i915-gem shrinker, until we
actually "attach" the TT to the object, or in the case of lmem-only
objects it just gets migrated back to lmem when touched again.

For all cases we can just adjust the i915 shrinker LRU each time we also
adjust the TTM LRU. The two cases we care about are:

  1) When something is moved by TTM, including when initially populating
     an object. Importantly this covers the case where TTM moves something from
     lmem <-> smem, outside of the normal get_pages() interface, which
     should still ensure the shmem pages underneath are reclaimable.

  2) When calling into i915_gem_object_unlock(). The unlock should
     ensure the object is removed from the shinker LRU, if it was indeed
     swapped out, or just purged, when the shrinker drops the object lock.

v2(Thomas):
  - Handle managing the shrinker LRU in adjust_lru, where it is always
    safe to touch the object.
v3(Thomas):
  - Pretty much a re-write. This time piggy back off the shrink_pin
    stuff, which actually seems to fit quite well for what we want here.
v4(Thomas):
  - Just use a simple boolean for tracking ttm_shrinkable.
v5:
  - Ensure we call adjust_lru when faulting the object, to ensure the
    pages are visible to the shrinker, if needed.
  - Add back the adjust_lru when in i915_ttm_move (Thomas)
v6(Reported-by: kernel test robot <lkp@intel.com>):
  - Remove unused i915_tt
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> #v4
Link: https://patchwork.freedesktop.org/patch/msgid/20211018091055.1998191-6-matthew.auld@intel.com

ebd4a8ec

drm/i915: add some kernel-doc for shrink_pin and friends · e25d1ea4

Matthew Auld authored Oct 18, 2021

Attempt to document shrink_pin and the other relevant interfaces that
interact with it, before we start messing with it.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211018091055.1998191-5-matthew.auld@intel.com

e25d1ea4

drm/i915: drop unneeded make_unshrinkable in free_object · 893f11f0

Matthew Auld authored Oct 18, 2021

The comment here is no longer accurate, since the current shrinker code
requires a full ref before touching any objects. Also unset_pages()
should already do the required make_unshrinkable() for us, if needed,
which is also nicely balanced with set_pages().
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211018091055.1998191-4-matthew.auld@intel.com

893f11f0

drm/i915/gtt: drop unneeded make_unshrinkable · 5926ff80

Matthew Auld authored Oct 18, 2021

We already do this when mapping the pages.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211018091055.1998191-3-matthew.auld@intel.com

5926ff80

drm/i915/ttm: add tt shmem backend · 7ae03459

Matthew Auld authored Oct 18, 2021

For cached objects we can allocate our pages directly in shmem. This
should make it possible(in a later patch) to utilise the existing
i915-gem shrinker code for such objects. For now this is still disabled.

v2(Thomas):
  - Add optional try_to_writeback hook for objects. Importantly we need
    to check if the object is even still shrinkable; in between us
    dropping the shrinker LRU lock and acquiring the object lock it could for
    example have been moved. Also we need to differentiate between
    "lazy" shrinking and the immediate writeback mode. Also later we need to
    handle objects which don't even have mm.pages, so bundling this into
    put_pages() would require somehow handling that edge case, hence
    just letting the ttm backend handle everything in try_to_writeback
    doesn't seem too bad.
v3(Thomas):
  - Likely a bad idea to touch the object from the unpopulate hook,
    since it's not possible to hold a reference, without also creating
    circular dependency, so likely this is too fragile. For now just
    ensure we at least mark the pages as dirty/accessed when called from the
    shrinker on WILLNEED objects.
  - s/try_to_writeback/shrinker_release_pages, since this can do more
    than just writeback.
  - Get rid of do_backup boolean and just set the SWAPPED flag prior to
    calling unpopulate.
  - Keep shmem_tt as lowest priority for the TTM LRU bo_swapout walk, since
    these just get skipped anyway. We can try to come up with something
    better later.
v4(Thomas):
  - s/PCI_DMA/DMA/. Also drop NO_KERNEL_MAPPING and NO_WARN, which
    apparently doesn't do anything with streaming mappings.
  - Just pass along the error for ->truncate, and assume nothing.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Oak Zeng <oak.zeng@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Acked-by: Oak Zeng <oak.zeng@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211018091055.1998191-2-matthew.auld@intel.com

7ae03459

drm/i915/gem: Break out some shmem backend utils · f05b985e

Thomas Hellström authored Oct 18, 2021

Break out some shmem backend utils for future reuse by the TTM backend:
shmem_alloc_st(), shmem_free_st() and __shmem_writeback() which we can
use to provide a shmem-backed TTM page pool for cached-only TTM
buffer objects.

Main functional change here is that we now compute the page sizes using
the dma segments rather than using the physical page address segments.

v2(Reported-by: kernel test robot <lkp@intel.com>)
    - Make sure we initialise the mapping on the error path in
      shmem_get_pages()
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211018091055.1998191-1-matthew.auld@intel.com

f05b985e

Merge drm/drm-next into drm-intel-gt-next · ef3e6192

Joonas Lahtinen authored Oct 22, 2021

Backmerging to pull in the new dma_resv iterators requested by
Maarten and Matt.
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

ef3e6192

drm/i915/dmabuf: fix broken build · 777226da

Matthew Auld authored Oct 21, 2021

wbinvd_on_all_cpus() is only defined on x86 it seems, plus we need to
include asm/smp.h here.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211021125332.2455288-1-matthew.auld@intel.com

777226da

21 Oct, 2021 2 commits

Merge tag 'drm-intel-gt-next-2021-10-21' of... · 6f2f7c83

Dave Airlie authored Oct 22, 2021

Merge tag 'drm-intel-gt-next-2021-10-21' of git://anongit.freedesktop.org/drm/drm-intel into drm-next

UAPI Changes:

- Expose multi-LRC submission interface

  Similar to the bonded submission interface but simplified.
  Comes with GuC only implementation for now. See kerneldoc
  for more details.

  Userspace changes: https://github.com/intel/media-driver/pull/1252

- Expose logical engine instance to user

  Needed by the multi-LRC submission interface for GuC

  Userspace changes: https://github.com/intel/media-driver/pull/1252

Driver Changes:

- Fix blank screen booting crashes when CONFIG_CC_OPTIMIZE_FOR_SIZE=y (Hugh)
- Add support for multi-LRC submission in the GuC backend (Matt B)
- Add extra cache flushing before making pages userspace visible (Matt A, Thomas)
- Mark internal GPU object pages dirty so they will be flushed properly (Matt A)

- Move remaining debugfs interfaces i915_wedged/i915_forcewake_user into gt (Andi)
- Replace the unconditional clflushes with drm_clflush_virt_range() (Ville)
- Remove IS_ACTIVE macro completely (Lucas)
- Improve kerneldocs for cache_dirty (Matt A)

- Add missing includes (Lucas)
- Selftest improvements (Matt R, Ran, Matt A)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/YXFmLKoq8Fg9JxSd@jlahtine-mobl.ger.corp.intel.com

6f2f7c83

Merge tag 'drm-intel-next-2021-10-15' of git://anongit.freedesktop.org/drm/drm-intel into drm-next · 94ff371e

Dave Airlie authored Oct 22, 2021

UAPI Changes:

- No Functional change, but a clarification around I915_TILING values (Matt).

Driver Changes:

- Changes around async flip VT-d w/a (Ville)
- Delete bogus NULL check in intel_ddi_encoder_destroy (Dan)
- DP link training improvements and DP per-lane driver settings (Ville)
- Free the returned object of acpi_evaluate_dsm (Zenghui)
- Fixes and improvements around DP's UHBR and MST (Jani)
- refactor plane config + pin out (Dave)
- remove unused include in intel_dsi_vbt.c (Lucas)
- some code clean up (Lucas, Jani)
- gracefully disable dual eDP (Jani)
- Remove memory frequency calculation (Jose)
- Fix oops on platforms w/o hpd support (Ville)
- Clean up PXP Kconfig info (Rodrigo)
Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/YWnMORrixyw90O3/@intel.com

94ff371e

20 Oct, 2021 10 commits

drm/i915/selftests: mark up hugepages object with start_cpu_write · ab5d964c

Matthew Auld authored Oct 18, 2021

Just like we do for internal objects. Also just use
i915_gem_object_set_cache_coherency() here. No need for over-flushing on
LLC platforms.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211018174508.2137279-9-matthew.auld@intel.com

ab5d964c

drm/i915: mark up internal objects with start_cpu_write · 3884d8af

Matthew Auld authored Oct 18, 2021

While the pages can't be swapped out, they can be discarded by the shrinker.
Normally such objects are marked with __I915_MADV_PURGED, which can't be
unset, and therefore requires a new object. For kernel internal objects
this is not true, since the madv hint is reset for our special volatile
objects, such that we can re-acquire new pages, if so desired, without
needing a new object. As a result we should probably be paranoid here
and put the object back into the CPU domain when discarding the pages,
and also correctly set cache_dirty, if required.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211018174508.2137279-8-matthew.auld@intel.com

3884d8af

drm/i915: expand on the kernel-doc for cache_dirty · df94fd05

Matthew Auld authored Oct 18, 2021

Add some details around non-LLC platforms and cflushing, when dealing
with the flush-on-acquire, which is potentially security sensitive.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211018174508.2137279-7-matthew.auld@intel.com

df94fd05

drm/i915/shmem: ensure flush during swap-in on non-LLC · d70af579

Matthew Auld authored Oct 18, 2021

On non-LLC platforms, force the flush-on-acquire if this is ever
swapped-in. Our async flush path is not trust worthy enough yet(and
happens in the wrong order), and with some tricks it's conceivable for
userspace to change the cache-level to I915_CACHE_NONE after the pages
are swapped-in, and since execbuf binds the object before doing the
async flush, there is a potential race window.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211018174508.2137279-6-matthew.auld@intel.com

d70af579

drm/i915/userptr: add paranoid flush-on-acquire · 63430347

Matthew Auld authored Oct 18, 2021

Even though userptr objects are always coherent with the GPU, with no
way for userspace to change this with the set_caching ioctl, even on
non-LLC platforms, there is still the 'Bypass LCC' mocs setting, which
might permit reading the contents of main memory directly.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211018174508.2137279-5-matthew.auld@intel.com

63430347

drm/i915/dmabuf: add paranoid flush-on-acquire · a035154d

Matthew Auld authored Oct 18, 2021

As pointed out by Thomas, we likely need to flush the pages here if the
GPU can read the page contents directly from main memory. Underneath we
don't know what the sg_table is pointing to, so just add a
wbinvd_on_all_cpus() here, for now.
Reported-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211018174508.2137279-4-matthew.auld@intel.com

a035154d

drm/i915: extract bypass-llc check into helper · 30f1dccd

Matthew Auld authored Oct 18, 2021

It looks like we will need this in some more places, so extract as a
helper.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211018174508.2137279-3-matthew.auld@intel.com

30f1dccd

drm/i915: mark userptr objects as ALLOC_USER · f7858cb4

Matthew Auld authored Oct 18, 2021

These are userspace objects, so mark them as such. In a later patch it's
useful to determine how paranoid we need to be when managing cache
flushes. In theory no functional changes.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211018174508.2137279-2-matthew.auld@intel.com

f7858cb4

drm/i915: mark dmabuf objects as ALLOC_USER · e1f17ea4

Matthew Auld authored Oct 18, 2021

e1f17ea4

drm/i915/selftests: remove duplicate include in mock_region.c · c654dc37

Ran Jianping authored Oct 19, 2021

'drm/ttm/ttm_placement.h' included in
'drivers/gpu/drm/i915/selftests/mock_region.c' is duplicated.
It is also included on the 9 line.
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Ran Jianping <ran.jianping@zte.com.cn>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211019090205.1003458-1-ran.jianping@zte.com.cn

c654dc37

18 Oct, 2021 3 commits

drm/i915: Catch yet another unconditioal clflush · 9ced1218

Ville Syrjälä authored Oct 14, 2021

Replace the unconditional clflush() with drm_clflush_virt_range()
which does the wbinvd() fallback when clflush is not available.

This time no justification is given for the clflush in the
offending commit.

Cc: stable@vger.kernel.org
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Fixes: 2c8ab333 ("drm/i915: Pin timeline map after first timeline pin, v4.")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211014090941.12159-4-ville.syrjala@linux.intel.comReviewed-by: Dave Airlie <airlied@redhat.com>

9ced1218

drm/i915: Convert unconditional clflush to drm_clflush_virt_range() · af7b6d23

Ville Syrjälä authored Oct 14, 2021

This one is apparently a "clflush for good measure", so bit more
justification (if you can call it that) than some of the others.
Convert to drm_clflush_virt_range() again so that machines without
clflush will survive the ordeal.

Cc: stable@vger.kernel.org
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Thomas Hellström <thomas.hellstrom@intel.com> #v1
Fixes: 12ca695d ("drm/i915: Do not share hwsp across contexts any more, v8.")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211014090941.12159-3-ville.syrjala@linux.intel.comReviewed-by: Dave Airlie <airlied@redhat.com>

af7b6d23

drm/i915: Replace the unconditional clflush with drm_clflush_virt_range() · ef7ec41f

Ville Syrjälä authored Oct 14, 2021

Not all machines have clflush, so don't go assuming they do.
Not really sure why the clflush is even here since hwsp
is supposed to get snooped I thought.

Although in my case we're talking about a i830 machine where
render/blitter snooping is definitely busted. But it might
work for the hswp perhaps. Haven't really reverse engineered
that one fully.

Cc: stable@vger.kernel.org
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Fixes: b436a5f8 ("drm/i915/gt: Track all timelines created using the HWSP")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211014090941.12159-2-ville.syrjala@linux.intel.comReviewed-by: Dave Airlie <airlied@redhat.com>

ef7ec41f

15 Oct, 2021 6 commits

drm/i915: Clean up PXP Kconfig info. · c974cf01

Rodrigo Vivi authored Oct 14, 2021

During the review I focused on stop the using of the "+"
to reference the newer platforms, but I forgot that we are
in a process of making things more clear and differentiate
graphics and display versions. So, let me to clean up this
a bit. Also, we don't need any version mentioned in the
config menu entry, only in the help.

Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211015090916.82968-1-rodrigo.vivi@intel.com

c974cf01

drm/i915: Enable multi-bb execbuf · 4eb61ddc

Matthew Brost authored Oct 14, 2021

Enable multi-bb execbuf by enabling the set_parallel extension.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-25-matthew.brost@intel.com

4eb61ddc

drm/i915: Update I915_GEM_BUSY IOCTL to understand composite fences · 7647f009

Matthew Brost authored Oct 14, 2021

Parallel submission create composite fences (dma_fence_array) for excl /
shared slots in objects. The I915_GEM_BUSY IOCTL checks these slots to
determine the busyness of the object. Prior to patch it only check if
the fence in the slot was a i915_request. Update the check to understand
composite fences and correctly report the busyness.

v2:
 (Tvrtko)
  - Remove duplicate BUILD_BUG_ON
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-24-matthew.brost@intel.com

7647f009

drm/i915: Make request conflict tracking understand parallel submits · afc76f30

Matthew Brost authored Oct 14, 2021

If an object in the excl or shared slot is a composite fence from a
parallel submit and the current request in the conflict tracking is from
the same parallel context there is no need to enforce ordering as the
ordering is already implicit. Make the request conflict tracking
understand this by comparing a parallel submit's parent context and
skipping conflict insertion if the values match.

v2:
 (John Harrison)
  - Reword commit message
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-23-matthew.brost@intel.com

afc76f30

drm/i915/guc: Handle errors in multi-lrc requests · 28c70233

Matthew Brost authored Oct 14, 2021

If an error occurs in the front end when multi-lrc requests are getting
generated we need to skip these in the backend but we still need to
emit the breadcrumbs seqno. An issues arises because with multi-lrc
breadcrumbs there is a handshake between the parent and children to make
forward progress. If all the requests are not present this handshake
doesn't work. To work around this, if multi-lrc request has an error we
skip the handshake but still emit the breadcrumbs seqno.

v2:
 (John Harrison)
  - Add comment explaining the skipping of the handshake logic
  - Fix typos in the commit message
v3:
 (John Harrison)
  - Fix up some comments about the math to NOP the ring
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-22-matthew.brost@intel.com

28c70233

drm/i915: Multi-BB execbuf · 544460c3

Matthew Brost authored Oct 14, 2021

Allow multiple batch buffers to be submitted in a single execbuf IOCTL
after a context has been configured with the 'set_parallel' extension.
The number batches is implicit based on the contexts configuration.

This is implemented with a series of loops. First a loop is used to find
all the batches, a loop to pin all the HW contexts, a loop to create all
the requests, a loop to submit (emit BB start, etc...) all the requests,
a loop to tie the requests to the VMAs they touch, and finally a loop to
commit the requests to the backend.

A composite fence is also created for the generated requests to return
to the user and to stick in dma resv slots.

No behavior from the existing IOCTL should be changed aside from when
throttling because the ring for a context is full. In this situation,
i915 will now wait while holding the object locks. This change was done
because the code is much simpler to wait while holding the locks and we
believe there isn't a huge benefit of dropping these locks. If this
proves false we can restructure the code to drop the locks during the
wait.

IGT: https://patchwork.freedesktop.org/patch/447008/?series=93071&rev=1
media UMD: https://github.com/intel/media-driver/pull/1252

v2:
 (Matthew Brost)
  - Return proper error value if i915_request_create fails
v3:
 (John Harrison)
  - Add comment explaining create / add order loops + locking
  - Update commit message explaining different in IOCTL behavior
  - Line wrap some comments
  - eb_add_request returns void
  - Return -EINVAL rather triggering BUG_ON if cmd parser used
 (Checkpatch)
  - Check eb->batch_len[*current_batch]
v4:
 (CI)
  - Set batch len if passed if via execbuf args
  - Call __i915_request_skip after __i915_request_commit
 (Kernel test robot)
  - Initialize rq to NULL in eb_pin_timeline
v5:
 (John Harrison)
  - Fix typo in comments near bb order loops
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-21-matthew.brost@intel.com

544460c3