- 22 Oct, 2021 7 commits
-
-
Matthew Auld authored
Attempt to document shrink_pin and the other relevant interfaces that interact with it, before we start messing with it. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211018091055.1998191-5-matthew.auld@intel.com
-
Matthew Auld authored
The comment here is no longer accurate, since the current shrinker code requires a full ref before touching any objects. Also unset_pages() should already do the required make_unshrinkable() for us, if needed, which is also nicely balanced with set_pages(). Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211018091055.1998191-4-matthew.auld@intel.com
-
Matthew Auld authored
We already do this when mapping the pages. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211018091055.1998191-3-matthew.auld@intel.com
-
Matthew Auld authored
For cached objects we can allocate our pages directly in shmem. This should make it possible(in a later patch) to utilise the existing i915-gem shrinker code for such objects. For now this is still disabled. v2(Thomas): - Add optional try_to_writeback hook for objects. Importantly we need to check if the object is even still shrinkable; in between us dropping the shrinker LRU lock and acquiring the object lock it could for example have been moved. Also we need to differentiate between "lazy" shrinking and the immediate writeback mode. Also later we need to handle objects which don't even have mm.pages, so bundling this into put_pages() would require somehow handling that edge case, hence just letting the ttm backend handle everything in try_to_writeback doesn't seem too bad. v3(Thomas): - Likely a bad idea to touch the object from the unpopulate hook, since it's not possible to hold a reference, without also creating circular dependency, so likely this is too fragile. For now just ensure we at least mark the pages as dirty/accessed when called from the shrinker on WILLNEED objects. - s/try_to_writeback/shrinker_release_pages, since this can do more than just writeback. - Get rid of do_backup boolean and just set the SWAPPED flag prior to calling unpopulate. - Keep shmem_tt as lowest priority for the TTM LRU bo_swapout walk, since these just get skipped anyway. We can try to come up with something better later. v4(Thomas): - s/PCI_DMA/DMA/. Also drop NO_KERNEL_MAPPING and NO_WARN, which apparently doesn't do anything with streaming mappings. - Just pass along the error for ->truncate, and assume nothing. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Christian König <christian.koenig@amd.com> Cc: Oak Zeng <oak.zeng@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Acked-by: Oak Zeng <oak.zeng@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211018091055.1998191-2-matthew.auld@intel.com
-
Thomas Hellström authored
Break out some shmem backend utils for future reuse by the TTM backend: shmem_alloc_st(), shmem_free_st() and __shmem_writeback() which we can use to provide a shmem-backed TTM page pool for cached-only TTM buffer objects. Main functional change here is that we now compute the page sizes using the dma segments rather than using the physical page address segments. v2(Reported-by: kernel test robot <lkp@intel.com>) - Make sure we initialise the mapping on the error path in shmem_get_pages() Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211018091055.1998191-1-matthew.auld@intel.com
-
Joonas Lahtinen authored
Backmerging to pull in the new dma_resv iterators requested by Maarten and Matt. Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
-
Matthew Auld authored
wbinvd_on_all_cpus() is only defined on x86 it seems, plus we need to include asm/smp.h here. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211021125332.2455288-1-matthew.auld@intel.com
-
- 21 Oct, 2021 2 commits
-
-
Dave Airlie authored
Merge tag 'drm-intel-gt-next-2021-10-21' of git://anongit.freedesktop.org/drm/drm-intel into drm-next UAPI Changes: - Expose multi-LRC submission interface Similar to the bonded submission interface but simplified. Comes with GuC only implementation for now. See kerneldoc for more details. Userspace changes: https://github.com/intel/media-driver/pull/1252 - Expose logical engine instance to user Needed by the multi-LRC submission interface for GuC Userspace changes: https://github.com/intel/media-driver/pull/1252 Driver Changes: - Fix blank screen booting crashes when CONFIG_CC_OPTIMIZE_FOR_SIZE=y (Hugh) - Add support for multi-LRC submission in the GuC backend (Matt B) - Add extra cache flushing before making pages userspace visible (Matt A, Thomas) - Mark internal GPU object pages dirty so they will be flushed properly (Matt A) - Move remaining debugfs interfaces i915_wedged/i915_forcewake_user into gt (Andi) - Replace the unconditional clflushes with drm_clflush_virt_range() (Ville) - Remove IS_ACTIVE macro completely (Lucas) - Improve kerneldocs for cache_dirty (Matt A) - Add missing includes (Lucas) - Selftest improvements (Matt R, Ran, Matt A) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/YXFmLKoq8Fg9JxSd@jlahtine-mobl.ger.corp.intel.com
-
git://anongit.freedesktop.org/drm/drm-intelDave Airlie authored
UAPI Changes: - No Functional change, but a clarification around I915_TILING values (Matt). Driver Changes: - Changes around async flip VT-d w/a (Ville) - Delete bogus NULL check in intel_ddi_encoder_destroy (Dan) - DP link training improvements and DP per-lane driver settings (Ville) - Free the returned object of acpi_evaluate_dsm (Zenghui) - Fixes and improvements around DP's UHBR and MST (Jani) - refactor plane config + pin out (Dave) - remove unused include in intel_dsi_vbt.c (Lucas) - some code clean up (Lucas, Jani) - gracefully disable dual eDP (Jani) - Remove memory frequency calculation (Jose) - Fix oops on platforms w/o hpd support (Ville) - Clean up PXP Kconfig info (Rodrigo) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/YWnMORrixyw90O3/@intel.com
-
- 20 Oct, 2021 10 commits
-
-
Matthew Auld authored
Just like we do for internal objects. Also just use i915_gem_object_set_cache_coherency() here. No need for over-flushing on LLC platforms. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211018174508.2137279-9-matthew.auld@intel.com
-
Matthew Auld authored
While the pages can't be swapped out, they can be discarded by the shrinker. Normally such objects are marked with __I915_MADV_PURGED, which can't be unset, and therefore requires a new object. For kernel internal objects this is not true, since the madv hint is reset for our special volatile objects, such that we can re-acquire new pages, if so desired, without needing a new object. As a result we should probably be paranoid here and put the object back into the CPU domain when discarding the pages, and also correctly set cache_dirty, if required. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211018174508.2137279-8-matthew.auld@intel.com
-
Matthew Auld authored
Add some details around non-LLC platforms and cflushing, when dealing with the flush-on-acquire, which is potentially security sensitive. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Daniel Vetter <daniel@ffwll.ch> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211018174508.2137279-7-matthew.auld@intel.com
-
Matthew Auld authored
On non-LLC platforms, force the flush-on-acquire if this is ever swapped-in. Our async flush path is not trust worthy enough yet(and happens in the wrong order), and with some tricks it's conceivable for userspace to change the cache-level to I915_CACHE_NONE after the pages are swapped-in, and since execbuf binds the object before doing the async flush, there is a potential race window. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211018174508.2137279-6-matthew.auld@intel.com
-
Matthew Auld authored
Even though userptr objects are always coherent with the GPU, with no way for userspace to change this with the set_caching ioctl, even on non-LLC platforms, there is still the 'Bypass LCC' mocs setting, which might permit reading the contents of main memory directly. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211018174508.2137279-5-matthew.auld@intel.com
-
Matthew Auld authored
As pointed out by Thomas, we likely need to flush the pages here if the GPU can read the page contents directly from main memory. Underneath we don't know what the sg_table is pointing to, so just add a wbinvd_on_all_cpus() here, for now. Reported-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211018174508.2137279-4-matthew.auld@intel.com
-
Matthew Auld authored
It looks like we will need this in some more places, so extract as a helper. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211018174508.2137279-3-matthew.auld@intel.com
-
Matthew Auld authored
These are userspace objects, so mark them as such. In a later patch it's useful to determine how paranoid we need to be when managing cache flushes. In theory no functional changes. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211018174508.2137279-2-matthew.auld@intel.com
-
Matthew Auld authored
These are userspace objects, so mark them as such. In a later patch it's useful to determine how paranoid we need to be when managing cache flushes. In theory no functional changes. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211018174508.2137279-1-matthew.auld@intel.com
-
Ran Jianping authored
'drm/ttm/ttm_placement.h' included in 'drivers/gpu/drm/i915/selftests/mock_region.c' is duplicated. It is also included on the 9 line. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Ran Jianping <ran.jianping@zte.com.cn> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211019090205.1003458-1-ran.jianping@zte.com.cn
-
- 18 Oct, 2021 3 commits
-
-
Ville Syrjälä authored
Replace the unconditional clflush() with drm_clflush_virt_range() which does the wbinvd() fallback when clflush is not available. This time no justification is given for the clflush in the offending commit. Cc: stable@vger.kernel.org Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Fixes: 2c8ab333 ("drm/i915: Pin timeline map after first timeline pin, v4.") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014090941.12159-4-ville.syrjala@linux.intel.comReviewed-by: Dave Airlie <airlied@redhat.com>
-
Ville Syrjälä authored
This one is apparently a "clflush for good measure", so bit more justification (if you can call it that) than some of the others. Convert to drm_clflush_virt_range() again so that machines without clflush will survive the ordeal. Cc: stable@vger.kernel.org Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Thomas Hellström <thomas.hellstrom@intel.com> #v1 Fixes: 12ca695d ("drm/i915: Do not share hwsp across contexts any more, v8.") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014090941.12159-3-ville.syrjala@linux.intel.comReviewed-by: Dave Airlie <airlied@redhat.com>
-
Ville Syrjälä authored
Not all machines have clflush, so don't go assuming they do. Not really sure why the clflush is even here since hwsp is supposed to get snooped I thought. Although in my case we're talking about a i830 machine where render/blitter snooping is definitely busted. But it might work for the hswp perhaps. Haven't really reverse engineered that one fully. Cc: stable@vger.kernel.org Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Fixes: b436a5f8 ("drm/i915/gt: Track all timelines created using the HWSP") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014090941.12159-2-ville.syrjala@linux.intel.comReviewed-by: Dave Airlie <airlied@redhat.com>
-
- 15 Oct, 2021 18 commits
-
-
Rodrigo Vivi authored
During the review I focused on stop the using of the "+" to reference the newer platforms, but I forgot that we are in a process of making things more clear and differentiate graphics and display versions. So, let me to clean up this a bit. Also, we don't need any version mentioned in the config menu entry, only in the help. Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211015090916.82968-1-rodrigo.vivi@intel.com
-
Matthew Brost authored
Enable multi-bb execbuf by enabling the set_parallel extension. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-25-matthew.brost@intel.com
-
Matthew Brost authored
Parallel submission create composite fences (dma_fence_array) for excl / shared slots in objects. The I915_GEM_BUSY IOCTL checks these slots to determine the busyness of the object. Prior to patch it only check if the fence in the slot was a i915_request. Update the check to understand composite fences and correctly report the busyness. v2: (Tvrtko) - Remove duplicate BUILD_BUG_ON Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-24-matthew.brost@intel.com
-
Matthew Brost authored
If an object in the excl or shared slot is a composite fence from a parallel submit and the current request in the conflict tracking is from the same parallel context there is no need to enforce ordering as the ordering is already implicit. Make the request conflict tracking understand this by comparing a parallel submit's parent context and skipping conflict insertion if the values match. v2: (John Harrison) - Reword commit message Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-23-matthew.brost@intel.com
-
Matthew Brost authored
If an error occurs in the front end when multi-lrc requests are getting generated we need to skip these in the backend but we still need to emit the breadcrumbs seqno. An issues arises because with multi-lrc breadcrumbs there is a handshake between the parent and children to make forward progress. If all the requests are not present this handshake doesn't work. To work around this, if multi-lrc request has an error we skip the handshake but still emit the breadcrumbs seqno. v2: (John Harrison) - Add comment explaining the skipping of the handshake logic - Fix typos in the commit message v3: (John Harrison) - Fix up some comments about the math to NOP the ring Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-22-matthew.brost@intel.com
-
Matthew Brost authored
Allow multiple batch buffers to be submitted in a single execbuf IOCTL after a context has been configured with the 'set_parallel' extension. The number batches is implicit based on the contexts configuration. This is implemented with a series of loops. First a loop is used to find all the batches, a loop to pin all the HW contexts, a loop to create all the requests, a loop to submit (emit BB start, etc...) all the requests, a loop to tie the requests to the VMAs they touch, and finally a loop to commit the requests to the backend. A composite fence is also created for the generated requests to return to the user and to stick in dma resv slots. No behavior from the existing IOCTL should be changed aside from when throttling because the ring for a context is full. In this situation, i915 will now wait while holding the object locks. This change was done because the code is much simpler to wait while holding the locks and we believe there isn't a huge benefit of dropping these locks. If this proves false we can restructure the code to drop the locks during the wait. IGT: https://patchwork.freedesktop.org/patch/447008/?series=93071&rev=1 media UMD: https://github.com/intel/media-driver/pull/1252 v2: (Matthew Brost) - Return proper error value if i915_request_create fails v3: (John Harrison) - Add comment explaining create / add order loops + locking - Update commit message explaining different in IOCTL behavior - Line wrap some comments - eb_add_request returns void - Return -EINVAL rather triggering BUG_ON if cmd parser used (Checkpatch) - Check eb->batch_len[*current_batch] v4: (CI) - Set batch len if passed if via execbuf args - Call __i915_request_skip after __i915_request_commit (Kernel test robot) - Initialize rq to NULL in eb_pin_timeline v5: (John Harrison) - Fix typo in comments near bb order loops Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-21-matthew.brost@intel.com
-
Matthew Brost authored
For some users of multi-lrc, e.g. split frame, it isn't safe to preempt mid BB. To safely enable preemption at the BB boundary, a handshake between parent and child is needed, syncing the set of BBs at the beginning and end of each batch. This is implemented via custom emit_bb_start & emit_fini_breadcrumb functions and enabled by default if a context is configured by set parallel extension. Lastly, this patch updates the process descriptor to the correct size as the memory used in the handshake is directly after the process descriptor. v2: (John Harrison) - Fix a few comments wording - Add struture for parent page layout v3: (John Harrison) - A structure for sync semaphore - Use offsetof to calc address - Update commit message v4: (John Harrison) - Fix typos in comment explaining memory map of scratch page Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-20-matthew.brost@intel.com
-
Matthew Brost authored
Add very basic (single submission) multi-lrc selftest. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-19-matthew.brost@intel.com
-
Matthew Brost authored
Update parallel submit doc to point to i915_drm.h Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-18-matthew.brost@intel.com
-
Matthew Brost authored
Introduce 'set parallel submit' extension to connect UAPI to GuC multi-lrc interface. Kernel doc in new uAPI should explain it all. IGT: https://patchwork.freedesktop.org/patch/447008/?series=93071&rev=1 media UMD: https://github.com/intel/media-driver/pull/1252 v2: (Daniel Vetter) - Add IGT link and placeholder for media UMD link v3: (Kernel test robot) - Fix warning in unpin engines call (John Harrison) - Reword a bunch of the kernel doc v4: (John Harrison) - Add comment why perma-pin is done after setting gem context - Update some comments / docs for proto contexts v5: (John Harrison) - Rework perma-pin comment - Add BUG_IN if context is pinned when setting gem context Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-17-matthew.brost@intel.com
-
Matthew Brost authored
Display the workqueue status in debugfs for GuC contexts that are in parent-child relationship. v2: (John Harrison) - Output number children in debugfs Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-16-matthew.brost@intel.com
-
Matthew Brost authored
Update context and full GPU reset to work with multi-lrc. The idea is parent context tracks all the active requests inflight for itself and its children. The parent context owns the reset replaying / canceling requests as needed. v2: (John Harrison) - Simply loop in find active request - Add comments to find ative request / reset loop v3: (John Harrison) - s/its'/its/g - Fix comment when searching for active request - Reorder if state in __guc_reset_context v4: (Kernel test robot) - Delete unused is_multi_lrc function Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-15-matthew.brost@intel.com
-
Matthew Brost authored
The GuC must receive requests in the order submitted for contexts in a parent-child relationship to function correctly. To ensure this, insert a submit fence between the current request and last request submitted for requests / contexts in a parent child relationship. This is conceptually similar to a single timeline. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-14-matthew.brost@intel.com
-
Matthew Brost authored
Implement multi-lrc submission via a single workqueue entry and single H2G. The workqueue entry contains an updated tail value for each request, of all the contexts in the multi-lrc submission, and updates these values simultaneously. As such, the tasklet and bypass path have been updated to coalesce requests into a single submission. v2: (John Harrison) - s/wqe/wqi - Use FIELD_PREP macros - Add GEM_BUG_ONs ensures length fits within field - Add comment / white space to intel_guc_write_barrier (Kernel test robot) - Make need_tasklet a static function v3: (Docs) - A comment for submission_stall_reason v4: (Kernel test robot) - Initialize return value in bypass tasklt submit function (John Harrison) - Add comment near work queue defs - Add BUILD_BUG_ON to ensure WQ_SIZE is a power of 2 - Update write_barrier comment to talk about work queue v5: (John Harrison) - Fix typo in work queue comment Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-13-matthew.brost@intel.com
-
Matthew Brost authored
Parallel contexts are perma-pinned by the upper layers which makes the backend implementation rather simple. The parent pins the guc_id and children increment the parent's pin count on pin to ensure all the contexts are unpinned before we disable scheduling with the GuC / or deregister the context. v2: (Daniel Vetter) - Perma-pin parallel contexts Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-12-matthew.brost@intel.com
-
Matthew Brost authored
Assign contexts in parent-child relationship consecutive guc_ids. This is accomplished by partitioning guc_id space between ones that need to be consecutive (1/16 available guc_ids) and ones that do not (15/16 of available guc_ids). The consecutive search is implemented via the bitmap API. This is a precursor to the full GuC multi-lrc implementation but aligns to how GuC mutli-lrc interface is defined - guc_ids must be consecutive when using the GuC multi-lrc interface. v2: (Daniel Vetter) - Explicitly state why we assign consecutive guc_ids v3: (John Harrison) - Bring back in spin lock Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-11-matthew.brost@intel.com
-
Matthew Brost authored
In GuC parent-child contexts the parent context controls the scheduling, ensure only the parent does the scheduling operations. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-10-matthew.brost@intel.com
-
Matthew Brost authored
Add multi-lrc context registration H2G. In addition a workqueue and process descriptor are setup during multi-lrc context registration as these data structures are needed for multi-lrc submission. v2: (John Harrison) - Move GuC specific fields into sub-struct - Clean up WQ defines - Add comment explaining math to derive WQ / PD address v3: (John Harrison) - Add PARENT_SCRATCH_SIZE define - Update comment explaining multi-lrc register v4: (John Harrison) - Move PARENT_SCRATCH_SIZE to common file Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-9-matthew.brost@intel.com
-