- 13 May, 2019 3 commits
-
-
Chris Wilson authored
In all likelihood, the priority and node are already in the CPU cache and by checking them first, we can avoid having to chase the *request->hwsp for the current breadcrumb. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190513120102.29660-3-chris@chris-wilson.co.uk
-
Chris Wilson authored
To simplify the next patch, update bump_priority and schedule to accept the internal i915_sched_ndoe directly and not expect a request pointer. add/remove: 0/0 grow/shrink: 2/1 up/down: 8/-15 (-7) Function old new delta i915_schedule_bump_priority 109 113 +4 i915_schedule 50 54 +4 __i915_schedule 922 907 -15 v2: Adopt node for the old rq local, since it no longer is a request but the origin node. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190513120102.29660-2-chris@chris-wilson.co.uk
-
Chris Wilson authored
To avoid pulling in a forward declaration in the next patch, move the i915_sched_node handling to after the main dfs of the scheduler. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190513120102.29660-1-chris@chris-wilson.co.uk
-
- 09 May, 2019 9 commits
-
-
Ville Syrjälä authored
Convert the HSW pch_pfit.force_thru to a proper state variable with readout and accompanying pipe conf check. Makes the logic a bit more straightforward, and hopefully prevents some breakage in the future. 'force_thru' is probably not the best name for this, but I didn't manage to come up with anything better either, so I left it alone. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190425162906.5242-2-ville.syrjala@linux.intel.comReviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
-
Ville Syrjälä authored
On HSW the pipe A panel fitter lives inside the display power well, and the input MUX for the EDP transcoder needs to be configured appropriately to route the data through the power well as needed. Changing the MUX setting is not allowed while the pipe is active, so we need to force a full modeset whenever we need to change it. Currently we may end up doing a fastset which won't change the MUX settings, but it will drop the power well reference, and that kills the pipe. Cc: stable@vger.kernel.org Cc: Hans de Goede <hdegoede@redhat.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Fixes: d19f958d ("drm/i915: Enable fastset for non-boot modesets.") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190425162906.5242-1-ville.syrjala@linux.intel.comReviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
-
Daniel Drake authored
On many (all?) the Gemini Lake systems we work with, there is frequent momentary graphical corruption at the top of the screen, and it seems that disabling framebuffer compression can avoid this. The ticket was reported 6 months ago and has already affected a multitude of users, without any real progress being made. So, lets disable framebuffer compression on GeminiLake until a solution is found. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108085 Fixes: fd7d6c5c ("drm/i915: enable FBC on gen9+ too") Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: <stable@vger.kernel.org> # v4.11+ Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: Daniel Drake <drake@endlessm.com> Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190423092810.28359-1-jian-hong@endlessm.com
-
Ramalingam C authored
Considering the significant size of hdcp related code in drm, all hdcp related codes are moved into separate file called drm_hdcp.c. v2: Rebased. v2: Rebased. Signed-off-by: Ramalingam C <ramalingam.c@intel.com> Suggested-by: Daniel Vetter <daniel@ffwll.ch> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Dave Airlie <airlied@gmail.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20190507162745.25600-7-ramalingam.c@intel.com
-
Ramalingam C authored
DRM HDCP SRM revocation check services are used from I915 for HDCP1.4 and 2.2 revocation check during the respective authentication flow. v2: Rebased. v3: %s/*_ksvs_revocated/*_check_ksvs_revoked [Daniel] unwanted noise is removed. Signed-off-by: Ramalingam C <ramalingam.c@intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20190507162745.25600-6-ramalingam.c@intel.com
-
Ramalingam C authored
On every hdcp revocation check request SRM is read from fw file /lib/firmware/display_hdcp_srm.bin SRM table is parsed and stored at drm_hdcp.c, with functions exported for the services for revocation check from drivers (which implements the HDCP authentication) This patch handles the HDCP1.4 and 2.2 versions of SRM table. v2: moved the uAPI to request_firmware_direct() [Daniel] v3: kdoc added. [Daniel] srm_header unified and bit field definitions are removed. [Daniel] locking improved. [Daniel] vrl length violation is fixed. [Daniel] v4: s/__swab16/be16_to_cpu [Daniel] be24_to_cpu is done through a global func [Daniel] Unused variables are removed. [Daniel] unchecked return values are dropped from static funcs [Daniel] Signed-off-by: Ramalingam C <ramalingam.c@intel.com> Acked-by: Satyeshwar Singh <satyeshwar.singh@intel.com> Reviewed-by: Daniel Vetter <daniel@ffwll.ch> Acked-by: Dave Airlie <airlied@gmail.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20190507162745.25600-5-ramalingam.c@intel.com
-
Ramalingam C authored
Existing functions for converting a 3bytes(be24) of big endian value into u32 of little endian and vice versa are renamed as s/drm_hdcp2_seq_num_to_u32/drm_hdcp_be24_to_cpu s/drm_hdcp2_u32_to_seq_num/drm_hdcp_cpu_to_be24 Signed-off-by: Ramalingam C <ramalingam.c@intel.com> Suggested-by: Daniel Vetter <daniel@ffwll.ch> cc: Tomas Winkler <tomas.winkler@intel.com> Acked-by: Dave Airlie <airlied@gmail.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20190507162745.25600-4-ramalingam.c@intel.com
-
Ramalingam C authored
Adding the HDCP2.2 capability of HDCP src and sink info into debugfs entry "i915_hdcp_sink_capability" This helps the userspace tests to skip the HDCP2.2 test on non HDCP2.2 sinks. v2: Rebased. Signed-off-by: Ramalingam C <ramalingam.c@intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20190507162745.25600-3-ramalingam.c@intel.com
-
Ramalingam C authored
Content protection property is created once and stored in drm_mode_config. And attached to all HDCP capable connectors. Signed-off-by: Ramalingam C <ramalingam.c@intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Dave Airlie <airlied@gmail.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20190507162745.25600-2-ramalingam.c@intel.com
-
- 08 May, 2019 3 commits
-
-
Chris Wilson authored
Currently there is an underlying assumption that i915_request_unsubmit() is synchronous wrt the GPU -- that is the request is no longer in flight as we remove it. In the near future that may change, and this may upset our signaling as we can process an interrupt for that request while it is no longer in flight. CPU0 CPU1 intel_engine_breadcrumbs_irq (queue request completion) i915_request_cancel_signaling ... ... i915_request_enable_signaling dma_fence_signal Hence in the time it took us to drop the lock to signal the request, a preemption event may have occurred and re-queued the request. In the process, that request would have seen I915_FENCE_FLAG_SIGNAL clear and so reused the rq->signal_link that was in use on CPU0, leading to bad pointer chasing in intel_engine_breadcrumbs_irq. A related issue was that if someone started listening for a signal on a completed but no longer in-flight request, we missed the opportunity to immediately signal that request. Furthermore, as intel_contexts may be immediately released during request retirement, in order to be entirely sure that intel_engine_breadcrumbs_irq may no longer dereference the intel_context (ce->signals and ce->signal_link), we must wait for irq spinlock. In order to prevent the race, we use a bit in the fence.flags to signal the transfer onto the signal list inside intel_engine_breadcrumbs_irq. For simplicity, we use the DMA_FENCE_FLAG_SIGNALED_BIT as it then quickly signals to any outside observer that the fence is indeed signaled. v2: Sketch out potential dma-fence API for manual signaling v3: And the test_and_set_bit() Fixes: 52c0fdb2 ("drm/i915: Replace global breadcrumbs with per-context interrupt tracking") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190508112452.18942-1-chris@chris-wilson.co.uk
-
Chris Wilson authored
After realising we need to sample RING_START to detect context switches from preemption events that do not allow for the seqno to advance, we can also realise that the seqno itself is just a distance along the ring and so can be replaced by sampling RING_HEAD. v2: Bonus comment for the mystery separate CS_STALL before MI_USER_INTERRUPT Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190508080704.24223-1-chris@chris-wilson.co.uk
-
Chris Wilson authored
If the HW fails to ack a change in forcewake status, the machine is as good as dead -- it may recover, but in reality it missed the mmio updates and is now in a very inconsistent state. If it happens, we can't trust the CI results (or at least the fails may be genuine but due to the HW being dead and not the actual test!) so reboot the machine (CI checks for a kernel taint in between each test and reboots if the machine is tainted). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190508115245.27790-1-chris@chris-wilson.co.uk
-
- 07 May, 2019 13 commits
-
-
Aditya Swarup authored
There is a bug in hdmi_deep_color_possible() - we compare pipe_bpp <= 8*3 which returns true every time for hdmi_deep_color_possible 12 bit deep color mode test in intel_hdmi_compute_config().(Even when the requested color mode is 10 bit through max bpc property) Comparing pipe_bpp with bpc * 3 takes care of this condition where requested max bpc is 10 bit, so hdmi_deep_color_possible with 12 bit returns false when requested max bpc is 10.(Ville) v2:Add suggested by Ville Syrjälä Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Aditya Swarup <aditya.swarup@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Manasi Navare <manasi.d.navare@intel.com> Cc: Clinton Taylor <Clinton.A.Taylor@intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190507181856.16091-1-aditya.swarup@intel.com
-
Ville Syrjälä authored
For us KBP is 100% identical to SPT. Kill the redundant enum value. Also bspec doesn't talk about KBP either, so this might avoid some confusion when cross checking the code against the spec. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190506152627.20283-1-ville.syrjala@linux.intel.comReviewed-by: Jani Nikula <jani.nikula@intel.com>
-
Chris Wilson authored
Do not treat reset as a normal preemption event and avoid giving the guilty request a priority boost for simply being active at the time of reset. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190507122954.6299-1-chris@chris-wilson.co.uk
-
Chris Wilson authored
If we couple the scheduler more tightly with the execlists policy, we can apply the preemption policy to the question of whether we need to kick the tasklet at all for this priority bump. v2: Rephrase it as a core i915 policy and not an execlists foible. v3: Pull the kick together. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190507122544.12698-1-chris@chris-wilson.co.uk
-
Chris Wilson authored
If the user is racing a call to debugfs/i915_drop_caches with ongoing submission from another thread/process, we may never end up idling the GPU and be uninterruptibly spinning in debugfs/i915_drop_caches trying to catch an idle moment. Just flush the work once, that should be enough to park the system under correct conditions. Outside of those we either have a driver bug or the user is racing themselves. Sadly, because the user may be provoking the unwanted situation we can't put a warn here to attract attention to a probable bug. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190507121108.18377-4-chris@chris-wilson.co.uk
-
Chris Wilson authored
Replace the racy continuation check within retire_work with a definite kill-switch on idling. The race was being exposed by gem_concurrent_blit where the retire_worker would be terminated too early leaving us spinning in debugfs/i915_drop_caches with nothing flushing the retirement queue. Although that the igt is trying to idle from one child while submitting from another may be a contributing factor as to why it runs so slowly... v2: Use the non-sync version of cancel_delayed_work(), we only need to stop it from being scheduled as we independently check whether now is the right time to be parking. Testcase: igt/gem_concurrent_blit Fixes: 79ffac85 ("drm/i915: Invert the GEM wakeref hierarchy") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190507121108.18377-3-chris@chris-wilson.co.uk
-
Chris Wilson authored
The original intent for the delay before running the idle_work was to provide a hysteresis to avoid ping-ponging the device runtime-pm. Since then we have also pulled in some memory management and general device management for parking. But with the inversion of the wakeref handling, GEM is no longer responsible for the wakeref and by the time we call the idle_work, the device is asleep. It seems appropriate now to drop the delay and just run the worker immediately to flush the cached GEM state before sleeping. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190507121108.18377-2-chris@chris-wilson.co.uk
-
Chris Wilson authored
To complete the idle worker, we must complete 2 passes of wait-for-idle. At the end of the first pass, we queue a switch-to-kernel-context and may only idle after waiting for its completion. Speed up the flush_work by doing the wait explicitly, which then allows us to remove the unbounded loop trying to complete the flush_work in the next patch. References: 79ffac85 ("drm/i915: Invert the GEM wakeref hierarchy") Testcase: igt/gem_ppgtt/flind-and-close-vma-leak Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190507121108.18377-1-chris@chris-wilson.co.uk
-
Clinton Taylor authored
v2: Fix commit msg to reflect why issue occurs(Jani) Set GCP_COLOR_INDICATION only when we set 10/12 bit deep color. Changing settings from 10/12 bit deep color to 8 bit(& vice versa) doesn't work correctly using xrandr max bpc property. When we connect a monitor which supports deep color, the highest deep color setting is selected; which sets GCP_COLOR_INDICATION. When we change the setting to 8 bit color, we still set GCP_COLOR_INDICATION which doesn't allow the switch back to 8 bit color. v3,4: Add comments & drop changes in intel_hdmi_compute_config(Ville) Since HSW+, GCP_COLOR_INDICATION is not required for 8bpc. Drop the changes in intel_hdmi_compute_config as desired_bpp is needed to change values for pipe_bpp based on bw_constrained flag. v5: Fix missing logical && in condition for setting GCP_COLOR_INDICATION. v6: Fix comment formatting (Ville) v7: Add reviewed by Ville v8: Set GCP_COLOR_INDICATION based on spec: For Gen 7.5 or later platforms, indicate color depth only for deep color modes. Bspec: 8135,7751,50524 Pre DDI platforms, indicate color depth if deep color is supported by sink. Bspec: 7854 Exception: CHERRYVIEW behaves like Pre DDI platforms. Bspec: 15975 Check pipe_bpp is less than bpp * 3 in hdmi_deep_color_possible, to not set 12 bit deep color for every modeset. This fixes the issue where 12 bit color was selected even when user selected 10 bit.(Ville) v9: Maintain a consistent behavior for all platforms and support GCP_COLOR_INDICATION only when we are in deep color mode. Remove hdmi_sink_is_deep_color() - no longer needed as checking pipe_bpp > 24 takes care of the deep color mode scenario. Separate patch for fixing switch from 12 bit to 10 bit deep color mode. Co-developed-by: Aditya Swarup <aditya.swarup@intel.com> Signed-off-by: Clinton Taylor <Clinton.A.Taylor@intel.com> Signed-off-by: Aditya Swarup <aditya.swarup@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Manasi Navare <manasi.d.navare@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190429230811.9983-1-aditya.swarup@intel.com
-
Chris Wilson authored
Due to the asynchronous tasklet and recursive GT wakeref, it may happen that we submit to the engine (underneath it's own wakeref) prior to the central wakeref being marked as taken. Switch to checking the local wakeref for greater consistency. Fixes: 79ffac85 ("drm/i915: Invert the GEM wakeref hierarchy") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190503115225.30831-3-chris@chris-wilson.co.uk
-
Chris Wilson authored
The counter goes to zero at the start of the parking cycle, but the wakeref itself is held until the end. Likewise, the counter becomes one at the end of the unparking, but the wakeref is taken first. If we check the wakeref instead of the counter, we include the unpark/unparking time as intel_wakeref_is_active(), and do not spuriously declare inactive if we fail to park (i.e. the parking and wakeref drop is postponed). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190503115225.30831-2-chris@chris-wilson.co.uk
-
Chris Wilson authored
Inside the signal handler, we expect the requests to be ordered by their breadcrumb such that no later request may be complete if we find an earlier incomplete. Add an assert to check that the next breadcrumb should not be logically before the current. v2: Move the overhanging line into its own function and reuse it after doing the insertion. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190503152214.26517-1-chris@chris-wilson.co.uk
-
Chris Wilson authored
Acquiring the signaler's timeline takes an active reference to their HWSP that we would like to avoid if possible, so take it after performing all of our allocations required to set up the fencing. The acquisition also provides the final check that the target has not already signaled allowing us to avoid the semaphore at the last moment. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190503140239.32668-1-chris@chris-wilson.co.uk
-
- 06 May, 2019 5 commits
-
-
Ville Syrjälä authored
hsw_enable_pc8()/hsw_disable_pc8() are more less equivalent to the display core init/unit functions of later platforms. Relocate the hsw/bdw code into intel_runtime_pm.c so that it sits next to its cousins. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190503193143.28240-2-ville.syrjala@linux.intel.comReviewed-by: Imre Deak <imre.deak@intel.com>
-
Ville Syrjälä authored
intel_ddi_pll_init() is an anachronism. Rename it to hsw_assert_cdclk() and move it to the power domain init code. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190503193143.28240-1-ville.syrjala@linux.intel.comReviewed-by: Imre Deak <imre.deak@intel.com>
-
Ville Syrjälä authored
Move the w/a to disable IPC on SKL closer to the actual code that implements IPS. Otherwise I just end up confused as to what is excluding SKL from considerations. IMO this makes more sense anyway since the hw does have the feature, we're just not supposed to use it. And this also makes us actually disable IPC in case eg. the BIOS enabled it when it shouldn't have. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190503173807.10834-3-ville.syrjala@linux.intel.com
-
Ville Syrjälä authored
Drop WaIncreaseLatencyIPCEnabled/Display w/a #1140 for early cnl steppings. v2: Drop the IS_GEN9_BC() change since other related parts of the code also use the KBL||CFL pattern Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190503173807.10834-2-ville.syrjala@linux.intel.com
-
Ville Syrjälä authored
Display w/a #1141 is also known as WaIncreaseLatencyIPCEnabled. Add that to the comment. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190503173807.10834-1-ville.syrjala@linux.intel.com
-
- 04 May, 2019 1 commit
-
-
Chris Wilson authored
Asking the GPU to busywait on a memory address, perhaps not unexpectedly in hindsight for a shared system, leads to bus contention that affects CPU programs trying to concurrently access memory. This can manifest as a drop in transcode throughput on highly over-saturated workloads. The only clue offered by perf, is that the bus-cycles (perf stat -e bus-cycles) jumped by 50% when enabling semaphores. This corresponds with extra CPU active cycles being attributed to intel_idle's mwait. This patch introduces a heuristic to try and detect when more than one client is submitting to the GPU pushing it into an oversaturated state. As we already keep track of when the semaphores are signaled, we can inspect their state on submitting the busywait batch and if we planned to use a semaphore but were too late, conclude that the GPU is overloaded and not try to use semaphores in future requests. In practice, this means we optimistically try to use semaphores for the first frame of a transcode job split over multiple engines, and fail if there are multiple clients active and continue not to use semaphores for the subsequent frames in the sequence. Periodically, we try to optimistically switch semaphores back on whenever the client waits to catch up with the transcode results. With 1 client, on Broxton J3455, with the relative fps normalized by %cpu: x no semaphores + drm-tip * patched +------------------------------------------------------------------------+ | * | | *+ | | **+ | | **+ x | | x * +**+ x | | x x * * +***x xx | | x x * * *+***x *x | | x x* + * * *****x *x x | | + x xx+x* + *** * ********* x * | | + x xx+x* * *** +** ********* xx * | | * + ++++* + x*x****+*+* ***+*************+x* * | |*+ +** *+ + +* + *++****** *xxx**********x***+*****************+*++ *| | |__________A_____M_____| | | |_______________A____M_________| | | |____________A___M________| | +------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 120 2.60475 3.50941 3.31123 3.2143953 0.21117399 + 120 2.3826 3.57077 3.25101 3.1414161 0.28146407 Difference at 95.0% confidence -0.0729792 +/- 0.0629585 -2.27039% +/- 1.95864% (Student's t, pooled s = 0.248814) * 120 2.35536 3.66713 3.2849 3.2059917 0.24618565 No difference proven at 95.0% confidence With 10 clients over-saturating the pipeline: x no semaphores + drm-tip * patched +------------------------------------------------------------------------+ | ++ ** | | ++ ** | | ++ ** | | ++ ** | | ++ xx *** | | ++ xx *** | | ++ xxx*** | | ++ xxx*** | | +++ xxx*** | | +++ xx**** | | +++ xx**** | | +++ xx**** | | +++ xx**** | | ++++ xx**** | | +++++ xx**** | | +++++ x x****** | | ++++++ xxx******* | | ++++++ xxx******* | | ++++++ xxx******* | | ++++++ xx******** | | ++++++ xxxx******** | | ++++++ xxxx******** | | ++++++++ xxxxx********* | |+ + + + ++++++++ xxx*xx**********x* *| | |__A__| | | |__AM__| | | |__A_| | +------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 120 2.47855 2.8972 2.72376 2.7193402 0.074604933 + 120 1.17367 1.77459 1.71977 1.6966782 0.085850697 Difference at 95.0% confidence -1.02266 +/- 0.0203502 -37.607% +/- 0.748352% (Student's t, pooled s = 0.0804246) * 120 2.57868 3.00821 2.80142 2.7923878 0.058646477 Difference at 95.0% confidence 0.0730476 +/- 0.0169791 2.68622% +/- 0.624383% (Student's t, pooled s = 0.0671018) Indicating that we've recovered the regression from enabling semaphores on this saturated setup, with a hint towards an overall improvement. Very similar, but of smaller magnitude, results are observed on both Skylake(gt2) and Kabylake(gt4). This may be due to the reduced impact of bus-cycles, where we see a 50% hit on Broxton, it is only 10% on the big core, in this particular test. One observation to make here is that for a greedy client trying to maximise its own throughput, using semaphores is the right choice. It is only the holistic system-wide view that semaphores of one client impacts another and reduces the overall throughput where we would choose to disable semaphores. The most noticeable negactive impact this has is on the no-op microbenchmarks, which are also very notable for having no cpu bus load. In particular, this increases the runtime and energy consumption of gem_exec_whisper. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> Cc: Dmitry Ermilov <dmitry.ermilov@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190504070707.30902-1-chris@chris-wilson.co.uk
-
- 03 May, 2019 6 commits
-
-
Ville Syrjälä authored
We have a lot of '(u64)foo * bar' everywhere. Replace with mul_u32_u32() to avoid gcc failing to use a regular 32x32->64 multiply for this. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190408152702.4153-1-ville.syrjala@linux.intel.comReviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-
Ville Syrjälä authored
Turns out the cursor is compatible with the pipe "HDR mode". It's only the actual SDR planes that get entirely bypassed during blending. So let's ignore the cursor when checking if we have any planes active that aren't HDR compatible. This fixes the regressions in the kms_cursor_crc and kms_plane_cursor tests. Cc: Uma Shankar <uma.shankar@intel.com> Cc: Shashank Sharma <shashank.sharma@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110579 Fixes: 09b25812 ("drm/i915: Enable pipe HDR mode on ICL if only HDR planes are used") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190502200607.14504-2-ville.syrjala@linux.intel.comReviewed-by: Uma Shankar <uma.shankar@intel.com>
-
Ville Syrjälä authored
I fumbled the PIPEMISC write into the wrong place. It only gets called for fastsets, but since value needs to be updated based on the set of active planes it needs to be done for all plane updates. Move it to the correct spot. The symptoms include SDR planes never showing up if a previous modeset/fastset left the pipe in HDR mode. This was immediately obvious when running the kms_plane pixel format tests. Unfortunately the test didn't realize it was scanning out pure black all the time and declared success anyway. Cc: Uma Shankar <uma.shankar@intel.com> Cc: Shashank Sharma <shashank.sharma@intel.com> Fixes: 09b25812 ("drm/i915: Enable pipe HDR mode on ICL if only HDR planes are used") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190502200607.14504-1-ville.syrjala@linux.intel.comReviewed-by: Uma Shankar <uma.shankar@intel.com>
-
Chris Wilson authored
Currently we submit the semaphore busywait as soon as the signaler is submitted to HW. However, we may submit the signaler as the tail of a batch of requests, and even not as the first context in the HW list, i.e. the busywait may start spinning far in advance of the signaler even starting. If we wait until the request before the signaler is completed before submitting the busywait, we prevent the busywait from starting too early, if the signaler is not first in submission port. To handle the case where the signaler is at the start of the second (or later) submission port, we will need to delay the execution callback until we know the context is promoted to port0. A challenge for later. Fixes: e8861964 ("drm/i915: Use HW semaphores for inter-engine synchroni sation on gen8+") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190501114541.10077-9-chris@chris-wilson.co.uk
-
Chris Wilson authored
Given sufficient preemption, we may see a busy system that doesn't advance seqno while performing work across multiple contexts, and given sufficient pathology not even notice a change in ACTHD. What does change between the preempting contexts is their RING, so take note of that and treat a change in the ring address as being an indication of forward progress. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190501114541.10077-1-chris@chris-wilson.co.uk
-
Chris Wilson authored
Drop the check in GEM parking that the engines were already parked. The intention here was that before we dropped the GT wakeref, we were sure that no more interrupts could be raised -- however, we have already dropped the wakeref by this point and the warning is no longer valid. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190502150024.16636-2-chris@chris-wilson.co.uk
-