drm/i915: Track frontbuffer invalidation/flushing

So these are the guts of the new beast. This tracks when a frontbuffer gets invalidated (due to frontbuffer rendering) and hence should be constantly scaned out, and when it's flushed again and can be compressed/one-shot-upload. Rules for flushing are simple: The frontbuffer needs one more full upload starting from the next vblank. Which means that the flushing can _only_ be called once the frontbuffer update has been latched. But this poses a problem for pageflips: We can't just delay the flushing until the pageflip is latched, since that would pose the risk that we override frontbuffer rendering that has been scheduled in-between the pageflip ioctl and the actual latching. To handle this track asynchronous invalidations (and also pageflip) state per-ring and delay any in-between flushing until the rendering has completed. And also cancel any delayed flushing if we get a new invalidation request (whether delayed or not). Also call intel_mark_fb_busy in both cases in all cases to make sure that we keep the screen at the highest refresh rate both on flips, synchronous plane updates and for frontbuffer rendering. v2: Lots of improvements Suggestions from Chris: - Move invalidate/flush in flush_*_domain and set_to_*_domain. - Drop the flush in busy_ioctl since it's redundant. Was a leftover from an earlier concept to track flips/delayed flushes. - Don't forget about the initial modeset enable/final disable. Suggested by Chris. Track flips accurately, too. Since flips complete independently of rendering we need to track pending flips in a separate mask. Again if an invalidate happens we need to cancel the evenutal flush to avoid races. v3: Provide correct header declarations for flip functions. Currently not needed outside of intel_display.c, but part of the proper interface. v4: Add proper domain management to fbcon so that the fbcon buffer is also tracked correctly. v5: Fixup locking around the fbcon set_to_gtt_domain call. v6: More comments from Chris: - Split out fbcon changes. - Drop superflous checks for potential scanout before calling intel_fb functions - we can micro-optimize this later. - s/intel_fb_/intel_fb_obj_/ to make it clear that this deals in gem object. We already have precedence for fb_obj in the pin_and_fence functions. v7: Clarify the semantics of the flip flush handling by renaming things a bit: - Don't go through a gem object but take the relevant frontbuffer bits directly. These functions center on the plane, the actual object is irrelevant - even a flip to the same object as already active should cause a flush. - Add a new intel_frontbuffer_flip for synchronous plane updates. It currently just calls intel_frontbuffer_flush since the implemenation differs. This way we achieve a clear split between one-shot update events on one side and frontbuffer rendering with potentially a very long delay between the invalidate and flush. Chris and I also had some discussions about mark_busy and whether it is appropriate to call from flush. But mark busy is a state which should be derived from the 3 events (invalidate, flush, flip) we now have by the users, like psr does by tracking relevant information in psr.busy_frontbuffer_bits. DRRS (the only real use of mark_busy for frontbuffer) needs to have similar logic. With that the overall mark_busy in the core could be removed. v8: Only when retiring gpu buffers only flush frontbuffer bits we actually invalidated in a batch. Just for safety since before any additional usage/invalidate we should always retire current rendering. Suggested by Chris Wilson. v9: Actually use intel_frontbuffer_flip in all appropriate places. Spotted by Chris. v10: Address more comments from Chris: - Don't call _flip in set_base when the crtc is inactive, avoids redunancy in the modeset case with the initial enabling of all planes. - Add comments explaining that the initial/final plane enable/disable still has work left to do before it's fully generic. v11: Only invalidate for gtt/cpu access when writing. Spotted by Chris. v12: s/_flush/_flip/ in intel_overlay.c per Chris' comment. Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Track frontbuffer invalidation/flushing
So these are the guts of the new beast. This tracks when a frontbuffer gets invalidated (due to frontbuffer rendering) and hence should be constantly scaned out, and when it's flushed again and can be compressed/one-shot-upload. Rules for flushing are simple: The frontbuffer needs one more full upload starting from the next vblank. Which means that the flushing can _only_ be called once the frontbuffer update has been latched. But this poses a problem for pageflips: We can't just delay the flushing until the pageflip is latched, since that would pose the risk that we override frontbuffer rendering that has been scheduled in-between the pageflip ioctl and the actual latching. To handle this track asynchronous invalidations (and also pageflip) state per-ring and delay any in-between flushing until the rendering has completed. And also cancel any delayed flushing if we get a new invalidation request (whether delayed or not). Also call intel_mark_fb_busy in both cases in all cases to make sure that we keep the screen at the highest refresh rate both on flips, synchronous plane updates and for frontbuffer rendering. v2: Lots of improvements Suggestions from Chris: - Move invalidate/flush in flush_*_domain and set_to_*_domain. - Drop the flush in busy_ioctl since it's redundant. Was a leftover from an earlier concept to track flips/delayed flushes. - Don't forget about the initial modeset enable/final disable. Suggested by Chris. Track flips accurately, too. Since flips complete independently of rendering we need to track pending flips in a separate mask. Again if an invalidate happens we need to cancel the evenutal flush to avoid races. v3: Provide correct header declarations for flip functions. Currently not needed outside of intel_display.c, but part of the proper interface. v4: Add proper domain management to fbcon so that the fbcon buffer is also tracked correctly. v5: Fixup locking around the fbcon set_to_gtt_domain call. v6: More comments from Chris: - Split out fbcon changes. - Drop superflous checks for potential scanout before calling intel_fb functions - we can micro-optimize this later. - s/intel_fb_/intel_fb_obj_/ to make it clear that this deals in gem object. We already have precedence for fb_obj in the pin_and_fence functions. v7: Clarify the semantics of the flip flush handling by renaming things a bit: - Don't go through a gem object but take the relevant frontbuffer bits directly. These functions center on the plane, the actual object is irrelevant - even a flip to the same object as already active should cause a flush. - Add a new intel_frontbuffer_flip for synchronous plane updates. It currently just calls intel_frontbuffer_flush since the implemenation differs. This way we achieve a clear split between one-shot update events on one side and frontbuffer rendering with potentially a very long delay between the invalidate and flush. Chris and I also had some discussions about mark_busy and whether it is appropriate to call from flush. But mark busy is a state which should be derived from the 3 events (invalidate, flush, flip) we now have by the users, like psr does by tracking relevant information in psr.busy_frontbuffer_bits. DRRS (the only real use of mark_busy for frontbuffer) needs to have similar logic. With that the overall mark_busy in the core could be removed. v8: Only when retiring gpu buffers only flush frontbuffer bits we actually invalidated in a batch. Just for safety since before any additional usage/invalidate we should always retire current rendering. Suggested by Chris Wilson. v9: Actually use intel_frontbuffer_flip in all appropriate places. Spotted by Chris. v10: Address more comments from Chris: - Don't call _flip in set_base when the crtc is inactive, avoids redunancy in the modeset case with the initial enabling of all planes. - Add comments explaining that the initial/final plane enable/disable still has work left to do before it's fully generic. v11: Only invalidate for gtt/cpu access when writing. Spotted by Chris. v12: s/_flush/_flip/ in intel_overlay.c per Chris' comment. Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
f99d7069 · Daniel Vetter · cc36513c · f99d7069 · f99d7069 · f99d7069
Commit f99d7069 authored Jun 19, 2014 by Daniel Vetter
7 changed files
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1330,6 +1330,17 @@ struct intel_pipe_crc {
 	wait_queue_head_t wq;
 };

+struct i915_frontbuffer_tracking {
+	struct mutex lock;
+
+	/*
+	 * Tracking bits for delayed frontbuffer flushing du to gpu activity or
+	 * scheduled flips.
+	 */
+	unsigned busy_bits;
+	unsigned flip_bits;
+};
+
 struct drm_i915_private {
 	struct drm_device *dev;
 	struct kmem_cache *slab;
@@ -1476,6 +1487,9 @@ struct drm_i915_private {
 	bool lvds_downclock_avail;
 	/* indicates the reduced downclock for LVDS*/
 	int lvds_downclock;
+
+	struct i915_frontbuffer_tracking fb_tracking;
+
 	u16 orig_clock;

 	bool mchbar_need_disable;

--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1395,8 +1395,6 @@ i915_gem_set_domain_ioctl(struct drm_device *dev, void *data,
 		goto unlock;
 	}

-	intel_edp_psr_exit(dev);
-
 	/* Try to flush the object off the GPU without holding the lock.
 	 * We will repeat the flush holding the lock in the normal manner
 	 * to catch cases where we are gazumped.
@@ -1442,8 +1440,6 @@ i915_gem_sw_finish_ioctl(struct drm_device *dev, void *data,
 	if (ret)
 		return ret;

-	intel_edp_psr_exit(dev);
-
 	obj = to_intel_bo(drm_gem_object_lookup(dev, file, args->handle));
 	if (&obj->base == NULL) {
 		ret = -ENOENT;
@@ -2223,6 +2219,8 @@ i915_gem_object_move_to_inactive(struct drm_i915_gem_object *obj)
 			list_move_tail(&vma->mm_list, &vm->inactive_list);
 	}

+	intel_fb_obj_flush(obj, true);
+
 	list_del_init(&obj->ring_list);
 	obj->ring = NULL;

@@ -3552,6 +3550,8 @@ i915_gem_object_flush_gtt_write_domain(struct drm_i915_gem_object *obj)
 	old_write_domain = obj->base.write_domain;
 	obj->base.write_domain = 0;

+	intel_fb_obj_flush(obj, false);
+
 	trace_i915_gem_object_change_domain(obj,
 					    obj->base.read_domains,
 					    old_write_domain);
@@ -3573,6 +3573,8 @@ i915_gem_object_flush_cpu_write_domain(struct drm_i915_gem_object *obj,
 	old_write_domain = obj->base.write_domain;
 	obj->base.write_domain = 0;

+	intel_fb_obj_flush(obj, false);
+
 	trace_i915_gem_object_change_domain(obj,
 					    obj->base.read_domains,
 					    old_write_domain);
@@ -3626,6 +3628,9 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
 		obj->dirty = 1;
 	}

+	if (write)
+		intel_fb_obj_invalidate(obj, NULL);
+
 	trace_i915_gem_object_change_domain(obj,
 					    old_read_domains,
 					    old_write_domain);
@@ -3962,6 +3967,9 @@ i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write)
 		obj->base.write_domain = I915_GEM_DOMAIN_CPU;
 	}

+	if (write)
+		intel_fb_obj_invalidate(obj, NULL);
+
 	trace_i915_gem_object_change_domain(obj,
 					    old_read_domains,
 					    old_write_domain);
@@ -4236,8 +4244,6 @@ i915_gem_busy_ioctl(struct drm_device *dev, void *data,
 	if (ret)
 		return ret;

-	intel_edp_psr_exit(dev);
-
 	obj = to_intel_bo(drm_gem_object_lookup(dev, file, args->handle));
 	if (&obj->base == NULL) {
 		ret = -ENOENT;
@@ -4937,6 +4943,8 @@ i915_gem_load(struct drm_device *dev)

 	dev_priv->mm.oom_notifier.notifier_call = i915_gem_shrinker_oom;
 	register_oom_notifier(&dev_priv->mm.oom_notifier);
+
+	mutex_init(&dev_priv->fb_tracking.lock);
 }

 void i915_gem_release(struct drm_device *dev, struct drm_file *file)

--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -975,10 +975,8 @@ i915_gem_execbuffer_move_to_active(struct list_head *vmas,
 		if (obj->base.write_domain) {
 			obj->dirty = 1;
 			obj->last_write_seqno = intel_ring_get_seqno(ring);
-			/* check for potential scanout */
-			if (i915_gem_obj_ggtt_bound(obj) &&
-			    i915_gem_obj_to_ggtt(obj)->pin_count)
-				intel_mark_fb_busy(obj, ring);
+
+			intel_fb_obj_invalidate(obj, ring);

 			/* update for the implicit flush after a batch */
 			obj->base.write_domain &= ~I915_GEM_GPU_DOMAINS;

--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -2756,6 +2756,9 @@ intel_pipe_set_base(struct drm_crtc *crtc, int x, int y,

 	dev_priv->display.update_primary_plane(crtc, fb, x, y);

+	if (intel_crtc->active)
+		intel_frontbuffer_flip(dev, INTEL_FRONTBUFFER_PRIMARY(pipe));
+
 	crtc->primary->fb = fb;
 	crtc->x = x;
 	crtc->y = y;
@@ -3950,6 +3953,13 @@ static void intel_crtc_enable_planes(struct drm_crtc *crtc)
 	mutex_lock(&dev->struct_mutex);
 	intel_update_fbc(dev);
 	mutex_unlock(&dev->struct_mutex);
+
+	/*
+	 * FIXME: Once we grow proper nuclear flip support out of this we need
+	 * to compute the mask of flip planes precisely. For the time being
+	 * consider this a flip from a NULL plane.
+	 */
+	intel_frontbuffer_flip(dev, INTEL_FRONTBUFFER_ALL_MASK(pipe));
 }

 static void intel_crtc_disable_planes(struct drm_crtc *crtc)
@@ -3972,6 +3982,13 @@ static void intel_crtc_disable_planes(struct drm_crtc *crtc)
 	intel_disable_planes(crtc);
 	intel_disable_primary_hw_plane(dev_priv, plane, pipe);

+	/*
+	 * FIXME: Once we grow proper nuclear flip support out of this we need
+	 * to compute the mask of flip planes precisely. For the time being
+	 * consider this a flip to a NULL plane.
+	 */
+	intel_frontbuffer_flip(dev, INTEL_FRONTBUFFER_ALL_MASK(pipe));
+
 	drm_vblank_off(dev, pipe);
 }

@@ -8212,6 +8229,8 @@ static int intel_crtc_cursor_set_obj(struct drm_crtc *crtc,
 		intel_crtc_update_cursor(crtc, intel_crtc->cursor_bo != NULL);
 	}

+	intel_frontbuffer_flip(dev, INTEL_FRONTBUFFER_CURSOR(pipe));
+
 	return 0;
 fail_unpin:
 	i915_gem_object_unpin_from_display_plane(obj);
@@ -8827,20 +8846,26 @@ void intel_mark_idle(struct drm_device *dev)
 }


-void intel_mark_fb_busy(struct drm_i915_gem_object *obj,
-			struct intel_engine_cs *ring)
+/**
+ * intel_mark_fb_busy - mark given planes as busy
+ * @dev: DRM device
+ * @frontbuffer_bits: bits for the affected planes
+ * @ring: optional ring for asynchronous commands
+ *
+ * This function gets called every time the screen contents change. It can be
+ * used to keep e.g. the update rate at the nominal refresh rate with DRRS.
+ */
+static void intel_mark_fb_busy(struct drm_device *dev,
+			       unsigned frontbuffer_bits,
+			       struct intel_engine_cs *ring)
 {
-	struct drm_device *dev = obj->base.dev;
 	enum pipe pipe;

-	intel_edp_psr_exit(dev);
-
 	if (!i915.powersave)
 		return;

 	for_each_pipe(pipe) {
-		if (!(obj->frontbuffer_bits &
-		      INTEL_FRONTBUFFER_ALL_MASK(pipe)))
+		if (!(frontbuffer_bits & INTEL_FRONTBUFFER_ALL_MASK(pipe)))
 			continue;

 		intel_increase_pllclock(dev, pipe);
@@ -8849,6 +8874,150 @@ void intel_mark_fb_busy(struct drm_i915_gem_object *obj,
 	}
 }

+/**
+ * intel_fb_obj_invalidate - invalidate frontbuffer object
+ * @obj: GEM object to invalidate
+ * @ring: set for asynchronous rendering
+ *
+ * This function gets called every time rendering on the given object starts and
+ * frontbuffer caching (fbc, low refresh rate for DRRS, panel self refresh) must
+ * be invalidated. If @ring is non-NULL any subsequent invalidation will be delayed
+ * until the rendering completes or a flip on this frontbuffer plane is
+ * scheduled.
+ */
+void intel_fb_obj_invalidate(struct drm_i915_gem_object *obj,
+			     struct intel_engine_cs *ring)
+{
+	struct drm_device *dev = obj->base.dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+
+	WARN_ON(!mutex_is_locked(&dev->struct_mutex));
+
+	if (!obj->frontbuffer_bits)
+		return;
+
+	if (ring) {
+		mutex_lock(&dev_priv->fb_tracking.lock);
+		dev_priv->fb_tracking.busy_bits
+			|= obj->frontbuffer_bits;
+		dev_priv->fb_tracking.flip_bits
+			&= ~obj->frontbuffer_bits;
+		mutex_unlock(&dev_priv->fb_tracking.lock);
+	}
+
+	intel_mark_fb_busy(dev, obj->frontbuffer_bits, ring);
+
+	intel_edp_psr_exit(dev);
+}
+
+/**
+ * intel_frontbuffer_flush - flush frontbuffer
+ * @dev: DRM device
+ * @frontbuffer_bits: frontbuffer plane tracking bits
+ *
+ * This function gets called every time rendering on the given planes has
+ * completed and frontbuffer caching can be started again. Flushes will get
+ * delayed if they're blocked by some oustanding asynchronous rendering.
+ *
+ * Can be called without any locks held.
+ */
+void intel_frontbuffer_flush(struct drm_device *dev,
+			     unsigned frontbuffer_bits)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+
+	/* Delay flushing when rings are still busy.*/
+	mutex_lock(&dev_priv->fb_tracking.lock);
+	frontbuffer_bits &= ~dev_priv->fb_tracking.busy_bits;
+	mutex_unlock(&dev_priv->fb_tracking.lock);
+
+	intel_mark_fb_busy(dev, frontbuffer_bits, NULL);
+
+	intel_edp_psr_exit(dev);
+}
+
+/**
+ * intel_fb_obj_flush - flush frontbuffer object
+ * @obj: GEM object to flush
+ * @retire: set when retiring asynchronous rendering
+ *
+ * This function gets called every time rendering on the given object has
+ * completed and frontbuffer caching can be started again. If @retire is true
+ * then any delayed flushes will be unblocked.
+ */
+void intel_fb_obj_flush(struct drm_i915_gem_object *obj,
+			bool retire)
+{
+	struct drm_device *dev = obj->base.dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	unsigned frontbuffer_bits;
+
+	WARN_ON(!mutex_is_locked(&dev->struct_mutex));
+
+	if (!obj->frontbuffer_bits)
+		return;
+
+	frontbuffer_bits = obj->frontbuffer_bits;
+
+	if (retire) {
+		mutex_lock(&dev_priv->fb_tracking.lock);
+		/* Filter out new bits since rendering started. */
+		frontbuffer_bits &= dev_priv->fb_tracking.busy_bits;
+
+		dev_priv->fb_tracking.busy_bits &= ~frontbuffer_bits;
+		mutex_unlock(&dev_priv->fb_tracking.lock);
+	}
+
+	intel_frontbuffer_flush(dev, frontbuffer_bits);
+}
+
+/**
+ * intel_frontbuffer_flip_prepare - prepare asnychronous frontbuffer flip
+ * @dev: DRM device
+ * @frontbuffer_bits: frontbuffer plane tracking bits
+ *
+ * This function gets called after scheduling a flip on @obj. The actual
+ * frontbuffer flushing will be delayed until completion is signalled with
+ * intel_frontbuffer_flip_complete. If an invalidate happens in between this
+ * flush will be cancelled.
+ *
+ * Can be called without any locks held.
+ */
+void intel_frontbuffer_flip_prepare(struct drm_device *dev,
+				    unsigned frontbuffer_bits)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+
+	mutex_lock(&dev_priv->fb_tracking.lock);
+	dev_priv->fb_tracking.flip_bits
+		|= frontbuffer_bits;
+	mutex_unlock(&dev_priv->fb_tracking.lock);
+}
+
+/**
+ * intel_frontbuffer_flip_complete - complete asynchronous frontbuffer flush
+ * @dev: DRM device
+ * @frontbuffer_bits: frontbuffer plane tracking bits
+ *
+ * This function gets called after the flip has been latched and will complete
+ * on the next vblank. It will execute the fush if it hasn't been cancalled yet.
+ *
+ * Can be called without any locks held.
+ */
+void intel_frontbuffer_flip_complete(struct drm_device *dev,
+				     unsigned frontbuffer_bits)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+
+	mutex_lock(&dev_priv->fb_tracking.lock);
+	/* Mask any cancelled flips. */
+	frontbuffer_bits &= dev_priv->fb_tracking.flip_bits;
+	dev_priv->fb_tracking.flip_bits &= ~frontbuffer_bits;
+	mutex_unlock(&dev_priv->fb_tracking.lock);
+
+	intel_frontbuffer_flush(dev, frontbuffer_bits);
+}
+
 static void intel_crtc_destroy(struct drm_crtc *crtc)
 {
 	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
@@ -8876,6 +9045,7 @@ static void intel_unpin_work_fn(struct work_struct *__work)
 	struct intel_unpin_work *work =
 		container_of(__work, struct intel_unpin_work, work);
 	struct drm_device *dev = work->crtc->dev;
+	enum pipe pipe = to_intel_crtc(work->crtc)->pipe;

 	mutex_lock(&dev->struct_mutex);
 	intel_unpin_fb_obj(work->old_fb_obj);
@@ -8885,6 +9055,8 @@ static void intel_unpin_work_fn(struct work_struct *__work)
 	intel_update_fbc(dev);
 	mutex_unlock(&dev->struct_mutex);

+	intel_frontbuffer_flip_complete(dev, INTEL_FRONTBUFFER_PRIMARY(pipe));
+
 	BUG_ON(atomic_read(&to_intel_crtc(work->crtc)->unpin_work_count) == 0);
 	atomic_dec(&to_intel_crtc(work->crtc)->unpin_work_count);

@@ -9441,9 +9613,6 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc,
 	if (work == NULL)
 		return -ENOMEM;

-	/* Exit PSR early in page flip */
-	intel_edp_psr_exit(dev);
-
 	work->event = event;
 	work->crtc = crtc;
 	work->old_fb_obj = to_intel_framebuffer(old_fb)->obj;
@@ -9519,7 +9688,7 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc,
 			  INTEL_FRONTBUFFER_PRIMARY(pipe));

 	intel_disable_fbc(dev);
-	intel_mark_fb_busy(obj, NULL);
+	intel_frontbuffer_flip_prepare(dev, INTEL_FRONTBUFFER_PRIMARY(pipe));
 	mutex_unlock(&dev->struct_mutex);

 	trace_i915_flip_request(intel_crtc->plane, obj);

--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -724,8 +724,33 @@ bool intel_has_pending_fb_unpin(struct drm_device *dev);
 int intel_pch_rawclk(struct drm_device *dev);
 int valleyview_cur_cdclk(struct drm_i915_private *dev_priv);
 void intel_mark_busy(struct drm_device *dev);
-void intel_mark_fb_busy(struct drm_i915_gem_object *obj,
-			struct intel_engine_cs *ring);
+void intel_fb_obj_invalidate(struct drm_i915_gem_object *obj,
+			     struct intel_engine_cs *ring);
+void intel_frontbuffer_flip_prepare(struct drm_device *dev,
+				    unsigned frontbuffer_bits);
+void intel_frontbuffer_flip_complete(struct drm_device *dev,
+				     unsigned frontbuffer_bits);
+void intel_frontbuffer_flush(struct drm_device *dev,
+			     unsigned frontbuffer_bits);
+/**
+ * intel_frontbuffer_flip - prepare frontbuffer flip
+ * @dev: DRM device
+ * @frontbuffer_bits: frontbuffer plane tracking bits
+ *
+ * This function gets called after scheduling a flip on @obj. This is for
+ * synchronous plane updates which will happen on the next vblank and which will
+ * not get delayed by pending gpu rendering.
+ *
+ * Can be called without any locks held.
+ */
+static inline
+void intel_frontbuffer_flip(struct drm_device *dev,
+			    unsigned frontbuffer_bits)
+{
+	intel_frontbuffer_flush(dev, frontbuffer_bits);
+}
+
+void intel_fb_obj_flush(struct drm_i915_gem_object *obj, bool retire);
 void intel_mark_idle(struct drm_device *dev);
 void intel_crtc_restore_mode(struct drm_crtc *crtc);
 void intel_crtc_update_dpms(struct drm_crtc *crtc);

--- a/drivers/gpu/drm/i915/intel_overlay.c
+++ b/drivers/gpu/drm/i915/intel_overlay.c
@@ -787,6 +787,9 @@ static int intel_overlay_do_put_image(struct intel_overlay *overlay,
 	overlay->old_vid_bo = overlay->vid_bo;
 	overlay->vid_bo = new_bo;

+	intel_frontbuffer_flip(dev,
+			       INTEL_FRONTBUFFER_OVERLAY(pipe));
+
 	return 0;

 out_unpin:

--- a/drivers/gpu/drm/i915/intel_sprite.c
+++ b/drivers/gpu/drm/i915/intel_sprite.c
@@ -1034,6 +1034,8 @@ intel_update_plane(struct drm_plane *plane, struct drm_crtc *crtc,
 		else
 			intel_plane->disable_plane(plane, crtc);

+		intel_frontbuffer_flip(dev, INTEL_FRONTBUFFER_SPRITE(pipe));
+
 		if (!primary_was_enabled && primary_enabled)
 			intel_post_enable_primary(crtc);
 	}
@@ -1054,8 +1056,6 @@ intel_update_plane(struct drm_plane *plane, struct drm_crtc *crtc,
 		mutex_unlock(&dev->struct_mutex);
 	}

-	intel_edp_psr_exit(dev);
-
 	return 0;
 }