sched/deadline: Base GRUB reclaiming on the inactive utilization

Instead of decreasing the runtime as "dq = -Uact dt" (eventually divided by the maximum utilization available for deadline tasks), decrease it as "dq = -max{u, (1 - Uinact)} dt", where u is the task utilization and Uinact is the "inactive utilization". In this way, the maximum fraction of CPU time that can be reclaimed is given by the total utilization of deadline tasks. This approach solves a fairness issue with "traditional" global GRUB reclaiming: using the traditional GRUB algorithm, if tasks are allocated to the various cores in a non-uniform way, the reclaiming mechanism allows some tasks to reclaim more time than others. This issue is visible starting 11 time-consuming tasks with runtime 10ms and period 30ms (total utilization 3.666) on a 4-cores system: some tasks will receive much more than the reserved runtime (thanks to the reclaiming mechanism), while other tasks will receive less than the reserved runtime. Tested-by: Daniel Bristot de Oliveira <bristot@redhat.com> Signed-off-by: Luca Abeni <luca.abeni@santannapisa.it> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Claudio Scordino <claudio@evidence.eu.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tommaso Cucinotta <tommaso.cucinotta@sssup.it> Link: http://lkml.kernel.org/r/1495138417-6203-9-git-send-email-luca.abeni@santannapisa.itSigned-off-by: Ingo Molnar <mingo@kernel.org>

sched/deadline: Base GRUB reclaiming on the inactive utilization
Instead of decreasing the runtime as "dq = -Uact dt" (eventually divided by the maximum utilization available for deadline tasks), decrease it as "dq = -max{u, (1 - Uinact)} dt", where u is the task utilization and Uinact is the "inactive utilization". In this way, the maximum fraction of CPU time that can be reclaimed is given by the total utilization of deadline tasks. This approach solves a fairness issue with "traditional" global GRUB reclaiming: using the traditional GRUB algorithm, if tasks are allocated to the various cores in a non-uniform way, the reclaiming mechanism allows some tasks to reclaim more time than others. This issue is visible starting 11 time-consuming tasks with runtime 10ms and period 30ms (total utilization 3.666) on a 4-cores system: some tasks will receive much more than the reserved runtime (thanks to the reclaiming mechanism), while other tasks will receive less than the reserved runtime. Tested-by: Daniel Bristot de Oliveira <bristot@redhat.com> Signed-off-by: Luca Abeni <luca.abeni@santannapisa.it> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Claudio Scordino <claudio@evidence.eu.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tommaso Cucinotta <tommaso.cucinotta@sssup.it> Link: http://lkml.kernel.org/r/1495138417-6203-9-git-send-email-luca.abeni@santannapisa.itSigned-off-by: Ingo Molnar <mingo@kernel.org>
9f0d1a50 · Luca Abeni · Ingo Molnar · 8fd27231 · 9f0d1a50
Commit 9f0d1a50 authored May 18, 2017 by Luca Abeni Committed by Ingo Molnar Jun 08, 2017
Hide whitespace changes
Inline Side-by-side

Showing with 22 additions and 18 deletions

kernel/sched/deadline.c kernel/sched/deadline.c +22 -18

No files found.
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -955,26 +955,30 @@ extern bool sched_rt_bandwidth_account(struct rt_rq *rt_rq);
 /*
 * This function implements the GRUB accounting rule:
 * according to the GRUB reclaiming algorithm, the runtime is
- * not decreased as "dq = -dt", but as "dq = -Uact dt", where
- * Uact is the (per-runqueue) active utilization.
- * Since rq->dl.running_bw contains Uact * 2^BW_SHIFT, the result
- * has to be shifted right by BW_SHIFT.
- * To reclaim only a fraction Umax of the CPU time, the
- * runtime accounting rule is modified as
- * "dq = -Uact / Umax dt"; since rq->dl.bw_ratio contains
- * 2^RATIO_SHIFT / Umax, delta is multiplied by bw_ratio and shifted
- * right by RATIO_SHIFT.
- * Since delta is a 64 bit variable, to have an overflow its value
- * should be larger than 2^(64 - 20 - 8), which is more than 64 seconds.
- * So, overflow is not an issue here.
+ * not decreased as "dq = -dt", but as "dq = -max{u, (1 - Uinact)} dt",
+ * where u is the utilization of the task and Uinact is the
+ * (per-runqueue) inactive utilization, computed as the difference
+ * between the "total runqueue utilization" and the runqueue
+ * active utilization.
+ * Since rq->dl.running_bw and rq->dl.this_bw contain utilizations
+ * multiplied by 2^BW_SHIFT, the result has to be shifted right by BW_SHIFT.
 */
-u64 grub_reclaim(u64 delta, struct rq *rq)
+u64 grub_reclaim(u64 delta, struct rq *rq, struct sched_dl_entity *dl_se)
 {
-	delta *= rq->dl.running_bw;
-	delta *= rq->dl.bw_ratio;
-	delta >>= BW_SHIFT + RATIO_SHIFT;
+	u64 u_inact = rq->dl.this_bw - rq->dl.running_bw; /* Utot - Uact */
+	u64 u_act;

-	return delta;
+	/*
+	 * Instead of computing max{u, (1 - u_inact)}, we compare
+	 * u_inact with 1 - u, because u_inact can be larger than 1
+	 * (so, 1 - u_inact would be negative leading to wrong results)
+	 */
+	if (u_inact > BW_UNIT - dl_se->dl_bw)
+		u_act = dl_se->dl_bw;
+	else
+		u_act = BW_UNIT - u_inact;
+
+	return (delta * u_act) >> BW_SHIFT;
 }

 /*
@@ -1020,7 +1024,7 @@ static void update_curr_dl(struct rq *rq)
 	sched_rt_avg_update(rq, delta_exec);

 	if (unlikely(dl_se->flags & SCHED_FLAG_RECLAIM))
-		delta_exec = grub_reclaim(delta_exec, rq);
+		delta_exec = grub_reclaim(delta_exec, rq, &curr->dl);
 	dl_se->runtime -= delta_exec;

 throttle: