Commit 77577558 authored by Rafael J. Wysocki's avatar Rafael J. Wysocki

cpuidle: teo: Rework most recent idle duration values treatment

The TEO (Timer Events Oriented) cpuidle governor uses several most
recent idle duration values for a given CPU to refine the idle state
selection in case the previous long-term trends have not been
followed recently and a new trend appears to be forming.  That is
done by computing the average of the most recent idle duration
values falling below the time till the next timer event ("sleep
length"), provided that they are the majority of the most recent
idle duration values taken into account, and using it as the new
expected idle duration value.

However, idle state selection based on that value may not be optimal,
because the average does not really indicate which of the idle states
with target residencies less than or equal to it is likely to be the
best fit.

Thus, instead of computing the average, make the governor carry out
computations based on the distribution of the most recent idle
duration values among the bins corresponding to different idle
states.  Namely, if the majority of the most recent idle duration
values taken into consideration are less than the current sleep
length (which means that the CPU is likely to wake up early), find
the idle state closest to the "candidate" one "matching" the sleep
length whose target residency is less than or equal to the majority
of the most recent idle duration values that have fallen below the
current sleep length (which means that it is likely to be "shallow
enough" this time).
Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
parent c410a9a1
...@@ -47,15 +47,20 @@ ...@@ -47,15 +47,20 @@
* length). In turn, the "intercepts" metric reflects the relative frequency of * length). In turn, the "intercepts" metric reflects the relative frequency of
* situations in which the measured idle duration is so much shorter than the * situations in which the measured idle duration is so much shorter than the
* sleep length that the bin it falls into corresponds to an idle state * sleep length that the bin it falls into corresponds to an idle state
* shallower than the one whose bin is fallen into by the sleep length. * shallower than the one whose bin is fallen into by the sleep length (these
* situations are referred to as "intercepts" below).
*
* In addition to the metrics described above, the governor counts recent
* intercepts (that is, intercepts that have occurred during the last NR_RECENT
* invocations of it for the given CPU) for each bin.
* *
* In order to select an idle state for a CPU, the governor takes the following * In order to select an idle state for a CPU, the governor takes the following
* steps (modulo the possible latency constraint that must be taken into account * steps (modulo the possible latency constraint that must be taken into account
* too): * too):
* *
* 1. Find the deepest CPU idle state whose target residency does not exceed * 1. Find the deepest CPU idle state whose target residency does not exceed
* the current sleep length (the candidate idle state) and compute two sums * the current sleep length (the candidate idle state) and compute 3 sums as
* as follows: * follows:
* *
* - The sum of the "hits" and "intercepts" metrics for the candidate state * - The sum of the "hits" and "intercepts" metrics for the candidate state
* and all of the deeper idle states (it represents the cases in which the * and all of the deeper idle states (it represents the cases in which the
...@@ -67,25 +72,29 @@ ...@@ -67,25 +72,29 @@
* idle long enough to avoid being intercepted if the sleep length had been * idle long enough to avoid being intercepted if the sleep length had been
* equal to the current one). * equal to the current one).
* *
* 2. If the second sum is greater than the first one, look for an alternative * - The sum of the numbers of recent intercepts for all of the idle states
* idle state to select. * shallower than the candidate one.
*
* 2. If the second sum is greater than the first one or the third sum is
* greater than NR_RECENT / 2, the CPU is likely to wake up early, so look
* for an alternative idle state to select.
* *
* - Traverse the idle states shallower than the candidate one in the * - Traverse the idle states shallower than the candidate one in the
* descending order. * descending order.
* *
* - For each of them compute the sum of the "intercepts" metrics over all of * - For each of them compute the sum of the "intercepts" metrics and the sum
* the idle states between it and the candidate one (including the former * of the numbers of recent intercepts over all of the idle states between
* and excluding the latter). * it and the candidate one (including the former and excluding the
* latter).
* *
* - If that sum is greater than a half of the second sum computed in step 1 * - If each of these sums that needs to be taken into account (because the
* (which means that the target residency of the state in question had not * check related to it has indicated that the CPU is likely to wake up
* exceeded the idle duration in over a half of the relevant cases), select * early) is greater than a half of the corresponding sum computed in step
* the given idle state instead of the candidate one. * 1 (which means that the target residency of the state in question had
* not exceeded the idle duration in over a half of the relevant cases),
* select the given idle state instead of the candidate one.
* *
* 3. If the majority of the most recent idle duration values are below the * 3. By default, select the candidate state.
* current anticipated idle duration, use those values to compute the new
* expected idle duration and find an idle state matching it (which has to
* be shallower than the current candidate one).
*/ */
#include <linux/cpuidle.h> #include <linux/cpuidle.h>
...@@ -103,18 +112,20 @@ ...@@ -103,18 +112,20 @@
/* /*
* Number of the most recent idle duration values to take into consideration for * Number of the most recent idle duration values to take into consideration for
* the detection of wakeup patterns. * the detection of recent early wakeup patterns.
*/ */
#define INTERVALS 8 #define NR_RECENT 9
/** /**
* struct teo_bin - Metrics used by the TEO cpuidle governor. * struct teo_bin - Metrics used by the TEO cpuidle governor.
* @intercepts: The "intercepts" metric. * @intercepts: The "intercepts" metric.
* @hits: The "hits" metric. * @hits: The "hits" metric.
* @recent: The number of recent "intercepts".
*/ */
struct teo_bin { struct teo_bin {
unsigned int intercepts; unsigned int intercepts;
unsigned int hits; unsigned int hits;
unsigned int recent;
}; };
/** /**
...@@ -123,16 +134,16 @@ struct teo_bin { ...@@ -123,16 +134,16 @@ struct teo_bin {
* @sleep_length_ns: Time till the closest timer event (at the selection time). * @sleep_length_ns: Time till the closest timer event (at the selection time).
* @state_bins: Idle state data bins for this CPU. * @state_bins: Idle state data bins for this CPU.
* @total: Grand total of the "intercepts" and "hits" mertics for all bins. * @total: Grand total of the "intercepts" and "hits" mertics for all bins.
* @interval_idx: Index of the most recent saved idle interval. * @next_recent_idx: Index of the next @recent_idx entry to update.
* @intervals: Saved idle duration values. * @recent_idx: Indices of bins corresponding to recent "intercepts".
*/ */
struct teo_cpu { struct teo_cpu {
s64 time_span_ns; s64 time_span_ns;
s64 sleep_length_ns; s64 sleep_length_ns;
struct teo_bin state_bins[CPUIDLE_STATE_MAX]; struct teo_bin state_bins[CPUIDLE_STATE_MAX];
unsigned int total; unsigned int total;
int interval_idx; int next_recent_idx;
u64 intervals[INTERVALS]; int recent_idx[NR_RECENT];
}; };
static DEFINE_PER_CPU(struct teo_cpu, teo_cpus); static DEFINE_PER_CPU(struct teo_cpu, teo_cpus);
...@@ -201,26 +212,29 @@ static void teo_update(struct cpuidle_driver *drv, struct cpuidle_device *dev) ...@@ -201,26 +212,29 @@ static void teo_update(struct cpuidle_driver *drv, struct cpuidle_device *dev)
} }
} }
i = cpu_data->next_recent_idx++;
if (cpu_data->next_recent_idx >= NR_RECENT)
cpu_data->next_recent_idx = 0;
if (cpu_data->recent_idx[i] >= 0)
cpu_data->state_bins[cpu_data->recent_idx[i]].recent--;
/* /*
* If the measured idle duration falls into the same bin as the sleep * If the measured idle duration falls into the same bin as the sleep
* length, this is a "hit", so update the "hits" metric for that bin. * length, this is a "hit", so update the "hits" metric for that bin.
* Otherwise, update the "intercepts" metric for the bin fallen into by * Otherwise, update the "intercepts" metric for the bin fallen into by
* the measured idle duration. * the measured idle duration.
*/ */
if (idx_timer == idx_duration) if (idx_timer == idx_duration) {
cpu_data->state_bins[idx_timer].hits += PULSE; cpu_data->state_bins[idx_timer].hits += PULSE;
else cpu_data->recent_idx[i] = -1;
} else {
cpu_data->state_bins[idx_duration].intercepts += PULSE; cpu_data->state_bins[idx_duration].intercepts += PULSE;
cpu_data->state_bins[idx_duration].recent++;
cpu_data->recent_idx[i] = idx_duration;
}
cpu_data->total += PULSE; cpu_data->total += PULSE;
/*
* Save idle duration values corresponding to non-timer wakeups for
* pattern detection.
*/
cpu_data->intervals[cpu_data->interval_idx++] = measured_ns;
if (cpu_data->interval_idx >= INTERVALS)
cpu_data->interval_idx = 0;
} }
static bool teo_time_ok(u64 interval_ns) static bool teo_time_ok(u64 interval_ns)
...@@ -271,10 +285,13 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, ...@@ -271,10 +285,13 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
s64 latency_req = cpuidle_governor_latency_req(dev->cpu); s64 latency_req = cpuidle_governor_latency_req(dev->cpu);
unsigned int idx_intercept_sum = 0; unsigned int idx_intercept_sum = 0;
unsigned int intercept_sum = 0; unsigned int intercept_sum = 0;
unsigned int idx_recent_sum = 0;
unsigned int recent_sum = 0;
unsigned int idx_hit_sum = 0; unsigned int idx_hit_sum = 0;
unsigned int hit_sum = 0; unsigned int hit_sum = 0;
int constraint_idx = 0; int constraint_idx = 0;
int idx0 = 0, idx = -1; int idx0 = 0, idx = -1;
bool alt_intercepts, alt_recent;
ktime_t delta_tick; ktime_t delta_tick;
s64 duration_ns; s64 duration_ns;
int i; int i;
...@@ -317,6 +334,7 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, ...@@ -317,6 +334,7 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
*/ */
intercept_sum += prev_bin->intercepts; intercept_sum += prev_bin->intercepts;
hit_sum += prev_bin->hits; hit_sum += prev_bin->hits;
recent_sum += prev_bin->recent;
if (dev->states_usage[i].disable) if (dev->states_usage[i].disable)
continue; continue;
...@@ -336,6 +354,7 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, ...@@ -336,6 +354,7 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
idx_intercept_sum = intercept_sum; idx_intercept_sum = intercept_sum;
idx_hit_sum = hit_sum; idx_hit_sum = hit_sum;
idx_recent_sum = recent_sum;
} }
/* Avoid unnecessary overhead. */ /* Avoid unnecessary overhead. */
...@@ -350,27 +369,36 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, ...@@ -350,27 +369,36 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
* If the sum of the intercepts metric for all of the idle states * If the sum of the intercepts metric for all of the idle states
* shallower than the current candidate one (idx) is greater than the * shallower than the current candidate one (idx) is greater than the
* sum of the intercepts and hits metrics for the candidate state and * sum of the intercepts and hits metrics for the candidate state and
* all of the deeper states, the CPU is likely to wake up early, so find * all of the deeper states, or the sum of the numbers of recent
* an alternative idle state to select. * intercepts over all of the states shallower than the candidate one
* is greater than a half of the number of recent events taken into
* account, the CPU is likely to wake up early, so find an alternative
* idle state to select.
*/ */
if (2 * idx_intercept_sum > cpu_data->total - idx_hit_sum) { alt_intercepts = 2 * idx_intercept_sum > cpu_data->total - idx_hit_sum;
alt_recent = idx_recent_sum > NR_RECENT / 2;
if (alt_recent || alt_intercepts) {
s64 last_enabled_span_ns = duration_ns; s64 last_enabled_span_ns = duration_ns;
int last_enabled_idx = idx; int last_enabled_idx = idx;
/* /*
* Look for the deepest idle state whose target residency had * Look for the deepest idle state whose target residency had
* not exceeded the idle duration in over a half of the relevant * not exceeded the idle duration in over a half of the relevant
* cases in the past. * cases (both with respect to intercepts overall and with
* respect to the recent intercepts only) in the past.
* *
* Take the possible latency constraint and duration limitation * Take the possible latency constraint and duration limitation
* present if the tick has been stopped already into account. * present if the tick has been stopped already into account.
*/ */
intercept_sum = 0; intercept_sum = 0;
recent_sum = 0;
for (i = idx - 1; i >= idx0; i--) { for (i = idx - 1; i >= idx0; i--) {
struct teo_bin *bin = &cpu_data->state_bins[i];
s64 span_ns; s64 span_ns;
intercept_sum += cpu_data->state_bins[i].intercepts; intercept_sum += bin->intercepts;
recent_sum += bin->recent;
if (dev->states_usage[i].disable) if (dev->states_usage[i].disable)
continue; continue;
...@@ -386,7 +414,9 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, ...@@ -386,7 +414,9 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
break; break;
} }
if (2 * intercept_sum > idx_intercept_sum) { if ((!alt_recent || 2 * recent_sum > idx_recent_sum) &&
(!alt_intercepts ||
2 * intercept_sum > idx_intercept_sum)) {
idx = i; idx = i;
duration_ns = span_ns; duration_ns = span_ns;
break; break;
...@@ -404,49 +434,6 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, ...@@ -404,49 +434,6 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
if (idx > constraint_idx) if (idx > constraint_idx)
idx = constraint_idx; idx = constraint_idx;
if (idx > idx0) {
unsigned int count = 0;
u64 sum = 0;
/*
* The target residencies of at least two different enabled idle
* states are less than or equal to the current expected idle
* duration. Try to refine the selection using the most recent
* measured idle duration values.
*
* Count and sum the most recent idle duration values less than
* the current expected idle duration value.
*/
for (i = 0; i < INTERVALS; i++) {
u64 val = cpu_data->intervals[i];
if (val >= duration_ns)
continue;
count++;
sum += val;
}
/*
* Give up unless the majority of the most recent idle duration
* values are in the interesting range.
*/
if (count > INTERVALS / 2) {
u64 avg_ns = div64_u64(sum, count);
/*
* Avoid spending too much time in an idle state that
* would be too shallow.
*/
if (teo_time_ok(avg_ns)) {
duration_ns = avg_ns;
if (drv->states[idx].target_residency_ns > avg_ns)
idx = teo_find_shallower_state(drv, dev,
idx, avg_ns);
}
}
}
end: end:
/* /*
* Don't stop the tick if the selected state is a polling one or if the * Don't stop the tick if the selected state is a polling one or if the
...@@ -507,8 +494,8 @@ static int teo_enable_device(struct cpuidle_driver *drv, ...@@ -507,8 +494,8 @@ static int teo_enable_device(struct cpuidle_driver *drv,
memset(cpu_data, 0, sizeof(*cpu_data)); memset(cpu_data, 0, sizeof(*cpu_data));
for (i = 0; i < INTERVALS; i++) for (i = 0; i < NR_RECENT; i++)
cpu_data->intervals[i] = U64_MAX; cpu_data->recent_idx[i] = -1;
return 0; return 0;
} }
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment