Commit e24f1c24 authored by Paolo Valente's avatar Paolo Valente Committed by Jens Axboe

block, bfq: remove slow-system class

BFQ computes the duration of weight raising for interactive
applications automatically, using some reference parameters. In
particular, BFQ uses the best durations (see comments in the code for
how these durations have been assessed) for two classes of systems:
slow and fast ones. Examples of slow systems are old phones or systems
using micro HDDs. Fast systems are all the remaining ones. Using these
parameters, BFQ computes the actual duration of the weight raising,
for the system at hand, as a function of the relative speed of the
system w.r.t. the speed of a reference system, belonging to the same
class of systems as the system at hand.

This slow vs fast differentiation proved to be useful in the past, but
happens to have little meaning with current hardware. Even worse, it
does cause problems in virtual systems, where the speed of the system
can vary frequently, and so widely to just confuse the class-detection
mechanism, and, as we have verified experimentally, to cause BFQ to
compute non-sensical weight-raising durations.

This commit addresses this issue by removing the slow class and the
class-detection mechanism.
Signed-off-by: default avatarPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
parent 4029eef1
...@@ -251,55 +251,43 @@ static struct kmem_cache *bfq_pool; ...@@ -251,55 +251,43 @@ static struct kmem_cache *bfq_pool;
* When configured for computing the duration of the weight-raising * When configured for computing the duration of the weight-raising
* for interactive queues automatically (see the comments at the * for interactive queues automatically (see the comments at the
* beginning of this file), BFQ does it using the following formula: * beginning of this file), BFQ does it using the following formula:
* duration = (R / r) * T, * duration = (ref_rate / r) * ref_wr_duration,
* where r is the peak rate of the device, and R * where r is the peak rate of the device, and ref_rate and
* and T are two reference parameters. In particular, * ref_wr_duration are two reference parameters. In particular,
* R is the peak rate of the reference device (see below), and * ref_rate is the peak rate of the reference storage device (see
* T is a reference time: given the systems that are likely * below), and ref_wr_duration is about the maximum time needed, with
* to be installed on the reference device according to its speed * BFQ and while reading two files in parallel, to load typical large
* class, T is about the maximum time needed, under BFQ and * applications on the reference device (see the comments on
* while reading two files in parallel, to load typical large * max_service_from_wr below, for more details on how ref_wr_duration
* applications on these systems (see the comments on * is obtained). In practice, the slower/faster the device at hand
* max_service_from_wr below, for more details on how T is * is, the more/less it takes to load applications with respect to the
* obtained). In practice, the slower/faster the device at hand is,
* the more/less it takes to load applications with respect to the
* reference device. Accordingly, the longer/shorter BFQ grants * reference device. Accordingly, the longer/shorter BFQ grants
* weight raising to interactive applications. * weight raising to interactive applications.
* *
* BFQ uses four different reference pairs (R, T), depending on: * BFQ uses two different reference pairs (ref_rate, ref_wr_duration),
* . whether the device is rotational or non-rotational; * depending on whether the device is rotational or non-rotational.
* . whether the device is slow, such as old or portable HDDs, as well as
* SD cards, or fast, such as newer HDDs and SSDs.
* *
* The device's speed class is dynamically (re)detected in * In the following definitions, ref_rate[0] and ref_wr_duration[0]
* bfq_update_peak_rate() every time the estimated peak rate is updated. * are the reference values for a rotational device, whereas
* ref_rate[1] and ref_wr_duration[1] are the reference values for a
* non-rotational device. The reference rates are not the actual peak
* rates of the devices used as a reference, but slightly lower
* values. The reason for using slightly lower values is that the
* peak-rate estimator tends to yield slightly lower values than the
* actual peak rate (it can yield the actual peak rate only if there
* is only one process doing I/O, and the process does sequential
* I/O).
* *
* In the following definitions, R_slow[0]/R_fast[0] and * The reference peak rates are measured in sectors/usec, left-shifted
* T_slow[0]/T_fast[0] are the reference values for a slow/fast * by BFQ_RATE_SHIFT.
* rotational device, whereas R_slow[1]/R_fast[1] and
* T_slow[1]/T_fast[1] are the reference values for a slow/fast
* non-rotational device. Finally, device_speed_thresh are the
* thresholds used to switch between speed classes. The reference
* rates are not the actual peak rates of the devices used as a
* reference, but slightly lower values. The reason for using these
* slightly lower values is that the peak-rate estimator tends to
* yield slightly lower values than the actual peak rate (it can yield
* the actual peak rate only if there is only one process doing I/O,
* and the process does sequential I/O).
*
* Both the reference peak rates and the thresholds are measured in
* sectors/usec, left-shifted by BFQ_RATE_SHIFT.
*/ */
static int R_slow[2] = {1000, 10700}; static int ref_rate[2] = {14000, 33000};
static int R_fast[2] = {14000, 33000};
/* /*
* To improve readability, a conversion function is used to initialize the * To improve readability, a conversion function is used to initialize
* following arrays, which entails that they can be initialized only in a * the following array, which entails that the array can be
* function. * initialized only in a function.
*/ */
static int T_slow[2]; static int ref_wr_duration[2];
static int T_fast[2];
static int device_speed_thresh[2];
/* /*
* BFQ uses the above-detailed, time-based weight-raising mechanism to * BFQ uses the above-detailed, time-based weight-raising mechanism to
...@@ -884,7 +872,7 @@ static unsigned int bfq_wr_duration(struct bfq_data *bfqd) ...@@ -884,7 +872,7 @@ static unsigned int bfq_wr_duration(struct bfq_data *bfqd)
if (bfqd->bfq_wr_max_time > 0) if (bfqd->bfq_wr_max_time > 0)
return bfqd->bfq_wr_max_time; return bfqd->bfq_wr_max_time;
dur = bfqd->RT_prod; dur = bfqd->rate_dur_prod;
do_div(dur, bfqd->peak_rate); do_div(dur, bfqd->peak_rate);
/* /*
...@@ -2492,37 +2480,15 @@ static unsigned long bfq_calc_max_budget(struct bfq_data *bfqd) ...@@ -2492,37 +2480,15 @@ static unsigned long bfq_calc_max_budget(struct bfq_data *bfqd)
/* /*
* Update parameters related to throughput and responsiveness, as a * Update parameters related to throughput and responsiveness, as a
* function of the estimated peak rate. See comments on * function of the estimated peak rate. See comments on
* bfq_calc_max_budget(), and on T_slow and T_fast arrays. * bfq_calc_max_budget(), and on the ref_wr_duration array.
*/ */
static void update_thr_responsiveness_params(struct bfq_data *bfqd) static void update_thr_responsiveness_params(struct bfq_data *bfqd)
{ {
int dev_type = blk_queue_nonrot(bfqd->queue); if (bfqd->bfq_user_max_budget == 0) {
if (bfqd->bfq_user_max_budget == 0)
bfqd->bfq_max_budget = bfqd->bfq_max_budget =
bfq_calc_max_budget(bfqd); bfq_calc_max_budget(bfqd);
bfq_log(bfqd, "new max_budget = %d", bfqd->bfq_max_budget);
if (bfqd->device_speed == BFQ_BFQD_FAST &&
bfqd->peak_rate < device_speed_thresh[dev_type]) {
bfqd->device_speed = BFQ_BFQD_SLOW;
bfqd->RT_prod = R_slow[dev_type] *
T_slow[dev_type];
} else if (bfqd->device_speed == BFQ_BFQD_SLOW &&
bfqd->peak_rate > device_speed_thresh[dev_type]) {
bfqd->device_speed = BFQ_BFQD_FAST;
bfqd->RT_prod = R_fast[dev_type] *
T_fast[dev_type];
} }
bfq_log(bfqd,
"dev_type %s dev_speed_class = %s (%llu sects/sec), thresh %llu setcs/sec",
dev_type == 0 ? "ROT" : "NONROT",
bfqd->device_speed == BFQ_BFQD_FAST ? "FAST" : "SLOW",
bfqd->device_speed == BFQ_BFQD_FAST ?
(USEC_PER_SEC*(u64)R_fast[dev_type])>>BFQ_RATE_SHIFT :
(USEC_PER_SEC*(u64)R_slow[dev_type])>>BFQ_RATE_SHIFT,
(USEC_PER_SEC*(u64)device_speed_thresh[dev_type])>>
BFQ_RATE_SHIFT);
} }
static void bfq_reset_rate_computation(struct bfq_data *bfqd, static void bfq_reset_rate_computation(struct bfq_data *bfqd,
...@@ -5311,14 +5277,12 @@ static int bfq_init_queue(struct request_queue *q, struct elevator_type *e) ...@@ -5311,14 +5277,12 @@ static int bfq_init_queue(struct request_queue *q, struct elevator_type *e)
bfqd->wr_busy_queues = 0; bfqd->wr_busy_queues = 0;
/* /*
* Begin by assuming, optimistically, that the device is a * Begin by assuming, optimistically, that the device peak
* high-speed one, and that its peak rate is equal to 2/3 of * rate is equal to 2/3 of the highest reference rate.
* the highest reference rate.
*/ */
bfqd->RT_prod = R_fast[blk_queue_nonrot(bfqd->queue)] * bfqd->rate_dur_prod = ref_rate[blk_queue_nonrot(bfqd->queue)] *
T_fast[blk_queue_nonrot(bfqd->queue)]; ref_wr_duration[blk_queue_nonrot(bfqd->queue)];
bfqd->peak_rate = R_fast[blk_queue_nonrot(bfqd->queue)] * 2 / 3; bfqd->peak_rate = ref_rate[blk_queue_nonrot(bfqd->queue)] * 2 / 3;
bfqd->device_speed = BFQ_BFQD_FAST;
spin_lock_init(&bfqd->lock); spin_lock_init(&bfqd->lock);
...@@ -5626,8 +5590,8 @@ static int __init bfq_init(void) ...@@ -5626,8 +5590,8 @@ static int __init bfq_init(void)
/* /*
* Times to load large popular applications for the typical * Times to load large popular applications for the typical
* systems installed on the reference devices (see the * systems installed on the reference devices (see the
* comments before the definitions of the next two * comments before the definition of the next
* arrays). Actually, we use slightly slower values, as the * array). Actually, we use slightly lower values, as the
* estimated peak rate tends to be smaller than the actual * estimated peak rate tends to be smaller than the actual
* peak rate. The reason for this last fact is that estimates * peak rate. The reason for this last fact is that estimates
* are computed over much shorter time intervals than the long * are computed over much shorter time intervals than the long
...@@ -5636,25 +5600,8 @@ static int __init bfq_init(void) ...@@ -5636,25 +5600,8 @@ static int __init bfq_init(void)
* scheduler cannot rely on a peak-rate-evaluation workload to * scheduler cannot rely on a peak-rate-evaluation workload to
* be run for a long time. * be run for a long time.
*/ */
T_slow[0] = msecs_to_jiffies(3500); /* actually 4 sec */ ref_wr_duration[0] = msecs_to_jiffies(7000); /* actually 8 sec */
T_slow[1] = msecs_to_jiffies(6000); /* actually 6.5 sec */ ref_wr_duration[1] = msecs_to_jiffies(2500); /* actually 3 sec */
T_fast[0] = msecs_to_jiffies(7000); /* actually 8 sec */
T_fast[1] = msecs_to_jiffies(2500); /* actually 3 sec */
/*
* Thresholds that determine the switch between speed classes
* (see the comments before the definition of the array
* device_speed_thresh). These thresholds are biased towards
* transitions to the fast class. This is safer than the
* opposite bias. In fact, a wrong transition to the slow
* class results in short weight-raising periods, because the
* speed of the device then tends to be higher that the
* reference peak rate. On the opposite end, a wrong
* transition to the fast class tends to increase
* weight-raising periods, because of the opposite reason.
*/
device_speed_thresh[0] = (4 * R_slow[0]) / 3;
device_speed_thresh[1] = (4 * R_slow[1]) / 3;
ret = elv_register(&iosched_bfq_mq); ret = elv_register(&iosched_bfq_mq);
if (ret) if (ret)
......
...@@ -399,11 +399,6 @@ struct bfq_io_cq { ...@@ -399,11 +399,6 @@ struct bfq_io_cq {
struct bfq_ttime saved_ttime; struct bfq_ttime saved_ttime;
}; };
enum bfq_device_speed {
BFQ_BFQD_FAST,
BFQ_BFQD_SLOW,
};
/** /**
* struct bfq_data - per-device data structure. * struct bfq_data - per-device data structure.
* *
...@@ -611,12 +606,11 @@ struct bfq_data { ...@@ -611,12 +606,11 @@ struct bfq_data {
/* Max service-rate for a soft real-time queue, in sectors/sec */ /* Max service-rate for a soft real-time queue, in sectors/sec */
unsigned int bfq_wr_max_softrt_rate; unsigned int bfq_wr_max_softrt_rate;
/* /*
* Cached value of the product R*T, used for computing the * Cached value of the product ref_rate*ref_wr_duration, used
* maximum duration of weight raising automatically. * for computing the maximum duration of weight raising
* automatically.
*/ */
u64 RT_prod; u64 rate_dur_prod;
/* device-speed class for the low-latency heuristic */
enum bfq_device_speed device_speed;
/* fallback dummy bfqq for extreme OOM conditions */ /* fallback dummy bfqq for extreme OOM conditions */
struct bfq_queue oom_bfqq; struct bfq_queue oom_bfqq;
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment