Commit ba7aeae5 authored by Paolo Valente's avatar Paolo Valente Committed by Jens Axboe

block, bfq: fix decrement of num_active_groups

Since commit '2d29c9f8 ("block, bfq: improve asymmetric scenarios
detection")', if there are process groups with I/O requests waiting for
completion, then BFQ tags the scenario as 'asymmetric'. This detection
is needed for preserving service guarantees (for details, see comments
on the computation * of the variable asymmetric_scenario in the
function bfq_better_to_idle).

Unfortunately, commit '2d29c9f8 ("block, bfq: improve asymmetric
scenarios detection")' contains an error exactly in the updating of
the number of groups with I/O requests waiting for completion: if a
group has more than one descendant process, then the above number of
groups, which is renamed from num_active_groups to a more appropriate
num_groups_with_pending_reqs by this commit, may happen to be wrongly
decremented multiple times, namely every time one of the descendant
processes gets all its pending I/O requests completed.

A correct, complete solution should work as follows. Consider a group
that is inactive, i.e., that has no descendant process with pending
I/O inside BFQ queues. Then suppose that num_groups_with_pending_reqs
is still accounting for this group, because the group still has some
descendant process with some I/O request still in
flight. num_groups_with_pending_reqs should be decremented when the
in-flight request of the last descendant process is finally completed
(assuming that nothing else has changed for the group in the meantime,
in terms of composition of the group and active/inactive state of
child groups and processes). To accomplish this, an additional
pending-request counter must be added to entities, and must be
updated correctly.

To avoid this additional field and operations, this commit resorts to
the following tradeoff between simplicity and accuracy: for an
inactive group that is still counted in num_groups_with_pending_reqs,
this commit decrements num_groups_with_pending_reqs when the first
descendant process of the group remains with no request waiting for
completion.

This simplified scheme provides a fix to the unbalanced decrements
introduced by 2d29c9f8. Since this error was also caused by lack
of comments on this non-trivial issue, this commit also adds related
comments.

Fixes: 2d29c9f8 ("block, bfq: improve asymmetric scenarios detection")
Reported-by: default avatarSteven Barrett <steven@liquorix.net>
Tested-by: default avatarSteven Barrett <steven@liquorix.net>
Tested-by: default avatarLucjan Lucjanov <lucjan.lucjanov@gmail.com>
Reviewed-by: default avatarFederico Motta <federico@willer.it>
Signed-off-by: default avatarPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
parent ffe81d45
...@@ -638,7 +638,7 @@ static bool bfq_varied_queue_weights_or_active_groups(struct bfq_data *bfqd) ...@@ -638,7 +638,7 @@ static bool bfq_varied_queue_weights_or_active_groups(struct bfq_data *bfqd)
bfqd->queue_weights_tree.rb_node->rb_right) bfqd->queue_weights_tree.rb_node->rb_right)
#ifdef CONFIG_BFQ_GROUP_IOSCHED #ifdef CONFIG_BFQ_GROUP_IOSCHED
) || ) ||
(bfqd->num_active_groups > 0 (bfqd->num_groups_with_pending_reqs > 0
#endif #endif
); );
} }
...@@ -802,7 +802,21 @@ void bfq_weights_tree_remove(struct bfq_data *bfqd, ...@@ -802,7 +802,21 @@ void bfq_weights_tree_remove(struct bfq_data *bfqd,
*/ */
break; break;
} }
bfqd->num_active_groups--;
/*
* The decrement of num_groups_with_pending_reqs is
* not performed immediately upon the deactivation of
* entity, but it is delayed to when it also happens
* that the first leaf descendant bfqq of entity gets
* all its pending requests completed. The following
* instructions perform this delayed decrement, if
* needed. See the comments on
* num_groups_with_pending_reqs for details.
*/
if (entity->in_groups_with_pending_reqs) {
entity->in_groups_with_pending_reqs = false;
bfqd->num_groups_with_pending_reqs--;
}
} }
} }
...@@ -3529,27 +3543,44 @@ static bool bfq_better_to_idle(struct bfq_queue *bfqq) ...@@ -3529,27 +3543,44 @@ static bool bfq_better_to_idle(struct bfq_queue *bfqq)
* fact, if there are active groups, then, for condition (i) * fact, if there are active groups, then, for condition (i)
* to become false, it is enough that an active group contains * to become false, it is enough that an active group contains
* more active processes or sub-groups than some other active * more active processes or sub-groups than some other active
* group. We address this issue with the following bi-modal * group. More precisely, for condition (i) to hold because of
* behavior, implemented in the function * such a group, it is not even necessary that the group is
* (still) active: it is sufficient that, even if the group
* has become inactive, some of its descendant processes still
* have some request already dispatched but still waiting for
* completion. In fact, requests have still to be guaranteed
* their share of the throughput even after being
* dispatched. In this respect, it is easy to show that, if a
* group frequently becomes inactive while still having
* in-flight requests, and if, when this happens, the group is
* not considered in the calculation of whether the scenario
* is asymmetric, then the group may fail to be guaranteed its
* fair share of the throughput (basically because idling may
* not be performed for the descendant processes of the group,
* but it had to be). We address this issue with the
* following bi-modal behavior, implemented in the function
* bfq_symmetric_scenario(). * bfq_symmetric_scenario().
* *
* If there are active groups, then the scenario is tagged as * If there are groups with requests waiting for completion
* (as commented above, some of these groups may even be
* already inactive), then the scenario is tagged as
* asymmetric, conservatively, without checking any of the * asymmetric, conservatively, without checking any of the
* conditions (i) and (ii). So the device is idled for bfqq. * conditions (i) and (ii). So the device is idled for bfqq.
* This behavior matches also the fact that groups are created * This behavior matches also the fact that groups are created
* exactly if controlling I/O (to preserve bandwidth and * exactly if controlling I/O is a primary concern (to
* latency guarantees) is a primary concern. * preserve bandwidth and latency guarantees).
* *
* On the opposite end, if there are no active groups, then * On the opposite end, if there are no groups with requests
* only condition (i) is actually controlled, i.e., provided * waiting for completion, then only condition (i) is actually
* that condition (i) holds, idling is not performed, * controlled, i.e., provided that condition (i) holds, idling
* regardless of whether condition (ii) holds. In other words, * is not performed, regardless of whether condition (ii)
* only if condition (i) does not hold, then idling is * holds. In other words, only if condition (i) does not hold,
* allowed, and the device tends to be prevented from queueing * then idling is allowed, and the device tends to be
* many requests, possibly of several processes. Since there * prevented from queueing many requests, possibly of several
* are no active groups, then, to control condition (i) it is * processes. Since there are no groups with requests waiting
* enough to check whether all active queues have the same * for completion, then, to control condition (i) it is enough
* weight. * to check just whether all the queues with requests waiting
* for completion also have the same weight.
* *
* Not checking condition (ii) evidently exposes bfqq to the * Not checking condition (ii) evidently exposes bfqq to the
* risk of getting less throughput than its fair share. * risk of getting less throughput than its fair share.
...@@ -3607,10 +3638,11 @@ static bool bfq_better_to_idle(struct bfq_queue *bfqq) ...@@ -3607,10 +3638,11 @@ static bool bfq_better_to_idle(struct bfq_queue *bfqq)
* bfqq is weight-raised is checked explicitly here. More * bfqq is weight-raised is checked explicitly here. More
* precisely, the compound condition below takes into account * precisely, the compound condition below takes into account
* also the fact that, even if bfqq is being weight-raised, * also the fact that, even if bfqq is being weight-raised,
* the scenario is still symmetric if all active queues happen * the scenario is still symmetric if all queues with requests
* to be weight-raised. Actually, we should be even more * waiting for completion happen to be
* precise here, and differentiate between interactive weight * weight-raised. Actually, we should be even more precise
* raising and soft real-time weight raising. * here, and differentiate between interactive weight raising
* and soft real-time weight raising.
* *
* As a side note, it is worth considering that the above * As a side note, it is worth considering that the above
* device-idling countermeasures may however fail in the * device-idling countermeasures may however fail in the
...@@ -5417,7 +5449,7 @@ static int bfq_init_queue(struct request_queue *q, struct elevator_type *e) ...@@ -5417,7 +5449,7 @@ static int bfq_init_queue(struct request_queue *q, struct elevator_type *e)
bfqd->idle_slice_timer.function = bfq_idle_slice_timer; bfqd->idle_slice_timer.function = bfq_idle_slice_timer;
bfqd->queue_weights_tree = RB_ROOT; bfqd->queue_weights_tree = RB_ROOT;
bfqd->num_active_groups = 0; bfqd->num_groups_with_pending_reqs = 0;
INIT_LIST_HEAD(&bfqd->active_list); INIT_LIST_HEAD(&bfqd->active_list);
INIT_LIST_HEAD(&bfqd->idle_list); INIT_LIST_HEAD(&bfqd->idle_list);
......
...@@ -196,6 +196,9 @@ struct bfq_entity { ...@@ -196,6 +196,9 @@ struct bfq_entity {
/* flag, set to request a weight, ioprio or ioprio_class change */ /* flag, set to request a weight, ioprio or ioprio_class change */
int prio_changed; int prio_changed;
/* flag, set if the entity is counted in groups_with_pending_reqs */
bool in_groups_with_pending_reqs;
}; };
struct bfq_group; struct bfq_group;
...@@ -448,10 +451,54 @@ struct bfq_data { ...@@ -448,10 +451,54 @@ struct bfq_data {
* bfq_weights_tree_[add|remove] for further details). * bfq_weights_tree_[add|remove] for further details).
*/ */
struct rb_root queue_weights_tree; struct rb_root queue_weights_tree;
/* /*
* number of groups with requests still waiting for completion * Number of groups with at least one descendant process that
* has at least one request waiting for completion. Note that
* this accounts for also requests already dispatched, but not
* yet completed. Therefore this number of groups may differ
* (be larger) than the number of active groups, as a group is
* considered active only if its corresponding entity has
* descendant queues with at least one request queued. This
* number is used to decide whether a scenario is symmetric.
* For a detailed explanation see comments on the computation
* of the variable asymmetric_scenario in the function
* bfq_better_to_idle().
*
* However, it is hard to compute this number exactly, for
* groups with multiple descendant processes. Consider a group
* that is inactive, i.e., that has no descendant process with
* pending I/O inside BFQ queues. Then suppose that
* num_groups_with_pending_reqs is still accounting for this
* group, because the group has descendant processes with some
* I/O request still in flight. num_groups_with_pending_reqs
* should be decremented when the in-flight request of the
* last descendant process is finally completed (assuming that
* nothing else has changed for the group in the meantime, in
* terms of composition of the group and active/inactive state of child
* groups and processes). To accomplish this, an additional
* pending-request counter must be added to entities, and must
* be updated correctly. To avoid this additional field and operations,
* we resort to the following tradeoff between simplicity and
* accuracy: for an inactive group that is still counted in
* num_groups_with_pending_reqs, we decrement
* num_groups_with_pending_reqs when the first descendant
* process of the group remains with no request waiting for
* completion.
*
* Even this simpler decrement strategy requires a little
* carefulness: to avoid multiple decrements, we flag a group,
* more precisely an entity representing a group, as still
* counted in num_groups_with_pending_reqs when it becomes
* inactive. Then, when the first descendant queue of the
* entity remains with no request waiting for completion,
* num_groups_with_pending_reqs is decremented, and this flag
* is reset. After this flag is reset for the entity,
* num_groups_with_pending_reqs won't be decremented any
* longer in case a new descendant queue of the entity remains
* with no request waiting for completion.
*/ */
unsigned int num_active_groups; unsigned int num_groups_with_pending_reqs;
/* /*
* Number of bfq_queues containing requests (including the * Number of bfq_queues containing requests (including the
......
...@@ -1012,7 +1012,10 @@ static void __bfq_activate_entity(struct bfq_entity *entity, ...@@ -1012,7 +1012,10 @@ static void __bfq_activate_entity(struct bfq_entity *entity,
container_of(entity, struct bfq_group, entity); container_of(entity, struct bfq_group, entity);
struct bfq_data *bfqd = bfqg->bfqd; struct bfq_data *bfqd = bfqg->bfqd;
bfqd->num_active_groups++; if (!entity->in_groups_with_pending_reqs) {
entity->in_groups_with_pending_reqs = true;
bfqd->num_groups_with_pending_reqs++;
}
} }
#endif #endif
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment