Commit c65e6fd4 authored by Jan Kara's avatar Jan Kara Committed by Jens Axboe

bfq: Do not let waker requests skip proper accounting

Commit 7cc4ffc5 ("block, bfq: put reqs of waker and woken in
dispatch list") added a condition to bfq_insert_request() which added
waker's requests directly to dispatch list. The rationale was that
completing waker's IO is needed to get more IO for the current queue.
Although this rationale is valid, there is a hole in it. The waker does
not necessarily serve the IO only for the current queue and maybe it's
current IO is not needed for current queue to make progress. Furthermore
injecting IO like this completely bypasses any service accounting within
bfq and thus we do not properly track how much service is waker's queue
getting or that the waker is actually doing any IO. Depending on the
conditions this can result in the waker getting too much or too few
service.

Consider for example the following job file:

[global]
directory=/mnt/repro/
rw=write
size=8g
time_based
runtime=30
ramp_time=10
blocksize=1m
direct=0
ioengine=sync

[slowwriter]
numjobs=1
prioclass=2
prio=7
fsync=200

[fastwriter]
numjobs=1
prioclass=2
prio=0
fsync=200

Despite processes have very different IO priorities, they get the same
about of service. The reason is that bfq identifies these processes as
having waker-wakee relationship and once that happens, IO from
fastwriter gets injected during slowwriter's time slice. As a result bfq
is not aware that fastwriter has any IO to do and constantly schedules
only slowwriter's queue. Thus fastwriter is forced to compete with
slowwriter's IO all the time instead of getting its share of time based
on IO priority.

Drop the special injection condition from bfq_insert_request(). As a
result, requests will be tracked and queued in a normal way and on next
dispatch bfq_select_queue() can decide whether the waker's inserted
requests should be injected during the current queue's timeslice or not.

Fixes: 7cc4ffc5 ("block, bfq: put reqs of waker and woken in dispatch list")
Acked-by: default avatarPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: default avatarJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20211125133645.27483-8-jack@suse.czSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
parent 1eb17f5e
...@@ -6132,48 +6132,7 @@ static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq, ...@@ -6132,48 +6132,7 @@ static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
spin_lock_irq(&bfqd->lock); spin_lock_irq(&bfqd->lock);
bfqq = bfq_init_rq(rq); bfqq = bfq_init_rq(rq);
if (!bfqq || at_head) {
/*
* Reqs with at_head or passthrough flags set are to be put
* directly into dispatch list. Additional case for putting rq
* directly into the dispatch queue: the only active
* bfq_queues are bfqq and either its waker bfq_queue or one
* of its woken bfq_queues. The rationale behind this
* additional condition is as follows:
* - consider a bfq_queue, say Q1, detected as a waker of
* another bfq_queue, say Q2
* - by definition of a waker, Q1 blocks the I/O of Q2, i.e.,
* some I/O of Q1 needs to be completed for new I/O of Q2
* to arrive. A notable example of waker is journald
* - so, Q1 and Q2 are in any respect the queues of two
* cooperating processes (or of two cooperating sets of
* processes): the goal of Q1's I/O is doing what needs to
* be done so that new Q2's I/O can finally be
* issued. Therefore, if the service of Q1's I/O is delayed,
* then Q2's I/O is delayed too. Conversely, if Q2's I/O is
* delayed, the goal of Q1's I/O is hindered.
* - as a consequence, if some I/O of Q1/Q2 arrives while
* Q2/Q1 is the only queue in service, there is absolutely
* no point in delaying the service of such an I/O. The
* only possible result is a throughput loss
* - so, when the above condition holds, the best option is to
* have the new I/O dispatched as soon as possible
* - the most effective and efficient way to attain the above
* goal is to put the new I/O directly in the dispatch
* list
* - as an additional restriction, Q1 and Q2 must be the only
* busy queues for this commit to put the I/O of Q2/Q1 in
* the dispatch list. This is necessary, because, if also
* other queues are waiting for service, then putting new
* I/O directly in the dispatch list may evidently cause a
* violation of service guarantees for the other queues
*/
if (!bfqq ||
(bfqq != bfqd->in_service_queue &&
bfqd->in_service_queue != NULL &&
bfq_tot_busy_queues(bfqd) == 1 + bfq_bfqq_busy(bfqq) &&
(bfqq->waker_bfqq == bfqd->in_service_queue ||
bfqd->in_service_queue->waker_bfqq == bfqq)) || at_head) {
if (at_head) if (at_head)
list_add(&rq->queuelist, &bfqd->dispatch); list_add(&rq->queuelist, &bfqd->dispatch);
else else
...@@ -6200,7 +6159,6 @@ static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq, ...@@ -6200,7 +6159,6 @@ static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
* merge). * merge).
*/ */
cmd_flags = rq->cmd_flags; cmd_flags = rq->cmd_flags;
spin_unlock_irq(&bfqd->lock); spin_unlock_irq(&bfqd->lock);
bfq_update_insert_stats(q, bfqq, idle_timer_disabled, bfq_update_insert_stats(q, bfqq, idle_timer_disabled,
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment