Commits · 22ed8a93adc7a9cbb2c0a0fc1d7f10068a1f84c1 · nexedi / linux

28 Sep, 2018 9 commits

blk-iolatency: deal with small samples · 22ed8a93

Josef Bacik authored Sep 28, 2018

There is logic to keep cgroups that haven't done a lot of IO in the most
recent scale window from being punished for over-active higher priority
groups. However for things like ssd's where the windows are pretty
short we'll end up with small numbers of samples, so 5% of samples will
come out to 0 if there aren't enough. Make the floor 1 sample to keep
us from improperly bailing out of scaling down.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

22ed8a93

blk-iolatency: deal with nr_requests == 1 · 9f60511a

Josef Bacik authored Sep 28, 2018

Hitting the case where blk_queue_depth() returned 1 uncovered the fact
that iolatency doesn't actually handle this case properly, it simply
doesn't scale down anybody. For this case we should go straight into
applying the time delay, which we weren't doing. Since we already limit
the floor at 1 request this if statement is not needed, and this allows
us to set our depth to 1 which allows us to apply the delay if needed.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

9f60511a

blk-iolatency: use q->nr_requests directly · ff4cee08

Josef Bacik authored Sep 28, 2018

We were using blk_queue_depth() assuming that it would return
nr_requests, but we hit a case in production on drives that had to have
NCQ turned off in order for them to not shit the bed which resulted in a
qd of 1, even though the nr_requests was much larger. iolatency really
only cares about requests we are allowed to queue up, as any io that
get's onto the request list is going to be serviced soonish, so we want
to be throttling before the bio gets onto the request list. To make
iolatency work as expected, simply use q->nr_requests instead of
blk_queue_depth() as that is what we actually care about.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

ff4cee08

kyber: fix integer overflow of latency targets on 32-bit · f0a0cddd

Omar Sandoval authored Sep 28, 2018

NSEC_PER_SEC has type long, so 5 * NSEC_PER_SEC is calculated as a long.
However, 5 seconds is 5,000,000,000 nanoseconds, which overflows a
32-bit long. Make sure all of the targets are calculated as 64-bit
values.

Fixes: 6e25cb01 ("kyber: implement improved heuristics")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

f0a0cddd

virtio-blk: modernize sysfs attribute creation · e982c4d0

Hannes Reinecke authored Sep 28, 2018

Use new-style DEVICE_ATTR_RO/DEVICE_ATTR_RW to create the sysfs attributes
and register the disk with default sysfs attribute groups.
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

e982c4d0

zram: register default groups with device_add_disk() · 98af4d4d

Hannes Reinecke authored Sep 28, 2018

Register default sysfs groups during device_add_disk() to avoid a
race condition with udev during startup.
Signed-off-by: Hannes Reinecke <hare@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

98af4d4d

aoe: register default groups with device_add_disk() · 95cf7809

Hannes Reinecke authored Sep 28, 2018

Register default sysfs groups during device_add_disk() to avoid a
race condition with udev during startup.
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Ed L. Cachin <ed.cashin@acm.org>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

95cf7809

nvme: register ns_id attributes as default sysfs groups · 33b14f67

Hannes Reinecke authored Sep 28, 2018

We should be registering the ns_id attribute as default sysfs
attribute groups, otherwise we have a race condition between
the uevent and the attributes appearing in sysfs.
Suggested-by: Bart van Assche <bvanassche@acm.org>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

33b14f67

block: genhd: add 'groups' argument to device_add_disk · fef912bf

Hannes Reinecke authored Sep 28, 2018

Update device_add_disk() to take an 'groups' argument so that
individual drivers can register a device with additional sysfs
attributes.
This avoids race condition the driver would otherwise have if these
groups were to be created with sysfs_add_groups().
Signed-off-by: Martin Wilck <martin.wilck@suse.com>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

fef912bf

27 Sep, 2018 5 commits

kyber: add tracepoints · 6c3b7af1

Omar Sandoval authored Sep 27, 2018

When debugging Kyber, it's really useful to know what latencies we've
been having, how the domain depths have been adjusted, and if we've
actually been throttling. Add three tracepoints, kyber_latency,
kyber_adjust, and kyber_throttled, to record that.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

6c3b7af1

kyber: implement improved heuristics · 6e25cb01

Omar Sandoval authored Sep 27, 2018

Kyber's current heuristics have a few flaws:

- It's based on the mean latency, but p99 latency tends to be more
  meaningful to anyone who cares about latency. The mean can also be
  skewed by rare outliers that the scheduler can't do anything about.
- The statistics calculations are purely time-based with a short window.
  This works for steady, high load, but is more sensitive to outliers
  with bursty workloads.
- It only considers the latency once an I/O has been submitted to the
  device, but the user cares about the time spent in the kernel, as
  well.

These are shortcomings of the generic blk-stat code which doesn't quite
fit the ideal use case for Kyber. So, this replaces the statistics with
a histogram used to calculate percentiles of total latency and I/O
latency, which we then use to adjust depths in a slightly more
intelligent manner:

- Sync and async writes are now the same domain.
- Discards are a separate domain.
- Domain queue depths are scaled by the ratio of the p99 total latency
  to the target latency (e.g., if the p99 latency is double the target
  latency, we will double the queue depth; if the p99 latency is half of
  the target latency, we can halve the queue depth).
- We use the I/O latency to determine whether we should scale queue
  depths down: we will only scale down if any domain's I/O latency
  exceeds the target latency, which is an indicator of congestion in the
  device.

These new heuristics are just as scalable as the heuristics they
replace.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

6e25cb01

kyber: don't make domain token sbitmap larger than necessary · fa2a1f60

Omar Sandoval authored Sep 27, 2018

The domain token sbitmaps are currently initialized to the device queue
depth or 256, whichever is larger, and immediately resized to the
maximum depth for that domain (256, 128, or 64 for read, write, and
other, respectively). The sbitmap is never resized larger than that, so
it's unnecessary to allocate a bitmap larger than the maximum depth.
Let's just allocate it to the maximum depth to begin with. This will use
marginally less memory, and more importantly, give us a more appropriate
number of bits per sbitmap word.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

fa2a1f60

block: export blk_stat_enable_accounting() · f8232f29

Omar Sandoval authored Sep 27, 2018

Kyber will need this in a future change if it is built as a module.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

f8232f29

block: move call of scheduler's ->completed_request() hook · ed88660a

Omar Sandoval authored Sep 27, 2018

Commit 4bc6339a ("block: move blk_stat_add() to
__blk_mq_end_request()") consolidated some calls using ktime_get() so
we'd only need to call it once. Kyber's ->completed_request() hook also
calls ktime_get(), so let's move it to the same place, too.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

ed88660a

26 Sep, 2018 13 commits

blk-mq: Enable support for runtime power management · 986d413b

Bart Van Assche authored Sep 26, 2018

Now that the blk-mq core processes power management requests
(marked with RQF_PREEMPT) in other states than RPM_ACTIVE, enable
runtime power management for blk-mq.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

986d413b

block: Make blk_get_request() block for non-PM requests while suspended · 7cedffec

Bart Van Assche authored Sep 26, 2018

Instead of allowing requests that are not power management requests
to enter the queue in runtime suspended status (RPM_SUSPENDED), make
the blk_get_request() caller block. This change fixes a starvation
issue: it is now guaranteed that power management requests will be
executed no matter how many blk_get_request() callers are waiting.
For blk-mq, instead of maintaining the q->nr_pending counter, rely
on q->q_usage_counter. Call pm_runtime_mark_last_busy() every time a
request finishes instead of only if the queue depth drops to zero.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

7cedffec

block: Allow unfreezing of a queue while requests are in progress · bdd63160

Bart Van Assche authored Sep 26, 2018

A later patch will call blk_freeze_queue_start() followed by
blk_mq_unfreeze_queue() without waiting for q_usage_counter to drop
to zero. Make sure that this doesn't cause a kernel warning to appear
by switching from percpu_ref_reinit() to percpu_ref_resurrect(). The
former namely requires that the refcount it operates on is zero.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

bdd63160

percpu-refcount: Introduce percpu_ref_resurrect() · 18c9a6bb

Bart Van Assche authored Sep 26, 2018

This function will be used in a later patch to switch the struct
request_queue q_usage_counter from killed back to live. In contrast
to percpu_ref_reinit(), this new function does not require that the
refcount is zero.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

18c9a6bb

block: Schedule runtime resume earlier · 0d25bd07

Bart Van Assche authored Sep 26, 2018

Instead of scheduling runtime resume of a request queue after a
request has been queued, schedule asynchronous resume during request
allocation. The new pm_request_resume() calls occur after
blk_queue_enter() has increased the q_usage_counter request queue
member. This change is needed for a later patch that will make request
allocation block while the queue status is not RPM_ACTIVE.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

0d25bd07

block: Split blk_pm_add_request() and blk_pm_put_request() · 154b00d5

Bart Van Assche authored Sep 26, 2018

Move the pm_request_resume() and pm_runtime_mark_last_busy() calls into
two new functions and thereby separate legacy block layer code from code
that works for both the legacy block layer and blk-mq. A later patch will
add calls to the new functions in the blk-mq code.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

154b00d5

block, scsi: Change the preempt-only flag into a counter · cd84a62e

Bart Van Assche authored Sep 26, 2018

The RQF_PREEMPT flag is used for three purposes:
- In the SCSI core, for making sure that power management requests
  are executed even if a device is in the "quiesced" state.
- For domain validation by SCSI drivers that use the parallel port.
- In the IDE driver, for IDE preempt requests.
Rename "preempt-only" into "pm-only" because the primary purpose of
this mode is power management. Since the power management core may
but does not have to resume a runtime suspended device before
performing system-wide suspend and since a later patch will set
"pm-only" mode as long as a block device is runtime suspended, make
it possible to set "pm-only" mode from more than one context. Since
with this change scsi_device_quiesce() is no longer idempotent, make
that function return early if it is called for a quiesced queue.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

cd84a62e

block: Move power management code into a new source file · bca6b067

Bart Van Assche authored Sep 26, 2018

Move the code for runtime power management from blk-core.c into the
new source file blk-pm.c. Move the corresponding declarations from
<linux/blkdev.h> into <linux/blk-pm.h>. For CONFIG_PM=n, leave out
the declarations of the functions that are not used in that mode.
This patch not only reduces the number of #ifdefs in the block layer
core code but also reduces the size of header file <linux/blkdev.h>
and hence should help to reduce the build time of the Linux kernel
if CONFIG_PM is not defined.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

bca6b067

xen: don't include <xen/xen.h> from <asm/io.h> and <asm/dma-mapping.h> · 3cfa210b

Christoph Hellwig authored Sep 25, 2018

Nothing Xen specific in these headers, which get included from a lot
of code in the kernel.  So prune the includes and move them to the
Xen-specific files that actually use them instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

3cfa210b

block: remove ARCH_BIOVEC_PHYS_MERGEABLE · c39ae60d

Christoph Hellwig authored Sep 25, 2018

Take the Xen check into the core code instead of delegating it to
the architectures.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

c39ae60d

xen: provide a prototype for xen_biovec_phys_mergeable in xen.h · 20e32676

Christoph Hellwig authored Sep 25, 2018

Having multiple externs in arch headers is not a good way to provide
a common interface.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

20e32676

xen: remove the xen_biovec_phys_mergeable export · cd11b1d3

Christoph Hellwig authored Sep 25, 2018

BIOVEC_PHYS_MERGEABLE is only called from core block code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

cd11b1d3

arm: remove the unused BIOVEC_MERGEABLE define · a5bb207a
Christoph Hellwig authored Sep 25, 2018
```
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
```
a5bb207a

24 Sep, 2018 9 commits

block: don't include bug.h from bio.h · 65969e5c

Christoph Hellwig authored Sep 24, 2018

No need to pull in the BUG() defintion.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

65969e5c

block: don't include io.h from bio.h · bceacbfa

Christoph Hellwig authored Sep 24, 2018

Now that we don't need an override for BIOVEC_PHYS_MERGEABLE there is
no need to drag this header in.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

bceacbfa

block: remove bvec_to_phys · 6e768461

Christoph Hellwig authored Sep 24, 2018

We only use it in biovec_phys_mergeable and a m68k paravirt driver,
so just opencode it there.  Also remove the pointless unsigned long cast
for the offset in the opencoded instances.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

6e768461

block: merge BIOVEC_SEG_BOUNDARY into biovec_phys_mergeable · 3dccdae5

Christoph Hellwig authored Sep 24, 2018

These two checks should always be performed together, so merge them into
a single helper.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

3dccdae5

block: add a missing BIOVEC_SEG_BOUNDARY check in bio_add_pc_page · 0e253391

Christoph Hellwig authored Sep 24, 2018

The actual recaculation of segments in __blk_recalc_rq_segments will
do this check, so there is no point in forcing it if we know it won't
succeed.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

0e253391

block: simplify BIOVEC_PHYS_MERGEABLE · 6a9f5f24

Christoph Hellwig authored Sep 24, 2018

Turn the macro into an inline, move it to blk.h and simplify the
arch hooks a bit.

Also rename the function to biovec_phys_mergeable as there is no need
to shout.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

6a9f5f24

block: move req_gap_back_merge to blk.h · 27ca1d4e

Christoph Hellwig authored Sep 24, 2018

No need to expose these helpers outside the block layer.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

27ca1d4e

block: move req_gap_{back,front}_merge to blk-merge.c · e9907009

Christoph Hellwig authored Sep 24, 2018

Keep it close to the actual users instead of exposing the function to all
drivers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

e9907009

block: move integrity_req_gap_{back,front}_merge to blk.h · 43b729bf

Christoph Hellwig authored Sep 24, 2018

No need to expose these to drivers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

43b729bf

22 Sep, 2018 4 commits

blk-mq: Document the functions that iterate over requests · c7b1bf5c

Bart Van Assche authored Sep 21, 2018

Make it easier to understand the purpose of the functions that iterate
over requests by documenting their purpose. Fix several minor spelling
and grammer mistakes in comments in these functions.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

c7b1bf5c

blkcg: rename blkg_try_get to blkg_tryget · 101246ec

Dennis Zhou (Facebook) authored Sep 11, 2018

blkg reference counting now uses percpu_ref rather than atomic_t. Let's
make this consistent with css_tryget. This renames blkg_try_get to
blkg_tryget and now returns a bool rather than the blkg or NULL.
Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

101246ec

blkcg: change blkg reference counting to use percpu_ref · b3b9f24f

Dennis Zhou (Facebook) authored Sep 11, 2018

Now that every bio is associated with a blkg, this puts the use of
blkg_get, blkg_try_get, and blkg_put on the hot path. This switches over
the refcnt in blkg to use percpu_ref.
Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

b3b9f24f

blkcg: cleanup and make blk_get_rl use blkg_lookup_create · e2b09899

Dennis Zhou (Facebook) authored Sep 11, 2018

blk_get_rl is responsible for identifying which request_list a request
should be allocated to. Try get logic was added earlier, but
semantically the logic was not changed.

This patch makes better use of the bio already having a reference to the
blkg in the hot path. The cold path uses a better fallback of
blkg_lookup_create rather than just blkg_lookup and then falling back to
the q->root_rl. If lookup_create fails with anything but -ENODEV, it
falls back to q->root_rl.

A clarifying comment is added to explain why q->root_rl is used rather
than the root blkg's rl.
Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

e2b09899