Commits · 4278a0deb1f6cac40ded3362fe2a9827d7efee3d · Kirill Smelkov / linux

An error occurred fetching the project authors.

31 May, 2022 7 commits

io_uring: defer alloc_hint update to io_file_bitmap_set() · 4278a0de

Xiaoguang Wang authored 2 years ago

io_file_bitmap_get() returns a free bitmap slot, but if it isn't
used later, such as io_queue_rsrc_removal() returns error, in this
case, we should not update alloc_hint at all, which still should
be considered as a valid candidate for next io_file_bitmap_get()
calls.

To fix this issue, only update alloc_hint in io_file_bitmap_set().
Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220528015109.48039-1-xiaoguang.wang@linux.alibaba.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

4278a0de

io_uring: ensure fput() called correspondingly when direct install fails · 8c71fe75

Xiaoguang Wang authored 2 years ago

io_fixed_fd_install() may fail for short of free fixed file bitmap,
in this case, need to call fput() correspondingly.
Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220527025400.51048-1-xiaoguang.wang@linux.alibaba.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

8c71fe75

io_uring: wire up allocated direct descriptors for socket · fa82dd10

Jens Axboe authored 2 years ago

The socket support was merged in an earlier branch that didn't yet
have support for allocating direct descriptors, hence only open
and accept got support for that.

Do the one-liner to enable it now, so we have consistent support for
any request that can instantiate a file/direct descriptor.
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

fa82dd10

io_uring: fix a memory leak of buffer group list on exit · 21870e02

Jens Axboe authored 2 years ago

If we use a buffer group ID that is large enough to require io_uring
to allocate it, then we don't correctly free it if the cleanup is
deferred to the ring exit. The explicit removal paths are fine.

Fixes: 9cfc7e94 ("io_uring: get rid of hashed provided buffer groups")
Signed-off-by: Jens Axboe <axboe@kernel.dk>

21870e02

io_uring: move shutdown under the general net section · 1151a7cc

Jens Axboe authored 2 years ago

Gets rid of some ifdefs and enables use of the net defines for when
CONFIG_NET isn't set.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

1151a7cc

io_uring: unify calling convention for async prep handling · 157dc813

Jens Axboe authored 2 years ago

Make them consistent in preparation for defining a req async prep
handler. The readv/writev requests share a prep handler, move it one
level down so the initial one is consistent with the others.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

157dc813

io_uring: add io_op_defs 'def' pointer in req init and issue · fcde59fe

Jens Axboe authored 2 years ago

Define and set it when appropriate, and use it consistently in the
function rather than using io_op_defs[opcode].
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

fcde59fe

25 May, 2022 2 commits

io_uring: make prep and issue side of req handlers named consistently · 54739cc6

Jens Axboe authored 2 years ago

Almost all of them are, the odd ones out are the poll remove and the
files update request. Name them like the others, which is:

io_#cmdname_prep	for request preparation
io_#cmdname		for request issue
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

54739cc6

io_uring: make timeout prep handlers consistent with other prep handlers · ecddc25d

Jens Axboe authored 2 years ago

All other opcodes take a {req, sqe} set for prep handling, split out
a timeout prep handler so that timeout and linked timeouts can use
the same one.
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

ecddc25d

21 May, 2022 1 commit

io_uring: cleanup handling of the two task_work lists · 3fe07bcd

Jens Axboe authored 2 years ago

Rather than pass in a bool for whether or not this work item needs to go
into the priority list or not, provide separate helpers for it. For most
use cases, this also then gets rid of the branch for non-priority task
work.

While at it, rename the prior_task_list to prio_task_list. Prior is
a confusing name for it, as it would seem to indicate that this is the
previous task_work list. prio makes it clear that this is a priority
task_work list.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

3fe07bcd

18 May, 2022 13 commits

io_uring: disallow mixed provided buffer group registrations · 2fcabce2

Jens Axboe authored 2 years ago

It's nonsensical to register a provided buffer ring, if a classic
provided buffer group with the same ID exists. Depending on the order of
which we decide what type to pick, the other type will never get used.
Explicitly disallow it and return an error if this is attempted.

Fixes: c7fb1942 ("io_uring: add support for ring mapped supplied buffers")
Signed-off-by: Jens Axboe <axboe@kernel.dk>

2fcabce2

io_uring: initialize io_buffer_list head when shared ring is unregistered · 1d0dbbfa

Jens Axboe authored 2 years ago

We use ->buf_pages != 0 to tell if this is a shared buffer ring or a
classic provided buffer group. If we unregister the shared ring and
then attempt to use it, buf_pages is zero yet the classic list head
isn't properly initialized. This causes io_buffer_select() to think
that we have classic buffers available, but then we crash when we try
and get one from the list.

Just initialize the list if we unregister a shared buffer ring, leaving
it in a sane state for either re-registration or for attempting to use
it. And do the same for the initial setup from the classic path.

Fixes: c7fb1942 ("io_uring: add support for ring mapped supplied buffers")
Signed-off-by: Jens Axboe <axboe@kernel.dk>

1d0dbbfa

io_uring: add fully sparse buffer registration · 0184f08e

Pavel Begunkov authored 2 years ago

Honour IORING_RSRC_REGISTER_SPARSE not only for direct files but fixed
buffers as well. It makes the rsrc API more consistent.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/66f429e4912fe39fb3318217ff33a2853d4544be.1652879898.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

0184f08e

io_uring: use rcu_dereference in io_close · 0bf1dbee

Christoph Hellwig authored 2 years ago

Accessing the file table needs a rcu_dereference_protected().
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220518084005.3255380-7-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

0bf1dbee

io_uring: consistently use the EPOLL* defines · a294bef5

Christoph Hellwig authored 2 years ago

POLL* are unannotated values for the userspace ABI, while everything
in-kernel should use EPOLL* and the __poll_t type.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220518084005.3255380-6-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

a294bef5

io_uring: make apoll_events a __poll_t · 58f5c8d3

Christoph Hellwig authored 2 years ago

apoll_events is fed to vfs_poll and the poll tables, so it should be
a __poll_t.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220518084005.3255380-5-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

58f5c8d3

io_uring: drop a spurious inline on a forward declaration · ee67ba3b

Christoph Hellwig authored 2 years ago

io_file_get_normal isn't marked inline, so don't claim it as such in the
forward declaration.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220518084005.3255380-4-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

ee67ba3b

io_uring: don't use ERR_PTR for user pointers · 984824db

Christoph Hellwig authored 2 years ago

ERR_PTR abuses the high bits of a pointer to transport error information.
This is only safe for kernel pointers and not user pointers. Fix
io_buffer_select and its helpers to just return NULL for failure and get
rid of this abuse.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220518084005.3255380-3-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

984824db

io_uring: use a rwf_t for io_rw.flags · 20cbd21d

Christoph Hellwig authored 2 years ago

Use the proper type.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220518084005.3255380-2-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

20cbd21d

io_uring: add support for ring mapped supplied buffers · c7fb1942

Jens Axboe authored 2 years ago

Provided buffers allow an application to supply io_uring with buffers
that can then be grabbed for a read/receive request, when the data
source is ready to deliver data. The existing scheme relies on using
IORING_OP_PROVIDE_BUFFERS to do that, but it can be difficult to use
in real world applications. It's pretty efficient if the application
is able to supply back batches of provided buffers when they have been
consumed and the application is ready to recycle them, but if
fragmentation occurs in the buffer space, it can become difficult to
supply enough buffers at the time. This hurts efficiency.

Add a register op, IORING_REGISTER_PBUF_RING, which allows an application
to setup a shared queue for each buffer group of provided buffers. The
application can then supply buffers simply by adding them to this ring,
and the kernel can consume then just as easily. The ring shares the head
with the application, the tail remains private in the kernel.

Provided buffers setup with IORING_REGISTER_PBUF_RING cannot use
IORING_OP_{PROVIDE,REMOVE}_BUFFERS for adding or removing entries to the
ring, they must use the mapped ring. Mapped provided buffer rings can
co-exist with normal provided buffers, just not within the same group ID.

To gauge overhead of the existing scheme and evaluate the mapped ring
approach, a simple NOP benchmark was written. It uses a ring of 128
entries, and submits/completes 32 at the time. 'Replenish' is how
many buffers are provided back at the time after they have been
consumed:

Test			Replenish			NOPs/sec
================================================================
No provided buffers	NA				~30M
Provided buffers	32				~16M
Provided buffers	 1				~10M
Ring buffers		32				~27M
Ring buffers		 1				~27M

The ring mapped buffers perform almost as well as not using provided
buffers at all, and they don't care if you provided 1 or more back at
the same time. This means application can just replenish as they go,
rather than need to batch and compact, further reducing overhead in the
application. The NOP benchmark above doesn't need to do any compaction,
so that overhead isn't even reflected in the above test.
Co-developed-by: Dylan Yudaken <dylany@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

c7fb1942

io_uring: add io_pin_pages() helper · d8c2237d

Jens Axboe authored 2 years ago

Abstract this out from io_sqe_buffer_register() so we can use it
elsewhere too without duplicating this code.

No intended functional changes in this patch.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

d8c2237d

io_uring: add buffer selection support to IORING_OP_NOP · 3d200242

Jens Axboe authored 2 years ago

Obviously not really useful since it's not transferring data, but it
is helpful in benchmarking overhead of provided buffers.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

3d200242

io_uring: fix locking state for empty buffer group · e7637a49

Jens Axboe authored 2 years ago

io_provided_buffer_select() must drop the submit lock, if needed, even
in the error handling case. Failure to do so will leave us with the
ctx->uring_lock held, causing spew like:

====================================
WARNING: iou-wrk-366/368 still has locks held!
5.18.0-rc6-00294-gdf8dc7004331 #994 Not tainted
------------------------------------
1 lock held by iou-wrk-366/368:
 #0: ffff0000c72598a8 (&ctx->uring_lock){+.+.}-{3:3}, at: io_ring_submit_lock+0x20/0x48

stack backtrace:
CPU: 4 PID: 368 Comm: iou-wrk-366 Not tainted 5.18.0-rc6-00294-gdf8dc7004331 #994
Hardware name: linux,dummy-virt (DT)
Call trace:
 dump_backtrace.part.0+0xa4/0xd4
 show_stack+0x14/0x5c
 dump_stack_lvl+0x88/0xb0
 dump_stack+0x14/0x2c
 debug_check_no_locks_held+0x84/0x90
 try_to_freeze.isra.0+0x18/0x44
 get_signal+0x94/0x6ec
 io_wqe_worker+0x1d8/0x2b4
 ret_from_fork+0x10/0x20

and triggering later hangs off get_signal() because we attempt to
re-grab the lock.

Reported-by: syzbot+987d7bb19195ae45208c@syzkaller.appspotmail.com
Fixes: 149c69b0 ("io_uring: abstract out provided buffer list selection")
Signed-off-by: Jens Axboe <axboe@kernel.dk>

e7637a49

17 May, 2022 1 commit

io_uring: don't attempt to IOPOLL for MSG_RING requests · aa184e86

Jens Axboe authored 2 years ago

We gate whether to IOPOLL for a request on whether the opcode is allowed
on a ring setup for IOPOLL and if it's got a file assigned. MSG_RING
is the only one that allows a file yet isn't pollable, it's merely
supported to allow communication on an IOPOLL ring, not because we can
poll for completion of it.

Put the assigned file early and clear it, so we don't attempt to poll
for it.

Reported-by: syzbot+1a0a53300ce782f8b3ad@syzkaller.appspotmail.com
Fixes: 3f1d52ab ("io_uring: defer msg-ring file validity check until command issue")
Signed-off-by: Jens Axboe <axboe@kernel.dk>

aa184e86

14 May, 2022 3 commits

io_uring: implement multishot mode for accept · 4e86a2c9

Hao Xu authored 2 years ago

Refactor io_accept() to support multishot mode.

theoretical analysis:
  1) when connections come in fast
    - singleshot:
              add accept sqe(userspace) --> accept inline
                              ^                 |
                              |-----------------|
    - multishot:
             add accept sqe(userspace) --> accept inline
                                              ^     |
                                              |--*--|

    we do accept repeatedly in * place until get EAGAIN

  2) when connections come in at a low pressure
    similar thing like 1), we reduce a lot of userspace-kernel context
    switch and useless vfs_poll()

tests:
Did some tests, which goes in this way:

  server    client(multiple)
  accept    connect
  read      write
  write     read
  close     close

Basically, raise up a number of clients(on same machine with server) to
connect to the server, and then write some data to it, the server will
write those data back to the client after it receives them, and then
close the connection after write return. Then the client will read the
data and then close the connection. Here I test 10000 clients connect
one server, data size 128 bytes. And each client has a go routine for
it, so they come to the server in short time.
test 20 times before/after this patchset, time spent:(unit cycle, which
is the return value of clock())
before:
  1930136+1940725+1907981+1947601+1923812+1928226+1911087+1905897+1941075
  +1934374+1906614+1912504+1949110+1908790+1909951+1941672+1969525+1934984
  +1934226+1914385)/20.0 = 1927633.75
after:
  1858905+1917104+1895455+1963963+1892706+1889208+1874175+1904753+1874112
  +1874985+1882706+1884642+1864694+1906508+1916150+1924250+1869060+1889506
  +1871324+1940803)/20.0 = 1894750.45

(1927633.75 - 1894750.45) / 1927633.75 = 1.65%
Signed-off-by: Hao Xu <howeyxu@tencent.com>
Link: https://lore.kernel.org/r/20220514142046.58072-5-haoxu.linux@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

4e86a2c9

io_uring: let fast poll support multishot · dbc2564c

Hao Xu authored 2 years ago

For operations like accept, multishot is a useful feature, since we can
reduce a number of accept sqe. Let's integrate it to fast poll, it may
be good for other operations in the future.
Signed-off-by: Hao Xu <howeyxu@tencent.com>
Link: https://lore.kernel.org/r/20220514142046.58072-4-haoxu.linux@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

dbc2564c

io_uring: add REQ_F_APOLL_MULTISHOT for requests · 227685eb

Hao Xu authored 2 years ago

Add a flag to indicate multishot mode for fast poll. currently only
accept use it, but there may be more operations leveraging it in the
future. Also add a mask IO_APOLL_MULTI_POLLED which stands for
REQ_F_APOLL_MULTI | REQ_F_POLLED, to make the code short and cleaner.
Signed-off-by: Hao Xu <howeyxu@tencent.com>
Link: https://lore.kernel.org/r/20220514142046.58072-3-haoxu.linux@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

227685eb

13 May, 2022 8 commits

io_uring: only wake when the correct events are set · 1b1d7b4b

Dylan Yudaken authored 2 years ago

The check for waking up a request compares the poll_t bits, however this
will always contain some common flags so this always wakes up.

For files with single wait queues such as sockets this can cause the
request to be sent to the async worker unnecesarily. Further if it is
non-blocking will complete the request with EAGAIN which is not desired.

Here exclude these common events, making sure to not exclude POLLERR which
might be important.

Fixes: d7718a9d ("io_uring: use poll driven retry for files that support it")
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220512091834.728610-3-dylany@fb.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

1b1d7b4b

io_uring: avoid io-wq -EAGAIN looping for !IOPOLL · e0deb6a0

Pavel Begunkov authored 2 years ago

If an opcode handler semi-reliably returns -EAGAIN, io_wq_submit_work()
might continue busily hammer the same handler over and over again, which
is not ideal. The -EAGAIN handling in question was put there only for
IOPOLL, so restrict it to IOPOLL mode only where there is no other
recourse than to retry as we cannot wait.

Fixes: def596e9 ("io_uring: support for IO polling")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f168b4f24181942f3614dd8ff648221736f572e6.1652433740.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

e0deb6a0

io_uring: add flag for allocating a fully sparse direct descriptor space · a8da73a3

Jens Axboe authored 2 years ago

Currently to setup a fully sparse descriptor space upfront, the app needs
to alloate an array of the full size and memset it to -1 and then pass
that in. Make this a bit easier by allowing a flag that simply does
this internally rather than needing to copy each slot separately.

This works with IORING_REGISTER_FILES2 as the flag is set in struct
io_uring_rsrc_register, and is only allow when the type is
IORING_RSRC_FILE as this doesn't make sense for registered buffers.
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

a8da73a3

io_uring: bump max direct descriptor count to 1M · 09893e15

Jens Axboe authored 2 years ago

We currently limit these to 32K, but since we're now backing the table
space with vmalloc when needed, there's no reason why we can't make it
bigger. The total space is limited by RLIMIT_NOFILE as well.
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

09893e15

io_uring: allow allocated fixed files for accept · c30c3e00

Jens Axboe authored 2 years ago

If the application passes in IORING_FILE_INDEX_ALLOC as the file_slot,
then that's a hint to allocate a fixed file descriptor rather than have
one be passed in directly.

This can be useful for having io_uring manage the direct descriptor space,
and also allows multi-shot support to work with fixed files.

Normal accept direct requests will complete with 0 for success, and < 0
in case of error. If io_uring is asked to allocated the direct descriptor,
then the direct descriptor is returned in case of success.
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

c30c3e00

io_uring: allow allocated fixed files for openat/openat2 · 1339f24b

Jens Axboe authored 2 years ago

If the application passes in IORING_FILE_INDEX_ALLOC as the file_slot,
then that's a hint to allocate a fixed file descriptor rather than have
one be passed in directly.

This can be useful for having io_uring manage the direct descriptor space.

Normal open direct requests will complete with 0 for success, and < 0
in case of error. If io_uring is asked to allocated the direct descriptor,
then the direct descriptor is returned in case of success.
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

1339f24b

io_uring: add basic fixed file allocator · b70b8e33

Jens Axboe authored 2 years ago

Applications currently always pick where they want fixed files to go.
In preparation for allowing these types of commands with multishot
support, add a basic allocator in the fixed file table.
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

b70b8e33

io_uring: track fixed files with a bitmap · d78bd8ad

Jens Axboe authored 2 years ago

In preparation for adding a basic allocator for direct descriptors,
add helpers that set/clear whether a file slot is used.
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

d78bd8ad

11 May, 2022 1 commit

fs,io_uring: add infrastructure for uring-cmd · ee692a21

Jens Axboe authored 2 years ago

file_operations->uring_cmd is a file private handler.
This is somewhat similar to ioctl but hopefully a lot more sane and
useful as it can be used to enable many io_uring capabilities for the
underlying operation.

IORING_OP_URING_CMD is a file private kind of request. io_uring doesn't
know what is in this command type, it's for the provider of ->uring_cmd()
to deal with.
Co-developed-by: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220511054750.20432-2-joshi.k@samsung.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

ee692a21

09 May, 2022 4 commits

io_uring: support CQE32 for nop operation · 2bb04df7

Stefan Roesch authored 2 years ago

This adds support for filling the extra1 and extra2 fields for large
CQE's.
Co-developed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Stefan Roesch <shr@fb.com>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Link: https://lore.kernel.org/r/20220426182134.136504-13-shr@fb.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

2bb04df7

io_uring: enable CQE32 · 76c68fbf

Stefan Roesch authored 2 years ago

This enables large CQE's in the uring setup.
Co-developed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Stefan Roesch <shr@fb.com>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Link: https://lore.kernel.org/r/20220426182134.136504-12-shr@fb.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

76c68fbf

io_uring: support CQE32 in /proc info · f9b3dfcc

Stefan Roesch authored 2 years ago

This exposes the extra1 and extra2 fields in the /proc output.
Signed-off-by: Stefan Roesch <shr@fb.com>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Link: https://lore.kernel.org/r/20220426182134.136504-11-shr@fb.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

f9b3dfcc

io_uring: add tracing for additional CQE32 fields · c4bb964f

Stefan Roesch authored 2 years ago

This adds tracing for the extra1 and extra2 fields.
Co-developed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Stefan Roesch <shr@fb.com>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Link: https://lore.kernel.org/r/20220426182134.136504-10-shr@fb.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

c4bb964f