Commits · 39220e8d4a2aaab045ea03cc16d737e85d0817bf · Kirill Smelkov / linux

29 Jan, 2020 8 commits

eventpoll: support non-blocking do_epoll_ctl() calls · 39220e8d

Jens Axboe authored Jan 08, 2020

Also make it available outside of epoll, along with the helper that
decides if we need to copy the passed in epoll_event.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

39220e8d

eventpoll: abstract out epoll_ctl() handler · 58e41a44
Jens Axboe authored Jan 08, 2020
```
No functional changes in this patch.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
```
58e41a44

io_uring: fix linked command file table usage · f86cd20c

Jens Axboe authored Jan 29, 2020

We're not consistent in how the file table is grabbed and assigned if we
have a command linked that requires the use of it.

Add ->file_table to the io_op_defs[] array, and use that to determine
when to grab the table instead of having the handlers set it if they
need to defer. This also means we can kill the IO_WQ_WORK_NEEDS_FILES
flag. We always initialize work->files, so io-wq can just check for
that.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

f86cd20c

io_uring: support using a registered personality for commands · 75c6a039

Jens Axboe authored Jan 28, 2020

For personalities previously registered via IORING_REGISTER_PERSONALITY,
allow any command to select them. This is done through setting
sqe->personality to the id returned from registration, and then flagging
sqe->flags with IOSQE_PERSONALITY.
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

75c6a039

io_uring: allow registering credentials · 071698e1

Jens Axboe authored Jan 28, 2020

If an application wants to use a ring with different kinds of
credentials, it can register them upfront. We don't lookup credentials,
the credentials of the task calling IORING_REGISTER_PERSONALITY is used.

An 'id' is returned for the application to use in subsequent personality
support.
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

071698e1

io_uring: add io-wq workqueue sharing · 24369c2e

Pavel Begunkov authored Jan 28, 2020

If IORING_SETUP_ATTACH_WQ is set, it expects wq_fd in io_uring_params to
be a valid io_uring fd io-wq of which will be shared with the newly
created io_uring instance. If the flag is set but it can't share io-wq,
it fails.

This allows creation of "sibling" io_urings, where we prefer to keep the
SQ/CQ private, but want to share the async backend to minimize the amount
of overhead associated with having multiple rings that belong to the same
backend.
Reported-by: Jens Axboe <axboe@kernel.dk>
Reported-by: Daurnimator <quae@daurnimator.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

24369c2e

io-wq: allow grabbing existing io-wq · eba6f5a3

Pavel Begunkov authored Jan 28, 2020

Export a helper to attach to an existing io-wq, rather than setting up
a new one. This is doable now that we have reference counted io_wq's.
Reported-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

eba6f5a3

io_uring/io-wq: don't use static creds/mm assignments · cccf0ee8

Jens Axboe authored Jan 27, 2020

We currently setup the io_wq with a static set of mm and creds. Even for
a single-use io-wq per io_uring, this is suboptimal as we have may have
multiple enters of the ring. For sharing the io-wq backend, it doesn't
work at all.

Switch to passing in the creds and mm when the work item is setup. This
means that async work is no longer deferred to the io_uring mm and creds,
it is done with the current mm and creds.

Flag this behavior with IORING_FEAT_CUR_PERSONALITY, so applications know
they can rely on the current personality (mm and creds) being the same
for direct issue and async issue.
Reviewed-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

cccf0ee8

27 Jan, 2020 4 commits

io-wq: make the io_wq ref counted · 848f7e18

Jens Axboe authored Jan 23, 2020

In preparation for sharing an io-wq across different users, add a
reference count that manages destruction of it.
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

848f7e18

io_uring: fix refcounting with batched allocations at OOM · 9466f437

Pavel Begunkov authored Jan 25, 2020

In case of out of memory the second argument of percpu_ref_put_many() in
io_submit_sqes() may evaluate into "nr - (-EAGAIN)", that is clearly
wrong.

Fixes: 2b85edfc ("io_uring: batch getting pcpu references")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

9466f437

io_uring: add comment for drain_next · 8cdf2193

Pavel Begunkov authored Jan 25, 2020

Draining the middle of a link is tricky, so leave a comment there
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

8cdf2193

io_uring: don't attempt to copy iovec for READ/WRITE · 980ad263

Jens Axboe authored Jan 24, 2020

For the non-vectored variant of READV/WRITEV, we don't need to setup an
async io context, and we flag that appropriately in the io_op_defs
array. However, in fixing this for the 5.5 kernel in commit 74566df3
we didn't have these opcodes, so the check there was added just for the
READ_FIXED and WRITE_FIXED opcodes. Replace that check with just a
single check for needing async context, that covers all four of these
read/write variants that don't use an iovec.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

980ad263

22 Jan, 2020 2 commits

io_uring: honor IOSQE_ASYNC for linked reqs · 86a761f8

Pavel Begunkov authored Jan 22, 2020

REQ_F_FORCE_ASYNC is checked only for the head of a link. Fix it.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

86a761f8

io_uring: prep req when do IOSQE_ASYNC · 1118591a

Pavel Begunkov authored Jan 22, 2020

Whenever IOSQE_ASYNC is set, requests will be punted to async without
getting into io_issue_req() and without proper preparation done (e.g.
io_req_defer_prep()). Hence they will be left uninitialised.

Prepare them before punting.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

1118591a

21 Jan, 2020 26 commits

io_uring: use labeled array init in io_op_defs · 0463b6c5

Pavel Begunkov authored Jan 18, 2020

Don't rely on implicit ordering of IORING_OP_ and explicitly place them
at a right place in io_op_defs. Now former comments are now a part of
the code and won't ever outdate.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

0463b6c5

io_uring: optimise sqe-to-req flags translation · 6b47ee6e

Pavel Begunkov authored Jan 18, 2020

For each IOSQE_* flag there is a corresponding REQ_F_* flag. And there
is a repetitive pattern of their translation:
e.g. if (sqe->flags & SQE_FLAG*) req->flags |= REQ_F_FLAG*

Use same numeric values/bits for them and copy instead of manual
handling.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

6b47ee6e

io_uring: remove REQ_F_IO_DRAINED · 87987898

Pavel Begunkov authored Jan 18, 2020

A request can get into the defer list only once, there is no need for
marking it as drained, so remove it. This probably was left after
extracting __need_defer() for use in timeouts.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

87987898

io_uring: file switch work needs to get flushed on exit · e46a7950

Jens Axboe authored Jan 17, 2020

We currently flush early, but if we have something in progress and a
new switch is scheduled, we need to ensure to flush after our teardown
as well.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

e46a7950

io_uring: hide uring_fd in ctx · b14cca0c

Pavel Begunkov authored Jan 17, 2020

req->ring_fd and req->ring_file are used only during the prep stage
during submission, which is is protected by mutex. There is no need
to store them per-request, place them in ctx.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

b14cca0c

io_uring: remove extra check in __io_commit_cqring · 07910158

Pavel Begunkov authored Jan 17, 2020

__io_commit_cqring() is almost always called when there is a change in
the rings, so the check is rather pessimising.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

07910158

io_uring: optimise use of ctx->drain_next · 711be031

Pavel Begunkov authored Jan 17, 2020

Move setting ctx->drain_next to the only place it could be set, when it
got linked non-head requests. The same for checking it, it's interesting
only for a head of a link or a non-linked request.

No functional changes here. This removes some code from the common path
and also removes REQ_F_DRAIN_LINK flag, as it doesn't need it anymore.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

711be031

io_uring: add support for probing opcodes · 66f4af93

Jens Axboe authored Jan 16, 2020

The application currently has no way of knowing if a given opcode is
supported or not without having to try and issue one and see if we get
-EINVAL or not. And even this approach is fraught with peril, as maybe
we're getting -EINVAL due to some fields being missing, or maybe it's
just not that easy to issue that particular command without doing some
other leg work in terms of setup first.

This adds IORING_REGISTER_PROBE, which fills in a structure with info
on what it supported or not. This will work even with sparse opcode
fields, which may happen in the future or even today if someone
backports specific features to older kernels.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

66f4af93

io_uring: account fixed file references correctly in batch · 10fef4be

Jens Axboe authored Jan 09, 2020

We can't assume that the whole batch has fixed files in it. If it's a
mix, or none at all, then we can end up doing a ref put that either
messes up accounting, or causes an oops if we have no fixed files at
all.

Also ensure we free requests properly between inflight accounted and
normal requests.

Fixes: 82c721577011 ("io_uring: extend batch freeing to cover more cases")
Reported-by: Dmitrii Dolgov <9erthalion6@gmail.com>
Reported-by: Pavel Begunkov <asml.silence@gmail.com>
Tested-by: Dmitrii Dolgov <9erthalion6@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

10fef4be

io_uring: add opcode to issue trace event · 354420f7

Jens Axboe authored Jan 08, 2020

For some test apps at least, user_data is just zeroes. So it's not a
good way to tell what the command actually is. Add the opcode to the
issue trace point.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

354420f7

io_uring: add support for IORING_OP_OPENAT2 · cebdb986

Jens Axboe authored Jan 08, 2020

Add support for the new openat2(2) system call. It's trivial to do, as
we can have openat(2) just be wrapped around it.
Suggested-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

cebdb986

io_uring: remove 'fname' from io_open structure · f8748881

Jens Axboe authored Jan 08, 2020

We only use it internally in the prep functions for both statx and
openat, so we don't need it to be persistent across the request.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

f8748881

io_uring: add 'struct open_how' to the openat request context · c12cedf2

Jens Axboe authored Jan 08, 2020

We'll need this for openat2(2) support, remove flags and mode from
the existing io_open struct.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

c12cedf2

io_uring: enable option to only trigger eventfd for async completions · f2842ab5

Jens Axboe authored Jan 08, 2020

If an application is using eventfd notifications with poll to know when
new SQEs can be issued, it's expecting the following read/writes to
complete inline. And with that, it knows that there are events available,
and don't want spurious wakeups on the eventfd for those requests.

This adds IORING_REGISTER_EVENTFD_ASYNC, which works just like
IORING_REGISTER_EVENTFD, except it only triggers notifications for events
that happen from async completions (IRQ, or io-wq worker completions).
Any completions inline from the submission itself will not trigger
notifications.
Suggested-by: Mark Papadakis <markuspapadakis@icloud.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

f2842ab5

io_uring: change io_ring_ctx bool fields into bit fields · 69b3e546

Jens Axboe authored Jan 08, 2020

In preparation for adding another one, which would make us spill into
another long (and hence bump the size of the ctx), change them to
bit fields.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

69b3e546

io_uring: file set registration should use interruptible waits · c150368b

Jens Axboe authored Jan 08, 2020

If an application attempts to register a set with unbounded requests
pending, we can be stuck here forever if they don't complete. We can
make this wait interruptible, and just abort if we get signaled.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

c150368b

io_uring: Remove unnecessary null check · 96fd84d8

YueHaibing authored Jan 07, 2020

Null check kfree is redundant, so remove it.
This is detected by coccinelle.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

96fd84d8

io_uring: add support for send(2) and recv(2) · fddaface

Jens Axboe authored Jan 04, 2020

This adds IORING_OP_SEND for send(2) support, and IORING_OP_RECV for
recv(2) support.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

fddaface

io_uring: remove extra io_wq_current_is_worker() · 2550878f

Pavel Begunkov authored Dec 30, 2019

io_wq workers use io_issue_sqe() to forward sqes and never
io_queue_sqe(). Remove extra check for io_wq_current_is_worker()
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

2550878f

io_uring: optimise commit_sqring() for common case · caf582c6

Pavel Begunkov authored Dec 30, 2019

It should be pretty rare to not submitting anything when there is
something in the ring. No need to keep heuristics for this case.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

caf582c6

io_uring: optimise head checks in io_get_sqring() · ee7d46d9

Pavel Begunkov authored Dec 30, 2019

A user may ask to submit more than there is in the ring, and then
io_uring will submit as much as it can. However, in the last iteration
it will allocate an io_kiocb and immediately free it. It could do
better and adjust @to_submit to what is in the ring.

And since the ring's head is already checked here, there is no need to
do it in the loop, spamming with smp_load_acquire()'s barriers
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

ee7d46d9

io_uring: clamp to_submit in io_submit_sqes() · 9ef4f124

Pavel Begunkov authored Dec 30, 2019

Make io_submit_sqes() to clamp @to_submit itself. It removes duplicated
code and prepares for following changes.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

9ef4f124

io_uring: add support for IORING_SETUP_CLAMP · 8110c1a6

Jens Axboe authored Dec 28, 2019

Some applications like to start small in terms of ring size, and then
ramp up as needed. This is a bit tricky to do currently, since we don't
advertise the max ring size.

This adds IORING_SETUP_CLAMP. If set, and the values for SQ or CQ ring
size exceed what we support, then clamp them at the max values instead
of returning -EINVAL. Since we return the chosen ring sizes after setup,
no further changes are needed on the application side. io_uring already
changes the ring sizes if the application doesn't ask for power-of-two
sizes, for example.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

8110c1a6

io_uring: extend batch freeing to cover more cases · c6ca97b3

Jens Axboe authored Dec 28, 2019

Currently we only batch free if fixed files are used, no links, no aux
data, etc. This extends the batch freeing to only exclude the linked
case and fallback case, and make io_free_req_many() handle the other
cases just fine.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

c6ca97b3

io_uring: wrap multi-req freeing in struct req_batch · 8237e045

Jens Axboe authored Dec 28, 2019

This cleans up the code a bit, and it allows us to build on top of the
multi-req freeing.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

8237e045

io_uring: batch getting pcpu references · 2b85edfc

Pavel Begunkov authored Dec 28, 2019

percpu_ref_tryget() has its own overhead. Instead getting a reference
for each request, grab a bunch once per io_submit_sqes().

~5% throughput boost for a "submit and wait 128 nops" benchmark.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>

__io_req_free_empty() -> __io_req_do_free()
Signed-off-by: Jens Axboe <axboe@kernel.dk>

2b85edfc