Commits · a141dd896f544df9627502cfb3fc1a73fb6587e4 · Kirill Smelkov / linux

23 Aug, 2021 40 commits

io_uring: correct __must_hold annotation · a141dd89

Jens Axboe authored Aug 12, 2021

io_req_free_batch() has a __must_hold annotation referencing a
request being passed in, but we're passing in the context.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

a141dd89

io_uring: code clean for completion_lock in io_arm_poll_handler() · 41a5169c

Hao Xu authored Aug 12, 2021

We can merge two spin_unlock() operations to one since we removed some
code not long ago.
Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

41a5169c

io_uring: remove files pointer in cancellation functions · f552a27a

Hao Xu authored Aug 12, 2021

When doing cancellation, we use a parameter to indicate where it's from
do_exit or exec. So a boolean value is good enough for this, remove the
struct files* as it is not necessary.
Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
[axboe: fixup io_uring_files_cancel for !CONFIG_IO_URING]
Signed-off-by: Jens Axboe <axboe@kernel.dk>

f552a27a

io_uring: extract io_uring_files_cancel() in io_uring_task_cancel() · a4aadd11

Hao Xu authored Aug 12, 2021

Extract io_uring_files_cancel() call in io_uring_task_cancel() to make
io_uring_files_cancel() and io_uring_task_cancel() coherent and easy to
read.
Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

a4aadd11

io_uring: optimise hot path of ltimeout prep · fd08e530

Pavel Begunkov authored Aug 11, 2021

io_prep_linked_timeout() grew too heavy and compiler now refuse to
inline the function. Help it by splitting in two and annotating with
inline.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/560636717a32e9513724f09b9ecaace942dde4d4.1628705069.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

fd08e530

io_uring: skip request refcounting · 20e60a38

Pavel Begunkov authored Aug 11, 2021

As submission references are gone, there is only one initial reference
left. Instead of actually doing atomic refcounting, add a flag
indicating whether we're going to take more refs or doing any other sync
magic. The flag should be set before the request may get used in
parallel.

Together with the previous patch it saves 2 refcount atomics per request
for IOPOLL and IRQ completions, and 1 atomic per req for inline
completions, with some exceptions. In particular, currently, there are
three cases, when the refcounting have to be enabled:
- Polling, including apoll. Because double poll entries takes a ref.
  Might get relaxed in the near future.
- Link timeouts, enabled for both, the timeout and the request it's
  bound to, because they work in-parallel and we need to synchronise
  to cancel one of them on completion.
- When a request gets in io-wq, because it doesn't hold uring_lock and
  we need guarantees of submission references.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/8b204b6c5f6643062270a1913d6d3a7f8f795fd9.1628705069.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

20e60a38

io_uring: remove submission references · 5d5901a3

Pavel Begunkov authored Aug 11, 2021

Requests are by default given with two references, submission and
completion. Completion references are straightforward, they represent
request ownership and are put when a request is completed or so.
Submission references are a bit more trickier. They're needed when
io_issue_sqe() followed deep into the submission stack (e.g. in fs,
block, drivers, etc.), request may have given away for concurrent
execution or already completed, and the code unwinding back to
io_issue_sqe() may be accessing some pieces of our requests, e.g.
file or iov.

Now, we prevent such async/in-depth completions by pushing requests
through task_work. Punting to io-wq is also done through task_works,
apart from a couple of cases with a pretty well known context. So,
there're two cases:
1) io_issue_sqe() from the task context and protected by ->uring_lock.
Either requests return back to io_uring or handed to task_work, which
won't be executed because we're currently controlling that task. So,
we can be sure that requests are staying alive all the time and we don't
need submission references to pin them.

2) io_issue_sqe() from io-wq, which doesn't hold the mutex. The role of
submission reference is played by io-wq reference, which is put by
io_wq_submit_work(). Hence, it should be fine.

Considering that, we can carefully kill the submission reference.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/6b68f1c763229a590f2a27148aee77767a8d7750.1628705069.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

5d5901a3

io_uring: remove req_ref_sub_and_test() · 91c2f697

Pavel Begunkov authored Aug 11, 2021

Soon, we won't need to put several references at once, remove
req_ref_sub_and_test() and @nr argument from io_put_req_deferred(),
and put the rest of the references by hand.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1868c7554108bff9194fb5757e77be23fadf7fc0.1628705069.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

91c2f697

io_uring: move req_ref_get() and friends · 21c843d5

Pavel Begunkov authored Aug 11, 2021

Move all request refcount helpers to avoid forward declarations in the
future.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/89fd36f6f3fe5b733dfe4546c24725eee40df605.1628705069.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

21c843d5

io_uring: remove IRQ aspect of io_ring_ctx completion lock · 79ebeaee

Jens Axboe authored Aug 10, 2021

We have no hard/soft IRQ users of this lock left, remove any IRQ
disabling/saving and restoring when grabbing this lock.

This is straight forward with no users entering with IRQs disabled
anymore, the only thing to look out for is the waitqueue poll head
lock which nests inside the completion lock. That needs IRQs disabled,
and hence we have to do that now instead of relying on the outer lock
doing so.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

79ebeaee

io_uring: run regular file completions from task_work · 8ef12efe

Jens Axboe authored Aug 10, 2021

This is in preparation to making the completion lock work outside of
hard/soft IRQ context.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

8ef12efe

io_uring: run linked timeouts from task_work · 89b263f6

Jens Axboe authored Aug 10, 2021

This is in preparation to making the completion lock work outside of
hard/soft IRQ context.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

89b263f6

io_uring: run timeouts from task_work · 89850fce

Jens Axboe authored Aug 10, 2021

This is in preparation to making the completion lock work outside of
hard/soft IRQ context.

Add a timeout_lock to handle the ordering of timeout completions or
cancelations with the timeouts actually triggering.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

89850fce

io_uring: remove file batch-get optimisation · 62906e89

Pavel Begunkov authored Aug 10, 2021

For requests with non-fixed files, instead of grabbing just one
reference, we get by the number of left requests, so the following
requests using the same file can take it without atomics.

However, it's not all win. If there is one request in the middle
not using files or having a fixed file, we'll need to put back the left
references. Even worse if an application submits requests dealing with
different files, it will do a put for each new request, so doubling the
number of atomics needed. Also, even if not used, it's still takes some
cycles in the submission path.

If a file used many times, it rather makes sense to pre-register it, if
not, we may fall in the described pitfall. So, this optimisation is a
matter of use case. Go with the simpliest code-wise way, remove it.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

62906e89

io_uring: clean up tctx_task_work() · 6294f368

Pavel Begunkov authored Aug 10, 2021

After recent fixes, tctx_task_work() always does proper spinlocking
before looking into ->task_list, so now we don't need atomics for
->task_state, replace it with non-atomic task_running using the critical
section.

Tide it up, combine two separate block with spinlocking, and always try
to splice in there, so we do less locking when new requests are arriving
during the function execution.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
[axboe: fix missing ->task_running reset on task_work_add() failure]
Signed-off-by: Jens Axboe <axboe@kernel.dk>

6294f368

io_uring: inline io_poll_remove_waitqs · 5d709043

Pavel Begunkov authored Aug 09, 2021

Inline io_poll_remove_waitqs() into its only user and clean it up.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2f1a91a19ffcd591531dc4c61e2f11c64a2d6a6d.1628536684.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

5d709043

io_uring: remove extra argument for overflow flush · 90f67366

Pavel Begunkov authored Aug 09, 2021

Unlike __io_cqring_overflow_flush(), nobody does forced flushing with
io_cqring_overflow_flush(), so removed the argument from it.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7594f869ca41b7cfb5a35a3c7c2d402242834e9e.1628536684.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

90f67366

io_uring: inline struct io_comp_state · cd0ca2e0

Pavel Begunkov authored Aug 09, 2021

Inline struct io_comp_state into struct io_submit_state. They are
already coupled tightly, together with mixed responsibilities it
only brings confusion having them separately.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e55bba77426b399e3a2e54e3c6c267c6a0fc4b57.1628536684.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

cd0ca2e0

io_uring: use inflight_entry instead of compl.list · bb943b82

Pavel Begunkov authored Aug 09, 2021

req->compl.list is used to cache freed requests, and so can't overlap in
time with req->inflight_entry. So, use inflight_entry to link requests
and remove compl.list.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e430e79d22d70a190d718831bda7bfed1daf8976.1628536684.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

bb943b82

io_uring: remove redundant args from cache_free · 7255834e

Pavel Begunkov authored Aug 09, 2021

We don't use @tsk argument of io_req_cache_free(), remove it.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/6a28b4a58ee0aaf0db98e2179b9c9f06f9b0cca1.1628536684.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

7255834e

io_uring: cache __io_free_req()'d requests · c34b025f

Pavel Begunkov authored Aug 09, 2021

Don't kfree requests in __io_free_req() but put them back into the
internal request cache. That makes allocations more sustainable and will
be used for refcounting optimisations.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/9f4950fbe7771c8d41799366d0a3a08ac3040236.1628536684.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

c34b025f

io_uring: move io_fallback_req_func() · f56165e6

Pavel Begunkov authored Aug 09, 2021

Move io_fallback_req_func() to kill yet another forward declaration.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d0a8f9d9a0057ed761d6237167d51c9378798d2d.1628536684.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

f56165e6

io_uring: optimise putting task struct · e9dbe221

Pavel Begunkov authored Aug 09, 2021

We cache all the reference to task + tctx, so if io_put_task() is
called by the corresponding task itself, we can save on atomics and
return the refs right back into the cache.

It's beneficial for all inline completions, and also iopolling, when
polling and submissions are done by the same task, including
SQPOLL|IOPOLL.

Note: io_uring_cancel_generic() can return refs to the cache as well,
so those should be flushed in the loop for tctx_inflight() to work
right.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/6fe9646b3cb70e46aca1f58426776e368c8926b3.1628471125.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

e9dbe221

io_uring: drop exec checks from io_req_task_submit · af066f31

Pavel Begunkov authored Aug 09, 2021

In case of on-exec io_uring cancellations, tasks already wait for all
submitted requests to get completed/cancelled, so we don't need to check
for ->in_execve separately.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/be8707049f10df9d20ca03dc4ca3316239b5e8e0.1628471125.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

af066f31

io_uring: kill unused IO_IOPOLL_BATCH · bbbca094

Pavel Begunkov authored Aug 09, 2021

IO_IOPOLL_BATCH is not used, delete it.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b2bdf19dbee2c9fc8865bbab9412135a14e24a64.1628471125.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

bbbca094

io_uring: improve ctx hang handling · 58d3be2c

Pavel Begunkov authored Aug 09, 2021

If io_ring_exit_work() can't get it done in 5 minutes, something is
going very wrong, don't keep spinning at HZ / 20 rate, it doesn't help
and it may take much of CPU time if there is a lot of workers stuck as
such.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/9e2d1ca81d569f6bc628af1a42ff6663bff7ce9c.1628471125.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

58d3be2c

io_uring: deduplicate open iopoll check · d3fddf6d

Pavel Begunkov authored Aug 09, 2021

Move IORING_SETUP_IOPOLL check into __io_openat_prep(), so both openat
and openat2 reuse it.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/9a73ce83e4ee60d011180ef177eecef8e87ff2a2.1628471125.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

d3fddf6d

io_uring: inline io_free_req_deferred · 543af3a1

Pavel Begunkov authored Aug 09, 2021

Inline io_free_req_deferred(), there is no reason to keep it separated.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/ce04b7180d4eac0d69dd00677b227eefe80c2cc5.1628471125.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

543af3a1

io_uring: move io_rsrc_node_alloc() definition · b9bd2bea

Pavel Begunkov authored Aug 09, 2021

Move the function together with io_rsrc_node_ref_zero() in the source
file as it is to get rid of forward declarations.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4d81f6f833e7d017860b24463a9a68b14a8a5ed2.1628471125.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

b9bd2bea

io_uring: move io_put_task() definition · 6a290a14

Pavel Begunkov authored Aug 09, 2021

Move the function in the source file as it is to get rid of forward
declarations.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/33d917d69e4206557c75a5b98fe22bcdf77ce47d.1628471125.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

6a290a14

io_uring: extract a helper for ctx quiesce · e73c5c7c

Pavel Begunkov authored Aug 09, 2021

Refactor __io_uring_register() by extracting a helper responsible for
ctx queisce. Looks better and will make it easier to add more
optimisations.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0339e0027504176be09237eefa7945bf9a6f153d.1628471125.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

e73c5c7c

io_uring: optimise io_cqring_wait() hot path · 90291099

Pavel Begunkov authored Aug 09, 2021

Turns out we always init struct io_wait_queue in io_cqring_wait(), even
if it's not used after, i.e. there are already enough of CQEs. And often
it's exactly what happens, for instance, requests may have been
completed inline, or in case of io_uring_enter(submit=N, wait=1).

It shows up in my profiler, so optimise it by delaying the struct init.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/6f1b81c60b947d165583dc333947869c3d85d037.1628471125.git.asml.silence@gmail.com
[axboe: fixed up for new cqring wait]
Signed-off-by: Jens Axboe <axboe@kernel.dk>

90291099

io_uring: add more locking annotations for submit · 282cdc86

Pavel Begunkov authored Aug 09, 2021

Add more annotations for submission path functions holding ->uring_lock.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/128ec4185e26fbd661dd3a424aa66108ee8ff951.1628471125.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

282cdc86

io_uring: don't halt iopoll too early · a2416e1e

Pavel Begunkov authored Aug 09, 2021

IOPOLL users should care more about getting completions for requests
they submitted, but not in "device did/completed something". Currently,
io_do_iopoll() may return a positive number, which will instruct
io_iopoll_check() to break the loop and end the syscall, even if there
is not enough CQEs or none at all.

Don't return positive numbers, so io_iopoll_check() exits only when it
gets an actual error, need reschedule or got enough CQEs.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/641a88f751623b6758303b3171f0a4141f06726e.1628471125.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

a2416e1e

io_uring: refactor io_alloc_req · 864ea921

Pavel Begunkov authored Aug 09, 2021

Replace the main if of io_flush_cached_reqs() with inverted condition +
goto, so all the cases are handled in the same way. And also extract
io_preinit_req() to make it cleaner and easier to refer to.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1abcba1f7b55dc53bf1dbe95036e345ffb1d5b01.1628471125.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

864ea921

io-wq: improve wq_list_add_tail() · 8724dd8c

Pavel Begunkov authored Aug 09, 2021

Prepare nodes that we're going to add before actually linking them, it's
always safer and costs us nothing.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f7e53f0c84c02ed6748c488ed0789b98f8cc6185.1628471125.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

8724dd8c

io_uring: remove unnecessary PF_EXITING check · 2215bed9

Pavel Begunkov authored Aug 09, 2021

We prefer nornal task_works even if it would fail requests inside. Kill
a PF_EXITING check in io_req_task_work_add(), task_work_add() handles
well dying tasks, i.e. return error when can't enqueue due to late
stages of do_exit().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/fc14297e8441cd8f5d1743a2488cf0df09bf48ac.1628471125.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

2215bed9

io_uring: clean io-wq callbacks · ebc11b6c

Pavel Begunkov authored Aug 09, 2021

Move io-wq callbacks closer to each other, so it's easier to work with
them, and rename io_free_work() into io_wq_free_work() for consistency.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/851bbc7f0f86f206d8c1333efee8bcb9c26e419f.1628471125.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

ebc11b6c

io_uring: avoid touching inode in rw prep · c97d8a0f

Pavel Begunkov authored Aug 09, 2021

If we use fixed files, we can be sure (almost) that REQ_F_ISREG is set.
However, for non-reg files io_prep_rw() still will look into inode to
double check, and that's expensive and can be avoided.

The only caveat is that it only currently works with 64+ bit
architectures, see FFS_ISREG, so we should consider that.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0a62780c491ca2522cd52db4ae3f16e03aafed0f.1628471125.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

c97d8a0f

io_uring: rename io_file_supports_async() · b191e2df

Pavel Begunkov authored Aug 09, 2021

io_file_supports_async() checks whether a file supports nowait
operations, so "async" in the name is misleading. Rename it.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/33d55b5ce43aa1884c637c1957f1e30d30dc3bec.1628471125.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

b191e2df