- 24 Apr, 2022 5 commits
-
-
Pavel Begunkov authored
There is a new (req->flags & REQ_F_POLLED) check in __io_submit_flush_completions() for poll recycling, however io_free_batch_list() is a much better place for it. First, we prefer it after putting the last req ref just to avoid potential problems in the future. Also, it'll enable the recycling for IOPOLL and also will place it closer to all other req->flags bits clean up requests. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/31dfe1dafda66ba3ce36b301884ec7e162c777d1.1647897811.git.asml.silence@gmail.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
We do several req->flags checks in the fast path of io_free_batch_list(). One explicit check of REQ_F_REFCOUNT, and two other hidden in io_queue_next() and io_dismantle_req(). Moreover, there is a io_req_put_rsrc_locked() call in between, so there is no hope req->flags will be preserved in registers. All those flags if not a slow path than definitely a slower path, so put them all under a single flags mask check and save several mem reloads and ifs. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/0fb493f73f2009aea395c570c2932fecaa4e1244.1647897811.git.asml.silence@gmail.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Move the fast path from io_req_find_next() into callers. It prepares us for further changes. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/10bd0e564472dde0c7f8d90ae317d05356cd565a.1647897811.git.asml.silence@gmail.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Now io_commit_cqring() is simple and it tolerates well being called without a new CQE filled, so kill a bunch of not needed anymore guards. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/36aed692dff402bba00a444a63a9cd2e97a340ea.1647897811.git.asml.silence@gmail.com [axboe: fold in followup fix] Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
There should be no completions stashed when we first get into tctx_task_work(), so move completion flushing checks a bit later after we had a chance to execute some task works. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/c6765c804f3c438591b9825ab9c43d22039073c4.1647897811.git.asml.silence@gmail.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- 17 Apr, 2022 2 commits
-
-
Pavel Begunkov authored
If all completed requests in io_do_iopoll() were marked with REQ_F_CQE_SKIP, we'll not only skip CQE posting but also io_free_batch_list() leaking memory and resources. Move @nr_events increment before REQ_F_CQE_SKIP check. We'll potentially return the value greater than the real one, but iopolling will deal with it and the userspace will re-iopoll if needed. In anyway, I don't think there are many use cases for REQ_F_CQE_SKIP + IOPOLL. Fixes: 83a13a41 ("io_uring: tweak iopoll CQE_SKIP event counting") Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/5072fc8693fbfd595f89e5d4305bfcfd5d2f0a64.1650186611.git.asml.silence@gmail.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
We just return failure in this case, but we need to release the iovec first. If we're doing IO with more than FAST_IOV segments, then the iovec is allocated and must be freed. Reported-by: syzbot+96b43810dfe9c3bb95ed@syzkaller.appspotmail.com Fixes: 584b0180 ("io_uring: move read/write file prep state into actual opcode handler") Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- 15 Apr, 2022 1 commit
-
-
Jens Axboe authored
We need to either restore creds properly if we fail on the file assignment, or just do the file assignment first instead. Let's do the latter as it's simpler, should make no difference here for file assignment. Link: https://lore.kernel.org/lkml/000000000000a7edb305dca75a50@google.com/ Reported-by: syzbot+60c52ca98513a8760a91@syzkaller.appspotmail.com Fixes: 6bf9c47a ("io_uring: defer file assignment") Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- 13 Apr, 2022 3 commits
-
-
Pavel Begunkov authored
We should not return an error code in req->result in io_poll_check_events(), because it may get mangled and returned as success. Just return the error code directly, the callers will fail the request or proceed accordingly. Fixes: 6bf9c47a ("io_uring: defer file assignment") Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/5f03514ee33324dc811fb93df84aee0f695fb044.1649862516.git.asml.silence@gmail.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
We pass "unlocked" into io_assign_file() in io_poll_check_events(), which can lead to double locking. Fixes: 6bf9c47a ("io_uring: defer file assignment") Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/2476d4ae46554324b599ee4055447b105f20a75a.1649862516.git.asml.silence@gmail.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Pass right issue_flags into into io_file_get_fixed() instead of IO_URING_F_UNLOCKED. It's probably not a problem at the moment but let's do it safer. Fixes: 6bf9c47a ("io_uring: defer file assignment") Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/7d242daa9df5d776907686977cd29fbceb4a2d8d.1649862516.git.asml.silence@gmail.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- 12 Apr, 2022 5 commits
-
-
Dylan Yudaken authored
Ensure that only 0 is passed for pad here. Fixes: c73ebb68 ("io_uring: add timeout support for io_uring_enter()") Signed-off-by:
Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220412163042.2788062-5-dylany@fb.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Dylan Yudaken authored
Only allow resv field to be 0 in struct io_uring_rsrc_update user arguments. Fixes: e7a6c00d ("io_uring: add support for registering ring file descriptors") Signed-off-by:
Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220412163042.2788062-4-dylany@fb.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Dylan Yudaken authored
Verify that the user does not pass in anything but 0 for this field. Fixes: 992da01a ("io_uring: change registration/upd/rsrc tagging ABI") Signed-off-by:
Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220412163042.2788062-3-dylany@fb.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Dylan Yudaken authored
Move validation to be more consistently straight after copy_from_user. This is already done in io_register_rsrc_update and so this removes that redundant check. Signed-off-by:
Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220412163042.2788062-2-dylany@fb.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
io-wq work cancellation path can't take uring_lock as how it's done on file assignment, we have to handle IO_WQ_WORK_CANCEL first, this fixes encountered hangs. Fixes: 6bf9c47a ("io_uring: defer file assignment") Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/0d9b9f37841645518503f6a207e509d14a286aba.1649773463.git.asml.silence@gmail.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- 11 Apr, 2022 4 commits
-
-
Jens Axboe authored
There are two reasons why this isn't the best idea: - It's an odd area to grab a bit of storage space, hence it's an odd area to grab storage from. - It puts the 3rd io_kiocb cacheline into the hot path, where normal hot path just needs the first two. Use 'cflags' for joint fd/cflags storage. We only need fd until we successfully issue, and we only need cflags once a request is done and is completed. Fixes: 6bf9c47a ("io_uring: defer file assignment") Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
In preparation for fixing a regression with pulling in an extra cacheline for IO that doesn't usually touch the last cacheline of the io_kiocb, move the cached location of apoll->events to space shared with some other completion data. Like cflags, this isn't used until after the request has been completed, so we can piggy back on top of comp_list. Fixes: 81459350 ("io_uring: cache req->apoll->events in req->cflags") Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
-1 tells use to use the current position, but we check if the file is a stream regardless of that. Fix up io_kiocb_update_pos() to only dip into file if we need to. This is both more efficient and also drops 12 bytes of text on aarch64 and 64 bytes on x86-64. Fixes: b4aec400 ("io_uring: do not recalculate ppos unnecessarily") Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
Give applications a way to tell if the kernel supports sane linked files, as in files being assigned at the right time to be able to reliably do <open file direct into slot X><read file from slot X> while using IOSQE_IO_LINK to order them. Not really a bug fix, but flag it as such so that it gets pulled in with backports of the deferred file assignment. Fixes: 6bf9c47a ("io_uring: defer file assignment") Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- 08 Apr, 2022 1 commit
-
-
Jens Axboe authored
io_flush_timeouts() assumes the timeout isn't in progress of triggering or being removed/canceled, so it unconditionally removes it from the timeout list and attempts to cancel it. Leave it on the list and let the normal timeout cancelation take care of it. Cc: stable@vger.kernel.org # 5.5+ Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- 07 Apr, 2022 9 commits
-
-
Pavel Begunkov authored
There are still several places that using pre array_index_nospec() indexes, fix them up. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b01ef5ee83f72ed35ad525912370b729f5d145f4.1649336342.git.asml.silence@gmail.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Automatically default rsrc tag in io_queue_rsrc_removal(), it's safer than leaving it there and relying on the rest of the code to behave and not use it. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1cf262a50df17478ea25b22494dcc19f3a80301f.1649336342.git.asml.silence@gmail.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
It's safer to not touch scm_fp_list after we queued an skb to which it was assigned, there might be races lurking if we screw subtle sync guarantees on the io_uring side. Fixes: 6b06314c ("io_uring: add file set registration") Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Don't forget to array_index_nospec() for indexes before updating rsrc tags in __io_sqe_files_update(), just use already safe and precalculated index @i. Fixes: c3bdad02 ("io_uring: add generic rsrc update with tags") Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Eugene Syromiatnikov authored
Similarly to the way it is done im mbind syscall. Cc: stable@vger.kernel.org # 5.14 Fixes: fe76421d ("io_uring: allow user configurable IO thread CPU affinity") Signed-off-by:
Eugene Syromiatnikov <esyr@redhat.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
This reverts commit adc8682e. There's some discussion on the API not being as good as it can be. Rather than ship something and be stuck with it forever, let's revert the NAPI support for now and work on getting something sorted out for the next kernel release instead. Link: https://lore.kernel.org/io-uring/b7bbc124-8502-0ee9-d4c8-7c41b4487264@kernel.dk/ Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
io_uring tracks requests that are referencing an io_uring descriptor to be able to cancel without worrying about loops in the references. Since we now assign the file at execution time, the easier approach is to drop a potentially problematic reference before we punt the request. This eliminates the need to special case these types of files beyond just marking them as such, and simplifies cancelation quite a bit. This also fixes a recent issue where an async punted tee operation would with the io_uring descriptor as the output file would crash when attempting to get a reference to the file from the io-wq worker. We could have worked around that, but this is the much cleaner fix. Fixes: 6bf9c47a ("io_uring: defer file assignment") Reported-by: syzbot+c4b9303500a21750b250@syzkaller.appspotmail.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
If an application uses direct open or accept, it knows in advance what direct descriptor value it will get as it picks it itself. This allows combined requests such as: sqe = io_uring_get_sqe(ring); io_uring_prep_openat_direct(sqe, ..., file_slot); sqe->flags |= IOSQE_IO_LINK | IOSQE_CQE_SKIP_SUCCESS; sqe = io_uring_get_sqe(ring); io_uring_prep_read(sqe,file_slot, buf, buf_size, 0); sqe->flags |= IOSQE_FIXED_FILE; io_uring_submit(ring); where we prepare both a file open and read, and only get a completion event for the read when both have completed successfully. Currently links are fully prepared before the head is issued, but that fails if the dependent link needs a file assigned that isn't valid until the head has completed. Conversely, if the same chain is performed but the fixed file slot is already valid, then we would be unexpectedly returning data from the old file slot rather than the newly opened one. Make sure we're consistent here. Allow deferral of file setup, which makes this documented case work. Cc: stable@vger.kernel.org # v5.15+ Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
We'll need this in a future patch, when we could be assigning the file after the prep stage. While at it, get rid of the io_file_get() helper, it just makes the code harder to read. Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- 04 Apr, 2022 2 commits
-
-
Jens Axboe authored
In preparation for not necessarily having a file assigned at prep time, defer any initialization associated with the file to when the opcode handler is run. Cc: stable@vger.kernel.org # v5.15+ Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
In preparation for not using the file at prep time, defer checking if this file refers to a valid io_uring instance until issue time. This also means we can get rid of the cleanup flag for splice and tee. Cc: stable@vger.kernel.org # v5.15+ Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- 03 Apr, 2022 1 commit
-
-
Jens Axboe authored
This is a leftover from the really old days where we weren't able to track and error early if we need a file and it wasn't assigned. Kill the check. Cc: stable@vger.kernel.org # v5.15+ Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- 29 Mar, 2022 2 commits
-
-
Jens Axboe authored
In preparation for not using the file at prep time, defer checking if this file refers to a valid io_uring instance until issue time. Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
We must always call req_set_fail() if the request is failed, otherwise we won't sever links for dependent chains correctly. Fixes: 4f57f06c ("io_uring: add support for IORING_OP_MSG_RING command") Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- 25 Mar, 2022 5 commits
-
-
Pavel Begunkov authored
When there are no files for __io_sqe_files_scm() to process in the range, it'll free everything and return. However, it forgets to put uid. Fixes: 08a45173 ("io_uring: allow sparse fixed file sets") Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/accee442376f33ce8aaebb099d04967533efde92.1648226048.git.asml.silence@gmail.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
io_put_kbuf_comp() should only be called while holding ->completion_lock, however there is no such assumption in io_clean_op() and thus it can corrupt ->io_buffer_comp. Take the lock there, and workaround the only user of io_clean_op() calling it with locks. Not the prettiest solution, but it's easier to refactor it for-next. Fixes: cc3cec83 ("io_uring: speedup provided buffer handling") Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/743e2130b73ec6d48c4c5dd15db896c433431e6d.1648212967.git.asml.silence@gmail.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
io_req_complete_failed() doesn't require callers to hold ->uring_lock, use IO_URING_F_UNLOCKED version of io_put_kbuf(). The only affected place is the fail path of io_apoll_task_func(). Also add a lockdep annotation to catch such bugs in the future. Fixes: 3b2b78a8 ("io_uring: extend provided buf return to fails") Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/ccf602dbf8df3b6a8552a262d8ee0a13a086fbc7.1648212967.git.asml.silence@gmail.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Move a misplaced comment about req->creds and add a line with assumptions about req->link. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1e51d1e6b1f3708c2d4127b4e371f9daa4c5f859.1648209006.git.asml.silence@gmail.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Dylan Yudaken authored
When polling sockets for accept, use EPOLLEXCLUSIVE. This is helpful when multiple accept SQEs are submitted. For O_NONBLOCK sockets multiple queued SQEs would previously have all completed at once, but most with -EAGAIN as the result. Now only one wakes up and completes. For sockets without O_NONBLOCK there is no user facing change, but internally the extra requests would previously be queued onto a worker thread as they would wake up with no connection waiting, and be punted. Now they do not wake up unnecessarily. Co-developed-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220325093755.4123343-1-dylany@fb.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-