An error occurred fetching the project authors.
- 31 May, 2022 7 commits
-
-
Xiaoguang Wang authored
io_file_bitmap_get() returns a free bitmap slot, but if it isn't used later, such as io_queue_rsrc_removal() returns error, in this case, we should not update alloc_hint at all, which still should be considered as a valid candidate for next io_file_bitmap_get() calls. To fix this issue, only update alloc_hint in io_file_bitmap_set(). Signed-off-by:
Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Link: https://lore.kernel.org/r/20220528015109.48039-1-xiaoguang.wang@linux.alibaba.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Xiaoguang Wang authored
io_fixed_fd_install() may fail for short of free fixed file bitmap, in this case, need to call fput() correspondingly. Signed-off-by:
Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Link: https://lore.kernel.org/r/20220527025400.51048-1-xiaoguang.wang@linux.alibaba.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
The socket support was merged in an earlier branch that didn't yet have support for allocating direct descriptors, hence only open and accept got support for that. Do the one-liner to enable it now, so we have consistent support for any request that can instantiate a file/direct descriptor. Reviewed-by:
Hao Xu <howeyxu@tencent.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
If we use a buffer group ID that is large enough to require io_uring to allocate it, then we don't correctly free it if the cleanup is deferred to the ring exit. The explicit removal paths are fine. Fixes: 9cfc7e94 ("io_uring: get rid of hashed provided buffer groups") Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
Gets rid of some ifdefs and enables use of the net defines for when CONFIG_NET isn't set. Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
Make them consistent in preparation for defining a req async prep handler. The readv/writev requests share a prep handler, move it one level down so the initial one is consistent with the others. Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
Define and set it when appropriate, and use it consistently in the function rather than using io_op_defs[opcode]. Reviewed-by:
Kanchan Joshi <joshi.k@samsung.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- 25 May, 2022 2 commits
-
-
Jens Axboe authored
Almost all of them are, the odd ones out are the poll remove and the files update request. Name them like the others, which is: io_#cmdname_prep for request preparation io_#cmdname for request issue Reviewed-by:
Kanchan Joshi <joshi.k@samsung.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
All other opcodes take a {req, sqe} set for prep handling, split out a timeout prep handler so that timeout and linked timeouts can use the same one. Reviewed-by:
Kanchan Joshi <joshi.k@samsung.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- 21 May, 2022 1 commit
-
-
Jens Axboe authored
Rather than pass in a bool for whether or not this work item needs to go into the priority list or not, provide separate helpers for it. For most use cases, this also then gets rid of the branch for non-priority task work. While at it, rename the prior_task_list to prio_task_list. Prior is a confusing name for it, as it would seem to indicate that this is the previous task_work list. prio makes it clear that this is a priority task_work list. Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- 18 May, 2022 13 commits
-
-
Jens Axboe authored
It's nonsensical to register a provided buffer ring, if a classic provided buffer group with the same ID exists. Depending on the order of which we decide what type to pick, the other type will never get used. Explicitly disallow it and return an error if this is attempted. Fixes: c7fb1942 ("io_uring: add support for ring mapped supplied buffers") Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
We use ->buf_pages != 0 to tell if this is a shared buffer ring or a classic provided buffer group. If we unregister the shared ring and then attempt to use it, buf_pages is zero yet the classic list head isn't properly initialized. This causes io_buffer_select() to think that we have classic buffers available, but then we crash when we try and get one from the list. Just initialize the list if we unregister a shared buffer ring, leaving it in a sane state for either re-registration or for attempting to use it. And do the same for the initial setup from the classic path. Fixes: c7fb1942 ("io_uring: add support for ring mapped supplied buffers") Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Honour IORING_RSRC_REGISTER_SPARSE not only for direct files but fixed buffers as well. It makes the rsrc API more consistent. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/66f429e4912fe39fb3318217ff33a2853d4544be.1652879898.git.asml.silence@gmail.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Accessing the file table needs a rcu_dereference_protected(). Signed-off-by:
Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220518084005.3255380-7-hch@lst.deSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
POLL* are unannotated values for the userspace ABI, while everything in-kernel should use EPOLL* and the __poll_t type. Signed-off-by:
Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220518084005.3255380-6-hch@lst.deSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
apoll_events is fed to vfs_poll and the poll tables, so it should be a __poll_t. Signed-off-by:
Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220518084005.3255380-5-hch@lst.deSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
io_file_get_normal isn't marked inline, so don't claim it as such in the forward declaration. Signed-off-by:
Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220518084005.3255380-4-hch@lst.deSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
ERR_PTR abuses the high bits of a pointer to transport error information. This is only safe for kernel pointers and not user pointers. Fix io_buffer_select and its helpers to just return NULL for failure and get rid of this abuse. Signed-off-by:
Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220518084005.3255380-3-hch@lst.deSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Use the proper type. Signed-off-by:
Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220518084005.3255380-2-hch@lst.deSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
Provided buffers allow an application to supply io_uring with buffers that can then be grabbed for a read/receive request, when the data source is ready to deliver data. The existing scheme relies on using IORING_OP_PROVIDE_BUFFERS to do that, but it can be difficult to use in real world applications. It's pretty efficient if the application is able to supply back batches of provided buffers when they have been consumed and the application is ready to recycle them, but if fragmentation occurs in the buffer space, it can become difficult to supply enough buffers at the time. This hurts efficiency. Add a register op, IORING_REGISTER_PBUF_RING, which allows an application to setup a shared queue for each buffer group of provided buffers. The application can then supply buffers simply by adding them to this ring, and the kernel can consume then just as easily. The ring shares the head with the application, the tail remains private in the kernel. Provided buffers setup with IORING_REGISTER_PBUF_RING cannot use IORING_OP_{PROVIDE,REMOVE}_BUFFERS for adding or removing entries to the ring, they must use the mapped ring. Mapped provided buffer rings can co-exist with normal provided buffers, just not within the same group ID. To gauge overhead of the existing scheme and evaluate the mapped ring approach, a simple NOP benchmark was written. It uses a ring of 128 entries, and submits/completes 32 at the time. 'Replenish' is how many buffers are provided back at the time after they have been consumed: Test Replenish NOPs/sec ================================================================ No provided buffers NA ~30M Provided buffers 32 ~16M Provided buffers 1 ~10M Ring buffers 32 ~27M Ring buffers 1 ~27M The ring mapped buffers perform almost as well as not using provided buffers at all, and they don't care if you provided 1 or more back at the same time. This means application can just replenish as they go, rather than need to batch and compact, further reducing overhead in the application. The NOP benchmark above doesn't need to do any compaction, so that overhead isn't even reflected in the above test. Co-developed-by:
Dylan Yudaken <dylany@fb.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
Abstract this out from io_sqe_buffer_register() so we can use it elsewhere too without duplicating this code. No intended functional changes in this patch. Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
Obviously not really useful since it's not transferring data, but it is helpful in benchmarking overhead of provided buffers. Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
io_provided_buffer_select() must drop the submit lock, if needed, even in the error handling case. Failure to do so will leave us with the ctx->uring_lock held, causing spew like: ==================================== WARNING: iou-wrk-366/368 still has locks held! 5.18.0-rc6-00294-gdf8dc7004331 #994 Not tainted ------------------------------------ 1 lock held by iou-wrk-366/368: #0: ffff0000c72598a8 (&ctx->uring_lock){+.+.}-{3:3}, at: io_ring_submit_lock+0x20/0x48 stack backtrace: CPU: 4 PID: 368 Comm: iou-wrk-366 Not tainted 5.18.0-rc6-00294-gdf8dc7004331 #994 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace.part.0+0xa4/0xd4 show_stack+0x14/0x5c dump_stack_lvl+0x88/0xb0 dump_stack+0x14/0x2c debug_check_no_locks_held+0x84/0x90 try_to_freeze.isra.0+0x18/0x44 get_signal+0x94/0x6ec io_wqe_worker+0x1d8/0x2b4 ret_from_fork+0x10/0x20 and triggering later hangs off get_signal() because we attempt to re-grab the lock. Reported-by: syzbot+987d7bb19195ae45208c@syzkaller.appspotmail.com Fixes: 149c69b0 ("io_uring: abstract out provided buffer list selection") Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- 17 May, 2022 1 commit
-
-
Jens Axboe authored
We gate whether to IOPOLL for a request on whether the opcode is allowed on a ring setup for IOPOLL and if it's got a file assigned. MSG_RING is the only one that allows a file yet isn't pollable, it's merely supported to allow communication on an IOPOLL ring, not because we can poll for completion of it. Put the assigned file early and clear it, so we don't attempt to poll for it. Reported-by: syzbot+1a0a53300ce782f8b3ad@syzkaller.appspotmail.com Fixes: 3f1d52ab ("io_uring: defer msg-ring file validity check until command issue") Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- 14 May, 2022 3 commits
-
-
Hao Xu authored
Refactor io_accept() to support multishot mode. theoretical analysis: 1) when connections come in fast - singleshot: add accept sqe(userspace) --> accept inline ^ | |-----------------| - multishot: add accept sqe(userspace) --> accept inline ^ | |--*--| we do accept repeatedly in * place until get EAGAIN 2) when connections come in at a low pressure similar thing like 1), we reduce a lot of userspace-kernel context switch and useless vfs_poll() tests: Did some tests, which goes in this way: server client(multiple) accept connect read write write read close close Basically, raise up a number of clients(on same machine with server) to connect to the server, and then write some data to it, the server will write those data back to the client after it receives them, and then close the connection after write return. Then the client will read the data and then close the connection. Here I test 10000 clients connect one server, data size 128 bytes. And each client has a go routine for it, so they come to the server in short time. test 20 times before/after this patchset, time spent:(unit cycle, which is the return value of clock()) before: 1930136+1940725+1907981+1947601+1923812+1928226+1911087+1905897+1941075 +1934374+1906614+1912504+1949110+1908790+1909951+1941672+1969525+1934984 +1934226+1914385)/20.0 = 1927633.75 after: 1858905+1917104+1895455+1963963+1892706+1889208+1874175+1904753+1874112 +1874985+1882706+1884642+1864694+1906508+1916150+1924250+1869060+1889506 +1871324+1940803)/20.0 = 1894750.45 (1927633.75 - 1894750.45) / 1927633.75 = 1.65% Signed-off-by:
Hao Xu <howeyxu@tencent.com> Link: https://lore.kernel.org/r/20220514142046.58072-5-haoxu.linux@gmail.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Hao Xu authored
For operations like accept, multishot is a useful feature, since we can reduce a number of accept sqe. Let's integrate it to fast poll, it may be good for other operations in the future. Signed-off-by:
Hao Xu <howeyxu@tencent.com> Link: https://lore.kernel.org/r/20220514142046.58072-4-haoxu.linux@gmail.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Hao Xu authored
Add a flag to indicate multishot mode for fast poll. currently only accept use it, but there may be more operations leveraging it in the future. Also add a mask IO_APOLL_MULTI_POLLED which stands for REQ_F_APOLL_MULTI | REQ_F_POLLED, to make the code short and cleaner. Signed-off-by:
Hao Xu <howeyxu@tencent.com> Link: https://lore.kernel.org/r/20220514142046.58072-3-haoxu.linux@gmail.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
- 13 May, 2022 8 commits
-
-
Dylan Yudaken authored
The check for waking up a request compares the poll_t bits, however this will always contain some common flags so this always wakes up. For files with single wait queues such as sockets this can cause the request to be sent to the async worker unnecesarily. Further if it is non-blocking will complete the request with EAGAIN which is not desired. Here exclude these common events, making sure to not exclude POLLERR which might be important. Fixes: d7718a9d ("io_uring: use poll driven retry for files that support it") Signed-off-by:
Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220512091834.728610-3-dylany@fb.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
If an opcode handler semi-reliably returns -EAGAIN, io_wq_submit_work() might continue busily hammer the same handler over and over again, which is not ideal. The -EAGAIN handling in question was put there only for IOPOLL, so restrict it to IOPOLL mode only where there is no other recourse than to retry as we cannot wait. Fixes: def596e9 ("io_uring: support for IO polling") Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/f168b4f24181942f3614dd8ff648221736f572e6.1652433740.git.asml.silence@gmail.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
Currently to setup a fully sparse descriptor space upfront, the app needs to alloate an array of the full size and memset it to -1 and then pass that in. Make this a bit easier by allowing a flag that simply does this internally rather than needing to copy each slot separately. This works with IORING_REGISTER_FILES2 as the flag is set in struct io_uring_rsrc_register, and is only allow when the type is IORING_RSRC_FILE as this doesn't make sense for registered buffers. Reviewed-by:
Hao Xu <howeyxu@tencent.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
We currently limit these to 32K, but since we're now backing the table space with vmalloc when needed, there's no reason why we can't make it bigger. The total space is limited by RLIMIT_NOFILE as well. Reviewed-by:
Hao Xu <howeyxu@tencent.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
If the application passes in IORING_FILE_INDEX_ALLOC as the file_slot, then that's a hint to allocate a fixed file descriptor rather than have one be passed in directly. This can be useful for having io_uring manage the direct descriptor space, and also allows multi-shot support to work with fixed files. Normal accept direct requests will complete with 0 for success, and < 0 in case of error. If io_uring is asked to allocated the direct descriptor, then the direct descriptor is returned in case of success. Reviewed-by:
Hao Xu <howeyxu@tencent.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
If the application passes in IORING_FILE_INDEX_ALLOC as the file_slot, then that's a hint to allocate a fixed file descriptor rather than have one be passed in directly. This can be useful for having io_uring manage the direct descriptor space. Normal open direct requests will complete with 0 for success, and < 0 in case of error. If io_uring is asked to allocated the direct descriptor, then the direct descriptor is returned in case of success. Reviewed-by:
Hao Xu <howeyxu@tencent.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
Applications currently always pick where they want fixed files to go. In preparation for allowing these types of commands with multishot support, add a basic allocator in the fixed file table. Reviewed-by:
Hao Xu <howeyxu@tencent.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
In preparation for adding a basic allocator for direct descriptors, add helpers that set/clear whether a file slot is used. Reviewed-by:
Hao Xu <howeyxu@tencent.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- 11 May, 2022 1 commit
-
-
Jens Axboe authored
file_operations->uring_cmd is a file private handler. This is somewhat similar to ioctl but hopefully a lot more sane and useful as it can be used to enable many io_uring capabilities for the underlying operation. IORING_OP_URING_CMD is a file private kind of request. io_uring doesn't know what is in this command type, it's for the provider of ->uring_cmd() to deal with. Co-developed-by:
Kanchan Joshi <joshi.k@samsung.com> Signed-off-by:
Kanchan Joshi <joshi.k@samsung.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220511054750.20432-2-joshi.k@samsung.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
- 09 May, 2022 4 commits
-
-
Stefan Roesch authored
This adds support for filling the extra1 and extra2 fields for large CQE's. Co-developed-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Stefan Roesch <shr@fb.com> Reviewed-by:
Kanchan Joshi <joshi.k@samsung.com> Link: https://lore.kernel.org/r/20220426182134.136504-13-shr@fb.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Stefan Roesch authored
This enables large CQE's in the uring setup. Co-developed-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Stefan Roesch <shr@fb.com> Reviewed-by:
Kanchan Joshi <joshi.k@samsung.com> Link: https://lore.kernel.org/r/20220426182134.136504-12-shr@fb.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Stefan Roesch authored
This exposes the extra1 and extra2 fields in the /proc output. Signed-off-by:
Stefan Roesch <shr@fb.com> Reviewed-by:
Kanchan Joshi <joshi.k@samsung.com> Link: https://lore.kernel.org/r/20220426182134.136504-11-shr@fb.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Stefan Roesch authored
This adds tracing for the extra1 and extra2 fields. Co-developed-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Stefan Roesch <shr@fb.com> Reviewed-by:
Kanchan Joshi <joshi.k@samsung.com> Link: https://lore.kernel.org/r/20220426182134.136504-10-shr@fb.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-