Commits · 3fe07bcd800d6e5e4e4263ca2564d69095c157bf · Kirill Smelkov / linux

21 May, 2022 1 commit

io_uring: cleanup handling of the two task_work lists · 3fe07bcd

Jens Axboe authored May 21, 2022

Rather than pass in a bool for whether or not this work item needs to go
into the priority list or not, provide separate helpers for it. For most
use cases, this also then gets rid of the branch for non-priority task
work.

While at it, rename the prior_task_list to prio_task_list. Prior is
a confusing name for it, as it would seem to indicate that this is the
previous task_work list. prio makes it clear that this is a priority
task_work list.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

3fe07bcd

20 May, 2022 2 commits

nvme: enable uring-passthrough for admin commands · 58e5bdeb

Kanchan Joshi authored May 20, 2022

Add two new opcodes that userspace can use for admin commands:
NVME_URING_CMD_ADMIN : non-vectroed
NVME_URING_CMD_ADMIN_VEC : vectored variant

Wire up support when these are issued on controller node(/dev/nvmeX).
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220520090630.70394-3-joshi.k@samsung.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

58e5bdeb

nvme: helper for uring-passthrough checks · 00fc2eeb

Kanchan Joshi authored May 20, 2022

Factor out a helper consolidating the error checks, and fix typo in a
comment too. This is in preparation to support admin commands on this
path.
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220520090630.70394-2-joshi.k@samsung.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

00fc2eeb

12 May, 2022 1 commit

blk-mq: fix passthrough plugging · a327c341

Ming Lei authored May 12, 2022

First we can't add request into plug list in blk_mq_request_bypass_insert
which may be called when flushing plug list, so nested plug is caused.

Second if polled passthrough request is inserted via blk_execute_rq(),
it can't be added to plug list too since io polling needs the request
to be issued to driver.

Fixes the two by moving plugging into blk_execute_rq_no_wait().

Cc: Christoph Hellwig <hch@lst.de>
Fixes: 1c2d2fff ("block: wire-up support for passthrough plugging")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20220512140010.1458645-1-ming.lei@redhat.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

a327c341

11 May, 2022 5 commits

nvme: add vectored-io support for uring-cmd · f569add4

Anuj Gupta authored May 11, 2022

wire up support for async passthru that takes an array of buffers (using
iovec). Exposed via a new op NVME_URING_CMD_IO_VEC. Same 'struct
nvme_uring_cmd' is to be used with -

1. cmd.addr as base address of user iovec array
2. cmd.data_len as count of iovec array elements
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220511054750.20432-6-joshi.k@samsung.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

f569add4

nvme: wire-up uring-cmd support for io-passthru on char-device. · 456cba38

Kanchan Joshi authored May 11, 2022

Introduce handler for fops->uring_cmd(), implementing async passthru
on char device (/dev/ngX). The handler supports newly introduced
operation NVME_URING_CMD_IO. This operates on a new structure
nvme_uring_cmd, which is similar to struct nvme_passthru_cmd64 but
without the embedded 8b result field. This field is not needed since
uring-cmd allows to return additional result via big-CQE.
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220511054750.20432-5-joshi.k@samsung.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

456cba38

nvme: refactor nvme_submit_user_cmd() · bcad2565

Christoph Hellwig authored May 11, 2022

Divide the work into two helpers, namely nvme_alloc_user_request and
nvme_execute_user_rq. This is a prep patch, to help wiring up
uring-cmd support in nvme.
Signed-off-by: Christoph Hellwig <hch@lst.de>
[axboe: fold in fix for assuming bio is non-NULL]
Link: https://lore.kernel.org/r/20220511054750.20432-4-joshi.k@samsung.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

bcad2565

block: wire-up support for passthrough plugging · 1c2d2fff

Jens Axboe authored May 11, 2022

Add support for plugging in passthrough path. When plugging is enabled, the
requests are added to a plug instead of getting dispatched to the driver.
And when the plug is finished, the whole batch gets dispatched via
->queue_rqs which turns out to be more efficient. Otherwise dispatching
used to happen via ->queue_rq, one request at a time.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220511054750.20432-3-joshi.k@samsung.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

1c2d2fff

fs,io_uring: add infrastructure for uring-cmd · ee692a21

Jens Axboe authored May 11, 2022

file_operations->uring_cmd is a file private handler.
This is somewhat similar to ioctl but hopefully a lot more sane and
useful as it can be used to enable many io_uring capabilities for the
underlying operation.

IORING_OP_URING_CMD is a file private kind of request. io_uring doesn't
know what is in this command type, it's for the provider of ->uring_cmd()
to deal with.
Co-developed-by: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220511054750.20432-2-joshi.k@samsung.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

ee692a21

09 May, 2022 26 commits

io_uring: support CQE32 for nop operation · 2bb04df7

Stefan Roesch authored Apr 26, 2022

This adds support for filling the extra1 and extra2 fields for large
CQE's.
Co-developed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Stefan Roesch <shr@fb.com>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Link: https://lore.kernel.org/r/20220426182134.136504-13-shr@fb.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

2bb04df7

io_uring: enable CQE32 · 76c68fbf

Stefan Roesch authored Apr 26, 2022

This enables large CQE's in the uring setup.
Co-developed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Stefan Roesch <shr@fb.com>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Link: https://lore.kernel.org/r/20220426182134.136504-12-shr@fb.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

76c68fbf

io_uring: support CQE32 in /proc info · f9b3dfcc

Stefan Roesch authored Apr 26, 2022

This exposes the extra1 and extra2 fields in the /proc output.
Signed-off-by: Stefan Roesch <shr@fb.com>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Link: https://lore.kernel.org/r/20220426182134.136504-11-shr@fb.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

f9b3dfcc

io_uring: add tracing for additional CQE32 fields · c4bb964f

Stefan Roesch authored Apr 26, 2022

This adds tracing for the extra1 and extra2 fields.
Co-developed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Stefan Roesch <shr@fb.com>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Link: https://lore.kernel.org/r/20220426182134.136504-10-shr@fb.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

c4bb964f

io_uring: overflow processing for CQE32 · e45a3e05

Stefan Roesch authored Apr 26, 2022

This adds the overflow processing for large CQE's.

This adds two parameters to the io_cqring_event_overflow function and
uses these fields to initialize the large CQE fields.

Allocate enough space for large CQE's in the overflow structue. If no
large CQE's are used, the size of the allocation is unchanged.

The cqe field can have a different size depending if its a large
CQE or not. To be able to allocate different sizes, the two fields
in the structure are re-ordered.
Co-developed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Stefan Roesch <shr@fb.com>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Link: https://lore.kernel.org/r/20220426182134.136504-9-shr@fb.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

e45a3e05

io_uring: flush completions for CQE32 · 0e2e5c47

Stefan Roesch authored Apr 26, 2022

This flushes the completions according to their CQE type: the same
processing is done for the default CQE size, but for large CQE's the
extra1 and extra2 fields are filled in.
Signed-off-by: Stefan Roesch <shr@fb.com>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Link: https://lore.kernel.org/r/20220426182134.136504-8-shr@fb.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

0e2e5c47

io_uring: modify io_get_cqe for CQE32 · 2fee6bc6

Stefan Roesch authored Apr 26, 2022

Modify accesses to the CQE array to take large CQE's into account. The
index needs to be shifted by one for large CQE's.
Signed-off-by: Stefan Roesch <shr@fb.com>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Link: https://lore.kernel.org/r/20220426182134.136504-7-shr@fb.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

2fee6bc6

io_uring: add CQE32 completion processing · effcf8bd

Stefan Roesch authored Apr 26, 2022

This adds the completion processing for the large CQE's and makes sure
that the extra1 and extra2 fields are passed through.
Co-developed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Stefan Roesch <shr@fb.com>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Link: https://lore.kernel.org/r/20220426182134.136504-6-shr@fb.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

effcf8bd

io_uring: add CQE32 setup processing · 91658798

Stefan Roesch authored Apr 26, 2022

This adds two new function to setup and fill the CQE32 result structure.
Signed-off-by: Stefan Roesch <shr@fb.com>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Link: https://lore.kernel.org/r/20220426182134.136504-5-shr@fb.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

91658798

io_uring: change ring size calculation for CQE32 · baf9cb64

Stefan Roesch authored Apr 26, 2022

This changes the function rings_size to take large CQE's into account.
Co-developed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Stefan Roesch <shr@fb.com>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Link: https://lore.kernel.org/r/20220426182134.136504-4-shr@fb.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

baf9cb64

io_uring: store add. return values for CQE32 · 4e5bc0a9

Stefan Roesch authored Apr 26, 2022

This reuses the hash list node for the storage we need to hold the two
64-bit values that must be passed back.
Co-developed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Stefan Roesch <shr@fb.com>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Link: https://lore.kernel.org/r/20220426182134.136504-3-shr@fb.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

4e5bc0a9

io_uring: support CQE32 in io_uring_cqe · 7a51e5b4

Stefan Roesch authored Apr 26, 2022

This adds the big_cqe array to the struct io_uring_cqe to support large
CQE's.
Co-developed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Stefan Roesch <shr@fb.com>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Link: https://lore.kernel.org/r/20220426182134.136504-2-shr@fb.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

7a51e5b4

io_uring: add support for 128-byte SQEs · ebdeb7c0

Jens Axboe authored Mar 31, 2022

Normal SQEs are 64-bytes in length, which is fine for all the commands
we support. However, in preparation for supporting passthrough IO,
provide an option for setting up a ring with 128-byte SQEs.

We continue to use the same type for io_uring_sqe, it's marked and
commented with a zero sized array pad at the end. This provides up
to 80 bytes of data for a passthrough command - 64 bytes for the
extra added data, and 16 bytes available at the end of the existing
SQE.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

ebdeb7c0

Merge branch 'for-5.19/io_uring-socket' into for-5.19/io_uring-passthrough · b5ba65df

Jens Axboe authored May 09, 2022

* for-5.19/io_uring-socket:
  io_uring: use the text representation of ops in trace
  io_uring: rename op -> opcode
  io_uring: add io_uring_get_opcode
  io_uring: add type to op enum
  io_uring: add socket(2) support
  net: add __sys_socket_file()
  io_uring: fix trace for reduced sqe padding
  io_uring: add fgetxattr and getxattr support
  io_uring: add fsetxattr and setxattr support
  fs: split off do_getxattr from getxattr
  fs: split off setxattr_copy and do_setxattr function from setxattr

b5ba65df

Merge branch 'for-5.19/io_uring' into for-5.19/io_uring-passthrough · 13086899

Jens Axboe authored May 09, 2022

* for-5.19/io_uring: (85 commits)
  io_uring: don't clear req->kbuf when buffer selection is done
  io_uring: eliminate the need to track provided buffer ID separately
  io_uring: move provided buffer state closer to submit state
  io_uring: move provided and fixed buffers into the same io_kiocb area
  io_uring: abstract out provided buffer list selection
  io_uring: never call io_buffer_select() for a buffer re-select
  io_uring: get rid of hashed provided buffer groups
  io_uring: always use req->buf_index for the provided buffer group
  io_uring: ignore ->buf_index if REQ_F_BUFFER_SELECT isn't set
  io_uring: kill io_rw_buffer_select() wrapper
  io_uring: make io_buffer_select() return the user address directly
  io_uring: kill io_recv_buffer_select() wrapper
  io_uring: use 'sr' vs 'req->sr_msg' consistently
  io_uring: add POLL_FIRST support for send/sendmsg and recv/recvmsg
  io_uring: check IOPOLL/ioprio support upfront
  io_uring: replace smp_mb() with smp_mb__after_atomic() in io_sq_thread()
  io_uring: add IORING_SETUP_TASKRUN_FLAG
  io_uring: use TWA_SIGNAL_NO_IPI if IORING_SETUP_COOP_TASKRUN is used
  io_uring: set task_work notify method at init time
  io-wq: use __set_notify_signal() to wake workers
  ...

13086899

io_uring: don't clear req->kbuf when buffer selection is done · 7ccba24d

Jens Axboe authored May 01, 2022

It's not needed as the REQ_F_BUFFER_SELECTED flag tracks the state of
whether or not kbuf is valid, so just drop it.
Suggested-by: Dylan Yudaken <dylany@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

7ccba24d

io_uring: eliminate the need to track provided buffer ID separately · 1dbd023e

Jens Axboe authored May 01, 2022

We have io_kiocb->buf_index which is used for either fixed buffers, or
for provided buffers. For the latter, it's used to hold the buffer group
ID for buffer selection. Post selection, req->kbuf->bid is used to get
the buffer ID.

Store the buffer ID, when selected, in req->buf_index. If we do end up
recycling the buffer, reset it back to the buffer group ID.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

1dbd023e

io_uring: move provided buffer state closer to submit state · 660cbfa2

Jens Axboe authored May 01, 2022

The timeout and other items that follow are less hot, so let's move the
provided buffer state above that.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

660cbfa2

io_uring: move provided and fixed buffers into the same io_kiocb area · a4f8d94c

Jens Axboe authored Apr 30, 2022

These are mutually exclusive - if you use provided buffers, then you
cannot use fixed buffers and vice versa. Move them into the same spot
in the io_kiocb, which is also advantageous for provided buffers as
they get near the submit side hot cacheline.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

a4f8d94c

io_uring: abstract out provided buffer list selection · 149c69b0

Jens Axboe authored Apr 30, 2022

In preparation for providing another way to select a buffer, move the
existing logic into a helper.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

149c69b0

io_uring: never call io_buffer_select() for a buffer re-select · b66e65f4

Jens Axboe authored Apr 30, 2022

Callers already have room to store the addr and length information,
clean it up by having the caller just assign the previously provided
data.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

b66e65f4

io_uring: get rid of hashed provided buffer groups · 9cfc7e94

Jens Axboe authored May 01, 2022

Use a plain array for any group ID that's less than 64, and punt
anything beyond that to an xarray. 64 fits in a page even for 4KB
page sizes and with the planned additions.

This makes the expected group usage faster by avoiding a hash and lookup
to find our list, and it uses less memory upfront by not allocating any
memory for provided buffers unless it's actually being used.
Suggested-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

9cfc7e94

io_uring: always use req->buf_index for the provided buffer group · 4e906702

Jens Axboe authored Apr 28, 2022

The read/write opcodes use it already, but the recv/recvmsg do not. If
we switch them over and read and validate this at init time while we're
checking if the opcode supports it anyway, then we can do it in one spot
and we don't have to pass in a separate group ID for io_buffer_select().
Signed-off-by: Jens Axboe <axboe@kernel.dk>

4e906702

io_uring: ignore ->buf_index if REQ_F_BUFFER_SELECT isn't set · bb68d504

Jens Axboe authored Apr 29, 2022

There's no point in validity checking buf_index if the request doesn't
have REQ_F_BUFFER_SELECT set, as we will never use it for that case.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

bb68d504

io_uring: kill io_rw_buffer_select() wrapper · e5b00349

Jens Axboe authored Apr 28, 2022

After the recent changes, this is direct call to io_buffer_select()
anyway. With this change, there are no wrappers left for provided
buffer selection.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

e5b00349

io_uring: make io_buffer_select() return the user address directly · c54d52c2

Jens Axboe authored Apr 28, 2022

There's no point in having callers provide a kbuf, we're just returning
the address anyway.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

c54d52c2

08 May, 2022 5 commits

Linux 5.18-rc6 · c5eb0a61
Linus Torvalds authored May 08, 2022

c5eb0a61

Merge tag 'for-5.18/parisc-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux · f002488d

Linus Torvalds authored May 08, 2022

Pull parisc architecture fixes from Helge Deller:
 "Some reverts of existing patches, which were necessary because of boot
  issues due to wrong CPU clock handling and cache issues which led to
  userspace segfaults with 32bit kernels. Dave has a whole bunch of
  upcoming cache fixes which I then plan to push in the next merge
  window.

  Other than that just small updates and fixes, e.g. defconfig updates,
  spelling fixes, a clocksource fix, boot topology fixes and a fix for
  /proc/cpuinfo output to satisfy lscpu"

* tag 'for-5.18/parisc-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  Revert "parisc: Increase parisc_cache_flush_threshold setting"
  parisc: Mark cr16 clock unstable on all SMP machines
  parisc: Fix typos in comments
  parisc: Change MAX_ADDRESS to become unsigned long long
  parisc: Merge model and model name into one line in /proc/cpuinfo
  parisc: Re-enable GENERIC_CPU_DEVICES for !SMP
  parisc: Update 32- and 64-bit defconfigs
  parisc: Only list existing CPUs in cpu_possible_mask
  Revert "parisc: Fix patch code locking and flushing"
  Revert "parisc: Mark sched_clock unstable only if clocks are not syncronized"
  Revert "parisc: Mark cr16 CPU clocksource unstable on all SMP machines"

f002488d

Merge tag 'powerpc-5.18-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · e3de3a1c

Linus Torvalds authored May 08, 2022

Pull powerpc fixes from Michael Ellerman:

 - Fix the DWARF CFI in our VDSO time functions, allowing gdb to
   backtrace through them correctly.

 - Fix a buffer overflow in the papr_scm driver, only triggerable by
   hypervisor input.

 - A fix in the recently added QoS handling for VAS (used for
   communicating with coprocessors).

Thanks to Alan Modra, Haren Myneni, Kajol Jain, and Segher Boessenkool.

* tag 'powerpc-5.18-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/papr_scm: Fix buffer overflow issue with CONFIG_FORTIFY_SOURCE
  powerpc/vdso: Fix incorrect CFI in gettimeofday.S
  powerpc/pseries/vas: Use QoS credits from the userspace

e3de3a1c

Merge tag 'x86-urgent-2022-05-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 27b5d61c

Linus Torvalds authored May 08, 2022

Pull x86 fix from Thomas Gleixner:
 "A fix and an email address update:

   - Prevent FPU state corruption.

     The condition in irq_fpu_usable() grants FPU usage when the FPU is
     not used in the kernel. That's just wrong as it does not take the
     fpregs_lock()'ed regions into account. If FPU usage happens within
     such a region from interrupt context, then the FPU state gets
     corrupted.

     That's a long standing bug, which got unearthed by the recent
     changes to the random code.

   - Josh wants to use his kernel.org email address"

* tag 'x86-urgent-2022-05-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/fpu: Prevent FPU state corruption
  MAINTAINERS: Update Josh Poimboeuf's email address

27b5d61c

Merge tag 'timers-urgent-2022-05-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · ea82593b

Linus Torvalds authored May 08, 2022

Pull timer fix from Thomas Gleixner:
 "A fix and an email address update:

   - Mark the NMI safe time accessors notrace to prevent tracer
     recursion when they are selected as trace clocks.

   - John Stultz has a new email address"

* tag 'timers-urgent-2022-05-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timekeeping: Mark NMI safe time accessors as notrace
  MAINTAINERS: Update email address for John Stultz

ea82593b