Commits · 7b53d59859bc932b37895d2d37388e7fa29af7a5 · Kirill Smelkov / linux

30 May, 2020 3 commits

io_uring: fix overflowed reqs cancellation · 7b53d598

Pavel Begunkov authored May 30, 2020

Overflowed requests in io_uring_cancel_files() should be shed only of
inflight and overflowed refs. All other left references are owned by
someone else.

If refcount_sub_and_test() fails, it will go further and put put extra
ref, don't do that. Also, don't need to do io_wq_cancel_work()
for overflowed reqs, they will be let go shortly anyway.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

7b53d598

io_uring: off timeouts based only on completions · bfe68a22

Pavel Begunkov authored May 30, 2020

Offset timeouts wait not for sqe->off non-timeout CQEs, but rather
sqe->off + number of prior inflight requests. Wait exactly for
sqe->off non-timeout completions
Reported-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

bfe68a22

io_uring: move timeouts flushing to a helper · 360428f8

Pavel Begunkov authored May 30, 2020

Separate flushing offset timeouts io_commit_cqring() by moving it into a
helper. Just a preparation, makes following patches clearer.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

360428f8

26 May, 2020 9 commits

statx: hide interfaces no longer used by io_uring · 6f88cc17

Bijan Mottahedeh authored May 22, 2020

The io_uring interfaces have been replaced by do_statx() and are no
longer needed.
Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

6f88cc17

io_uring: call statx directly · e62753e4

Bijan Mottahedeh authored May 22, 2020

Calling statx directly both simplifies the interface and avoids potential
incompatibilities between sync and async invokations.
Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

e62753e4

statx: allow system call to be invoked from io_uring · 0018784f

Bijan Mottahedeh authored May 22, 2020

This is a prepatory patch to allow io_uring to invoke statx directly.
Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

0018784f

io_uring: add io_statx structure · 1d9e1288

Bijan Mottahedeh authored May 22, 2020

Separate statx data from open in io_kiocb. No functional changes.
Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

1d9e1288

io_uring: get rid of manual punting in io_close · 0bf0eefd

Pavel Begunkov authored May 26, 2020

io_close() was punting async manually to skip grabbing files. Use
REQ_F_NO_FILE_TABLE instead, and pass it through the generic path
with -EAGAIN.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

0bf0eefd

io_uring: separate DRAIN flushing into a cold path · 04518945

Pavel Begunkov authored May 26, 2020

io_commit_cqring() assembly doesn't look good with extra code handling
drained requests. IOSQE_IO_DRAIN is slow and discouraged to be used in
a hot path, so try to minimise its impact by putting it into a helper
and doing a fast check.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

04518945

io_uring: don't re-read sqe->off in timeout_prep() · 56080b02

Pavel Begunkov authored May 26, 2020

SQEs are user writable, don't read sqe->off twice in io_timeout_prep()
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

56080b02

io_uring: simplify io_timeout locking · 733f5c95

Pavel Begunkov authored May 26, 2020

Move spin_lock_irq() earlier to have only 1 call site of it in
io_timeout(). It makes the flow easier.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

733f5c95

io_uring: fix flush req->refs underflow · 4518a3cc

Pavel Begunkov authored May 26, 2020

In io_uring_cancel_files(), after refcount_sub_and_test() leaves 0
req->refs, it calls io_put_req(), which would also put a ref. Call
io_free_req() instead.

Cc: stable@vger.kernel.org
Fixes: 2ca10259 ("io_uring: prune request from overflow list on flush")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

4518a3cc

20 May, 2020 1 commit

io_uring: don't submit sqes when ctx->refs is dying · 6b668c9b

Xiaoguang Wang authored May 20, 2020

When IORING_SETUP_SQPOLL is enabled, io_ring_ctx_wait_and_kill() will wait
for sq thread to idle by busy loop:

    while (ctx->sqo_thread && !wq_has_sleeper(&ctx->sqo_wait))
        cond_resched();

Above loop isn't very CPU friendly, it may introduce a short cpu burst on
the current cpu.

If ctx->refs is dying, we forbid sq_thread from submitting any further
SQEs. Instead they just get discarded when we exit.
Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

6b668c9b

17 May, 2020 7 commits

io_uring: async task poll trigger cleanup · 31067255

Jens Axboe authored May 17, 2020

If the request is still hashed in io_async_task_func(), then it cannot
have been canceled and it's pointless to check. So save that check.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

31067255

io_uring: add tee(2) support · f2a8d5c7

Pavel Begunkov authored May 17, 2020

Add IORING_OP_TEE implementing tee(2) support. Almost identical to
splice bits, but without offsets.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

f2a8d5c7

splice: export do_tee() · 9dafdfc2

Pavel Begunkov authored May 17, 2020

export do_tee() for use in io_uring
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

9dafdfc2

io_uring: don't repeat valid flag list · c11368a5

Pavel Begunkov authored May 17, 2020

req->flags stores all sqe->flags. After checking that sqe->flags are
valid set if IOSQE* flags, no need to double check it, just forward them
all.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

c11368a5

io_uring: rename io_file_put() · 9f13c35b

Pavel Begunkov authored May 17, 2020

io_file_put() deals with flushing state's file refs, adding "state" to
its name makes it a bit clearer. Also, avoid double check of
state->file in __io_file_get() in some cases.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

9f13c35b

io_uring: remove req->needs_fixed_files · 0cdaf760

Pavel Begunkov authored May 17, 2020

A submission is "async" IIF it's done by SQPOLL thread. Instead of
passing @async flag into io_submit_sqes(), deduce it from ctx->flags.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

0cdaf760

io_uring: cleanup io_poll_remove_one() logic · 3bfa5bcb

Jens Axboe authored May 17, 2020

We only need apoll in the one section, do the juggling with the work
restoration there. This removes a special case further down as well.

No functional changes in this patch.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

3bfa5bcb

15 May, 2020 5 commits

io_uring: file registration list and lock optimization · 6a4d07cd

Jens Axboe authored May 15, 2020

There's no point in using list_del_init() on entries that are going
away, and the associated lock is always used in process context so
let's not use the IRQ disabling+saving variant of the spinlock.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

6a4d07cd

io_uring: add IORING_CQ_EVENTFD_DISABLED to the CQ ring flags · 7e55a19c

Stefano Garzarella authored May 15, 2020

This new flag should be set/clear from the application to
disable/enable eventfd notifications when a request is completed
and queued to the CQ ring.

Before this patch, notifications were always sent if an eventfd is
registered, so IORING_CQ_EVENTFD_DISABLED is not set during the
initialization.

It will be up to the application to set the flag after initialization
if no notifications are required at the beginning.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

7e55a19c

io_uring: add 'cq_flags' field for the CQ ring · 0d9b5b3a

Stefano Garzarella authored May 15, 2020

This patch adds the new 'cq_flags' field that should be written by
the application and read by the kernel.

This new field is available to the userspace application through
'cq_off.flags'.
We are using 4-bytes previously reserved and set to zero. This means
that if the application finds this field to zero, then the new
functionality is not supported.

In the next patch we will introduce the first flag available.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

0d9b5b3a

io_uring: allow POLL_ADD with double poll_wait() users · 18bceab1

Jens Axboe authored May 15, 2020

Some file descriptors use separate waitqueues for their f_ops->poll()
handler, most commonly one for read and one for write. The io_uring
poll implementation doesn't work with that, as the 2nd poll_wait()
call will cause the io_uring poll request to -EINVAL.

This affects (at least) tty devices and /dev/random as well. This is a
big problem for event loops where some file descriptors work, and others
don't.

With this fix, io_uring handles multiple waitqueues.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

18bceab1

io_uring: batch reap of dead file registrations · 4a38aed2

Jens Axboe authored May 14, 2020

We currently embed and queue a work item per fixed_file_ref_node that
we update, but if the workload does a lot of these, then the associated
kworker-events overhead can become quite noticeable.

Since we rarely need to wait on these, batch them at 1 second intervals
instead. If we do need to wait for them, we just flush the pending
delayed work.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

4a38aed2

14 May, 2020 1 commit

io_uring: name sq thread and ref completions · 0f158b4c

Jens Axboe authored May 14, 2020

We used to have three completions, now we just have two. With the two,
let's not allocate them dynamically, just embed then in the ctx and
name them appropriately.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

0f158b4c

11 May, 2020 1 commit

io_uring: remove duplicate semicolon at the end of line · 84695089

Xiaoming Ni authored May 11, 2020

Remove duplicate semicolon at the end of line in io_file_from_index()
Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

84695089

09 May, 2020 2 commits

io_uring: remove obsolete 'state' parameter · 7d01bd74

Xiaoguang Wang authored May 08, 2020

The "struct io_submit_state *state" parameter is not used, remove it.
Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

7d01bd74

io_uring: remove 'fd is io_uring' from close path · 904fbcb1

Jens Axboe authored May 08, 2020

The attempt protecting us from closing the ring itself wasn't really
complete, and we actually don't need it. The referencing of requests
themselve, and the references they hold on the ring, ensures that the
life time of the ring is sane. With the check removed, we can also
remove the need to have the close operation fget() the file.
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

904fbcb1

07 May, 2020 2 commits

io_uring: don't use 'fd' for openat/openat2/statx · 63ff8223

Jens Axboe authored May 07, 2020

We currently make some guesses as when to open this fd, but in reality
we have no business (or need) to do so at all. In fact, it makes certain
things fail, like O_PATH.

Remove the fd lookup from these opcodes, we're just passing the 'fd' to
generic helpers anyway. With that, we can also remove the special casing
of fd values in io_req_needs_file(), and the 'fd_non_neg' check that
we have. And we can ensure that we only read sqe->fd once.

This fixes O_PATH usage with openat/openat2, and ditto statx path side
oddities.

Cc: stable@vger.kernel.org: # v5.6
Reported-by: Max Kellermann <mk@cm4all.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

63ff8223

splice: move f_mode checks to do_{splice,tee}() · 90da2e3f

Pavel Begunkov authored May 04, 2020

do_splice() is used by io_uring, as will be do_tee(). Move f_mode
checks from sys_{splice,tee}() to do_{splice,tee}(), so they're
enforced for io_uring as well.

Fixes: 7d67af2c ("io_uring: add splice(2) support")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

90da2e3f

05 May, 2020 1 commit

io_uring: handle -EFAULT properly in io_uring_setup() · 7f13657d

Xiaoguang Wang authored May 05, 2020

If copy_to_user() in io_uring_setup() failed, we'll leak many kernel
resources, which will be recycled until process terminates. This bug
can be reproduced by using mprotect to set params to PROT_READ. To fix
this issue, refactor io_uring_create() a bit to add a new 'struct
io_uring_params __user *params' parameter and move the copy_to_user()
in io_uring_setup() to io_uring_setup(), if copy_to_user() failed,
we can free kernel resource properly.
Suggested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

7f13657d

04 May, 2020 1 commit

io_uring: fix mismatched finish_wait() calls in io_uring_cancel_files() · d8f1b971

Xiaoguang Wang authored Apr 26, 2020

The prepare_to_wait() and finish_wait() calls in io_uring_cancel_files()
are mismatched. Currently I don't see any issues related this bug, just
find it by learning codes.
Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

d8f1b971

03 May, 2020 4 commits

Linux 5.7-rc4 · 0e698dfa
Linus Torvalds authored May 03, 2020

0e698dfa

Merge tag 'for-5.7-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 262f7a6b

Linus Torvalds authored May 03, 2020

Pull more btrfs fixes from David Sterba:
 "A few more stability fixes, minor build warning fixes and git url
  fixup:

   - fix partial loss of prealloc extent past i_size after fsync

   - fix potential deadlock due to wrong transaction handle passing via
     journal_info

   - fix gcc 4.8 struct intialization warning

   - update git URL in MAINTAINERS entry"

* tag 'for-5.7-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  MAINTAINERS: btrfs: fix git repo URL
  btrfs: fix gcc-4.8 build warning for struct initializer
  btrfs: transaction: Avoid deadlock due to bad initialization timing of fs_info::journal_info
  btrfs: fix partial loss of prealloc extent past i_size after fsync

262f7a6b

Merge tag 'iommu-fixes-v5.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu · ea915933

Linus Torvalds authored May 03, 2020

Pull IOMMU fixes from Joerg Roedel:

 - Fix a memory leak when dev_iommu gets freed and a sub-pointer does
   not

 - Build dependency fixes for Mediatek, spapr_tce, and Intel IOMMU
   driver

 - Export iommu_group_get_for_dev() only for GPLed modules

 - Fix AMD IOMMU interrupt remapping when x2apic is enabled

 - Fix error path in the QCOM IOMMU driver probe function

* tag 'iommu-fixes-v5.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/qcom: Fix local_base status check
  iommu: Properly export iommu_group_get_for_dev()
  iommu/vt-d: Use right Kconfig option name
  iommu/amd: Fix legacy interrupt remapping for x2APIC-enabled system
  iommu: spapr_tce: Disable compile testing to fix build on book3s_32 config
  iommu/mediatek: Fix MTK_IOMMU dependencies
  iommu: Fix the memory leak in dev_iommu_free()

ea915933

MAINTAINERS: btrfs: fix git repo URL · eb91db63

Eric Biggers authored May 01, 2020

The git repo listed for btrfs hasn't been updated in over a year.
List the current one instead.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

eb91db63

02 May, 2020 3 commits

Merge tag 'pm-5.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 743f0573

Linus Torvalds authored May 02, 2020

Pull power management fixes from Rafael Wysocki:

 - prevent the intel_pstate driver from printing excessive diagnostic
   messages in some cases (Chris Wilson)

 - make the hibernation restore kernel freeze kernel threads as well as
   user space tasks (Dexuan Cui)

 - fix the ACPI device PM disagnostic messages to include the correct
   power state name (Kai-Heng Feng).

* tag 'pm-5.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM: ACPI: Output correct message on target power state
  PM: hibernate: Freeze kernel threads in software_resume()
  cpufreq: intel_pstate: Only mention the BIOS disabling turbo mode once

743f0573

Merge branches 'pm-cpufreq' and 'pm-sleep' · a5383996

Rafael J. Wysocki authored May 02, 2020

* pm-cpufreq:
  cpufreq: intel_pstate: Only mention the BIOS disabling turbo mode once

* pm-sleep:
  PM: hibernate: Freeze kernel threads in software_resume()

a5383996

Merge tag 'iomap-5.7-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · f66ed1eb

Linus Torvalds authored May 02, 2020

Pull iomap fix from Darrick Wong:
 "Hoist the check for an unrepresentable FIBMAP return value into
  ioctl_fibmap.

  The internal kernel function can handle 64-bit values (and is needed
  to fix a regression on ext4 + jbd2). It is only the userspace ioctl
  that is so old that it cannot deal"

* tag 'iomap-5.7-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  fibmap: Warn and return an error in case of block > INT_MAX

f66ed1eb