- 16 Jan, 2020 2 commits
-
-
Jens Axboe authored
A previous commit moved the locking for the async sqthread, but didn't take into account that the io-wq workers still need it. We can't use req->in_async for this anymore as both the sqthread and io-wq workers set it, gate the need for locking on io_wq_current_is_worker() instead. Fixes: 8a4955ff ("io_uring: sqthread should grab ctx->uring_lock for submissions") Reported-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Bijan Mottahedeh authored
req->result is cleared when io_issue_sqe() calls io_read/write_pre() routines. Those routines however are not called when the sqe argument is NULL, which is the case when io_issue_sqe() is called from io_wq_submit_work(). io_issue_sqe() may then examine a stale result if a polled request had previously failed with -EAGAIN: if (ctx->flags & IORING_SETUP_IOPOLL) { if (req->result == -EAGAIN) return -EAGAIN; io_iopoll_req_issued(req); } and in turn cause a subsequently completed request to be re-issued in io_wq_submit_work(). Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 15 Jan, 2020 2 commits
-
-
Jens Axboe authored
If we pass back dependent work in case of links, we need to always ensure that we call the link setup and work prep handler. If not, we might be missing some setup for the next work item. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
If we require mm and user context, mark the request for cancellation if we fail to acquire the desired mm. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 14 Jan, 2020 1 commit
-
-
Jens Axboe authored
We don't need it, and if we have it, then the retry handler will attempt to copy the non-existent iovec with the inline iovec, with a segment count that doesn't make sense. Fixes: f67676d1 ("io_uring: ensure async punted read/write requests copy iovec") Reported-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 07 Jan, 2020 1 commit
-
-
Jens Axboe authored
We currently punt any short read on a regular file to async context, but this fails if the short read is due to running into EOF. This is especially problematic since we only do the single prep for commands now, as we don't reset kiocb->ki_pos. This can result in a 4k read on a 1k file returning zero, as we detect the short read and then retry from async context. At the time of retry, the position is now 1k, and we end up reading nothing, and hence return 0. Instead of trying to patch around the fact that short reads can be legitimate and won't succeed in case of retry, remove the logic to punt a short read to async context. Simply return it. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 24 Dec, 2019 1 commit
-
-
Hillf Danton authored
Reschedule the current IO worker to cut the risk that it is becoming a cpu hog. Signed-off-by: Hillf Danton <hdanton@sina.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 23 Dec, 2019 1 commit
-
-
Hillf Danton authored
Commit e61df66c ("io-wq: ensure free/busy list browsing see all items") added a list for io workers in addition to the free and busy lists, not only making worker walk cleaner, but leaving the busy list unused. Let's remove it. Signed-off-by: Hillf Danton <hdanton@sina.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 20 Dec, 2019 7 commits
-
-
Jens Axboe authored
This moves the prep handlers outside of the opcode handlers, and allows us to pass in the sqe directly. If the sqe is non-NULL, it means that the request should be prepared for the first time. With the opcode handlers not having access to the sqe at all, we are guaranteed that the prep handler has setup the request fully by the time we get there. As before, for opcodes that need to copy in more data then the io_kiocb allows for, the io_async_ctx holds that info. If a prep handler is invoked with req->io set, it must use that to retain information for later. Finally, we can remove io_kiocb->sqe as well. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
We currently have a mix of use cases. Most of the newer ones are pretty uniform, but we have some older ones that use different calling calling conventions. This is confusing. For the opcodes that currently rely on the req->io->sqe copy saving them from reuse, add a request type struct in the io_kiocb command union to store the data they need. Prepare for all opcodes having a standard prep method, so we can call it in a uniform fashion and outside of the opcode handler. This is in preparation for passing in the 'sqe' pointer, rather than storing it in the io_kiocb. Once we have uniform prep handlers, we can leave all the prep work to that part, and not even pass in the sqe to the opcode handler. This ensures that we don't reuse sqe data inadvertently. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
Add the count field to struct io_timeout, and ensure the prep handler has read it. Timeout also needs an async context always, set it up in the prep handler if we don't have one. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
Add struct io_sr_msg in our io_kiocb per-command union, and ensure that the send/recvmsg prep handlers have grabbed what they need from the SQE by the time prep is done. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
Add struct io_connect in our io_kiocb per-command union, and ensure that io_connect_prep() has grabbed what it needs from the SQE. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
Put the kiocb in struct io_rw, and add the addr/len for the request as well. Use the kiocb->private field for the buffer index for fixed reads and writes. Any use of kiocb->ki_filp is flipped to req->file. It's the same thing, and less confusing. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
We use it in some spots, but not consistently. Convert the rest over, makes it easier to read as well. No functional changes in this patch. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 18 Dec, 2019 12 commits
-
-
Jens Axboe authored
I've been chasing a weird and obscure crash that was userspace stack corruption, and finally narrowed it down to a bit flip that made a stack address invalid. io_wq_submit_work() unconditionally flips the req->rw.ki_flags IOCB_NOWAIT bit, but since it's a generic work handler, this isn't valid. Normal read/write operations own that part of the request, on other types it could be something else. Move the IOCB_NOWAIT clear to the read/write handlers where it belongs. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
There is no reliable way to submit and wait in a single syscall, as io_submit_sqes() may under-consume sqes (in case of an early error). Then it will wait for not-yet-submitted requests, deadlocking the user in most cases. Don't wait/poll if can't submit all sqes Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
Now that we have all the opcodes handled in terms of command prep and SQE reuse, add a printk_once() to warn about any potentially new and unhandled ones. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
If we defer a request, we can't be reading the opcode again. Ensure that the user_data and opcode fields are stable. For the user_data we already have a place for it, for the opcode we can fill a one byte hold and store that as well. For both of them, assign them when we originally read the SQE in io_get_sqring(). Any code that uses sqe->opcode or sqe->user_data is switched to req->opcode and req->user_data. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
If we defer this command as part of a link, we have to make sure that the SQE data has been read upfront. Integrate the timeout remove op into the prep handling to make it safe for SQE reuse. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
If we defer this command as part of a link, we have to make sure that the SQE data has been read upfront. Integrate the async cancel op into the prep handling to make it safe for SQE reuse. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
If we defer these commands as part of a link, we have to make sure that the SQE data has been read upfront. Integrate the poll add/remove into the prep handling to make it safe for SQE reuse. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
The rules are as follows, if IOSQE_IO_HARDLINK is specified, then it's a link and there is no need to set IOSQE_IO_LINK separately, though it could be there. Add proper check and ensure that IOSQE_IO_HARDLINK implies IOSQE_IO_LINK. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
We're currently not retaining sqe data for accept, fsync, and sync_file_range. None of these commands need data outside of what is directly provided, hence it can't go stale when the request is deferred. However, it can get reused, if an application reuses SQE entries. Ensure that we retain the information we need and only read the sqe contents once, off the submission path. Most of this is just moving code into a prep and finish function. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
We pass in req->sqe for all of them, no need to pass it in as the request is always passed in. This is a necessary prep patch to be able to cleanup/fix the request prep path. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
Some of these code paths assume that any force_nonblock == true issue is not prepped, but that's not true if we did prep as part of link setup earlier. Check if we already have an async context allocate before setting up a new one. Cleanup the async context setup in general, we have a lot of duplicated code there. Fixes: 03b1230c ("io_uring: ensure async punted sendmsg/recvmsg requests copy data") Fixes: f67676d1 ("io_uring: ensure async punted read/write requests copy iovec") Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
This reverts commit 8cdda87a, we now have several use csaes for this helper. Reinstate it. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 16 Dec, 2019 1 commit
-
-
Jens Axboe authored
If we have to punt the recvmsg to async context, we copy all the context. But since the iovec used can be either on-stack (if small) or dynamically allocated, if it's on-stack, then we need to ensure we reset the iov pointer. If we don't, then we're reusing old stack data, and that can lead to -EFAULTs if things get overwritten. Ensure we retain the right pointers for the iov, and free it as well if we end up having to go beyond UIO_FASTIOV number of vectors. Fixes: 03b1230c ("io_uring: ensure async punted sendmsg/recvmsg requests copy data") Reported-by: 李通洲 <carter.li@eoitek.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 15 Dec, 2019 1 commit
-
-
Brian Gianforcaro authored
- Fix a few typos found while reading the code. - Fix stale io_get_sqring comment referencing s->sqe, the 's' parameter was renamed to 'req', but the comment still holds. Signed-off-by: Brian Gianforcaro <b.gianfo@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 13 Dec, 2019 11 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2Linus Torvalds authored
Pull nios2 fix from Ley Foon Tan: "Fix nios2 ioremap regression" * tag 'nios2-v5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2: nios2: Fix ioremap
-
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linuxLinus Torvalds authored
Pull Devicetree fixes from Rob Herring: - Fix for dependency tracking caused by unittest interaction - Fix some schema errors in Tegra memory controller schema - Update Maxime Ripard's email address - Review fixes to TI cpsw-switch - Add wakeup-source prop for STM32 rproc. Got dropped in the schema conversion. * tag 'devicetree-fixes-for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: of/platform: Unconditionally pause/resume sync state during kernel init dt-bindings: memory-controllers: tegra: Fix type references dt-bindings: Change maintainer address dt-bindings: net: ti: cpsw-switch: update to fix comments dt-bindings: remoteproc: stm32: add wakeup-source property
-
git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds authored
Pull virtio fixes from Michael Tsirkin: "Some fixes and cleanup patches" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio_balloon: divide/multiply instead of shifts virtio_balloon: name cleanups virtio-balloon: fix managed page counts when migrating pages between zones
-
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pciLinus Torvalds authored
Pull PCI fix from Bjorn Helgaas: "Fix rockchip outbound ATU issue that prevented Google Kevin Chromebooks from booting (Enric Balletbo i Serra)" * tag 'pci-v5.5-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: PCI: rockchip: Fix IO outbound ATU register number
-
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linuxLinus Torvalds authored
Pull i2c updates from Wolfram Sang: - removal of an old API where all in-kernel users have been converted as of this merge window. - a kdoc fix - a new helper that will make dependencies for the next API conversion a tad easier * 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: add helper to check if a client has a driver attached i2c: fix header file kernel-doc warning i2c: remove i2c_new_dummy() API
-
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pmLinus Torvalds authored
Pull power management fixes from Rafael Wysocki: "These add PM QoS support to devfreq and fix a few issues in that subsystem, fix two cpuidle issues and do one minor cleanup in there, and address an ACPI power management problem related to devices with special power management requirements, like fans. Specifics: - Add PM QoS support, based on the frequency QoS introduced during the 5.4 cycle, to devfreq (Leonard Crestez). - Fix some assorted devfreq issues (Leonard Crestez). - Fix an unintentional cpuidle behavior change (introduced during the 5.4 cycle) related to the active polling time limit (Marcelo Tosatti). - Fix a recently introduced cpuidle helper function and do a minor cleanup in the cpuidle core (Rafael Wysocki). - Avoid adding devices with special power management requirements, like fans, to the generic ACPI PM domain (Rafael Wysocki)" * tag 'pm-5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpuidle: Drop unnecessary type cast in cpuidle_poll_time() cpuidle: Fix cpuidle_driver_state_disabled() ACPI: PM: Avoid attaching ACPI PM domain to certain devices cpuidle: use first valid target residency as poll time PM / devfreq: Use PM QoS for sysfs min/max_freq PM / devfreq: Add PM QoS support PM / devfreq: Don't fail devfreq_dev_release if not in list PM / devfreq: Introduce get_freq_range helper PM / devfreq: Set scaling_max_freq to max on OPP notifier error PM / devfreq: Fix devfreq_notifier_call returning errno
-
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/soundLinus Torvalds authored
Pull sound fixes from Takashi Iwai: "A small collection of fixes. The main changes are fixes for a couple of regressions in AMD HD-audio and FireWire that were introduced in 5.5-rc1. The rest are small fixes for echoaudio and FireWire, as well as a usual Dell HD-audio fixup" * tag 'sound-5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: hda/realtek - Line-out jack doesn't work on a Dell AIO ALSA: hda/hdmi - Fix duplicate unref of pci_dev ALSA: fireface: fix return value in error path of isochronous resources reservation ALSA: oxfw: fix return value in error path of isochronous resources reservation ALSA: firewire-motu: fix double unlocked 'motu->mutex' ALSA: echoaudio: simplify get_audio_levels
-
git://anongit.freedesktop.org/drm/drmLinus Torvalds authored
Pull drm fixes from Dave Airlie: "Usual round of rc2 fixes. i915 and amdgpu leading the charge, but a few others in here, including some nouveau fixes, all seems pretty for rc2, but hey it's a Fri 13th pull so I'm sure it'll cause untold bad fortune. dma-buf: - memory leak fix - expand MAINTAINERS scope core: - fix mode matching for drivers not using picture_aspect_ratio nouveau: - panel scaling fix - MST BPC fix - atomic fixes i915: - GPU hang on idle transition - GLK+ FBC corruption fix - non-priv OA access on Tigerlake - HDCP state fix - CI found race fixes amdgpu: - renoir DC fixes - GFX8 fence flush alignment with userspace - Arcturus power profile fix - DC aux + i2c over aux fixes - GPUVM invalidation semaphore fixes - gfx10 golden registers update mgag200: - expand startadd fix panfrost: - devfreq fix - memory fixes mcde: - DSI pointer deref fix" * tag 'drm-fixes-2019-12-13' of git://anongit.freedesktop.org/drm/drm: (51 commits) drm/amdgpu: add invalidate semaphore limit for SRIOV in gmc10 drm/amdgpu: add invalidate semaphore limit for SRIOV and picasso in gmc9 drm/amdgpu: avoid using invalidate semaphore for picasso Revert "drm/amdgpu: dont schedule jobs while in reset" drm/amdgpu: fix license on Kconfig and Makefiles drm/amdgpu/gfx10: update gfx golden settings for navi14 drm/amdgpu/gfx10: update gfx golden settings drm/amdgpu/gfx10: update gfx golden settings for navi14 drm/amdgpu/gfx10: update gfx golden settings drm/i915: Serialise with remote retirement drm/amd/display: include linux/slab.h where needed drm/amd/display: fix undefined struct member reference drm/nouveau/kms/nv50-: fix panel scaling drm/nouveau/kms/nv50-: Limit MST BPC to 8 drm/nouveau/kms/nv50-: Store the bpc we're using in nv50_head_atom drm/nouveau/kms/nv50-: Call outp_atomic_check_view() before handling PBN drm/nouveau: Fix drm-core using atomic code-paths on pre-nv50 hardware drm/nouveau: Move the declaration of struct nouveau_conn_atom up a bit drm/i915/gt: Detect if we miss WaIdleLiteRestore drm/i915/hdcp: Nuke intel_hdcp_transcoder_config() ...
-
git://git.kernel.dk/linux-blockLinus Torvalds authored
Pull block fixes from Jens Axboe: - stable fix for the bi_size overflow. Not a corruption issue, but a case wher we could merge but disallowed (Andreas) - NVMe pull request via Keith, with various fixes. - MD pull request from Song. - Merge window regression fix for the rq passthrough stats (Logan) - Remove unused blkcg_drain_queue() function (Guoqing) * tag 'for-linus-20191212' of git://git.kernel.dk/linux-block: blk-cgroup: remove blkcg_drain_queue block: fix NULL pointer dereference in account statistics with IDE md: make sure desc_nr less than MD_SB_DISKS md: raid1: check rdev before reference in raid1_sync_request func raid5: need to set STRIPE_HANDLE for batch head block: fix "check bi_size overflow before merge" nvme/pci: Fix read queue count nvme/pci Limit write queue sizes to possible cpus nvme/pci: Fix write and poll queue types nvme/pci: Remove last_cq_head nvme: Namepace identification descriptor list is optional nvme-fc: fix double-free scenarios on hw queues nvme: else following return is not needed nvme: add error message on mismatching controller ids nvme_fc: add module to ops template to allow module references nvmet-loop: Avoid preallocating big SGL for data nvme-fc: Avoid preallocating big SGL for data nvme-rdma: Avoid preallocating big SGL for data
-
git://git.kernel.dk/linux-blockLinus Torvalds authored
Pull io_uring fixes from Jens Axboe: - A tweak to IOSQE_IO_LINK (also marked for stable) to allow links that don't sever if the result is < 0. This is mostly for linked timeouts, where if we ask for a pure timeout we always get -ETIME. This makes links useless for that case, hence allow a case where it works. - Five minor optimizations to fix and improve cases that regressed since v5.4. - An SQTHREAD locking fix. - A sendmsg/recvmsg iov assignment fix. - Net fix where read_iter/write_iter don't honor IOCB_NOWAIT, and subsequently ensuring that works for io_uring. - Fix a case where for an invalid opcode we might return -EBADF instead of -EINVAL, if the ->fd of that sqe was set to an invalid fd value. * tag 'io_uring-5.5-20191212' of git://git.kernel.dk/linux-block: io_uring: ensure we return -EINVAL on unknown opcode io_uring: add sockets to list of files that support non-blocking issue net: make socket read/write_iter() honor IOCB_NOWAIT io_uring: only hash regular files for async work execution io_uring: run next sqe inline if possible io_uring: don't dynamically allocate poll data io_uring: deferred send/recvmsg should assign iov io_uring: sqthread should grab ctx->uring_lock for submissions io-wq: briefly spin for new work after finishing work io-wq: remove worker->wait waitqueue io_uring: allow unbreakable links
-
Linus Torvalds authored
Merge tag 'for-5.5/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mike Snitzer: - Fix DM multipath by restoring full path selector functionality for bio-based configurations that don't haave a SCSI device handler. - Fix dm-btree removal to ensure non-root btree nodes have at least (max_entries / 3) entries. This resolves userspace thin_check utility's report of "too few entries in btree_node". - Fix both the DM thin-provisioning and dm-clone targets to properly flush the data device prior to metadata commit. This resolves the potential for inconsistency across a power loss event when the data device has a volatile writeback cache. - Small documentation fixes to dm-clone and dm-integrity. * tag 'for-5.5/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: docs: dm-integrity: remove reference to ARC4 dm thin: Flush data device before committing metadata dm thin metadata: Add support for a pre-commit callback dm clone: Flush destination device before committing metadata dm clone metadata: Use a two phase commit dm clone metadata: Track exact changes per transaction dm btree: increase rebalance threshold in __rebalance2() dm: add dm-clone to the documentation index dm mpath: remove harmful bio-based optimization
-