- 01 Feb, 2021 28 commits
-
-
Jens Axboe authored
We currently split the close into two, in case we have a ->flush op that we can't safely handle from non-blocking context. This requires us to flag the op as uncancelable if we do need to punt it async, and that means special handling for just this op type. Use __close_fd_get_file() and grab the files lock so we can get the file and check if we need to go async in one atomic operation. That gets rid of the need for splitting this into two steps, and hence the need for IO_WQ_WORK_NO_CANCEL. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
Assumes current->files->file_lock is already held on invocation. Helps the caller check the file before removing the fd, if it needs to. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
When a request is completed with comp_state, its completion reference put is deferred to io_submit_flush_completions(), but the submission is put not far from there, so do it together to save one atomic dec per request. That targets requests that complete inline, e.g. buffered rw, send/recv. Proper benchmarking haven't been conducted but for nops(batch=32) it was around 7901 vs 8117 KIOPS (~2.7%), or ~4% per perf profiling. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
io_submit_flush_completions() is called down the stack in the _state version of io_req_complete(), that's ok because is only called by io_uring opcode handler functions directly. Move it up to __io_queue_sqe() as preparation. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
__io_req_complete() inlining is a bit weird, some compilers don't optimise out the non-NULL branch of it even when called as io_req_complete(). Help it a bit by extracting state and stateless helpers out of __io_req_complete(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Deduplicates translation of timeout flags into hrtimer_mode. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
When io_req_task_work_add() fails, the request will be cancelled by enqueueing via task_works of io-wq. Extract a function for that. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
The check in io_state_file_put() is optimised pretty well when called from __io_file_get(). Don't pollute the code with all these variants. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Get rid of a label in io_alloc_req(), it's cleaner to do return directly. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Apparently, there is one more place hand coded calculation of number of CQ events in the ring. Use __io_cqring_events() helper in io_get_cqring() as well. Naturally, assembly stays identical. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Inline it in its only user, that's cleaner Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
The name is confusing and it's used only in one place. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
personality_idr is usually synchronised by uring_lock, the exception would be removing personalities in io_ring_ctx_wait_and_kill(), which is legit as refs are killed by that point but still would be more resilient to do it under the lock. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
It's awkward to pass return a value into a function for it to return it back. Check it at the caller site and clean up io_resubmit_prep() a bit. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
The hot path is IO completing on the first try. Reshuffle io_rw_reissue() so it's checked first. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Bijan Mottahedeh authored
Make the percpu ref release function names consistent between rsrc data and nodes. Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Bijan Mottahedeh authored
Create common alloc/free fixed_rsrc_data routines for both files and buffers. Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com> [remove buffer part] Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Bijan Mottahedeh authored
Create common routines to be used for both files/buffers registration. [remove io_sqe_rsrc_set_node substitution] Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com> [merge, quiesce only for files] Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
A simple prep patch allowing to set refnode callbacks after it was allocated. This needed to 1) keep ourself off of hi-level functions where it's not pretty and they are not necessary 2) amortise ref_node allocation in the future, e.g. for updates. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Bijan Mottahedeh authored
Split alloc_fixed_file_ref_node into resource generic/specific parts, to be leveraged for fixed buffers. Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Bijan Mottahedeh authored
Encapsulate resource reference locking into separate routines. Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Bijan Mottahedeh authored
Uplevel ref_list and make it common to all resources. This is to allow one common ref_list to be used for both files, and buffers in upcoming patches. Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Bijan Mottahedeh authored
Generalize io_queue_rsrc_removal to handle both files and buffers. Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com> [remove io_mapped_ubuf from rsrc tables/etc. for now] Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Bijan Mottahedeh authored
This is a prep rename patch for subsequent patches to generalize file registration. [io_uring_rsrc_update:: rename fds -> data] Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com> [leave io_uring_files_update as struct] Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Bijan Mottahedeh authored
Move allocation of buffer management structures, and validation of buffers into separate routines. Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Bijan Mottahedeh authored
Split io_sqe_buffer_register into two routines: - io_sqe_buffer_register() registers a single buffer - io_sqe_buffers_register iterates over all user specified buffers Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
Instead of being pessimistic and assume that path lookup will block, use LOOKUP_CACHED to attempt just a cached lookup. This ensures that the fast path is always done inline, and we only punt to async context if IO is needed to satisfy the lookup. For forced nonblock open attempts, mark the file O_NONBLOCK over the actual ->open() call as well. We can safely clear this again before doing fd_install(), so it'll never be user visible that we fiddled with it. This greatly improves the performance of file open where the dentry is already cached: ached 5.10-git 5.10-git+LOOKUP_CACHED Speedup --------------------------------------------------------------- 33% 1,014,975 900,474 1.1x 89% 545,466 292,937 1.9x 100% 435,636 151,475 2.9x The more cache hot we are, the faster the inline LOOKUP_CACHED optimization helps. This is unsurprising and expected, as a thread offload becomes a more dominant part of the total overhead. If we look at io_uring tracing, doing an IORING_OP_OPENAT on a file that isn't in the dentry cache will yield: 275.550481: io_uring_create: ring 00000000ddda6278, fd 3 sq size 8, cq size 16, flags 0 275.550491: io_uring_submit_sqe: ring 00000000ddda6278, op 18, data 0x0, non block 1, sq_thread 0 275.550498: io_uring_queue_async_work: ring 00000000ddda6278, request 00000000c0267d17, flags 69760, normal queue, work 000000003d683991 275.550502: io_uring_cqring_wait: ring 00000000ddda6278, min_events 1 275.550556: io_uring_complete: ring 00000000ddda6278, user_data 0x0, result 4 which shows a failed nonblock lookup, then punt to worker, and then we complete with fd == 4. This takes 65 usec in total. Re-running the same test case again: 281.253956: io_uring_create: ring 0000000008207252, fd 3 sq size 8, cq size 16, flags 0 281.253967: io_uring_submit_sqe: ring 0000000008207252, op 18, data 0x0, non block 1, sq_thread 0 281.253973: io_uring_complete: ring 0000000008207252, user_data 0x0, result 4 shows the same request completing inline, also returning fd == 4. This takes 6 usec. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
Merge branch 'work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs into for-5.12/io_uring Merge RESOLVE_CACHED bits from Al, as the io_uring changes will build on top of that. * 'work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: expose LOOKUP_CACHED through openat2() RESOLVE_CACHED fs: add support for LOOKUP_CACHED saner calling conventions for unlazy_child() fs: make unlazy_walk() error handling consistent fs/namei.c: Remove unlikely of status being -ECHILD in lookup_fast() do_tmpfile(): don't mess with finish_open()
-
- 31 Jan, 2021 12 commits
-
-
Linus Torvalds authored
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull EFI fix from Borislav Petkov: "A single fix from Lukas: handle boolean device properties imported from Apple firmware correctly" * tag 'efi-urgent-for-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi/apple-properties: Reinstate support for boolean properties
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull x86 fix from Borislav Petkov: "A single fix for objtool to generate proper unwind info for newer toolchains which do not generate section symbols anymore. And a cleanup ontop. This was originally going to go during the next merge window but people can already trigger a build error with binutils-2.36 which doesn't emit section symbols - something which objtool relies on - so let's expedite it" * tag 'x86_entry_for_v5.11_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/entry: Remove put_ret_addr_in_rdi THUNK macro argument x86/entry: Emit a symbol for register restoring thunk
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull timer fix from Thomas Gleixner: "A fix for handling advertised, but non-existent 146818 RTCs correctly. With the recent UIP handling changes the time readout of non-existent RTCs hangs forever as the read returns always 0xFF which means the UIP bit is set. Sanity check the RTC before registering by checking the RTC_VALID register for correctness" * tag 'timers-urgent-2021-01-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rtc: mc146818: Detect and handle broken RTCs
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull single stepping fix from Thomas Gleixner: "A single fix for the single step reporting regression caused by getting the condition wrong when moving SYSCALL_EMU away from TIF flags" [ There's apparently another problem too, fix pending ] * tag 'core-urgent-2021-01-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: entry: Unbreak single step reporting behaviour
-
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linuxLinus Torvalds authored
Pull powerpc fix from Michael Ellerman: "One fix for a bug in our soft interrupt masking, which could lead to interrupt replaying recursing, causing spurious interrupts. Thanks to Nicholas Piggin" * tag 'powerpc-5.11-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64s: prevent recursive replay_soft_interrupts causing superfluous interrupt
-
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linuxLinus Torvalds authored
Pull i2c fix from Wolfram Sang: "Just one I2C driver update this time" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: mediatek: Move suspend and resume handling to NOIRQ phase
-
git://git.kernel.org/pub/scm/linux/kernel/git/pavel/linux-ledsLinus Torvalds authored
Pull LED fixes from Pavel Machek: "This pull is due to 'leds: trigger: fix potential deadlock with libata' -- people find the warn annoying. It also contains new driver and two trivial fixes" * 'for-rc-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/pavel/linux-leds: leds: rt8515: Add Richtek RT8515 LED driver dt-bindings: leds: Add DT binding for Richtek RT8515 leds: trigger: fix potential deadlock with libata leds: leds-ariel: convert comma to semicolon leds: leds-lm3533: convert comma to semicolon
-
git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds authored
Pull NFS client fixes from Trond Myklebust: - SUNRPC: Handle 0 length opaque XDR object data properly - Fix a layout segment leak in pnfs_layout_process() - pNFS/NFSv4: Update the layout barrier when we schedule a layoutreturn - pNFS/NFSv4: Improve rejection of out-of-order layouts - pNFS/NFSv4: Try to return invalid layout in pnfs_layout_process() * tag 'nfs-for-5.11-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: SUNRPC: Handle 0 length opaque XDR object data properly SUNRPC: Move simple_get_bytes and simple_get_netobj into private header pNFS/NFSv4: Improve rejection of out-of-order layouts pNFS/NFSv4: Update the layout barrier when we schedule a layoutreturn pNFS/NFSv4: Try to return invalid layout in pnfs_layout_process() pNFS/NFSv4: Fix a layout segment leak in pnfs_layout_process()
-
Linus Walleij authored
This adds a driver for the Richtek RT8515 dual channel torch/flash white LED driver. This LED driver is found in some mobile phones from Samsung such as the GT-S7710 and GT-I8190. A V4L interface is added. We do not have a proper datasheet for the RT8515 but it turns out that RT9387A has a public datasheet and is essentially the same chip. We designed the driver in accordance with this datasheet. The day someone needs to drive a RT9387A this driver can probably easily be augmented to handle that chip too. Sakari Ailus, Pavel Machek and Andy Shevchenko helped significantly in getting this driver right. Cc: Sakari Ailus <sakari.ailus@iki.fi> Cc: newbytee@protonmail.com Cc: Stephan Gerhold <stephan@gerhold.net> Cc: linux-media@vger.kernel.org Cc: phone-devel@vger.kernel.org Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Pavel Machek <pavel@ucw.cz>
-
Linus Walleij authored
Add a YAML devicetree binding for the Richtek RT8515 dual channel flash/torch LED driver. Cc: Sakari Ailus <sakari.ailus@iki.fi> Cc: newbytee@protonmail.com Cc: Stephan Gerhold <stephan@gerhold.net> Cc: phone-devel@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: devicetree@vger.kernel.org Reviewed-by: Rob Herring <robh@kernel.org> Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Pavel Machek <pavel@ucw.cz>
-
Andrea Righi authored
We have the following potential deadlock condition: ======================================================== WARNING: possible irq lock inversion dependency detected 5.10.0-rc2+ #25 Not tainted -------------------------------------------------------- swapper/3/0 just changed the state of lock: ffff8880063bd618 (&host->lock){-...}-{2:2}, at: ata_bmdma_interrupt+0x27/0x200 but this lock took another, HARDIRQ-READ-unsafe lock in the past: (&trig->leddev_list_lock){.+.?}-{2:2} and interrupts could create inverse lock ordering between them. other info that might help us debug this: Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&trig->leddev_list_lock); local_irq_disable(); lock(&host->lock); lock(&trig->leddev_list_lock); <Interrupt> lock(&host->lock); *** DEADLOCK *** no locks held by swapper/3/0. the shortest dependencies between 2nd lock and 1st lock: -> (&trig->leddev_list_lock){.+.?}-{2:2} ops: 46 { HARDIRQ-ON-R at: lock_acquire+0x15f/0x420 _raw_read_lock+0x42/0x90 led_trigger_event+0x2b/0x70 rfkill_global_led_trigger_worker+0x94/0xb0 process_one_work+0x240/0x560 worker_thread+0x58/0x3d0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 IN-SOFTIRQ-R at: lock_acquire+0x15f/0x420 _raw_read_lock+0x42/0x90 led_trigger_event+0x2b/0x70 kbd_bh+0x9e/0xc0 tasklet_action_common.constprop.0+0xe9/0x100 tasklet_action+0x22/0x30 __do_softirq+0xcc/0x46d run_ksoftirqd+0x3f/0x70 smpboot_thread_fn+0x116/0x1f0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 SOFTIRQ-ON-R at: lock_acquire+0x15f/0x420 _raw_read_lock+0x42/0x90 led_trigger_event+0x2b/0x70 rfkill_global_led_trigger_worker+0x94/0xb0 process_one_work+0x240/0x560 worker_thread+0x58/0x3d0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 INITIAL READ USE at: lock_acquire+0x15f/0x420 _raw_read_lock+0x42/0x90 led_trigger_event+0x2b/0x70 rfkill_global_led_trigger_worker+0x94/0xb0 process_one_work+0x240/0x560 worker_thread+0x58/0x3d0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 } ... key at: [<ffffffff83da4c00>] __key.0+0x0/0x10 ... acquired at: _raw_read_lock+0x42/0x90 led_trigger_blink_oneshot+0x3b/0x90 ledtrig_disk_activity+0x3c/0xa0 ata_qc_complete+0x26/0x450 ata_do_link_abort+0xa3/0xe0 ata_port_freeze+0x2e/0x40 ata_hsm_qc_complete+0x94/0xa0 ata_sff_hsm_move+0x177/0x7a0 ata_sff_pio_task+0xc7/0x1b0 process_one_work+0x240/0x560 worker_thread+0x58/0x3d0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 -> (&host->lock){-...}-{2:2} ops: 69 { IN-HARDIRQ-W at: lock_acquire+0x15f/0x420 _raw_spin_lock_irqsave+0x52/0xa0 ata_bmdma_interrupt+0x27/0x200 __handle_irq_event_percpu+0xd5/0x2b0 handle_irq_event+0x57/0xb0 handle_edge_irq+0x8c/0x230 asm_call_irq_on_stack+0xf/0x20 common_interrupt+0x100/0x1c0 asm_common_interrupt+0x1e/0x40 native_safe_halt+0xe/0x10 arch_cpu_idle+0x15/0x20 default_idle_call+0x59/0x1c0 do_idle+0x22c/0x2c0 cpu_startup_entry+0x20/0x30 start_secondary+0x11d/0x150 secondary_startup_64_no_verify+0xa6/0xab INITIAL USE at: lock_acquire+0x15f/0x420 _raw_spin_lock_irqsave+0x52/0xa0 ata_dev_init+0x54/0xe0 ata_link_init+0x8b/0xd0 ata_port_alloc+0x1f1/0x210 ata_host_alloc+0xf1/0x130 ata_host_alloc_pinfo+0x14/0xb0 ata_pci_sff_prepare_host+0x41/0xa0 ata_pci_bmdma_prepare_host+0x14/0x30 piix_init_one+0x21f/0x600 local_pci_probe+0x48/0x80 pci_device_probe+0x105/0x1c0 really_probe+0x221/0x490 driver_probe_device+0xe9/0x160 device_driver_attach+0xb2/0xc0 __driver_attach+0x91/0x150 bus_for_each_dev+0x81/0xc0 driver_attach+0x1e/0x20 bus_add_driver+0x138/0x1f0 driver_register+0x91/0xf0 __pci_register_driver+0x73/0x80 piix_init+0x1e/0x2e do_one_initcall+0x5f/0x2d0 kernel_init_freeable+0x26f/0x2cf kernel_init+0xe/0x113 ret_from_fork+0x1f/0x30 } ... key at: [<ffffffff83d9fdc0>] __key.6+0x0/0x10 ... acquired at: __lock_acquire+0x9da/0x2370 lock_acquire+0x15f/0x420 _raw_spin_lock_irqsave+0x52/0xa0 ata_bmdma_interrupt+0x27/0x200 __handle_irq_event_percpu+0xd5/0x2b0 handle_irq_event+0x57/0xb0 handle_edge_irq+0x8c/0x230 asm_call_irq_on_stack+0xf/0x20 common_interrupt+0x100/0x1c0 asm_common_interrupt+0x1e/0x40 native_safe_halt+0xe/0x10 arch_cpu_idle+0x15/0x20 default_idle_call+0x59/0x1c0 do_idle+0x22c/0x2c0 cpu_startup_entry+0x20/0x30 start_secondary+0x11d/0x150 secondary_startup_64_no_verify+0xa6/0xab This lockdep splat is reported after: commit e9181886 ("locking: More accurate annotations for read_lock()") To clarify: - read-locks are recursive only in interrupt context (when in_interrupt() returns true) - after acquiring host->lock in CPU1, another cpu (i.e. CPU2) may call write_lock(&trig->leddev_list_lock) that would be blocked by CPU0 that holds trig->leddev_list_lock in read-mode - when CPU1 (ata_ac_complete()) tries to read-lock trig->leddev_list_lock, it would be blocked by the write-lock waiter on CPU2 (because we are not in interrupt context, so the read-lock is not recursive) - at this point if an interrupt happens on CPU0 and ata_bmdma_interrupt() is executed it will try to acquire host->lock, that is held by CPU1, that is currently blocked by CPU2, so: * CPU0 blocked by CPU1 * CPU1 blocked by CPU2 * CPU2 blocked by CPU0 *** DEADLOCK *** The deadlock scenario is better represented by the following schema (thanks to Boqun Feng <boqun.feng@gmail.com> for the schema and the detailed explanation of the deadlock condition): CPU 0: CPU 1: CPU 2: ----- ----- ----- led_trigger_event(): read_lock(&trig->leddev_list_lock); <workqueue> ata_hsm_qc_complete(): spin_lock_irqsave(&host->lock); write_lock(&trig->leddev_list_lock); ata_port_freeze(): ata_do_link_abort(): ata_qc_complete(): ledtrig_disk_activity(): led_trigger_blink_oneshot(): read_lock(&trig->leddev_list_lock); // ^ not in in_interrupt() context, so could get blocked by CPU 2 <interrupt> ata_bmdma_interrupt(): spin_lock_irqsave(&host->lock); Fix by using read_lock_irqsave/irqrestore() in led_trigger_event(), so that no interrupt can happen in between, preventing the deadlock condition. Apply the same change to led_trigger_blink_setup() as well, since the same deadlock scenario can also happen in power_supply_update_bat_leds() -> led_trigger_blink() -> led_trigger_blink_setup() (workqueue context), and potentially prevent other similar usages. Link: https://lore.kernel.org/lkml/20201101092614.GB3989@xps-13-7390/ Fixes: eb25cb99 ("leds: convert IDE trigger to common disk trigger") Signed-off-by: Andrea Righi <andrea.righi@canonical.com> Signed-off-by: Pavel Machek <pavel@ucw.cz>
-