- 09 Dec, 2020 10 commits
-
-
Waiman Long authored
Reader optimistic spinning is helpful when the reader critical section is short and there aren't that many readers around. It also improves the chance that a reader can get the lock as writer optimistic spinning disproportionally favors writers much more than readers. Since commit d3681e26 ("locking/rwsem: Wake up almost all readers in wait queue"), all the waiting readers are woken up so that they can all get the read lock and run in parallel. When the number of contending readers is large, allowing reader optimistic spinning will likely cause reader fragmentation where multiple smaller groups of readers can get the read lock in a sequential manner separated by writers. That reduces reader parallelism. One possible way to address that drawback is to limit the number of readers (preferably one) that can do optimistic spinning. These readers act as representatives of all the waiting readers in the wait queue as they will wake up all those waiting readers once they get the lock. Alternatively, as reader optimistic lock stealing has already enhanced fairness to readers, it may be easier to just remove reader optimistic spinning and simplifying the optimistic spinning code as a result. Performance measurements (locking throughput kops/s) using a locking microbenchmark with 50/50 reader/writer distribution and turbo-boost disabled was done on a 2-socket Cascade Lake system (48-core 96-thread) to see the impacts of these changes: 1) Vanilla - 5.10-rc3 kernel 2) Before - 5.10-rc3 kernel with previous patches in this series 2) limit-rspin - 5.10-rc3 kernel with limited reader spinning patch 3) no-rspin - 5.10-rc3 kernel with reader spinning disabled # of threads CS Load Vanilla Before limit-rspin no-rspin ------------ ------- ------- ------ ----------- -------- 2 1 5,185 5,662 5,214 5,077 4 1 5,107 4,983 5,188 4,760 8 1 4,782 4,564 4,720 4,628 16 1 4,680 4,053 4,567 3,402 32 1 4,299 1,115 1,118 1,098 64 1 3,218 983 1,001 957 96 1 1,938 944 957 930 2 20 2,008 2,128 2,264 1,665 4 20 1,390 1,033 1,046 1,101 8 20 1,472 1,155 1,098 1,213 16 20 1,332 1,077 1,089 1,122 32 20 967 914 917 980 64 20 787 874 891 858 96 20 730 836 847 844 2 100 372 356 360 355 4 100 492 425 434 392 8 100 533 537 529 538 16 100 548 572 568 598 32 100 499 520 527 537 64 100 466 517 526 512 96 100 406 497 506 509 The column "CS Load" represents the number of pause instructions issued in the locking critical section. A CS load of 1 is extremely short and is not likey in real situations. A load of 20 (moderate) and 100 (long) are more realistic. It can be seen that the previous patches in this series have reduced performance in general except in highly contended cases with moderate or long critical sections that performance improves a bit. This change is mostly caused by the "Prevent potential lock starvation" patch that reduce reader optimistic spinning and hence reduce reader fragmentation. The patch that further limit reader optimistic spinning doesn't seem to have too much impact on overall performance as shown in the benchmark data. The patch that disables reader optimistic spinning shows reduced performance at lightly loaded cases, but comparable or slightly better performance on with heavier contention. This patch just removes reader optimistic spinning for now. As readers are not going to do optimistic spinning anymore, we don't need to consider if the OSQ is empty or not when doing lock stealing. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Davidlohr Bueso <dbueso@suse.de> Link: https://lkml.kernel.org/r/20201121041416.12285-6-longman@redhat.com
-
Waiman Long authored
If the optimistic spinning queue is empty and the rwsem does not have the handoff or write-lock bits set, it is actually not necessary to call rwsem_optimistic_spin() to spin on it. Instead, it can steal the lock directly as its reader bias is in the count already. If it is the first reader in this state, it will try to wake up other readers in the wait queue. With this patch applied, the following were the lock event counts after rebooting a 2-socket system and a "make -j96" kernel rebuild. rwsem_opt_rlock=4437 rwsem_rlock=29 rwsem_rlock_steal=19 So lock stealing represents about 0.4% of all the read locks acquired in the slow path. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Davidlohr Bueso <dbueso@suse.de> Link: https://lkml.kernel.org/r/20201121041416.12285-4-longman@redhat.com
-
Waiman Long authored
The lock handoff bit is added in commit 4f23dbc1 ("locking/rwsem: Implement lock handoff to prevent lock starvation") to avoid lock starvation. However, allowing readers to do optimistic spinning does introduce an unlikely scenario where lock starvation can happen. The lock handoff bit may only be set when a waiter is being woken up. In the case of reader unlock, wakeup happens only when the reader count reaches 0. If there is a continuous stream of incoming readers acquiring read lock via optimistic spinning, it is possible that the reader count may never reach 0 and so the handoff bit will never be asserted. One way to prevent this scenario from happening is to disallow optimistic spinning if the rwsem is currently owned by readers. If the previous or current owner is a writer, optimistic spinning will be allowed. If the previous owner is a reader but the reader count has reached 0 before, a wakeup should have been issued. So the handoff mechanism will be kicked in to prevent lock starvation. As a result, it should be OK to do optimistic spinning in this case. This patch may have some impact on reader performance as it reduces reader optimistic spinning especially if the lock critical sections are short the number of contending readers are small. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Davidlohr Bueso <dbueso@suse.de> Link: https://lkml.kernel.org/r/20201121041416.12285-3-longman@redhat.com
-
Waiman Long authored
The atomic count value right after reader count increment can be useful to determine the rwsem state at trylock time. So the count value is passed down to rwsem_down_read_slowpath() to be used when appropriate. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Davidlohr Bueso <dbueso@suse.de> Link: https://lkml.kernel.org/r/20201121041416.12285-2-longman@redhat.com
-
Peter Zijlstra authored
There's a lot needless duplication in __down_{read,write}*(), cure that with a helper. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201207090243.GE3040@hirez.programming.kicks-ass.net
-
Peter Zijlstra authored
One copy of this logic is better than three. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201207090243.GE3040@hirez.programming.kicks-ass.net
-
Peter Zijlstra authored
All users of rwsem_read_trylock() do rwsem_set_reader_owned(sem) on success, move it into rwsem_read_trylock() proper. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201207090243.GE3040@hirez.programming.kicks-ass.net
-
Peter Zijlstra authored
-
Eric W. Biederman authored
In preparation for converting exec_update_mutex to a rwsem so that multiple readers can execute in parallel and not deadlock, add down_read_interruptible. This is needed for perf_event_open to be converted (with no semantic changes) from working on a mutex to wroking on a rwsem. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/87k0tybqfy.fsf@x220.int.ebiederm.org
-
Eric W. Biederman authored
In preparation for converting exec_update_mutex to a rwsem so that multiple readers can execute in parallel and not deadlock, add down_read_killable_nested. This is needed so that kcmp_lock can be converted from working on a mutexes to working on rw_semaphores. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/87o8jabqh3.fsf@x220.int.ebiederm.org
-
- 03 Dec, 2020 9 commits
-
-
Mauro Carvalho Chehab authored
The kernel-doc markup is wrong: it is asking the tool to document struct refcount_struct, instead of documenting typedef refcount_t. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/afb9bb1e675bf5f72a34a55d780779d7d5916b4c.1606823973.git.mchehab+huawei@kernel.org
-
Mauro Carvalho Chehab authored
Changeset cd8084f9 ("locking/lockdep: Apply crossrelease to completions") added a CONFIG_LOCKDEP_COMPLETE (that was later renamed to CONFIG_LOCKDEP_COMPLETIONS). Such changeset renamed the init_completion, and add a macro that would either run a modified version or the original code. However, such code reported too many false positives. So, it ended being dropped later on by changeset e966eaee ("locking/lockdep: Remove the cross-release locking checks"). Yet, the define remained there as just: #define init_completion(x) __init_completion(x) Get rid of the define, and return __init_completion() function to its original name. Fixes: e966eaee ("locking/lockdep: Remove the cross-release locking checks") Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/e657bfc533545c185b1c3c55926a449ead56a88b.1606823973.git.mchehab+huawei@kernel.org
-
Peter Zijlstra authored
Update the files list to include refcount.h and the Documentation/ Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
-
Peter Zijlstra authored
It's been superseded by Documentation/atomic_*.txt. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
-
Peter Zijlstra authored
More consistent naming should make it easier to untangle the _Generic token pasting maze called __seqprop(). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201110115358.GE2594@hirez.programming.kicks-ass.net
-
Boqun Feng authored
Add a self test case to test the behavior for the following case: lock(A); lock_nest_lock(C1, A); lock(B); lock_nest_lock(C2, A); This is a reproducer for a problem[1] reported by Chris Wilson, and is helpful to prevent this. [1]: https://lore.kernel.org/lkml/160390684819.31966.12048967113267928793@build.alporthouse.com/Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201102053743.450459-2-boqun.feng@gmail.com
-
Peter Zijlstra authored
The selftest nests rwlock_t inside raw_spinlock_t, this is invalid. Reported-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
-
Arnd Bergmann authored
When building with W=2, there is a flood of warnings about the seqlock macros shadowing local variables: 19806 linux/seqlock.h:331:11: warning: declaration of 'seq' shadows a previous local [-Wshadow] 48 linux/seqlock.h:348:11: warning: declaration of 'seq' shadows a previous local [-Wshadow] 8 linux/seqlock.h:379:11: warning: declaration of 'seq' shadows a previous local [-Wshadow] Prefix the local variables to make the warning useful elsewhere again. Fixes: 52ac39e5 ("seqlock: seqcount_t: Implement all read APIs as statement expressions") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201026165044.3722931-1-arnd@kernel.org
-
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2Linus Torvalds authored
Pull gfs2 fixes from Andreas Gruenbacher: "Various gfs2 fixes" * tag 'gfs2-v5.10-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: Fix deadlock between gfs2_{create_inode,inode_lookup} and delete_work_func gfs2: Upgrade shared glocks for atime updates gfs2: Don't freeze the file system during unmount gfs2: check for empty rgrp tree in gfs2_ri_update gfs2: set lockdep subclass for iopen glocks gfs2: Fix deadlock dumping resource group glocks
-
- 02 Dec, 2020 6 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linuxLinus Torvalds authored
Pull arm64 fixes from Will Deacon: "I'm sad to say that we've got an unusually large arm64 fixes pull for rc7 which addresses numerous significant instrumentation issues with our entry code. Without these patches, lockdep is hopelessly unreliable in some configurations [1,2] and syzkaller is therefore not a lot of use because it's so noisy. Although much of this has always been broken, it appears to have been exposed more readily by other changes such as 044d0d6d ("lockdep: Only trace IRQ edges") and general lockdep improvements around IRQ tracing and NMIs. Fixing this properly required moving much of the instrumentation hooks from our entry assembly into C, which Mark has been working on for the last few weeks. We're not quite ready to move to the recently added generic functions yet, but the code here has been deliberately written to mimic that closely so we can look at cleaning things up once we have a bit more breathing room. Having said all that, the second version of these patches was posted last week and I pushed it into our CI (kernelci and cki) along with a commit which forced on PROVE_LOCKING, NOHZ_FULL and CONTEXT_TRACKING_FORCE. The result? We found a real bug in the md/raid10 code [3]. Oh, and there's also a really silly typo patch that's unrelated. Summary: - Fix numerous issues with instrumentation and exception entry - Fix hideous typo in unused register field definition" [1] https://lore.kernel.org/r/CACT4Y+aAzoJ48Mh1wNYD17pJqyEcDnrxGfApir=-j171TnQXhw@mail.gmail.com [2] https://lore.kernel.org/r/20201119193819.GA2601289@elver.google.com [3] https://lore.kernel.org/r/94c76d5e-466a-bc5f-e6c2-a11b65c39f83@redhat.com * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: mte: Fix typo in macro definition arm64: entry: fix EL1 debug transitions arm64: entry: fix NMI {user, kernel}->kernel transitions arm64: entry: fix non-NMI kernel<->kernel transitions arm64: ptrace: prepare for EL1 irq/rcu tracking arm64: entry: fix non-NMI user<->kernel transitions arm64: entry: move el1 irq/nmi logic to C arm64: entry: prepare ret_to_user for function call arm64: entry: move enter_from_user_mode to entry-common.c arm64: entry: mark entry code as noinstr arm64: mark idle code as noinstr arm64: syscall: exit userspace before unmasking exceptions
-
git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds authored
Pull vdpa fixes from Michael Tsirkin: "A couple of fixes that surfaced at the last minute" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vhost_vdpa: return -EFAULT if copy_to_user() fails vdpa: mlx5: fix vdpa/vhost dependencies
-
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/soundLinus Torvalds authored
Pull sound fixes from Takashi Iwai: "Here are the pending sound fixes for 5.10: all small device-specific fixes, and nothing particular stands out, so far" * tag 'sound-5.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: hda/realtek: Add mute LED quirk to yet another HP x360 model ALSA: hda/realtek: Fix bass speaker DAC assignment on Asus Zephyrus G14 ALSA: hda/generic: Add option to enforce preferred_dacs pairs ALSA: usb-audio: US16x08: fix value count for level meters ALSA: hda/realtek - Add new codec supported for ALC897 ASoC: rt5682: change SAR voltage threshold ASoC: wm_adsp: fix error return code in wm_adsp_load() ALSA: hda/realtek: Enable headset of ASUS UX482EG & B9400CEA with ALC294 ASoC: qcom: Fix enabling BCLK and LRCLK in LPAIF invalid state ALSA: hda/realtek - Fixed Dell AIO wrong sound tone ASoC: Intel: bytcr_rt5640: Fix HP Pavilion x2 Detachable quirks
-
Linus Torvalds authored
Merge tag 'trace-v5.10-rc6-bootconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull bootconfig fixes from Steven Rostedt: "Have bootconfig size and checksum be little endian In case the bootconfig is created on one kind of endian machine, and then read on the other kind of endian kernel, the size and checksum will be incorrect. Instead, have both the size and checksum always be little endian and have the tool and the kernel convert it from little endian to or from the host endian" * tag 'trace-v5.10-rc6-bootconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: docs: bootconfig: Add the endianness of fields tools/bootconfig: Store size and checksum in footer as le32 bootconfig: Load size and checksum in the footer as le32
-
Dan Carpenter authored
The copy_to_user() function returns the number of bytes remaining to be copied but this should return -EFAULT to the user. Fixes: 1b48dc03 ("vhost: vdpa: report iova range") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/X8c32z5EtDsMyyIL@mwandaSigned-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
-
Randy Dunlap authored
drivers/vdpa/mlx5/ uses vhost_iotlb*() interfaces, so select VHOST_IOTLB to make them be built. However, if VHOST_IOTLB is the only VHOST symbol that is set/enabled, the object file still won't be built because drivers/Makefile won't descend into drivers/vhost/ to build it, so make drivers/Makefile build the needed binary whenever VHOST_IOTLB is set, like it does for VHOST_RING. Fixes these build errors: ERROR: modpost: "vhost_iotlb_itree_next" [drivers/vdpa/mlx5/mlx5_vdpa.ko] undefined! ERROR: modpost: "vhost_iotlb_itree_first" [drivers/vdpa/mlx5/mlx5_vdpa.ko] undefined! Fixes: 29064bfd ("vdpa/mlx5: Add support library for mlx5 VDPA implementation") Fixes: aff90770 ("vdpa/mlx5: Fix dependency on MLX5_CORE") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Eli Cohen <eli@mellanox.com> Cc: Parav Pandit <parav@mellanox.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: virtualization@lists.linux-foundation.org Cc: Saeed Mahameed <saeedm@nvidia.com> Cc: Leon Romanovsky <leonro@nvidia.com> Cc: netdev@vger.kernel.org Link: https://lore.kernel.org/r/20201128213905.27409-1-rdunlap@infradead.orgSigned-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
-
- 01 Dec, 2020 12 commits
-
-
git://git.samba.org/sfrench/cifs-2.6Linus Torvalds authored
Pull cifs fixes from Steve French: "Two smb3 fixes for stable" * tag '5.10-rc6-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: fix potential use-after-free in cifs_echo_request() cifs: allow syscalls to be restarted in __smb_send_rqst()
-
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-traceLinus Torvalds authored
Pull tracing fixes from Steven Rostedt: - Use correct timestamp variable for ring buffer write stamp update - Fix up before stamp and write stamp when crossing ring buffer sub buffers - Keep a zero delta in ring buffer in slow path if cmpxchg fails - Fix trace_printk static buffer for archs that care - Fix ftrace record accounting for ftrace ops with trampolines - Fix DYNAMIC_FTRACE_WITH_DIRECT_CALLS dependency - Remove WARN_ON in hwlat tracer that triggers on something that is OK - Make "my_tramp" trampoline in ftrace direct sample code global - Fixes in the bootconfig tool for better alignment management * tag 'trace-v5.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ring-buffer: Always check to put back before stamp when crossing pages ftrace: Fix DYNAMIC_FTRACE_WITH_DIRECT_CALLS dependency ftrace: Fix updating FTRACE_FL_TRAMP tracing: Fix alignment of static buffer tracing: Remove WARN_ON in start_thread() samples/ftrace: Mark my_tramp[12]? global ring-buffer: Set the right timestamp in the slow path of __rb_reserve_next() ring-buffer: Update write stamp with the correct ts docs: bootconfig: Update file format on initrd image tools/bootconfig: Align the bootconfig applied initrd image size to 4 tools/bootconfig: Fix to check the write failure correctly tools/bootconfig: Fix errno reference after printf()
-
git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds authored
Pull vhost fixes from Michael Tsirkin: "A couple of minor fixes" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vhost-vdpa: fix page pinning leakage in error path (rework) vringh: fix vringh_iov_push_*() documentation vhost scsi: fix lun reset completion handling
-
Masami Hiramatsu authored
Add a description about the endianness of the size and the checksum fields. Those must be stored as le32 instead of u32. This will allow us to apply bootconfig to the cross build initrd without caring the endianness. Link: https://lkml.kernel.org/r/160583936246.547349.10964204130590955409.stgit@devnote2Reported-by: Steven Rostedt <rostedt@goodmis.org> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
-
Masami Hiramatsu authored
Store the size and the checksum fields in the footer as le32 instead of u32. This will allow us to apply bootconfig to the cross build initrd without caring the endianness. Link: https://lkml.kernel.org/r/160583935332.547349.5897811300636587426.stgit@devnote2Reported-by: Steven Rostedt <rostedt@goodmis.org> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
-
Masami Hiramatsu authored
Load the size and the checksum fields in the footer as le32 instead of u32. This will allow us to apply bootconfig to the cross build initrd without caring the endianness. Link: https://lkml.kernel.org/r/160583934457.547349.10504070298990791074.stgit@devnote2Reported-by: Steven Rostedt <rostedt@goodmis.org> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
-
Steven Rostedt (VMware) authored
The current ring buffer logic checks to see if the updating of the event buffer was interrupted, and if it is, it will try to fix up the before stamp with the write stamp to make them equal again. This logic is flawed, because if it is not interrupted, the two are guaranteed to be different, as the current event just updated the before stamp before allocation. This guarantees that the next event (this one or another interrupting one) will think it interrupted the time updates of a previous event and inject an absolute time stamp to compensate. The correct logic is to always update the timestamps when traversing to a new sub buffer. Cc: stable@vger.kernel.org Fixes: a389d86f ("ring-buffer: Have nested events still record running time stamp") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
-
Naveen N. Rao authored
DYNAMIC_FTRACE_WITH_DIRECT_CALLS should depend on DYNAMIC_FTRACE_WITH_REGS since we need ftrace_regs_caller(). Link: https://lkml.kernel.org/r/fc4b257ea8689a36f086d2389a9ed989496ca63a.1606412433.git.naveen.n.rao@linux.vnet.ibm.com Cc: stable@vger.kernel.org Fixes: 763e34e7 ("ftrace: Add register_ftrace_direct()") Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
-
Naveen N. Rao authored
On powerpc, kprobe-direct.tc triggered FTRACE_WARN_ON() in ftrace_get_addr_new() followed by the below message: Bad trampoline accounting at: 000000004222522f (wake_up_process+0xc/0x20) (f0000001) The set of steps leading to this involved: - modprobe ftrace-direct-too - enable_probe - modprobe ftrace-direct - rmmod ftrace-direct <-- trigger The problem turned out to be that we were not updating flags in the ftrace record properly. From the above message about the trampoline accounting being bad, it can be seen that the ftrace record still has FTRACE_FL_TRAMP set though ftrace-direct module is going away. This happens because we are checking if any ftrace_ops has the FTRACE_FL_TRAMP flag set _before_ updating the filter hash. The fix for this is to look for any _other_ ftrace_ops that also needs FTRACE_FL_TRAMP. Link: https://lkml.kernel.org/r/56c113aa9c3e10c19144a36d9684c7882bf09af5.1606412433.git.naveen.n.rao@linux.vnet.ibm.com Cc: stable@vger.kernel.org Fixes: a124692b ("ftrace: Enable trampoline when rec count returns back to one") Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
-
Minchan Kim authored
With 5.9 kernel on ARM64, I found ftrace_dump output was broken but it had no problem with normal output "cat /sys/kernel/debug/tracing/trace". With investigation, it seems coping the data into temporal buffer seems to break the align binary printf expects if the static buffer is not aligned with 4-byte. IIUC, get_arg in bstr_printf expects that args has already right align to be decoded and seq_buf_bprintf says ``the arguments are saved in a 32bit word array that is defined by the format string constraints``. So if we don't keep the align under copy to temporal buffer, the output will be broken by shifting some bytes. This patch fixes it. Link: https://lkml.kernel.org/r/20201125225654.1618966-1-minchan@kernel.org Cc: <stable@vger.kernel.org> Fixes: 8e99cf91 ("tracing: Do not allocate buffer in trace_find_next_entry() in atomic") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
-
Vasily Averin authored
This patch reverts commit 978defee ("tracing: Do a WARN_ON() if start_thread() in hwlat is called when thread exists") .start hook can be legally called several times if according tracer is stopped screen window 1 [root@localhost ~]# echo 1 > /sys/kernel/tracing/events/kmem/kfree/enable [root@localhost ~]# echo 1 > /sys/kernel/tracing/options/pause-on-trace [root@localhost ~]# less -F /sys/kernel/tracing/trace screen window 2 [root@localhost ~]# cat /sys/kernel/debug/tracing/tracing_on 0 [root@localhost ~]# echo hwlat > /sys/kernel/debug/tracing/current_tracer [root@localhost ~]# echo 1 > /sys/kernel/debug/tracing/tracing_on [root@localhost ~]# cat /sys/kernel/debug/tracing/tracing_on 0 [root@localhost ~]# echo 2 > /sys/kernel/debug/tracing/tracing_on triggers warning in dmesg: WARNING: CPU: 3 PID: 1403 at kernel/trace/trace_hwlat.c:371 hwlat_tracer_start+0xc9/0xd0 Link: https://lkml.kernel.org/r/bd4d3e70-400d-9c82-7b73-a2d695e86b58@virtuozzo.com Cc: Ingo Molnar <mingo@redhat.com> Cc: stable@vger.kernel.org Fixes: 978defee ("tracing: Do a WARN_ON() if start_thread() in hwlat is called when thread exists") Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
-
Sami Tolvanen authored
my_tramp[12]? are declared as global functions in C, but they are not marked global in the inline assembly definition. This mismatch confuses Clang's Control-Flow Integrity checking. Fix the definitions by adding .globl. Link: https://lkml.kernel.org/r/20201113183414.1446671-1-samitolvanen@google.com Fixes: 9d907f1a ("ftrace/samples: Add a sample module that implements modify_ftrace_direct()") Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
-
- 30 Nov, 2020 3 commits
-
-
Andreas Gruenbacher authored
In gfs2_create_inode and gfs2_inode_lookup, make sure to cancel any pending delete work before taking the inode glock. Otherwise, gfs2_cancel_delete_work may block waiting for delete_work_func to complete, and delete_work_func may block trying to acquire the inode glock in gfs2_inode_lookup. Reported-by: Alexander Aring <aahringo@redhat.com> Fixes: a0e3cc65 ("gfs2: Turn gl_delete into a delayed work") Cc: stable@vger.kernel.org # v5.8+ Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
-
Paulo Alcantara authored
This patch fixes a potential use-after-free bug in cifs_echo_request(). For instance, thread 1 -------- cifs_demultiplex_thread() clean_demultiplex_info() kfree(server) thread 2 (workqueue) -------- apic_timer_interrupt() smp_apic_timer_interrupt() irq_exit() __do_softirq() run_timer_softirq() call_timer_fn() cifs_echo_request() <- use-after-free in server ptr Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> CC: Stable <stable@vger.kernel.org> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
-
Paulo Alcantara authored
A customer has reported that several files in their multi-threaded app were left with size of 0 because most of the read(2) calls returned -EINTR and they assumed no bytes were read. Obviously, they could have fixed it by simply retrying on -EINTR. We noticed that most of the -EINTR on read(2) were due to real-time signals sent by glibc to process wide credential changes (SIGRT_1), and its signal handler had been established with SA_RESTART, in which case those calls could have been automatically restarted by the kernel. Let the kernel decide to whether or not restart the syscalls when there is a signal pending in __smb_send_rqst() by returning -ERESTARTSYS. If it can't, it will return -EINTR anyway. Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> CC: Stable <stable@vger.kernel.org> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
-