- 01 Oct, 2020 40 commits
-
-
Zhuang Yanying authored
[ Upstream commit 7df003c8 ] We are testing Virtual Machine with KSM on v5.4-rc2 kernel, and found the zero_page refcount overflow. The cause of refcount overflow is increased in try_async_pf (get_user_page) without being decreased in mmu_set_spte() while handling ept violation. In kvm_release_pfn_clean(), only unreserved page will call put_page. However, zero page is reserved. So, as well as creating and destroy vm, the refcount of zero page will continue to increase until it overflows. step1: echo 10000 > /sys/kernel/pages_to_scan/pages_to_scan echo 1 > /sys/kernel/pages_to_scan/run echo 1 > /sys/kernel/pages_to_scan/use_zero_pages step2: just create several normal qemu kvm vms. And destroy it after 10s. Repeat this action all the time. After a long period of time, all domains hang because of the refcount of zero page overflow. Qemu print error log as follow: … error: kvm run failed Bad address EAX=00006cdc EBX=00000008 ECX=80202001 EDX=078bfbfd ESI=ffffffff EDI=00000000 EBP=00000008 ESP=00006cc4 EIP=000efd75 EFL=00010002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] CS =0008 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA] SS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] DS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] FS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] GS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy GDT= 000f7070 00000037 IDT= 000f70ae 00000000 CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000000 Code=00 01 00 00 00 e9 e8 00 00 00 c7 05 4c 55 0f 00 01 00 00 00 <8b> 35 00 00 01 00 8b 3d 04 00 01 00 b8 d8 d3 00 00 c1 e0 08 0c ea a3 00 00 01 00 c7 05 04 … Meanwhile, a kernel warning is departed. [40914.836375] WARNING: CPU: 3 PID: 82067 at ./include/linux/mm.h:987 try_get_page+0x1f/0x30 [40914.836412] CPU: 3 PID: 82067 Comm: CPU 0/KVM Kdump: loaded Tainted: G OE 5.2.0-rc2 #5 [40914.836415] RIP: 0010:try_get_page+0x1f/0x30 [40914.836417] Code: 40 00 c3 0f 1f 84 00 00 00 00 00 48 8b 47 08 a8 01 75 11 8b 47 34 85 c0 7e 10 f0 ff 47 34 b8 01 00 00 00 c3 48 8d 78 ff eb e9 <0f> 0b 31 c0 c3 66 90 66 2e 0f 1f 84 00 0 0 00 00 00 48 8b 47 08 a8 [40914.836418] RSP: 0018:ffffb4144e523988 EFLAGS: 00010286 [40914.836419] RAX: 0000000080000000 RBX: 0000000000000326 RCX: 0000000000000000 [40914.836420] RDX: 0000000000000000 RSI: 00004ffdeba10000 RDI: ffffdf07093f6440 [40914.836421] RBP: ffffdf07093f6440 R08: 800000424fd91225 R09: 0000000000000000 [40914.836421] R10: ffff9eb41bfeebb8 R11: 0000000000000000 R12: ffffdf06bbd1e8a8 [40914.836422] R13: 0000000000000080 R14: 800000424fd91225 R15: ffffdf07093f6440 [40914.836423] FS: 00007fb60ffff700(0000) GS:ffff9eb4802c0000(0000) knlGS:0000000000000000 [40914.836425] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [40914.836426] CR2: 0000000000000000 CR3: 0000002f220e6002 CR4: 00000000003626e0 [40914.836427] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [40914.836427] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [40914.836428] Call Trace: [40914.836433] follow_page_pte+0x302/0x47b [40914.836437] __get_user_pages+0xf1/0x7d0 [40914.836441] ? irq_work_queue+0x9/0x70 [40914.836443] get_user_pages_unlocked+0x13f/0x1e0 [40914.836469] __gfn_to_pfn_memslot+0x10e/0x400 [kvm] [40914.836486] try_async_pf+0x87/0x240 [kvm] [40914.836503] tdp_page_fault+0x139/0x270 [kvm] [40914.836523] kvm_mmu_page_fault+0x76/0x5e0 [kvm] [40914.836588] vcpu_enter_guest+0xb45/0x1570 [kvm] [40914.836632] kvm_arch_vcpu_ioctl_run+0x35d/0x580 [kvm] [40914.836645] kvm_vcpu_ioctl+0x26e/0x5d0 [kvm] [40914.836650] do_vfs_ioctl+0xa9/0x620 [40914.836653] ksys_ioctl+0x60/0x90 [40914.836654] __x64_sys_ioctl+0x16/0x20 [40914.836658] do_syscall_64+0x5b/0x180 [40914.836664] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [40914.836666] RIP: 0033:0x7fb61cb6bfc7 Signed-off-by: LinFeng <linfeng23@huawei.com> Signed-off-by: Zhuang Yanying <ann.zhuangyanying@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Hillf Danton authored
[ Upstream commit 2a154903 ] Prefetch channel before killing sock in order to fix UAF like BUG: KASAN: use-after-free in l2cap_sock_release+0x24c/0x290 net/bluetooth/l2cap_sock.c:1212 Read of size 8 at addr ffff8880944904a0 by task syz-fuzzer/9751 Reported-by: syzbot+c3c5bdea7863886115dc@syzkaller.appspotmail.com Fixes: 6c08fc89 ("Bluetooth: Fix refcount use-after-free issue") Cc: Manish Mandlik <mmandlik@google.com> Signed-off-by: Hillf Danton <hdanton@sina.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Steven Price authored
[ Upstream commit c02a9875 ] If walk_pte_range() is called with a 'end' argument that is beyond the last page of memory (e.g. ~0UL) then the comparison between 'addr' and 'end' will always fail and the loop will be infinite. Instead change the comparison to >= while accounting for overflow. Link: http://lkml.kernel.org/r/20191218162402.45610-15-steven.price@arm.comSigned-off-by: Steven Price <steven.price@arm.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Hogan <jhogan@kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: "Liang, Kan" <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Burton <paul.burton@mips.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will@kernel.org> Cc: Zong Li <zong.li@sifive.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Vasily Averin authored
[ Upstream commit 10c8d69f ] If seq_file .next fuction does not change position index, read after some lseek can generate unexpected output. In Aug 2018 NeilBrown noticed commit 1f4aace6 ("fs/seq_file.c: simplify seq_file iteration code and interface") "Some ->next functions do not increment *pos when they return NULL... Note that such ->next functions are buggy and should be fixed. A simple demonstration is dd if=/proc/swaps bs=1000 skip=1 Choose any block size larger than the size of /proc/swaps. This will always show the whole last line of /proc/swaps" Described problem is still actual. If you make lseek into middle of last output line following read will output end of last line and whole last line once again. $ dd if=/proc/swaps bs=1 # usual output Filename Type Size Used Priority /dev/dm-0 partition 4194812 97536 -2 104+0 records in 104+0 records out 104 bytes copied $ dd if=/proc/swaps bs=40 skip=1 # last line was generated twice dd: /proc/swaps: cannot skip to specified offset v/dm-0 partition 4194812 97536 -2 /dev/dm-0 partition 4194812 97536 -2 3+1 records in 3+1 records out 131 bytes copied https://bugzilla.kernel.org/show_bug.cgi?id=206283 Link: http://lkml.kernel.org/r/bd8cfd7b-ac95-9b91-f9e7-e8438bd5047d@virtuozzo.comSigned-off-by: Vasily Averin <vvs@virtuozzo.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Jann Horn <jannh@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Kees Cook <keescook@chromium.org> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Manish Mandlik authored
[ Upstream commit 6c08fc89 ] There is no lock preventing both l2cap_sock_release() and chan->ops->close() from running at the same time. If we consider Thread A running l2cap_chan_timeout() and Thread B running l2cap_sock_release(), expected behavior is: A::l2cap_chan_timeout()->l2cap_chan_close()->l2cap_sock_teardown_cb() A::l2cap_chan_timeout()->l2cap_sock_close_cb()->l2cap_sock_kill() B::l2cap_sock_release()->sock_orphan() B::l2cap_sock_release()->l2cap_sock_kill() where, sock_orphan() clears "sk->sk_socket" and l2cap_sock_teardown_cb() marks socket as SOCK_ZAPPED. In l2cap_sock_kill(), there is an "if-statement" that checks if both sock_orphan() and sock_teardown() has been run i.e. sk->sk_socket is NULL and socket is marked as SOCK_ZAPPED. Socket is killed if the condition is satisfied. In the race condition, following occurs: A::l2cap_chan_timeout()->l2cap_chan_close()->l2cap_sock_teardown_cb() B::l2cap_sock_release()->sock_orphan() B::l2cap_sock_release()->l2cap_sock_kill() A::l2cap_chan_timeout()->l2cap_sock_close_cb()->l2cap_sock_kill() In this scenario, "if-statement" is true in both B::l2cap_sock_kill() and A::l2cap_sock_kill() and we hit "refcount: underflow; use-after-free" bug. Similar condition occurs at other places where teardown/sock_kill is happening: l2cap_disconnect_rsp()->l2cap_chan_del()->l2cap_sock_teardown_cb() l2cap_disconnect_rsp()->l2cap_sock_close_cb()->l2cap_sock_kill() l2cap_conn_del()->l2cap_chan_del()->l2cap_sock_teardown_cb() l2cap_conn_del()->l2cap_sock_close_cb()->l2cap_sock_kill() l2cap_disconnect_req()->l2cap_chan_del()->l2cap_sock_teardown_cb() l2cap_disconnect_req()->l2cap_sock_close_cb()->l2cap_sock_kill() l2cap_sock_cleanup_listen()->l2cap_chan_close()->l2cap_sock_teardown_cb() l2cap_sock_cleanup_listen()->l2cap_sock_kill() Protect teardown/sock_kill and orphan/sock_kill by adding hold_lock on l2cap channel to ensure that the socket is killed only after marked as zapped and orphan. Signed-off-by: Manish Mandlik <mmandlik@google.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Doug Smythies authored
[ Upstream commit e749e09d ] Some syntax needs to be more rigorous for python 3. Backwards compatibility tested with python 2.7 Signed-off-by: Doug Smythies <dsmythies@telus.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Sven Schnelle authored
[ Upstream commit af4ddd60 ] test.d/ftrace/func-filter-glob.tc is failing on s390 because it has ARCH_INLINE_SPIN_LOCK and friends set to 'y'. So the usual __raw_spin_lock symbol isn't in the ftrace function list. Change '*aw*lock' to '*spin*lock' which would hopefully match some of the locking functions on all platforms. Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Jeff Layton authored
[ Upstream commit 9a6bed4f ] If the caller passes in a NULL cap_reservation, and we can't allocate one then ensure that we fail gracefully. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Mert Dirik authored
[ Upstream commit 5b362498 ] Add the required USB ID for running SMCWUSBT-G2 wireless adapter (SMC "EZ Connect g"). This device uses ar5523 chipset and requires firmware to be loaded. Even though pid of the device is 4507, this patch adds it as 4506 so that AR5523_DEVICE_UG macro can set the AR5523_FLAG_PRE_FIRMWARE flag for pid 4507. Signed-off-by: Mert Dirik <mertdirik@gmail.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Vincent Whitchurch authored
[ Upstream commit 40ff1ddb ] The stacktrace code can read beyond the stack size, when it attempts to read pt_regs from exception frames. This can happen on normal, non-corrupt stacks. Since the unwind information in the extable is not correct for function prologues, the unwinding code can return data from the stack which is not actually the caller function address, and if in_entry_text() happens to succeed on this value, we can end up reading data from outside the task's stack when attempting to read pt_regs, since there is no bounds check. Example: [<8010e729>] (unwind_backtrace) from [<8010a9c9>] (show_stack+0x11/0x14) [<8010a9c9>] (show_stack) from [<8057d8d7>] (dump_stack+0x87/0xac) [<8057d8d7>] (dump_stack) from [<8012271d>] (tasklet_action_common.constprop.4+0xa5/0xa8) [<8012271d>] (tasklet_action_common.constprop.4) from [<80102333>] (__do_softirq+0x11b/0x31c) [<80102333>] (__do_softirq) from [<80122485>] (irq_exit+0xad/0xd8) [<80122485>] (irq_exit) from [<8015f3d7>] (__handle_domain_irq+0x47/0x84) [<8015f3d7>] (__handle_domain_irq) from [<8036a523>] (gic_handle_irq+0x43/0x78) [<8036a523>] (gic_handle_irq) from [<80101a49>] (__irq_svc+0x69/0xb4) Exception stack(0xeb491f58 to 0xeb491fa0) 1f40: 7eb14794 00000000 1f60: ffffffff 008dd32c 008dd324 ffffffff 008dd314 0000002a 801011e4 eb490000 1f80: 0000002a 7eb1478c 50c5387d eb491fa8 80101001 8023d09c 40080033 ffffffff [<80101a49>] (__irq_svc) from [<8023d09c>] (do_pipe2+0x0/0xac) [<8023d09c>] (do_pipe2) from [<ffffffff>] (0xffffffff) Exception stack(0xeb491fc8 to 0xeb492010) 1fc0: 008dd314 0000002a 00511ad8 008de4c8 7eb14790 7eb1478c 1fe0: 00511e34 7eb14774 004c8557 76f44098 60080030 7eb14794 00000000 00000000 2000: 00000001 00000000 ea846c00 ea847cc0 In this example, the stack limit is 0xeb492000, but 16 bytes outside the stack have been read. Fix it by adding bounds checks. Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Josef Bacik authored
[ Upstream commit cbc3b92c ] I noticed when trying to use the trace-cmd python interface that reading the raw buffer wasn't working for kernel_stack events. This is because it uses a stubbed version of __dynamic_array that doesn't do the __data_loc trick and encode the length of the array into the field. Instead it just shows up as a size of 0. So change this to __array and set the len to FTRACE_STACK_ENTRIES since this is what we actually do in practice and matches how user_stack_trace works. Link: http://lkml.kernel.org/r/1411589652-1318-1-git-send-email-jbacik@fb.comSigned-off-by: Josef Bacik <jbacik@fb.com> [ Pulled from the archeological digging of my INBOX ] Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Maxim Mikityanskiy authored
[ Upstream commit 268d3636 ] Currently, kmemdup is applied to the firmware data, and it invokes kmalloc under the hood. The firmware size and patch_length are big (more than PAGE_SIZE), and on some low-end systems (like ASUS E202SA) kmalloc may fail to allocate a contiguous chunk under high memory usage and fragmentation: Bluetooth: hci0: RTL: examining hci_ver=06 hci_rev=000a lmp_ver=06 lmp_subver=8821 Bluetooth: hci0: RTL: rom_version status=0 version=1 Bluetooth: hci0: RTL: loading rtl_bt/rtl8821a_fw.bin kworker/u9:2: page allocation failure: order:4, mode:0x40cc0(GFP_KERNEL|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0 <stack trace follows> As firmware load happens on each resume, Bluetooth will stop working after several iterations, when the kernel fails to allocate an order-4 page. This patch replaces kmemdup with kvmalloc+memcpy. It's not required to have a contiguous chunk here, because it's not mapped to the device directly. Signed-off-by: Maxim Mikityanskiy <maxtram95@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Oliver O'Halloran authored
[ Upstream commit 4e0942c0 ] Many drivers don't check for errors when they get a 0xFFs response from an MMIO load. As a result after an EEH event occurs a driver can get stuck in a polling loop unless it some kind of internal timeout logic. Currently EEH tries to detect and report stuck drivers by dumping a stack trace after eeh_dev_check_failure() is called EEH_MAX_FAILS times on an already frozen PE. The value of EEH_MAX_FAILS was chosen so that a dump would occur every few seconds if the driver was spinning in a loop. This results in a lot of spurious stack traces in the kernel log. Fix this by limiting it to printing one stack trace for each PE freeze. If the driver is truely stuck the kernel's hung task detector is better suited to reporting the probelm anyway. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com> Tested-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191016012536.22588-1-oohall@gmail.comSigned-off-by: Sasha Levin <sashal@kernel.org>
-
Thomas Richter authored
[ Upstream commit 32dab682 ] Use kzalloc() to allocate auxiliary buffer structure initialized with all zeroes to avoid random value in trace output. Avoid double access to SBD hardware flags. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Matthias Fend authored
[ Upstream commit cc88525e ] Since the dma engine expects the burst length register content as power of 2 value, the burst length needs to be converted first. Additionally add a burst length range check to avoid corrupting unrelated register bits. Signed-off-by: Matthias Fend <matthias.fend@wolfvision.net> Link: https://lore.kernel.org/r/20200115102249.24398-1-matthias.fend@wolfvision.netSigned-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Bart Van Assche authored
[ Upstream commit eacf36f5 ] Starting execution of a command before tracing a command may cause the completion handler to free data while it is being traced. Fix this race by tracing a command before it is submitted. Cc: Bean Huo <beanhuo@micron.com> Cc: Can Guo <cang@codeaurora.org> Cc: Avri Altman <avri.altman@wdc.com> Cc: Stanley Chu <stanley.chu@mediatek.com> Cc: Tomas Winkler <tomas.winkler@intel.com> Link: https://lore.kernel.org/r/20191224220248.30138-5-bvanassche@acm.orgReviewed-by: Alim Akhtar <alim.akhtar@samsung.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Bart Van Assche authored
[ Upstream commit e4d2add7 ] Since the lrbp->cmd expression occurs multiple times, introduce a new local variable to hold that pointer. This patch does not change any functionality. Cc: Bean Huo <beanhuo@micron.com> Cc: Can Guo <cang@codeaurora.org> Cc: Avri Altman <avri.altman@wdc.com> Cc: Stanley Chu <stanley.chu@mediatek.com> Cc: Tomas Winkler <tomas.winkler@intel.com> Link: https://lore.kernel.org/r/20191224220248.30138-3-bvanassche@acm.orgReviewed-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Can Guo <cang@codeaurora.org> Reviewed-by: Alim Akhtar <alim.akhtar@samsung.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Rafael J. Wysocki authored
[ Upstream commit 3df663a1 ] There is a race condition in acpi_ec_get_query_handler() theoretically allowing query handlers to go away before refernce counting them. In order to avoid it, call kref_get() on query handlers under ec->mutex. Also simplify the code a bit while at it. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Kevin Kou authored
[ Upstream commit f643ee29 ] The original patch bringed in the "SCTP ACK tracking trace event" feature was committed at Dec.20, 2017, it replaced jprobe usage with trace events, and bringed in two trace events, one is TRACE_EVENT(sctp_probe), another one is TRACE_EVENT(sctp_probe_path). The original patch intended to trigger the trace_sctp_probe_path in TRACE_EVENT(sctp_probe) as below code, +TRACE_EVENT(sctp_probe, + + TP_PROTO(const struct sctp_endpoint *ep, + const struct sctp_association *asoc, + struct sctp_chunk *chunk), + + TP_ARGS(ep, asoc, chunk), + + TP_STRUCT__entry( + __field(__u64, asoc) + __field(__u32, mark) + __field(__u16, bind_port) + __field(__u16, peer_port) + __field(__u32, pathmtu) + __field(__u32, rwnd) + __field(__u16, unack_data) + ), + + TP_fast_assign( + struct sk_buff *skb = chunk->skb; + + __entry->asoc = (unsigned long)asoc; + __entry->mark = skb->mark; + __entry->bind_port = ep->base.bind_addr.port; + __entry->peer_port = asoc->peer.port; + __entry->pathmtu = asoc->pathmtu; + __entry->rwnd = asoc->peer.rwnd; + __entry->unack_data = asoc->unack_data; + + if (trace_sctp_probe_path_enabled()) { + struct sctp_transport *sp; + + list_for_each_entry(sp, &asoc->peer.transport_addr_list, + transports) { + trace_sctp_probe_path(sp, asoc); + } + } + ), But I found it did not work when I did testing, and trace_sctp_probe_path had no output, I finally found that there is trace buffer lock operation(trace_event_buffer_reserve) in include/trace/trace_events.h: static notrace void \ trace_event_raw_event_##call(void *__data, proto) \ { \ struct trace_event_file *trace_file = __data; \ struct trace_event_data_offsets_##call __maybe_unused __data_offsets;\ struct trace_event_buffer fbuffer; \ struct trace_event_raw_##call *entry; \ int __data_size; \ \ if (trace_trigger_soft_disabled(trace_file)) \ return; \ \ __data_size = trace_event_get_offsets_##call(&__data_offsets, args); \ \ entry = trace_event_buffer_reserve(&fbuffer, trace_file, \ sizeof(*entry) + __data_size); \ \ if (!entry) \ return; \ \ tstruct \ \ { assign; } \ \ trace_event_buffer_commit(&fbuffer); \ } The reason caused no output of trace_sctp_probe_path is that trace_sctp_probe_path written in TP_fast_assign part of TRACE_EVENT(sctp_probe), and it will be placed( { assign; } ) after the trace_event_buffer_reserve() when compiler expands Macro, entry = trace_event_buffer_reserve(&fbuffer, trace_file, \ sizeof(*entry) + __data_size); \ \ if (!entry) \ return; \ \ tstruct \ \ { assign; } \ so trace_sctp_probe_path finally can not acquire trace_event_buffer and return no output, that is to say the nest of tracepoint entry function is not allowed. The function call flow is: trace_sctp_probe() -> trace_event_raw_event_sctp_probe() -> lock buffer -> trace_sctp_probe_path() -> trace_event_raw_event_sctp_probe_path() --nested -> buffer has been locked and return no output. This patch is to remove trace_sctp_probe_path from the TP_fast_assign part of TRACE_EVENT(sctp_probe) to avoid the nest of entry function, and trigger sctp_probe_path_trace in sctp_outq_sack. After this patch, you can enable both events individually, # cd /sys/kernel/debug/tracing # echo 1 > events/sctp/sctp_probe/enable # echo 1 > events/sctp/sctp_probe_path/enable Or, you can enable all the events under sctp. # echo 1 > events/sctp/enable Signed-off-by: Kevin Kou <qdkevin.kou@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Nikhil Devshatwar authored
[ Upstream commit 6e72eab2 ] When setting DMA for video capture from CSI channel, if the DMA size is not given, it ends up writing as much data as sent by the camera. This may lead to overwriting the buffers causing memory corruption. Observed green lines on the default framebuffer. Restrict the DMA to maximum height as specified in the S_FMT ioctl. Signed-off-by: Nikhil Devshatwar <nikhil.nd@ti.com> Signed-off-by: Benoit Parrot <bparrot@ti.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Marco Elver authored
[ Upstream commit bf07132f ] This patch proposes to require marked atomic accesses surrounding raw_write_seqcount_barrier. We reason that otherwise there is no way to guarantee propagation nor atomicity of writes before/after the barrier [1]. For example, consider the compiler tears stores either before or after the barrier; in this case, readers may observe a partial value, and because readers are unaware that writes are going on (writes are not in a seq-writer critical section), will complete the seq-reader critical section while having observed some partial state. [1] https://lwn.net/Articles/793253/ This came up when designing and implementing KCSAN, because KCSAN would flag these accesses as data-races. After careful analysis, our reasoning as above led us to conclude that the best thing to do is to propose an amendment to the raw_seqcount_barrier usage. Signed-off-by: Marco Elver <elver@google.com> Acked-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Vasily Averin authored
[ Upstream commit 4fc427e0 ] if seq_file .next fuction does not change position index, read after some lseek can generate unexpected output. https://bugzilla.kernel.org/show_bug.cgi?id=206283Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Vasily Averin authored
[ Upstream commit a3ea8673 ] if seq_file .next fuction does not change position index, read after some lseek can generate unexpected output. https://bugzilla.kernel.org/show_bug.cgi?id=206283Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Vasily Averin authored
[ Upstream commit 1e3f9f07 ] if seq_file .next fuction does not change position index, read after some lseek can generate unexpected output. https://bugzilla.kernel.org/show_bug.cgi?id=206283Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Darrick J. Wong authored
[ Upstream commit b1de6fc7 ] Omar Sandoval reported that a 4G fallocate on the realtime device causes filesystem shutdowns due to a log reservation overflow that happens when we log the rtbitmap updates. Factor rtbitmap/rtsummary updates into the the tr_write and tr_itruncate log reservation calculation. "The following reproducer results in a transaction log overrun warning for me: mkfs.xfs -f -r rtdev=/dev/vdc -d rtinherit=1 -m reflink=0 /dev/vdb mount -o rtdev=/dev/vdc /dev/vdb /mnt fallocate -l 4G /mnt/foo Reported-by: Omar Sandoval <osandov@osandov.com> Tested-by: Omar Sandoval <osandov@osandov.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Miaohe Lin authored
[ Upstream commit 0bda9498 ] In kvm_vgic_dist_init() called from kvm_vgic_map_resources(), if dist->vgic_model is invalid, dist->spis will be freed without set dist->spis = NULL. And in vgicv2 resources clean up path, __kvm_vgic_destroy() will be called to free allocated resources. And dist->spis will be freed again in clean up chain because we forget to set dist->spis = NULL in kvm_vgic_dist_init() failed path. So double free would happen. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/1574923128-19956-1-git-send-email-linmiaohe@huawei.comSigned-off-by: Sasha Levin <sashal@kernel.org>
-
Joe Perches authored
[ Upstream commit 5e1aada0 ] Initialization is not guaranteed to zero padding bytes so use an explicit memset instead to avoid leaking any kernel content in any possible padding bytes. Link: http://lkml.kernel.org/r/dfa331c00881d61c8ee51577a082d8bebd61805c.camel@perches.comSigned-off-by: Joe Perches <joe@perches.com> Cc: Dan Carpenter <error27@gmail.com> Cc: Julia Lawall <julia.lawall@lip6.fr> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Tzung-Bi Shih authored
[ Upstream commit acb874a7 ] It was observed Baytrail-based chromebooks could cause continuous PLL unlocked when using playback stream and capture stream simultaneously. Specifically, starting a capture stream after started a playback stream. As a result, the audio data could corrupt or turn completely silent. As the datasheet suggested, the maximum PLL lock time should be 7 msec. The workaround resets the codec softly by toggling SHDN off and on if PLL failed to lock for 10 msec. Notably, there is no suggested hold time for SHDN off. On Baytrail-based chromebooks, it would easily happen continuous PLL unlocked if there is a 10 msec delay between SHDN off and on. Removes the msleep(). Signed-off-by: Tzung-Bi Shih <tzungbi@google.com> Link: https://lore.kernel.org/r/20191122073114.219945-2-tzungbi@google.comReviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Pavel Shilovsky authored
[ Upstream commit 9bd45408 ] Currenly we doesn't assume that a server may break a lease from RWH to RW which causes us setting a wrong lease state on a file and thus mistakenly flushing data and byte-range locks and purging cached data on the client. This leads to performance degradation because subsequent IOs go directly to the server. Fix this by propagating new lease state and epoch values to the oplock break handler through cifsFileInfo structure and removing the use of cifsInodeInfo flags for that. It allows to avoid some races of several lease/oplock breaks using those flags in parallel. Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Kusanagi Kouichi authored
[ Upstream commit 4250b047 ] If DEBUG_FS=n, compile fails with the following error: kernel/trace/trace.c: In function 'tracing_init_dentry': kernel/trace/trace.c:8658:9: error: passing argument 3 of 'debugfs_create_automount' from incompatible pointer type [-Werror=incompatible-pointer-types] 8658 | trace_automount, NULL); | ^~~~~~~~~~~~~~~ | | | struct vfsmount * (*)(struct dentry *, void *) In file included from kernel/trace/trace.c:24: ./include/linux/debugfs.h:206:25: note: expected 'struct vfsmount * (*)(void *)' but argument is of type 'struct vfsmount * (*)(struct dentry *, void *)' 206 | struct vfsmount *(*f)(void *), | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~ Signed-off-by: Kusanagi Kouichi <slash@ac.auone-net.jp> Link: https://lore.kernel.org/r/20191121102021787.MLMY.25002.ppp.dion.ne.jp@dmta0003.auone-net.jpSigned-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
peter chang authored
[ Upstream commit 51c1c5f6 ] Added the fix so the if driver properly sent the abort it tries to remove it from the firmware's list of outstanding commands regardless of the abort status. This means that the task gets freed 'now' rather than possibly getting freed later when the scsi layer thinks it's leaked but still valid. Link: https://lore.kernel.org/r/20191114100910.6153-10-deepak.ukey@microchip.comAcked-by: Jack Wang <jinpu.wang@cloud.ionos.com> Signed-off-by: peter chang <dpf@google.com> Signed-off-by: Deepak Ukey <deepak.ukey@microchip.com> Signed-off-by: Viswas G <Viswas.G@microchip.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Bob Peterson authored
[ Upstream commit 2c47c1be ] Before this patch, gfs2_create_inode had a use-after-free for the iopen glock in some error paths because it did this: gfs2_glock_put(io_gl); fail_gunlock2: if (io_gl) clear_bit(GLF_INODE_CREATING, &io_gl->gl_flags); In some cases, the io_gl was used for create and only had one reference, so the glock might be freed before the clear_bit(). This patch tries to straighten it out by only jumping to the error paths where iopen is properly set, and moving the gfs2_glock_put after the clear_bit. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Bradley Bolen authored
[ Upstream commit f3d7c229 ] With large eMMC cards, it is possible to create general purpose partitions that are bigger than 4GB. The size member of the mmc_part struct is only an unsigned int which overflows for gp partitions larger than 4GB. Change this to a u64 to handle the overflow. Signed-off-by: Bradley Bolen <bradleybolen@gmail.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Sascha Hauer authored
[ Upstream commit f9c34bb5 ] When a new fastmap is about to be written UBI must make sure it has a free block for a fastmap anchor available. For this ubi_update_fastmap() calls ubi_ensure_anchor_pebs(). This stopped working with 2e8f08de ("ubi: Fix races around ubi_refill_pools()"), with this commit the wear leveling code is blocked and can no longer produce free PEBs. UBI then more often than not falls back to write the new fastmap anchor to the same block it was already on which means the same erase block gets erased during each fastmap write and wears out quite fast. As the locking prevents us from producing the anchor PEB when we actually need it, this patch changes the strategy for creating the anchor PEB. We no longer create it on demand right before we want to write a fastmap, but instead we create an anchor PEB right after we have written a fastmap. This gives us enough time to produce a new anchor PEB before it is needed. To make sure we have an anchor PEB for the very first fastmap write we call ubi_ensure_anchor_pebs() during initialisation as well. Fixes: 2e8f08de ("ubi: Fix races around ubi_refill_pools()") Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Christophe JAILLET authored
[ Upstream commit 9067f2f0 ] We should jump to fail3 in order to undo the 'xa_insert_irq()' call. Link: https://lore.kernel.org/r/20190923190746.10964-1-christophe.jaillet@wanadoo.frSigned-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Brian Foster authored
[ Upstream commit 2a2b5932 ] The leaf format xattr addition helper xfs_attr3_leaf_add_work() adjusts the block freemap in a couple places. The first update drops the size of the freemap that the caller had already selected to place the xattr name/value data. Before the function returns, it also checks whether the entries array has encroached on a freemap range by virtue of the new entry addition. This is necessary because the entries array grows from the start of the block (but end of the block header) towards the end of the block while the name/value data grows from the end of the block in the opposite direction. If the associated freemap is already empty, however, size is zero and the subtraction underflows the field and causes corruption. This is reproduced rarely by generic/070. The observed behavior is that a smaller sized freemap is aligned to the end of the entries list, several subsequent xattr additions land in larger freemaps and the entries list expands into the smaller freemap until it is fully consumed and then underflows. Note that it is not otherwise a corruption for the entries array to consume an empty freemap because the nameval list (i.e. the firstused pointer in the xattr header) starts beyond the end of the corrupted freemap. Update the freemap size modification to account for the fact that the freemap entry can be empty and thus stale. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Al Viro authored
[ Upstream commit e8400933 ] We are overoptimistic about taking the fast path there; seeing the same value in ->d_parent after having grabbed a reference to that parent does *not* mean that it has remained our parent all along. That wouldn't be a big deal (in the end it is our parent and we have grabbed the reference we are about to return), but... the situation with barriers is messed up. We might have hit the following sequence: d is a dentry of /tmp/a/b CPU1: CPU2: parent = d->d_parent (i.e. dentry of /tmp/a) rename /tmp/a/b to /tmp/b rmdir /tmp/a, making its dentry negative grab reference to parent, end up with cached parent->d_inode (NULL) mkdir /tmp/a, rename /tmp/b to /tmp/a/b recheck d->d_parent, which is back to original decide that everything's fine and return the reference we'd got. The trouble is, caller (on CPU1) will observe dget_parent() returning an apparently negative dentry. It actually is positive, but CPU1 has stale ->d_inode cached. Use d->d_seq to see if it has been moved instead of rechecking ->d_parent. NOTE: we are *NOT* going to retry on any kind of ->d_seq mismatch; we just go into the slow path in such case. We don't wait for ->d_seq to become even either - again, if we are racing with renames, we can bloody well go to slow path anyway. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Pan Bian authored
[ Upstream commit da046d5f ] Release variable dst after logging dst->error to avoid possible use after free. Link: https://lore.kernel.org/r/1573022651-37171-1-git-send-email-bianpan2016@163.comSigned-off-by: Pan Bian <bianpan2016@163.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Pan Bian authored
[ Upstream commit 960657b7 ] Move the release operation after error log to avoid possible use after free. Link: https://lore.kernel.org/r/1573021434-18768-1-git-send-email-bianpan2016@163.comSigned-off-by: Pan Bian <bianpan2016@163.com> Acked-by: Michal Kalderon <michal.kalderon@marvell.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Satendra Singh Thakur authored
[ Upstream commit 1ff95243 ] When devm_request_irq fails, currently, the function dma_async_device_unregister gets called. This doesn't free the resources allocated by of_dma_controller_register. Therefore, we have called of_dma_controller_free for this purpose. Signed-off-by: Satendra Singh Thakur <sst2005@gmail.com> Link: https://lore.kernel.org/r/20191109113523.6067-1-sst2005@gmail.comSigned-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
-