- 18 Mar, 2018 12 commits
-
-
Alex Deucher authored
commit 545b0bcd upstream. Always set the graphics values to the max for the asic type. E.g., some 1 RB chips are actually 1 RB chips, others are actually harvested 2 RB chips. Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=99353Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alex Deucher authored
commit 0b58d90f upstream. Always set the graphics values to the max for the asic type. E.g., some 1 RB chips are actually 1 RB chips, others are actually harvested 2 RB chips. Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=99353Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Rex Zhu authored
commit 1bced75f upstream. it is required if a platform supports PCIe root complex core voltage reduction. After receiving this notification, SBIOS can apply default PCIe root complex power policy. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Rex Zhu <Rex.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Lukas Wunner authored
commit aa0aad57 upstream. amdgpu's ->runtime_suspend hook calls drm_kms_helper_poll_disable(), which waits for the output poll worker to finish if it's running. The output poll worker meanwhile calls pm_runtime_get_sync() in amdgpu's ->detect hooks, which waits for the ongoing suspend to finish, causing a deadlock. Fix by not acquiring a runtime PM ref if the ->detect hooks are called in the output poll worker's context. This is safe because the poll worker is only enabled while runtime active and we know that ->runtime_suspend waits for it to finish. Fixes: d38ceaf9 ("drm/amdgpu: add core driver (v4)") Cc: stable@vger.kernel.org # v4.2+: 27d4ee03: workqueue: Allow retrieval of current task's work struct Cc: stable@vger.kernel.org # v4.2+: 25c058cc: drm: Allow determining if current task is output poll worker Cc: Alex Deucher <alexander.deucher@amd.com> Tested-by: Mike Lothian <mike@fireburn.co.uk> Reviewed-by: Lyude Paul <lyude@redhat.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> Link: https://patchwork.freedesktop.org/patch/msgid/4c9bf72aacae1eef062bd134cd112e0770a7f121.1518338789.git.lukas@wunner.deSigned-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Lukas Wunner authored
commit 15734fef upstream. radeon's ->runtime_suspend hook calls drm_kms_helper_poll_disable(), which waits for the output poll worker to finish if it's running. The output poll worker meanwhile calls pm_runtime_get_sync() in radeon's ->detect hooks, which waits for the ongoing suspend to finish, causing a deadlock. Fix by not acquiring a runtime PM ref if the ->detect hooks are called in the output poll worker's context. This is safe because the poll worker is only enabled while runtime active and we know that ->runtime_suspend waits for it to finish. Stack trace for posterity: INFO: task kworker/0:3:31847 blocked for more than 120 seconds Workqueue: events output_poll_execute [drm_kms_helper] Call Trace: schedule+0x3c/0x90 rpm_resume+0x1e2/0x690 __pm_runtime_resume+0x3f/0x60 radeon_lvds_detect+0x39/0xf0 [radeon] output_poll_execute+0xda/0x1e0 [drm_kms_helper] process_one_work+0x14b/0x440 worker_thread+0x48/0x4a0 INFO: task kworker/2:0:10493 blocked for more than 120 seconds. Workqueue: pm pm_runtime_work Call Trace: schedule+0x3c/0x90 schedule_timeout+0x1b3/0x240 wait_for_common+0xc2/0x180 wait_for_completion+0x1d/0x20 flush_work+0xfc/0x1a0 __cancel_work_timer+0xa5/0x1d0 cancel_delayed_work_sync+0x13/0x20 drm_kms_helper_poll_disable+0x1f/0x30 [drm_kms_helper] radeon_pmops_runtime_suspend+0x3d/0xa0 [radeon] pci_pm_runtime_suspend+0x61/0x1a0 vga_switcheroo_runtime_suspend+0x21/0x70 __rpm_callback+0x32/0x70 rpm_callback+0x24/0x80 rpm_suspend+0x12b/0x640 pm_runtime_work+0x6f/0xb0 process_one_work+0x14b/0x440 worker_thread+0x48/0x4a0 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94147 Fixes: 10ebc0bc ("drm/radeon: add runtime PM support (v2)") Cc: stable@vger.kernel.org # v3.13+: 27d4ee03: workqueue: Allow retrieval of current task's work struct Cc: stable@vger.kernel.org # v3.13+: 25c058cc: drm: Allow determining if current task is output poll worker Cc: Ismo Toijala <ismo.toijala@gmail.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Dave Airlie <airlied@redhat.com> Reviewed-by: Lyude Paul <lyude@redhat.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> Link: https://patchwork.freedesktop.org/patch/msgid/64ea02c44f91dda19bc563902b97bbc699040392.1518338789.git.lukas@wunner.deSigned-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Lukas Wunner authored
commit d61a5c10 upstream. nouveau's ->runtime_suspend hook calls drm_kms_helper_poll_disable(), which waits for the output poll worker to finish if it's running. The output poll worker meanwhile calls pm_runtime_get_sync() in nouveau_connector_detect() which waits for the ongoing suspend to finish, causing a deadlock. Fix by not acquiring a runtime PM ref if nouveau_connector_detect() is called in the output poll worker's context. This is safe because the poll worker is only enabled while runtime active and we know that ->runtime_suspend waits for it to finish. Other contexts calling nouveau_connector_detect() do require a runtime PM ref, these comprise: status_store() drm sysfs interface ->fill_modes drm callback drm_fb_helper_probe_connector_modes() drm_mode_getconnector() nouveau_connector_hotplug() nouveau_display_hpd_work() nv17_tv_set_property() Stack trace for posterity: INFO: task kworker/0:1:58 blocked for more than 120 seconds. Workqueue: events output_poll_execute [drm_kms_helper] Call Trace: schedule+0x28/0x80 rpm_resume+0x107/0x6e0 __pm_runtime_resume+0x47/0x70 nouveau_connector_detect+0x7e/0x4a0 [nouveau] nouveau_connector_detect_lvds+0x132/0x180 [nouveau] drm_helper_probe_detect_ctx+0x85/0xd0 [drm_kms_helper] output_poll_execute+0x11e/0x1c0 [drm_kms_helper] process_one_work+0x184/0x380 worker_thread+0x2e/0x390 INFO: task kworker/0:2:252 blocked for more than 120 seconds. Workqueue: pm pm_runtime_work Call Trace: schedule+0x28/0x80 schedule_timeout+0x1e3/0x370 wait_for_completion+0x123/0x190 flush_work+0x142/0x1c0 nouveau_pmops_runtime_suspend+0x7e/0xd0 [nouveau] pci_pm_runtime_suspend+0x5c/0x180 vga_switcheroo_runtime_suspend+0x1e/0xa0 __rpm_callback+0xc1/0x200 rpm_callback+0x1f/0x70 rpm_suspend+0x13c/0x640 pm_runtime_work+0x6e/0x90 process_one_work+0x184/0x380 worker_thread+0x2e/0x390 Bugzilla: https://bugs.archlinux.org/task/53497 Bugzilla: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=870523 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70388#c33 Fixes: 5addcf0a ("nouveau: add runtime PM support (v0.9)") Cc: stable@vger.kernel.org # v3.12+: 27d4ee03: workqueue: Allow retrieval of current task's work struct Cc: stable@vger.kernel.org # v3.12+: 25c058cc: drm: Allow determining if current task is output poll worker Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Dave Airlie <airlied@redhat.com> Reviewed-by: Lyude Paul <lyude@redhat.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> Link: https://patchwork.freedesktop.org/patch/msgid/b7d2cbb609a80f59ccabfdf479b9d5907c603ea1.1518338789.git.lukas@wunner.deSigned-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Lukas Wunner authored
commit 25c058cc upstream. Introduce a helper to determine if the current task is an output poll worker. This allows us to fix a long-standing deadlock in several DRM drivers wherein the ->runtime_suspend callback waits for the output poll worker to finish and the worker in turn calls a ->detect callback which waits for runtime suspend to finish. The ->detect callback is invoked from multiple call sites and waiting for runtime suspend to finish is the correct thing to do except if it's executing in the context of the worker. v2: Expand kerneldoc to specifically mention deadlock between output poll worker and autosuspend worker as use case. (Lyude) Cc: Dave Airlie <airlied@redhat.com> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Lyude Paul <lyude@redhat.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> Link: https://patchwork.freedesktop.org/patch/msgid/3549ce32e7f1467102e70d3e9cbf70c46bfe108e.1518593424.git.lukas@wunner.deSigned-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Lukas Wunner authored
commit 27d4ee03 upstream. Introduce a helper to retrieve the current task's work struct if it is a workqueue worker. This allows us to fix a long-standing deadlock in several DRM drivers wherein the ->runtime_suspend callback waits for a specific worker to finish and that worker in turn calls a function which waits for runtime suspend to finish. That function is invoked from multiple call sites and waiting for runtime suspend to finish is the correct thing to do except if it's executing in the context of the worker. Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Alex Deucher <alexander.deucher@amd.com> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lyude Paul <lyude@redhat.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> Link: https://patchwork.freedesktop.org/patch/msgid/2d8f603074131eb87e588d2b803a71765bd3a2fd.1518338788.git.lukas@wunner.deSigned-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
himanshu.madhani@cavium.com authored
commit 1514839b upstream. This patch fixes NULL pointer crash due to active timer running for abort IOCB. From crash dump analysis it was discoverd that get_next_timer_interrupt() encountered a corrupted entry on the timer list. #9 [ffff95e1f6f0fd40] page_fault at ffffffff914fe8f8 [exception RIP: get_next_timer_interrupt+440] RIP: ffffffff90ea3088 RSP: ffff95e1f6f0fdf0 RFLAGS: 00010013 RAX: ffff95e1f6451028 RBX: 000218e2389e5f40 RCX: 00000001232ad600 RDX: 0000000000000001 RSI: ffff95e1f6f0fdf0 RDI: 0000000001232ad6 RBP: ffff95e1f6f0fe40 R8: ffff95e1f6451188 R9: 0000000000000001 R10: 0000000000000016 R11: 0000000000000016 R12: 00000001232ad5f6 R13: ffff95e1f6450000 R14: ffff95e1f6f0fdf8 R15: ffff95e1f6f0fe10 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 Looking at the assembly of get_next_timer_interrupt(), address came from %r8 (ffff95e1f6451188) which is pointing to list_head with single entry at ffff95e5ff621178. 0xffffffff90ea307a <get_next_timer_interrupt+426>: mov (%r8),%rdx 0xffffffff90ea307d <get_next_timer_interrupt+429>: cmp %r8,%rdx 0xffffffff90ea3080 <get_next_timer_interrupt+432>: je 0xffffffff90ea30a7 <get_next_timer_interrupt+471> 0xffffffff90ea3082 <get_next_timer_interrupt+434>: nopw 0x0(%rax,%rax,1) 0xffffffff90ea3088 <get_next_timer_interrupt+440>: testb $0x1,0x18(%rdx) crash> rd ffff95e1f6451188 10 ffff95e1f6451188: ffff95e5ff621178 ffff95e5ff621178 x.b.....x.b..... ffff95e1f6451198: ffff95e1f6451198 ffff95e1f6451198 ..E.......E..... ffff95e1f64511a8: ffff95e1f64511a8 ffff95e1f64511a8 ..E.......E..... ffff95e1f64511b8: ffff95e77cf509a0 ffff95e77cf509a0 ...|.......|.... ffff95e1f64511c8: ffff95e1f64511c8 ffff95e1f64511c8 ..E.......E..... crash> rd ffff95e5ff621178 10 ffff95e5ff621178: 0000000000000001 ffff95e15936aa00 ..........6Y.... ffff95e5ff621188: 0000000000000000 00000000ffffffff ................ ffff95e5ff621198: 00000000000000a0 0000000000000010 ................ ffff95e5ff6211a8: ffff95e5ff621198 000000000000000c ..b............. ffff95e5ff6211b8: 00000f5800000000 ffff95e751f8d720 ....X... ..Q.... ffff95e5ff621178 belongs to freed mempool object at ffff95e5ff621080. CACHE NAME OBJSIZE ALLOCATED TOTAL SLABS SSIZE ffff95dc7fd74d00 mnt_cache 384 19785 24948 594 16k SLAB MEMORY NODE TOTAL ALLOCATED FREE ffffdc5dabfd8800 ffff95e5ff620000 1 42 29 13 FREE / [ALLOCATED] ffff95e5ff621080 (cpu 6 cache) Examining the contents of that memory reveals a pointer to a constant string in the driver, "abort\0", which is set by qla24xx_async_abort_cmd(). crash> rd ffffffffc059277c 20 ffffffffc059277c: 6e490074726f6261 0074707572726574 abort.Interrupt. ffffffffc059278c: 00676e696c6c6f50 6920726576697244 Polling.Driver i ffffffffc059279c: 646f6d207325206e 6974736554000a65 n %s mode..Testi ffffffffc05927ac: 636976656420676e 786c252074612065 ng device at %lx ffffffffc05927bc: 6b63656843000a2e 646f727020676e69 ...Checking prod ffffffffc05927cc: 6f20444920746375 0a2e706968632066 uct ID of chip.. ffffffffc05927dc: 5120646e756f4600 204130303232414c .Found QLA2200A ffffffffc05927ec: 43000a2e70696843 20676e696b636568 Chip...Checking ffffffffc05927fc: 65786f626c69616d 6c636e69000a2e73 mailboxes...incl ffffffffc059280c: 756e696c2f656475 616d2d616d642f78 ude/linux/dma-ma crash> struct -ox srb_iocb struct srb_iocb { union { struct {...} logio; struct {...} els_logo; struct {...} tmf; struct {...} fxiocb; struct {...} abt; struct ct_arg ctarg; struct {...} mbx; struct {...} nack; [0x0 ] } u; [0xb8] struct timer_list timer; [0x108] void (*timeout)(void *); } SIZE: 0x110 crash> ! bc ibase=16 obase=10 B8+40 F8 The object is a srb_t, and at offset 0xf8 within that structure (i.e. ffff95e5ff621080 + f8 -> ffff95e5ff621178) is a struct timer_list. Cc: <stable@vger.kernel.org> #4.4+ Fixes: 4440e46d ("[SCSI] qla2xxx: Add IOCB Abort command asynchronous handling.") Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Leon Romanovsky authored
commit 28e9091e upstream. The user can provide very large cqe_size which will cause to integer overflow as it can be seen in the following UBSAN warning: ======================================================================= UBSAN: Undefined behaviour in drivers/infiniband/hw/mlx5/cq.c:1192:53 signed integer overflow: 64870 * 65536 cannot be represented in type 'int' CPU: 0 PID: 267 Comm: syzkaller605279 Not tainted 4.15.0+ #90 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 Call Trace: dump_stack+0xde/0x164 ? dma_virt_map_sg+0x22c/0x22c ubsan_epilogue+0xe/0x81 handle_overflow+0x1f3/0x251 ? __ubsan_handle_negate_overflow+0x19b/0x19b ? lock_acquire+0x440/0x440 mlx5_ib_resize_cq+0x17e7/0x1e40 ? cyc2ns_read_end+0x10/0x10 ? native_read_msr_safe+0x6c/0x9b ? cyc2ns_read_end+0x10/0x10 ? mlx5_ib_modify_cq+0x220/0x220 ? sched_clock_cpu+0x18/0x200 ? lookup_get_idr_uobject+0x200/0x200 ? rdma_lookup_get_uobject+0x145/0x2f0 ib_uverbs_resize_cq+0x207/0x3e0 ? ib_uverbs_ex_create_cq+0x250/0x250 ib_uverbs_write+0x7f9/0xef0 ? cyc2ns_read_end+0x10/0x10 ? print_irqtrace_events+0x280/0x280 ? ib_uverbs_ex_create_cq+0x250/0x250 ? uverbs_devnode+0x110/0x110 ? sched_clock_cpu+0x18/0x200 ? do_raw_spin_trylock+0x100/0x100 ? __lru_cache_add+0x16e/0x290 __vfs_write+0x10d/0x700 ? uverbs_devnode+0x110/0x110 ? kernel_read+0x170/0x170 ? sched_clock_cpu+0x18/0x200 ? security_file_permission+0x93/0x260 vfs_write+0x1b0/0x550 SyS_write+0xc7/0x1a0 ? SyS_read+0x1a0/0x1a0 ? trace_hardirqs_on_thunk+0x1a/0x1c entry_SYSCALL_64_fastpath+0x1e/0x8b RIP: 0033:0x433549 RSP: 002b:00007ffe63bd1ea8 EFLAGS: 00000217 ======================================================================= Cc: syzkaller <syzkaller@googlegroups.com> Cc: <stable@vger.kernel.org> # 3.13 Fixes: bde51583 ("IB/mlx5: Add support for resize CQ") Reported-by: Noa Osherovich <noaos@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Leon Romanovsky authored
commit a5880b84 upstream. The QP state is limited and declared in enum ib_qp_state, but ucma user was able to supply any possible (u32) value. Reported-by: syzbot+0df1ab766f8924b1edba@syzkaller.appspotmail.com Fixes: 75216638 ("RDMA/cma: Export rdma cm interface to userspace") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Leon Romanovsky authored
commit 6a21dfc0 upstream. Users of ucma are supposed to provide size of option level, in most paths it is supposed to be equal to u8 or u16, but it is not the case for the IB path record, where it can be multiple of struct ib_path_rec_data. This patch takes simplest possible approach and prevents providing values more than possible to allocate. Reported-by: syzbot+a38b0e9f694c379ca7ce@syzkaller.appspotmail.com Fixes: 7ce86409 ("RDMA/ucma: Allow user space to set service type") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 11 Mar, 2018 28 commits
-
-
Greg Kroah-Hartman authored
-
Ernesto A. Fernández authored
commit d7d82496 upstream. When changing a file's acl mask, btrfs_set_acl() will first set the group bits of i_mode to the value of the mask, and only then set the actual extended attribute representing the new acl. If the second part fails (due to lack of space, for example) and the file had no acl attribute to begin with, the system will from now on assume that the mask permission bits are actual group permission bits, potentially granting access to the wrong users. Prevent this by restoring the original mode bits if __btrfs_set_acl fails. Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Daniel Borkmann authored
[ upstream commit a493a87f ] Implement a retpoline [0] for the BPF tail call JIT'ing that converts the indirect jump via jmp %rax that is used to make the long jump into another JITed BPF image. Since this is subject to speculative execution, we need to control the transient instruction sequence here as well when CONFIG_RETPOLINE is set, and direct it into a pause + lfence loop. The latter aligns also with what gcc / clang emits (e.g. [1]). JIT dump after patch: # bpftool p d x i 1 0: (18) r2 = map[id:1] 2: (b7) r3 = 0 3: (85) call bpf_tail_call#12 4: (b7) r0 = 2 5: (95) exit With CONFIG_RETPOLINE: # bpftool p d j i 1 [...] 33: cmp %edx,0x24(%rsi) 36: jbe 0x0000000000000072 |* 38: mov 0x24(%rbp),%eax 3e: cmp $0x20,%eax 41: ja 0x0000000000000072 | 43: add $0x1,%eax 46: mov %eax,0x24(%rbp) 4c: mov 0x90(%rsi,%rdx,8),%rax 54: test %rax,%rax 57: je 0x0000000000000072 | 59: mov 0x28(%rax),%rax 5d: add $0x25,%rax 61: callq 0x000000000000006d |+ 66: pause | 68: lfence | 6b: jmp 0x0000000000000066 | 6d: mov %rax,(%rsp) | 71: retq | 72: mov $0x2,%eax [...] * relative fall-through jumps in error case + retpoline for indirect jump Without CONFIG_RETPOLINE: # bpftool p d j i 1 [...] 33: cmp %edx,0x24(%rsi) 36: jbe 0x0000000000000063 |* 38: mov 0x24(%rbp),%eax 3e: cmp $0x20,%eax 41: ja 0x0000000000000063 | 43: add $0x1,%eax 46: mov %eax,0x24(%rbp) 4c: mov 0x90(%rsi,%rdx,8),%rax 54: test %rax,%rax 57: je 0x0000000000000063 | 59: mov 0x28(%rax),%rax 5d: add $0x25,%rax 61: jmpq *%rax |- 63: mov $0x2,%eax [...] * relative fall-through jumps in error case - plain indirect jump as before [0] https://support.google.com/faqs/answer/7625886 [1] https://github.com/gcc-mirror/gcc/commit/a31e654fa107be968b802786d747e962c2fcdb2bSigned-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Mike Snitzer authored
commit feb7695f upstream. If only a subset of the devices associated with multiple regions support a given special operation (eg. DISCARD) then the dec_count() that is used to set error for the region must increment the io->count. Otherwise, when the dec_count() is called it can cause the dm-io caller's bio to be completed multiple times. As was reported against the dm-mirror target that had mirror legs with a mix of discard capabilities. Bug: https://bugzilla.kernel.org/show_bug.cgi?id=196077Reported-by: Zhang Yi <yizhan@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Dan Williams authored
commit 3968523f upstream. mpls_label_ok() validates that the 'platform_label' array index from a userspace netlink message payload is valid. Under speculation the mpls_label_ok() result may not resolve in the CPU pipeline until after the index is used to access an array element. Sanitize the index to zero to prevent userspace-controlled arbitrary out-of-bounds speculation, a precursor for a speculative execution side channel vulnerability. Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 4.4: - mpls_label_ok() doesn't take an extack parameter - Drop change in mpls_getroute()] Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
David Ahern authored
commit b7b386f4 upstream. mpls_route_add and mpls_route_del have the same checks on the label. Move to a helper. Avoid duplicate extack messages in the next patch. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alexey Kodanev authored
[ Upstream commit 07f2c7ab ] When SCTP makes INIT or INIT_ACK packet the total chunk length can exceed SCTP_MAX_CHUNK_LEN which leads to kernel panic when transmitting these packets, e.g. the crash on sending INIT_ACK: [ 597.804948] skbuff: skb_over_panic: text:00000000ffae06e4 len:120168 put:120156 head:000000007aa47635 data:00000000d991c2de tail:0x1d640 end:0xfec0 dev:<NULL> ... [ 597.976970] ------------[ cut here ]------------ [ 598.033408] kernel BUG at net/core/skbuff.c:104! [ 600.314841] Call Trace: [ 600.345829] <IRQ> [ 600.371639] ? sctp_packet_transmit+0x2095/0x26d0 [sctp] [ 600.436934] skb_put+0x16c/0x200 [ 600.477295] sctp_packet_transmit+0x2095/0x26d0 [sctp] [ 600.540630] ? sctp_packet_config+0x890/0x890 [sctp] [ 600.601781] ? __sctp_packet_append_chunk+0x3b4/0xd00 [sctp] [ 600.671356] ? sctp_cmp_addr_exact+0x3f/0x90 [sctp] [ 600.731482] sctp_outq_flush+0x663/0x30d0 [sctp] [ 600.788565] ? sctp_make_init+0xbf0/0xbf0 [sctp] [ 600.845555] ? sctp_check_transmitted+0x18f0/0x18f0 [sctp] [ 600.912945] ? sctp_outq_tail+0x631/0x9d0 [sctp] [ 600.969936] sctp_cmd_interpreter.isra.22+0x3be1/0x5cb0 [sctp] [ 601.041593] ? sctp_sf_do_5_1B_init+0x85f/0xc30 [sctp] [ 601.104837] ? sctp_generate_t1_cookie_event+0x20/0x20 [sctp] [ 601.175436] ? sctp_eat_data+0x1710/0x1710 [sctp] [ 601.233575] sctp_do_sm+0x182/0x560 [sctp] [ 601.284328] ? sctp_has_association+0x70/0x70 [sctp] [ 601.345586] ? sctp_rcv+0xef4/0x32f0 [sctp] [ 601.397478] ? sctp6_rcv+0xa/0x20 [sctp] ... Here the chunk size for INIT_ACK packet becomes too big, mostly because of the state cookie (INIT packet has large size with many address parameters), plus additional server parameters. Later this chunk causes the panic in skb_put_data(): skb_packet_transmit() sctp_packet_pack() skb_put_data(nskb, chunk->skb->data, chunk->skb->len); 'nskb' (head skb) was previously allocated with packet->size from u16 'chunk->chunk_hdr->length'. As suggested by Marcelo we should check the chunk's length in _sctp_make_chunk() before trying to allocate skb for it and discard a chunk if its size bigger than SCTP_MAX_CHUNK_LEN. Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leinter@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Julian Wiedmann authored
[ Upstream commit d22ffb5a ] If multiple IPA commands are build & sent out concurrently, fill_ipacmd_header() may assign a seqno value to a command that's different from what send_control_data() later assigns to this command's reply. This is due to other commands passing through send_control_data(), and incrementing card->seqno.ipa along the way. So one IPA command has no reply that's waiting for its seqno, while some other IPA command has multiple reply objects waiting for it. Only one of those waiting replies wins, and the other(s) times out and triggers a recovery via send_ipa_cmd(). Fix this by making sure that the same seqno value is assigned to a command and its reply object. Do so immediately before submitting the command & while holding the irq_pending "lock", to produce nicely ascending seqnos. As a side effect, *all* IPA commands now use a reply object that's waiting for its actual seqno. Previously, early IPA commands that were submitted while the card was still DOWN used the "catch-all" IDX seqno. Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Julian Wiedmann authored
[ Upstream commit 1c5b2216 ] send_control_data() applies some special handling to SETIP v4 IPA commands. But current code parses *all* command types for the SETIP command code. Limit the command code check to IPA commands. Fixes: 5b54e16f ("qeth: do not spin for SETIP ip assist command") Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alexey Kodanev authored
[ Upstream commit 957d761c ] When going through the bind address list in sctp_v6_get_dst() and the previously found address is better ('matchlen > bmatchlen'), the code continues to the next iteration without releasing currently held destination. Fix it by releasing 'bdst' before continue to the next iteration, and instead of introducing one more '!IS_ERR(bdst)' check for dst_release(), move the already existed one right after ip6_dst_lookup_flow(), i.e. we shouldn't proceed further if we get an error for the route lookup. Fixes: dbc2b5e9 ("sctp: fix src address selection if using secondary addresses for ipv6") Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Tommi Rantala authored
[ Upstream commit 4a31a6b1 ] Fix dst reference count leak in sctp_v4_get_dst() introduced in commit 410f0383 ("sctp: add routing output fallback"): When walking the address_list, successive ip_route_output_key() calls may return the same rt->dst with the reference incremented on each call. The code would not decrement the dst refcount when the dst pointer was identical from the previous iteration, causing the dst refcnt leak. Testcase: ip netns add TEST ip netns exec TEST ip link set lo up ip link add dummy0 type dummy ip link add dummy1 type dummy ip link add dummy2 type dummy ip link set dev dummy0 netns TEST ip link set dev dummy1 netns TEST ip link set dev dummy2 netns TEST ip netns exec TEST ip addr add 192.168.1.1/24 dev dummy0 ip netns exec TEST ip link set dummy0 up ip netns exec TEST ip addr add 192.168.1.2/24 dev dummy1 ip netns exec TEST ip link set dummy1 up ip netns exec TEST ip addr add 192.168.1.3/24 dev dummy2 ip netns exec TEST ip link set dummy2 up ip netns exec TEST sctp_test -H 192.168.1.2 -P 20002 -h 192.168.1.1 -p 20000 -s -B 192.168.1.3 ip netns del TEST In 4.4 and 4.9 kernels this results to: [ 354.179591] unregister_netdevice: waiting for lo to become free. Usage count = 1 [ 364.419674] unregister_netdevice: waiting for lo to become free. Usage count = 1 [ 374.663664] unregister_netdevice: waiting for lo to become free. Usage count = 1 [ 384.903717] unregister_netdevice: waiting for lo to become free. Usage count = 1 [ 395.143724] unregister_netdevice: waiting for lo to become free. Usage count = 1 [ 405.383645] unregister_netdevice: waiting for lo to become free. Usage count = 1 ... Fixes: 410f0383 ("sctp: add routing output fallback") Fixes: 0ca50d12 ("sctp: fix src address selection if using secondary addresses") Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alexey Kodanev authored
[ Upstream commit 15f35d49 ] Since UDP-Lite is always using checksum, the following path is triggered when calculating pseudo header for it: udp4_csum_init() or udp6_csum_init() skb_checksum_init_zero_check() __skb_checksum_validate_complete() The problem can appear if skb->len is less than CHECKSUM_BREAK. In this particular case __skb_checksum_validate_complete() also invokes __skb_checksum_complete(skb). If UDP-Lite is using partial checksum that covers only part of a packet, the function will return bad checksum and the packet will be dropped. It can be fixed if we skip skb_checksum_init_zero_check() and only set the required pseudo header checksum for UDP-Lite with partial checksum before udp4_csum_init()/udp6_csum_init() functions return. Fixes: ed70fcfc ("net: Call skb_checksum_init in IPv4") Fixes: e4f45b7f ("net: Call skb_checksum_init in IPv6") Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Guillaume Nault authored
[ Upstream commit 77f840e3 ] PPP units don't hold any reference on the channels connected to it. It is the channel's responsibility to ensure that it disconnects from its unit before being destroyed. In practice, this is ensured by ppp_unregister_channel() disconnecting the channel from the unit before dropping a reference on the channel. However, it is possible for an unregistered channel to connect to a PPP unit: register a channel with ppp_register_net_channel(), attach a /dev/ppp file to it with ioctl(PPPIOCATTCHAN), unregister the channel with ppp_unregister_channel() and finally connect the /dev/ppp file to a PPP unit with ioctl(PPPIOCCONNECT). Once in this situation, the channel is only held by the /dev/ppp file, which can be released at anytime and free the channel without letting the parent PPP unit know. Then the ppp structure ends up with dangling pointers in its ->channels list. Prevent this scenario by forbidding unregistered channels from connecting to PPP units. This maintains the code logic by keeping ppp_unregister_channel() responsible from disconnecting the channel if necessary and avoids modification on the reference counting mechanism. This issue seems to predate git history (successfully reproduced on Linux 2.6.26 and earlier PPP commits are unrelated). Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Nicolas Dichtel authored
[ Upstream commit cb9f7a9a ] Nowadays, nlmsg_multicast() returns only 0 or -ESRCH but this was not the case when commit 134e6375 was pushed. However, there was no reason to stop the loop if a netns does not have listeners. Returns -ESRCH only if there was no listeners in all netns. To avoid having the same problem in the future, I didn't take the assumption that nlmsg_multicast() returns only 0 or -ESRCH. Fixes: 134e6375 ("genetlink: make netns aware") CC: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sabrina Dubroca authored
[ Upstream commit c7272c2f ] According to RFC 1191 sections 3 and 4, ICMP frag-needed messages indicating an MTU below 68 should be rejected: A host MUST never reduce its estimate of the Path MTU below 68 octets. and (talking about ICMP frag-needed's Next-Hop MTU field): This field will never contain a value less than 68, since every router "must be able to forward a datagram of 68 octets without fragmentation". Furthermore, by letting net.ipv4.route.min_pmtu be set to negative values, we can end up with a very large PMTU when (-1) is cast into u32. Let's also make ip_rt_min_pmtu a u32, since it's only ever compared to unsigned ints. Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jakub Kicinski authored
[ Upstream commit ac5b7019 ] netif_set_real_num_tx_queues() can be called when netdev is up. That usually happens when user requests change of number of channels/rings with ethtool -L. The procedure for changing the number of queues involves resetting the qdiscs and setting dev->num_tx_queues to the new value. When the new value is lower than the old one, extra care has to be taken to ensure ordering of accesses to the number of queues vs qdisc reset. Currently the queues are reset before new dev->num_tx_queues is assigned, leaving a window of time where packets can be enqueued onto the queues going down, leading to a likely crash in the drivers, since most drivers don't check if TX skbs are assigned to an active queue. Fixes: e6484930 ("net: allocate tx queues in register_netdevice") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Arnd Bergmann authored
[ Upstream commit ca79bec2 ] gcc-8 has a new warning that detects overlapping input and output arguments in memcpy(). It triggers for sit_init_net() calling ipip6_tunnel_clone_6rd(), which is actually correct: net/ipv6/sit.c: In function 'sit_init_net': net/ipv6/sit.c:192:3: error: 'memcpy' source argument is the same as destination [-Werror=restrict] The problem here is that the logic detecting the memcpy() arguments finds them to be the same, but the conditional that tests for the input and output of ipip6_tunnel_clone_6rd() to be identical is not a compile-time constant. We know that netdev_priv(t->dev) is the same as t for a tunnel device, and comparing "dev" directly here lets the compiler figure out as well that 'dev == sitn->fb_tunnel_dev' when called from sit_init_net(), so it no longer warns. This code is old, so Cc stable to make sure that we don't get the warning for older kernels built with new gcc. Cc: Martin Sebor <msebor@gmail.com> Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83456Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Denis Du authored
[ Upstream commit b6c3bad1 ] Sometimes when physical lines have a just good noise to make the protocol handshaking fail, but the carrier detect still good. Then after remove of the noise, nobody will trigger this protocol to be start again to cause the link to never come back. The fix is when the carrier is still on, not terminate the protocol handshaking. Signed-off-by: Denis Du <dudenis2000@yahoo.ca> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Stefano Brivio authored
[ Upstream commit a8c6db1d ] In fib_nh_match(), if output interface or gateway are passed in the FIB configuration, we don't have to check next hops of multipath routes to conclude whether we have a match or not. However, we might still have routes with different realms matching the same output interface and gateway configuration, and this needs to cause the match to fail. Otherwise the first route inserted in the FIB will match, regardless of the realms: # ip route add 1.1.1.1 dev eth0 table 1234 realms 1/2 # ip route append 1.1.1.1 dev eth0 table 1234 realms 3/4 # ip route list table 1234 1.1.1.1 dev eth0 scope link realms 1/2 1.1.1.1 dev eth0 scope link realms 3/4 # ip route del 1.1.1.1 dev ens3 table 1234 realms 3/4 # ip route list table 1234 1.1.1.1 dev ens3 scope link realms 3/4 whereas route with realms 3/4 should have been deleted instead. Explicitly check for fc_flow passed in the FIB configuration (this comes from RTA_FLOW extracted by rtm_to_fib_config()) and fail matching if it differs from nh_tclassid. The handling of RTA_FLOW for multipath routes later in fib_nh_match() is still needed, as we can have multiple RTA_FLOW attributes that need to be matched against the tclassid of each next hop. v2: Check that fc_flow is set before discarding the match, so that the user can still select the first matching rule by not specifying any realm, as suggested by David Ahern. Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Xin Long authored
[ Upstream commit 1b12580a ] Now br_sysfs_if file flush doesn't have attr show. To read it will cause kernel panic after users chmod u+r this file. Xiong found this issue when running the commands: ip link add br0 type bridge ip link add type veth ip link set veth0 master br0 chmod u+r /sys/devices/virtual/net/veth0/brport/flush timeout 3 cat /sys/devices/virtual/net/veth0/brport/flush kernel crashed with NULL a pointer dereference call trace. This patch is to fix it by return -EINVAL when brport_attr->show is null, just the same as the check for brport_attr->store in brport_store(). Fixes: 9cf63747 ("bridge: add sysfs hook to flush forwarding table") Reported-by: Xiong Zhou <xzhou@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ben Hutchings authored
This reverts commit 20ac8f72, which was commit 2b83ff96 upstream. The bug that it should fix was only introduced in Linux 4.7, and in 4.4 it causes a regression. Reported-by: Jacek Anaszewski <jacek.anaszewski@gmail.com> Cc: Matthieu CASTET <matthieu.castet@parrot.com> Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Dan Carpenter authored
commit 9de29eac upstream. If i == ARRAY_SIZE(mitigation_options) then we accidentally print garbage from one space beyond the end of the mitigation_options[] array. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@suse.de> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: KarimAllah Ahmed <karahmed@amazon.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel-janitors@vger.kernel.org Fixes: 9005c683 ("x86/spectre: Simplify spectre_v2 command line parsing") Link: http://lkml.kernel.org/r/20180214071416.GA26677@mwandaSigned-off-by: Ingo Molnar <mingo@kernel.org> Cc: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Nathan Sullivan authored
commit 3b9b9536 upstream. Per the documentation, use scnprintf instead of sprintf to ensure there is never more than PAGE_SIZE bytes of trigger names put into the buffer. Signed-off-by: Nathan Sullivan <nathan.sullivan@ni.com> Signed-off-by: Zach Brown <zach.brown@ni.com> Signed-off-by: Jacek Anaszewski <j.anaszewski@samsung.com> Cc: Willy Tarreau <w@1wt.eu> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Thomas Gleixner authored
The backport of upstream commit 45d55e7b ("x86/apic/vector: Fix off by one in error path") missed to fixup the legacy interrupt data which is not longer available upstream. Handle legacy irq data correctly by clearing the legacy storage to prevent use after free. Fixes: 7fd13353 ("x86/apic/vector: Fix off by one in error path") - 4.4.y Fixes: c557481a ("x86/apic/vector: Fix off by one in error path") - 4.9.y Reported-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Adam Ford authored
commit 74402055 upstream. The pinmuxing was missing for I2C1 which was causing intermittent issues with the PMIC which is connected to I2C1. The bootloader did not quite configure the I2C1 either, so when running at 2.6MHz, it was generating errors at time. This correctly sets the I2C1 pinmuxing so it can operate at 2.6MHz Fixes: 687c2767 ("ARM: dts: Add minimal support for LogicPD Torpedo DM3730 devkit") Signed-off-by: Adam Ford <aford173@gmail.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jan Kara authored
commit b7f8a09f upstream. When new directory 'DIR1' is created in a directory 'DIR0' with SGID bit set, DIR1 is expected to have SGID bit set (and owning group equal to the owning group of 'DIR0'). However when 'DIR0' also has some default ACLs that 'DIR1' inherits, setting these ACLs will result in SGID bit on 'DIR1' to get cleared if user is not member of the owning group. Fix the problem by moving posix_acl_update_mode() out of __btrfs_set_acl() into btrfs_set_acl(). That way the function will not be called when inheriting ACLs which is what we want as it prevents SGID bit clearing and the mode has been properly set by posix_acl_create() anyway. Fixes: 07393101 CC: stable@vger.kernel.org CC: linux-btrfs@vger.kernel.org CC: David Sterba <dsterba@suse.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jiri Slaby authored
In 4.4.118, we have commit c8961332 (x86/syscall: Sanitize syscall table de-references under speculation), which is a backport of upstream commit 2fbd7af5. But it fixed only the C part of the upstream patch -- the IA32 sysentry. So it ommitted completely the assembly part -- the 64bit sysentry. Fix that in this patch by explicit array_index_mask_nospec written in assembly. The same was used in lib/getuser.S. However, to have "sbb" working properly, we have to switch from "cmp" against (NR_syscalls-1) to (NR_syscalls), otherwise the last syscall number would be "and"ed by 0. It is because the original "ja" relies on "CF" or "ZF", but we rely only on "CF" in "sbb". That means: switch to "jae" conditional jump too. Final note: use rcx for mask as this is exactly what is overwritten by the 4th syscall argument (r10) right after. Reported-by: Jan Beulich <JBeulich@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: kernel-hardening@lists.openwall.com Cc: gregkh@linuxfoundation.org Cc: Andy Lutomirski <luto@kernel.org> Cc: alan@linux.intel.com Cc: Jinpu Wang <jinpu.wang@profitbricks.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Wanpeng Li authored
commit b28676bb upstream. Reported by syzkaller: pte_list_remove: ffff9714eb1f8078 0->BUG ------------[ cut here ]------------ kernel BUG at arch/x86/kvm/mmu.c:1157! invalid opcode: 0000 [#1] SMP RIP: 0010:pte_list_remove+0x11b/0x120 [kvm] Call Trace: drop_spte+0x83/0xb0 [kvm] mmu_page_zap_pte+0xcc/0xe0 [kvm] kvm_mmu_prepare_zap_page+0x81/0x4a0 [kvm] kvm_mmu_invalidate_zap_all_pages+0x159/0x220 [kvm] kvm_arch_flush_shadow_all+0xe/0x10 [kvm] kvm_mmu_notifier_release+0x6c/0xa0 [kvm] ? kvm_mmu_notifier_release+0x5/0xa0 [kvm] __mmu_notifier_release+0x79/0x110 ? __mmu_notifier_release+0x5/0x110 exit_mmap+0x15a/0x170 ? do_exit+0x281/0xcb0 mmput+0x66/0x160 do_exit+0x2c9/0xcb0 ? __context_tracking_exit.part.5+0x4a/0x150 do_group_exit+0x50/0xd0 SyS_exit_group+0x14/0x20 do_syscall_64+0x73/0x1f0 entry_SYSCALL64_slow_path+0x25/0x25 The reason is that when creates new memslot, there is no guarantee for new memslot not overlap with private memslots. This can be triggered by the following program: #include <fcntl.h> #include <pthread.h> #include <setjmp.h> #include <signal.h> #include <stddef.h> #include <stdint.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/ioctl.h> #include <sys/stat.h> #include <sys/syscall.h> #include <sys/types.h> #include <unistd.h> #include <linux/kvm.h> long r[16]; int main() { void *p = valloc(0x4000); r[2] = open("/dev/kvm", 0); r[3] = ioctl(r[2], KVM_CREATE_VM, 0x0ul); uint64_t addr = 0xf000; ioctl(r[3], KVM_SET_IDENTITY_MAP_ADDR, &addr); r[6] = ioctl(r[3], KVM_CREATE_VCPU, 0x0ul); ioctl(r[3], KVM_SET_TSS_ADDR, 0x0ul); ioctl(r[6], KVM_RUN, 0); ioctl(r[6], KVM_RUN, 0); struct kvm_userspace_memory_region mr = { .slot = 0, .flags = KVM_MEM_LOG_DIRTY_PAGES, .guest_phys_addr = 0xf000, .memory_size = 0x4000, .userspace_addr = (uintptr_t) p }; ioctl(r[3], KVM_SET_USER_MEMORY_REGION, &mr); return 0; } This patch fixes the bug by not adding a new memslot even if it overlaps with private memslots. Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
-