- 11 Feb, 2020 40 commits
-
-
Bart Van Assche authored
commit 3f5f7335 upstream. Since qla82xx_get_fw_size() returns a number in CPU-endian format, change its return type from __le32 into u32. This patch does not change any functionality. Fixes: 9c2b2975 ("[SCSI] qla2xxx: Support for loading Unified ROM Image (URI) format firmware file.") Cc: Himanshu Madhani <hmadhani@marvell.com> Cc: Quinn Tran <qutran@marvell.com> Cc: Martin Wilck <mwilck@suse.com> Cc: Daniel Wagner <dwagner@suse.de> Cc: Roman Bolshakov <r.bolshakov@yadro.com> Link: https://lore.kernel.org/r/20191219004905.39586-1-bvanassche@acm.orgReviewed-by: Daniel Wagner <dwagner@suse.de> Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Erdem Aktas authored
commit 264b0d2b upstream. CONFIG_VIRTUALIZATION may not be enabled for memory encrypted guests. If disabled, decrypted per-CPU variables may end up sharing the same page with variables that should be left encrypted. Always separate per-CPU variables that should be decrypted into their own page anytime memory encryption can be enabled in the guest rather than rely on any other config option that may not be enabled. Fixes: ac26963a ("percpu: Introduce DEFINE_PER_CPU_DECRYPTED") Cc: stable@vger.kernel.org # 4.15+ Signed-off-by: Erdem Aktas <erdemaktas@google.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Dennis Zhou <dennis@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Lyude Paul authored
commit 58fe03d6 upstream. Disabling a display on MST can potentially happen after the entire MST topology has been removed, which means that we can't communicate with the topology at all in this scenario. Likewise, this also means that we can't properly update payloads on the topology and as such, it's a good idea to ignore payload update failures when disabling displays. Currently, amdgpu makes the mistake of halting the payload update process when any payload update failures occur, resulting in leaving DC's local copies of the payload tables out of date. This ends up causing problems with hotplugging MST topologies, and causes modesets on the second hotplug to fail like so: [drm] Failed to updateMST allocation table forpipe idx:1 ------------[ cut here ]------------ WARNING: CPU: 5 PID: 1511 at drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_link.c:2677 update_mst_stream_alloc_table+0x11e/0x130 [amdgpu] Modules linked in: cdc_ether usbnet fuse xt_conntrack nf_conntrack nf_defrag_ipv6 libcrc32c nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 nft_counter nft_compat nf_tables nfnetlink tun bridge stp llc sunrpc vfat fat wmi_bmof uvcvideo snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi videobuf2_vmalloc snd_hda_intel videobuf2_memops videobuf2_v4l2 snd_intel_dspcfg videobuf2_common crct10dif_pclmul snd_hda_codec videodev crc32_pclmul snd_hwdep snd_hda_core ghash_clmulni_intel snd_seq mc joydev pcspkr snd_seq_device snd_pcm sp5100_tco k10temp i2c_piix4 snd_timer thinkpad_acpi ledtrig_audio snd wmi soundcore video i2c_scmi acpi_cpufreq ip_tables amdgpu(O) rtsx_pci_sdmmc amd_iommu_v2 gpu_sched mmc_core i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec drm crc32c_intel serio_raw hid_multitouch r8152 mii nvme r8169 nvme_core rtsx_pci pinctrl_amd CPU: 5 PID: 1511 Comm: gnome-shell Tainted: G O 5.5.0-rc7Lyude-Test+ #4 Hardware name: LENOVO FA495SIT26/FA495SIT26, BIOS R12ET22W(0.22 ) 01/31/2019 RIP: 0010:update_mst_stream_alloc_table+0x11e/0x130 [amdgpu] Code: 28 00 00 00 75 2b 48 8d 65 e0 5b 41 5c 41 5d 41 5e 5d c3 0f b6 06 49 89 1c 24 41 88 44 24 08 0f b6 46 01 41 88 44 24 09 eb 93 <0f> 0b e9 2f ff ff ff e8 a6 82 a3 c2 66 0f 1f 44 00 00 0f 1f 44 00 RSP: 0018:ffffac428127f5b0 EFLAGS: 00010202 RAX: 0000000000000002 RBX: ffff8d1e166eee80 RCX: 0000000000000000 RDX: ffffac428127f668 RSI: ffff8d1e166eee80 RDI: ffffac428127f610 RBP: ffffac428127f640 R08: ffffffffc03d94a8 R09: 0000000000000000 R10: ffff8d1e24b02000 R11: ffffac428127f5b0 R12: ffff8d1e1b83d000 R13: ffff8d1e1bea0b08 R14: 0000000000000002 R15: 0000000000000002 FS: 00007fab23ffcd80(0000) GS:ffff8d1e28b40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f151f1711e8 CR3: 00000005997c0000 CR4: 00000000003406e0 Call Trace: ? mutex_lock+0xe/0x30 dc_link_allocate_mst_payload+0x9a/0x210 [amdgpu] ? dm_read_reg_func+0x39/0xb0 [amdgpu] ? core_link_enable_stream+0x656/0x730 [amdgpu] core_link_enable_stream+0x656/0x730 [amdgpu] dce110_apply_ctx_to_hw+0x58e/0x5d0 [amdgpu] ? dcn10_verify_allow_pstate_change_high+0x1d/0x280 [amdgpu] ? dcn10_wait_for_mpcc_disconnect+0x3c/0x130 [amdgpu] dc_commit_state+0x292/0x770 [amdgpu] ? add_timer+0x101/0x1f0 ? ttm_bo_put+0x1a1/0x2f0 [ttm] amdgpu_dm_atomic_commit_tail+0xb59/0x1ff0 [amdgpu] ? amdgpu_move_blit.constprop.0+0xb8/0x1f0 [amdgpu] ? amdgpu_bo_move+0x16d/0x2b0 [amdgpu] ? ttm_bo_handle_move_mem+0x118/0x570 [ttm] ? ttm_bo_validate+0x134/0x150 [ttm] ? dm_plane_helper_prepare_fb+0x1b9/0x2a0 [amdgpu] ? _cond_resched+0x15/0x30 ? wait_for_completion_timeout+0x38/0x160 ? _cond_resched+0x15/0x30 ? wait_for_completion_interruptible+0x33/0x190 commit_tail+0x94/0x130 [drm_kms_helper] drm_atomic_helper_commit+0x113/0x140 [drm_kms_helper] drm_atomic_helper_set_config+0x70/0xb0 [drm_kms_helper] drm_mode_setcrtc+0x194/0x6a0 [drm] ? _cond_resched+0x15/0x30 ? mutex_lock+0xe/0x30 ? drm_mode_getcrtc+0x180/0x180 [drm] drm_ioctl_kernel+0xaa/0xf0 [drm] drm_ioctl+0x208/0x390 [drm] ? drm_mode_getcrtc+0x180/0x180 [drm] amdgpu_drm_ioctl+0x49/0x80 [amdgpu] do_vfs_ioctl+0x458/0x6d0 ksys_ioctl+0x5e/0x90 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x55/0x1b0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7fab2121f87b Code: 0f 1e fa 48 8b 05 0d 96 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d dd 95 2c 00 f7 d8 64 89 01 48 RSP: 002b:00007ffd045f9068 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007ffd045f90a0 RCX: 00007fab2121f87b RDX: 00007ffd045f90a0 RSI: 00000000c06864a2 RDI: 000000000000000b RBP: 00007ffd045f90a0 R08: 0000000000000000 R09: 000055dbd2985d10 R10: 000055dbd2196280 R11: 0000000000000246 R12: 00000000c06864a2 R13: 000000000000000b R14: 0000000000000000 R15: 000055dbd2196280 ---[ end trace 6ea888c24d2059cd ]--- Note as well, I have only been able to reproduce this on setups with 2 MST displays. Changes since v1: * Don't return false when part 1 or part 2 of updating the payloads fails, we don't want to abort at any step of the process even if things fail Reviewed-by: Mikita Lipski <Mikita.Lipski@amd.com> Signed-off-by: Lyude Paul <lyude@redhat.com> Acked-by: Harry Wentland <harry.wentland@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Stephen Warren authored
commit bf83b96f upstream. For a little over a year, U-Boot on Tegra124 has configured the flow controller to perform automatic RAM re-repair on off->on power transitions of the CPU rail[1]. This is mandatory for correct operation of Tegra124. However, RAM re-repair relies on certain clocks, which the kernel must enable and leave running. The fuse clock is one of those clocks. Mark this clock as critical so that LP1 power mode (system suspend) operates correctly. [1] 3cc7942a4ae5 ARM: tegra: implement RAM repair Reported-by: Jonathan Hunter <jonathanh@nvidia.com> Cc: stable@vger.kernel.org Signed-off-by: Stephen Warren <swarren@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Christian Borntraeger authored
commit 55680890 upstream. The initial CPU reset clobbers the userspace fpc and the store status ioctl clobbers the guest acrs + fpr. As these calls are only done via ioctl (and not via vcpu_run), no CPU context is loaded, so we can (and must) act directly on the sync regs, not on the thread context. Cc: stable@kernel.org Fixes: e1788bb9 ("KVM: s390: handle floating point registers in the run ioctl not in vcpu_put/load") Fixes: 31d8b8d4 ("KVM: s390: handle access registers in the run ioctl not in vcpu_put/load") Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Link: https://lore.kernel.org/r/20200131100205.74720-2-frankja@linux.ibm.comSigned-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sean Christopherson authored
commit 16be9dde upstream. Free the vCPU's wbinvd_dirty_mask if vCPU creation fails after kvm_arch_vcpu_init(), e.g. when installing the vCPU's file descriptor. Do the freeing by calling kvm_arch_vcpu_free() instead of open coding the freeing. This adds a likely superfluous, but ultimately harmless, call to kvmclock_reset(), which only clears vcpu->arch.pv_time_enabled. Using kvm_arch_vcpu_free() allows for additional cleanup in the future. Fixes: f5f48ee1 ("KVM: VMX: Execute WBINVD to keep data consistency with assigned devices") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sean Christopherson authored
commit b11306b5 upstream. Calculate the host-reserved cr4 bits at runtime based on the system's capabilities (using logic similar to __do_cpuid_func()), and use the dynamically generated mask for the reserved bit check in kvm_set_cr4() instead using of the static CR4_RESERVED_BITS define. This prevents userspace from "enabling" features in cr4 that are not supported by the system, e.g. by ignoring KVM_GET_SUPPORTED_CPUID and specifying a bogus CPUID for the vCPU. Allowing userspace to set unsupported bits in cr4 can lead to a variety of undesirable behavior, e.g. failed VM-Enter, and in general increases KVM's attack surface. A crafty userspace can even abuse CR4.LA57 to induce an unchecked #GP on a WRMSR. On a platform without LA57 support: KVM_SET_CPUID2 // CPUID_7_0_ECX.LA57 = 1 KVM_SET_SREGS // CR4.LA57 = 1 KVM_SET_MSRS // KERNEL_GS_BASE = 0x0004000000000000 KVM_RUN leads to a #GP when writing KERNEL_GS_BASE into hardware: unchecked MSR access error: WRMSR to 0xc0000102 (tried to write 0x0004000000000000) at rIP: 0xffffffffa00f239a (vmx_prepare_switch_to_guest+0x10a/0x1d0 [kvm_intel]) Call Trace: kvm_arch_vcpu_ioctl_run+0x671/0x1c70 [kvm] kvm_vcpu_ioctl+0x36b/0x5d0 [kvm] do_vfs_ioctl+0xa1/0x620 ksys_ioctl+0x66/0x70 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x4c/0x170 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7fc08133bf47 Note, the above sequence fails VM-Enter due to invalid guest state. Userspace can allow VM-Enter to succeed (after the WRMSR #GP) by adding a KVM_SET_SREGS w/ CR4.LA57=0 after KVM_SET_MSRS, in which case KVM will technically leak the host's KERNEL_GS_BASE into the guest. But, as KERNEL_GS_BASE is a userspace-defined value/address, the leak is largely benign as a malicious userspace would simply be exposing its own data to the guest, and attacking a benevolent userspace would require multiple bugs in the userspace VMM. Cc: stable@vger.kernel.org Cc: Jun Nakajima <jun.nakajima@intel.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Boris Ostrovsky authored
commit 8c6de56a upstream. kvm_steal_time_set_preempted() may accidentally clear KVM_VCPU_FLUSH_TLB bit if it is called more than once while VCPU is preempted. This is part of CVE-2019-3016. (This bug was also independently discovered by Jim Mattson <jmattson@google.com>) Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sean Christopherson authored
commit cb10bf91 upstream. Explicitly free the shared page if kvmppc_mmu_init() fails during kvmppc_core_vcpu_create(), as the page is freed only in kvmppc_core_vcpu_free(), which is not reached via kvm_vcpu_uninit(). Fixes: 96bc451a ("KVM: PPC: Introduce shared page") Cc: stable@vger.kernel.org Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sean Christopherson authored
commit 1a978d9d upstream. Call kvm_vcpu_uninit() if vcore creation fails to avoid leaking any resources allocated by kvm_vcpu_init(), i.e. the vcpu->run page. Fixes: 371fefd6 ("KVM: PPC: Allow book3s_hv guests to use SMT processor modes") Cc: stable@vger.kernel.org Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sean Christopherson authored
commit f958bd23 upstream. Unlike most state managed by XSAVE, MPX is initialized to zero on INIT. Because INITs are usually recognized in the context of a VCPU_RUN call, kvm_vcpu_reset() puts the guest's FPU so that the FPU state is resident in memory, zeros the MPX state, and reloads FPU state to hardware. But, in the unlikely event that an INIT is recognized during kvm_arch_vcpu_ioctl_get_mpstate() via kvm_apic_accept_events(), kvm_vcpu_reset() will call kvm_put_guest_fpu() without a preceding kvm_load_guest_fpu() and corrupt the guest's FPU state (and possibly userspace's FPU state as well). Given that MPX is being removed from the kernel[*], fix the bug with the simple-but-ugly approach of loading the guest's FPU during KVM_GET_MP_STATE. [*] See commit f240652b ("x86/mpx: Remove MPX APIs"). Fixes: f775b13e ("x86,kvm: move qemu/guest FPU switching out to vcpu_run") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Marios Pomonis authored
KVM: x86: Protect MSR-based index computations in fixed_msr_to_seg_unit() from Spectre-v1/L1TF attacks commit 25a5edea upstream. This fixes a Spectre-v1/L1TF vulnerability in fixed_msr_to_seg_unit(). This function contains index computations based on the (attacker-controlled) MSR number. Fixes: de9aef5e ("KVM: MTRR: introduce fixed_mtrr_segment table") Signed-off-by: Nick Finco <nifi@google.com> Signed-off-by: Marios Pomonis <pomonis@google.com> Reviewed-by: Andrew Honig <ahonig@google.com> Cc: stable@vger.kernel.org Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Marios Pomonis authored
commit 3c9053a2 upstream. This fixes a Spectre-v1/L1TF vulnerability in x86_decode_insn(). kvm_emulate_instruction() (an ancestor of x86_decode_insn()) is an exported symbol, so KVM should treat it conservatively from a security perspective. Fixes: 045a282c ("KVM: emulator: implement fninit, fnstsw, fnstcw") Signed-off-by: Nick Finco <nifi@google.com> Signed-off-by: Marios Pomonis <pomonis@google.com> Reviewed-by: Andrew Honig <ahonig@google.com> Cc: stable@vger.kernel.org Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Marios Pomonis authored
commit 6ec4c5ee upstream. This fixes a Spectre-v1/L1TF vulnerability in set_msr_mce() and get_msr_mce(). Both functions contain index computations based on the (attacker-controlled) MSR number. Fixes: 890ca9ae ("KVM: Add MCE support") Signed-off-by: Nick Finco <nifi@google.com> Signed-off-by: Marios Pomonis <pomonis@google.com> Reviewed-by: Andrew Honig <ahonig@google.com> Cc: stable@vger.kernel.org Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Marios Pomonis authored
commit 8c86405f upstream. This fixes a Spectre-v1/L1TF vulnerability in ioapic_read_indirect(). This function contains index computations based on the (attacker-controlled) IOREGSEL register. Fixes: a2c118bf ("KVM: Fix bounds checking in ioapic indirect register reads (CVE-2013-1798)") Signed-off-by: Nick Finco <nifi@google.com> Signed-off-by: Marios Pomonis <pomonis@google.com> Reviewed-by: Andrew Honig <ahonig@google.com> Cc: stable@vger.kernel.org Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Marios Pomonis authored
commit 13c5183a upstream. This fixes a Spectre-v1/L1TF vulnerability in the get_gp_pmc() and get_fixed_pmc() functions. They both contain index computations based on the (attacker-controlled) MSR number. Fixes: 25462f7f ("KVM: x86/vPMU: Define kvm_pmu_ops to support vPMU function dispatch") Signed-off-by: Nick Finco <nifi@google.com> Signed-off-by: Marios Pomonis <pomonis@google.com> Reviewed-by: Andrew Honig <ahonig@google.com> Cc: stable@vger.kernel.org Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Marios Pomonis authored
commit 67056455 upstream. This fixes a Spectre-v1/L1TF vulnerability in ioapic_write_indirect(). This function contains index computations based on the (attacker-controlled) IOREGSEL register. This patch depends on patch "KVM: x86: Protect ioapic_read_indirect() from Spectre-v1/L1TF attacks". Fixes: 70f93dae ("KVM: Use temporary variable to shorten lines.") Signed-off-by: Nick Finco <nifi@google.com> Signed-off-by: Marios Pomonis <pomonis@google.com> Reviewed-by: Andrew Honig <ahonig@google.com> Cc: stable@vger.kernel.org Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Marios Pomonis authored
commit 86187937 upstream. This fixes Spectre-v1/L1TF vulnerabilities in kvm_hv_msr_get_crash_data() and kvm_hv_msr_set_crash_data(). These functions contain index computations that use the (attacker-controlled) MSR number. Fixes: e7d9513b ("kvm/x86: added hyper-v crash msrs into kvm hyperv context") Signed-off-by: Nick Finco <nifi@google.com> Signed-off-by: Marios Pomonis <pomonis@google.com> Reviewed-by: Andrew Honig <ahonig@google.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Marios Pomonis authored
commit 4bf79cb0 upstream. This fixes a Spectre-v1/L1TF vulnerability in kvm_lapic_reg_write(). This function contains index computations based on the (attacker-controlled) MSR number. Fixes: 0105d1a5 ("KVM: x2apic interface to lapic") Signed-off-by: Nick Finco <nifi@google.com> Signed-off-by: Marios Pomonis <pomonis@google.com> Reviewed-by: Andrew Honig <ahonig@google.com> Cc: stable@vger.kernel.org Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Marios Pomonis authored
commit ea740059 upstream. This fixes a Spectre-v1/L1TF vulnerability in __kvm_set_dr() and kvm_get_dr(). Both kvm_get_dr() and kvm_set_dr() (a wrapper of __kvm_set_dr()) are exported symbols so KVM should tream them conservatively from a security perspective. Fixes: 020df079 ("KVM: move DR register access handling into generic code") Signed-off-by: Nick Finco <nifi@google.com> Signed-off-by: Marios Pomonis <pomonis@google.com> Reviewed-by: Andrew Honig <ahonig@google.com> Cc: stable@vger.kernel.org Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Marios Pomonis authored
commit 66061740 upstream. This fixes Spectre-v1/L1TF vulnerabilities in intel_find_fixed_event() and intel_rdpmc_ecx_to_pmc(). kvm_rdpmc() (ancestor of intel_find_fixed_event()) and reprogram_fixed_counter() (ancestor of intel_rdpmc_ecx_to_pmc()) are exported symbols so KVM should treat them conservatively from a security perspective. Fixes: 25462f7f ("KVM: x86/vPMU: Define kvm_pmu_ops to support vPMU function dispatch") Signed-off-by: Nick Finco <nifi@google.com> Signed-off-by: Marios Pomonis <pomonis@google.com> Reviewed-by: Andrew Honig <ahonig@google.com> Cc: stable@vger.kernel.org Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Marios Pomonis authored
commit 125ffc5e upstream. This fixes Spectre-v1/L1TF vulnerabilities in vmx_read_guest_seg_selector(), vmx_read_guest_seg_base(), vmx_read_guest_seg_limit() and vmx_read_guest_seg_ar(). When invoked from emulation, these functions contain index computations based on the (attacker-influenced) segment value. Using constants prevents the attack. Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Marios Pomonis authored
commit 14e32321 upstream. This fixes a Spectre-v1/L1TF vulnerability in picdev_write(). It replaces index computations based on the (attacked-controlled) port number with constants through a minor refactoring. Fixes: 85f455f7 ("KVM: Add support for in-kernel PIC emulation") Signed-off-by: Nick Finco <nifi@google.com> Signed-off-by: Marios Pomonis <pomonis@google.com> Reviewed-by: Andrew Honig <ahonig@google.com> Cc: stable@vger.kernel.org Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jens Axboe authored
commit 01d7a356 upstream. If we have nested or circular eventfd wakeups, then we can deadlock if we run them inline from our poll waitqueue wakeup handler. It's also possible to have very long chains of notifications, to the extent where we could risk blowing the stack. Check the eventfd recursion count before calling eventfd_signal(). If it's non-zero, then punt the signaling to async context. This is always safe, as it takes us out-of-line in terms of stack and locking context. Cc: stable@vger.kernel.org # 4.19+ Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jens Axboe authored
commit b5e683d5 upstream. eventfd use cases from aio and io_uring can deadlock due to circular or resursive calling, when eventfd_signal() tries to grab the waitqueue lock. On top of that, it's also possible to construct notification chains that are deep enough that we could blow the stack. Add a percpu counter that tracks the percpu recursion depth, warn if we exceed it. The counter is also exposed so that users of eventfd_signal() can do the right thing if it's non-zero in the context where it is called. Cc: stable@vger.kernel.org # 4.19+ Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Coly Li authored
commit 038ba8cc upstream. In year 2007 high performance SSD was still expensive, in order to save more space for real workload or meta data, the readahead I/Os for non-meta data was bypassed and not cached on SSD. In now days, SSD price drops a lot and people can find larger size SSD with more comfortable price. It is unncessary to alway bypass normal readahead I/Os to save SSD space for now. This patch adds options for readahead data cache policies via sysfs file /sys/block/bcache<N>/readahead_cache_policy, the options are, - "all": cache all readahead data I/Os. - "meta-only": only cache meta data, and bypass other regular I/Os. If users want to make bcache continue to only cache readahead request for metadata and bypass regular data readahead, please set "meta-only" to this sysfs file. By default, bcache will back to cache all read- ahead requests now. Cc: stable@vger.kernel.org Signed-off-by: Coly Li <colyli@suse.de> Acked-by: Eric Wheeler <bcache@linux.ewheeler.net> Cc: Michael Lyle <mlyle@lyle.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Vladis Dronov authored
commit 69503e58 upstream. After the commit 44ea3942 ("drivers/watchdog: make use of devm_register_reboot_notifier()") the struct notifier_block reboot_nb in the struct watchdog_device is removed from the reboot notifiers chain at the time watchdog's chardev is closed. But at least in i6300esb.c case reboot_nb is embedded in the struct esb_dev which can be freed on its device removal and before the chardev is closed, thus UAF at reboot: [ 7.728581] esb_probe: esb_dev.watchdog_device ffff91316f91ab28 ts# uname -r note the address ^^^ 5.5.0-rc5-ae6088-wdog ts# ./openwdog0 & [1] 696 ts# opened /dev/watchdog0, sleeping 10s... ts# echo 1 > /sys/devices/pci0000\:00/0000\:00\:09.0/remove [ 178.086079] devres:rel_nodes: dev ffff91317668a0b0 data ffff91316f91ab28 esb_dev.watchdog_device.reboot_nb memory is freed here ^^^ ts# ...woken up [ 181.459010] devres:rel_nodes: dev ffff913171781000 data ffff913174a1dae8 [ 181.460195] devm_unreg_reboot_notifier: res ffff913174a1dae8 nb ffff91316f91ab78 attempt to use memory already freed ^^^ [ 181.461063] devm_unreg_reboot_notifier: nb->call 6b6b6b6b6b6b6b6b [ 181.461243] devm_unreg_reboot_notifier: nb->next 6b6b6b6b6b6b6b6b freed memory is filled with a slub poison ^^^ [1]+ Done ./openwdog0 ts# reboot [ 229.921862] systemd-shutdown[1]: Rebooting. [ 229.939265] notifier_call_chain: nb ffffffff9c6c2f20 nb->next ffffffff9c6d50c0 [ 229.943080] notifier_call_chain: nb ffffffff9c6d50c0 nb->next 6b6b6b6b6b6b6b6b [ 229.946054] notifier_call_chain: nb 6b6b6b6b6b6b6b6b INVAL [ 229.957584] general protection fault: 0000 [#1] SMP [ 229.958770] CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 5.5.0-rc5-ae6088-wdog [ 229.960224] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), ... [ 229.963288] RIP: 0010:notifier_call_chain+0x66/0xd0 [ 229.969082] RSP: 0018:ffffb20dc0013d88 EFLAGS: 00010246 [ 229.970812] RAX: 000000000000002e RBX: 6b6b6b6b6b6b6b6b RCX: 00000000000008b3 [ 229.972929] RDX: 0000000000000000 RSI: 0000000000000096 RDI: ffffffff9ccc46ac [ 229.975028] RBP: 0000000000000001 R08: 0000000000000000 R09: 00000000000008b3 [ 229.977039] R10: 0000000000000001 R11: ffffffff9c26c740 R12: 0000000000000000 [ 229.979155] R13: 6b6b6b6b6b6b6b6b R14: 0000000000000000 R15: 00000000fffffffa ... slub_debug=FZP poison ^^^ [ 229.989089] Call Trace: [ 229.990157] blocking_notifier_call_chain+0x43/0x59 [ 229.991401] kernel_restart_prepare+0x14/0x30 [ 229.992607] kernel_restart+0x9/0x30 [ 229.993800] __do_sys_reboot+0x1d2/0x210 [ 230.000149] do_syscall_64+0x3d/0x130 [ 230.001277] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 230.002639] RIP: 0033:0x7f5461bdd177 [ 230.016402] Modules linked in: i6300esb [ 230.050261] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b Fix the crash by reverting 44ea3942 so unregister_reboot_notifier() is called when watchdog device is removed. This also makes handling of the reboot notifier unified with the handling of the restart handler, which is freed with unregister_restart_handler() in the same place. Fixes: 44ea3942 ("drivers/watchdog: make use of devm_register_reboot_notifier()") Cc: stable@vger.kernel.org # v4.15+ Signed-off-by: Vladis Dronov <vdronov@redhat.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20200108125347.6067-1-vdronov@redhat.comSigned-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Juergen Gross authored
commit eda4eabf upstream. Commit 3aa6c19d ("xen/balloon: Support xend-based toolstack") tried to fix a regression with running on rather ancient Xen versions. Unfortunately the fix was based on the assumption that xend would just use another Xenstore node, but in reality only some downstream versions of xend are doing that. The upstream xend does not write that Xenstore node at all, so the problem must be fixed in another way. The easiest way to achieve that is to fall back to the behavior before commit 96edd61d ("xen/balloon: don't online new memory initially") in case the static memory maximum can't be read. This is achieved by setting static_max to the current number of memory pages known by the system resulting in target_diff becoming zero. Fixes: 3aa6c19d ("xen/balloon: Support xend-based toolstack") Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: <stable@vger.kernel.org> # 4.13 Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Gavin Shan authored
commit 5fcf3a55 upstream. The filter name is fixed to "exit_reason" for some kvm_exit events, no matter what architect we have. Actually, the filter name ("exit_reason") is only applicable to x86, meaning it's broken on other architects including aarch64. This fixes the issue by providing various kvm_exit filter names, depending on architect we're on. Afterwards, the variable filter name is picked and applied through ioctl(fd, SET_FILTER). Reported-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Gavin Shan <gshan@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sean Young authored
commit 080d89f5 upstream. Once rc_open is called on the input device, lirc events can be delivered. Ensure lirc is ready to do so else we might get this: Registered IR keymap rc-hauppauge rc rc0: Hauppauge WinTV PVR-350 as /devices/pci0000:00/0000:00:1e.0/0000:04:00.0/i2c-0/0-0018/rc/rc0 input: Hauppauge WinTV PVR-350 as /devices/pci0000:00/0000:00:1e.0/0000:04:00.0/i2c-0/0-0018/rc/rc0/input9 BUG: kernel NULL pointer dereference, address: 0000000000000038 PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 1 PID: 17 Comm: kworker/1:0 Not tainted 5.3.11-300.fc31.x86_64 #1 Hardware name: /DG43NB, BIOS NBG4310H.86A.0096.2009.0903.1845 09/03/2009 Workqueue: events ir_work [ir_kbd_i2c] RIP: 0010:ir_lirc_scancode_event+0x3d/0xb0 Code: a6 b4 07 00 00 49 81 c6 b8 07 00 00 55 53 e8 ba a7 9d ff 4c 89 e7 49 89 45 00 e8 5e 7a 25 00 49 8b 1e 48 89 c5 4c 39 f3 74 58 <8b> 43 38 8b 53 40 89 c1 2b 4b 3c 39 ca 72 41 21 d0 49 8b 7d 00 49 RSP: 0018:ffffaae2000b3d88 EFLAGS: 00010017 RAX: 0000000000000002 RBX: 0000000000000000 RCX: 0000000000000019 RDX: 0000000000000001 RSI: 006e801b1f26ce6a RDI: ffff9e39797c37b4 RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000000000001 R10: 0000000000000001 R11: 0000000000000001 R12: ffff9e39797c37b4 R13: ffffaae2000b3db8 R14: ffff9e39797c37b8 R15: ffff9e39797c33d8 FS: 0000000000000000(0000) GS:ffff9e397b680000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000038 CR3: 0000000035844000 CR4: 00000000000006e0 Call Trace: ir_do_keydown+0x8e/0x2b0 rc_keydown+0x52/0xc0 ir_work+0xb8/0x130 [ir_kbd_i2c] process_one_work+0x19d/0x340 worker_thread+0x50/0x3b0 kthread+0xfb/0x130 ? process_one_work+0x340/0x340 ? kthread_park+0x80/0x80 ret_from_fork+0x35/0x40 Modules linked in: rc_hauppauge tuner msp3400 saa7127 saa7115 ivtv(+) tveeprom cx2341x v4l2_common videodev mc i2c_algo_bit ir_kbd_i2c ip_tables firewire_ohci e1000e serio_raw firewire_core ata_generic crc_itu_t pata_acpi pata_jmicron fuse CR2: 0000000000000038 ---[ end trace c67c2697a99fa74b ]--- RIP: 0010:ir_lirc_scancode_event+0x3d/0xb0 Code: a6 b4 07 00 00 49 81 c6 b8 07 00 00 55 53 e8 ba a7 9d ff 4c 89 e7 49 89 45 00 e8 5e 7a 25 00 49 8b 1e 48 89 c5 4c 39 f3 74 58 <8b> 43 38 8b 53 40 89 c1 2b 4b 3c 39 ca 72 41 21 d0 49 8b 7d 00 49 RSP: 0018:ffffaae2000b3d88 EFLAGS: 00010017 RAX: 0000000000000002 RBX: 0000000000000000 RCX: 0000000000000019 RDX: 0000000000000001 RSI: 006e801b1f26ce6a RDI: ffff9e39797c37b4 RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000000000001 R10: 0000000000000001 R11: 0000000000000001 R12: ffff9e39797c37b4 R13: ffffaae2000b3db8 R14: ffff9e39797c37b8 R15: ffff9e39797c33d8 FS: 0000000000000000(0000) GS:ffff9e397b680000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000038 CR3: 0000000035844000 CR4: 00000000000006e0 rc rc0: lirc_dev: driver ir_kbd_i2c registered at minor = 0, scancode receiver, no transmitter tuner-simple 0-0061: creating new instance tuner-simple 0-0061: type set to 2 (Philips NTSC (FI1236,FM1236 and compatibles)) ivtv0: Registered device video0 for encoder MPG (4096 kB) ivtv0: Registered device video32 for encoder YUV (2048 kB) ivtv0: Registered device vbi0 for encoder VBI (1024 kB) ivtv0: Registered device video24 for encoder PCM (320 kB) ivtv0: Registered device radio0 for encoder radio ivtv0: Registered device video16 for decoder MPG (1024 kB) ivtv0: Registered device vbi8 for decoder VBI (64 kB) ivtv0: Registered device vbi16 for decoder VOUT ivtv0: Registered device video48 for decoder YUV (1024 kB) Cc: stable@vger.kernel.org Tested-by: Nick French <nickfrench@gmail.com> Reported-by: Nick French <nickfrench@gmail.com> Signed-off-by: Sean Young <sean@mess.org> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ville Syrjälä authored
commit 433480c1 upstream. Check for zero width/height destination rectangle in drm_rect_clip_scaled() to avoid a division by zero. Cc: stable@vger.kernel.org Fixes: f96bdf56 ("drm/rect: Handle rounding errors in drm_rect_clip_scaled, v3.") Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Benjamin Gaignard <benjamin.gaignard@st.com> Cc: Daniel Vetter <daniel@ffwll.ch> Testcase: igt/kms_selftest/drm_rect_clip_scaled_div_by_zero Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191122175623.13565-2-ville.syrjala@linux.intel.comReviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Benjamin Gaignard <benjamin.gaignard@st.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Andreas Gruenbacher authored
commit 6e5e41e2 upstream. In gfs2_file_write_iter, for direct writes, the error checking in the buffered write fallback case is incomplete. This can cause inode write errors to go undetected. Fix and clean up gfs2_file_write_iter along the way. Based on a proposed fix by Christoph Hellwig <hch@lst.de>. Fixes: 967bcc91 ("gfs2: iomap direct I/O support") Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Christoph Hellwig authored
commit 4c0e8dda upstream. Set current->backing_dev_info just around the buffered write calls to prepare for the next fix. Fixes: 967bcc91 ("gfs2: iomap direct I/O support") Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Roberto Bergantinos Corpas authored
commit 3d96208c upstream. When upcalling gssproxy, cache_head.expiry_time is set as a timeval, not seconds since boot. As such, RPC cache expiry logic will not clean expired objects created under auth.rpcsec.context cache. This has proven to cause kernel memory leaks on field. Using 64 bit variants of getboottime/timespec Expiration times have worked this way since 2010's c5b29f88 "sunrpc: use seconds since boot in expiry cache". The gssproxy code introduced in 2012 added gss_proxy_save_rsc and introduced the bug. That's a while for this to lurk, but it required a bit of an extreme case to make it obvious. Signed-off-by: Roberto Bergantinos Corpas <rbergant@redhat.com> Cc: stable@vger.kernel.org Fixes: 030d794b "SUNRPC: Use gssproxy upcall for server..." Tested-By: Frank Sorenson <sorenson@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Brian Norris authored
commit 65b1aae0 upstream. We called rcu_read_lock(), so we need to call rcu_read_unlock() before we return. Fixes: 3d94a4a8 ("mwifiex: fix possible heap overflow in mwifiex_process_country_ie()") Cc: stable@vger.kernel.org Cc: huangwen <huangwenabc@gmail.com> Cc: Ganapathi Bhat <ganapathi.bhat@nxp.com> Signed-off-by: Brian Norris <briannorris@chromium.org> Acked-by: Ganapathi Bhat <ganapathi.bhat@nxp.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Luca Coelho authored
commit 197288d5 upstream. The IGTK keys are only removed by mac80211 after it has already removed the AP station. This causes the driver to throw an error because mac80211 is trying to remove the IGTK when the station doesn't exist anymore. The firmware is aware that the station has been removed and can deal with it the next time we try to add an IGTK for a station, so we shouldn't try to remove the key if the station ID is IWL_MVM_INVALID_STA. Do this by removing the check for mvm_sta before calling iwl_mvm_send_sta_igtk() and check return from that function gracefully if the station ID is invalid. Cc: stable@vger.kernel.org # 4.12+ Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Stephen Warren authored
commit 1a3388d5 upstream. For a little over a year, U-Boot has configured the flow controller to perform automatic RAM re-repair on off->on power transitions of the CPU rail[1]. This is mandatory for correct operation of Tegra124. However, RAM re-repair relies on certain clocks, which the kernel must enable and leave running. PLLP is one of those clocks. This clock is shut down during LP1 in order to save power. Enable bypass (which I believe routes osc_div_clk, essentially the crystal clock, to the PLL output) so that this clock signal toggles even though the PLL is not active. This is required so that LP1 power mode (system suspend) operates correctly. The bypass configuration must then be undone when resuming from LP1, so that all peripheral clocks run at the expected rate. Without this, many peripherals won't work correctly; for example, the UART baud rate would be incorrect. NVIDIA's downstream kernel code only does this if not compiled for Tegra30, so the added code is made conditional upon the chip ID. NVIDIA's downstream code makes this change conditional upon the active CPU cluster. The upstream kernel currently doesn't support cluster switching, so this patch doesn't test the active CPU cluster ID. [1] 3cc7942a4ae5 ARM: tegra: implement RAM repair Reported-by: Jonathan Hunter <jonathanh@nvidia.com> Cc: stable@vger.kernel.org Signed-off-by: Stephen Warren <swarren@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Filipe Manana authored
commit 7227ff4d upstream. There is a race between adding and removing elements to the tree mod log list and rbtree that can lead to use-after-free problems. Consider the following example that explains how/why the problems happens: 1) Task A has mod log element with sequence number 200. It currently is the only element in the mod log list; 2) Task A calls btrfs_put_tree_mod_seq() because it no longer needs to access the tree mod log. When it enters the function, it initializes 'min_seq' to (u64)-1. Then it acquires the lock 'tree_mod_seq_lock' before checking if there are other elements in the mod seq list. Since the list it empty, 'min_seq' remains set to (u64)-1. Then it unlocks the lock 'tree_mod_seq_lock'; 3) Before task A acquires the lock 'tree_mod_log_lock', task B adds itself to the mod seq list through btrfs_get_tree_mod_seq() and gets a sequence number of 201; 4) Some other task, name it task C, modifies a btree and because there elements in the mod seq list, it adds a tree mod elem to the tree mod log rbtree. That node added to the mod log rbtree is assigned a sequence number of 202; 5) Task B, which is doing fiemap and resolving indirect back references, calls btrfs get_old_root(), with 'time_seq' == 201, which in turn calls tree_mod_log_search() - the search returns the mod log node from the rbtree with sequence number 202, created by task C; 6) Task A now acquires the lock 'tree_mod_log_lock', starts iterating the mod log rbtree and finds the node with sequence number 202. Since 202 is less than the previously computed 'min_seq', (u64)-1, it removes the node and frees it; 7) Task B still has a pointer to the node with sequence number 202, and it dereferences the pointer itself and through the call to __tree_mod_log_rewind(), resulting in a use-after-free problem. This issue can be triggered sporadically with the test case generic/561 from fstests, and it happens more frequently with a higher number of duperemove processes. When it happens to me, it either freezes the VM or it produces a trace like the following before crashing: [ 1245.321140] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI [ 1245.321200] CPU: 1 PID: 26997 Comm: pool Not tainted 5.5.0-rc6-btrfs-next-52 #1 [ 1245.321235] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014 [ 1245.321287] RIP: 0010:rb_next+0x16/0x50 [ 1245.321307] Code: .... [ 1245.321372] RSP: 0018:ffffa151c4d039b0 EFLAGS: 00010202 [ 1245.321388] RAX: 6b6b6b6b6b6b6b6b RBX: ffff8ae221363c80 RCX: 6b6b6b6b6b6b6b6b [ 1245.321409] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8ae221363c80 [ 1245.321439] RBP: ffff8ae20fcc4688 R08: 0000000000000002 R09: 0000000000000000 [ 1245.321475] R10: ffff8ae20b120910 R11: 00000000243f8bb1 R12: 0000000000000038 [ 1245.321506] R13: ffff8ae221363c80 R14: 000000000000075f R15: ffff8ae223f762b8 [ 1245.321539] FS: 00007fdee1ec7700(0000) GS:ffff8ae236c80000(0000) knlGS:0000000000000000 [ 1245.321591] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1245.321614] CR2: 00007fded4030c48 CR3: 000000021da16003 CR4: 00000000003606e0 [ 1245.321642] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1245.321668] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 1245.321706] Call Trace: [ 1245.321798] __tree_mod_log_rewind+0xbf/0x280 [btrfs] [ 1245.321841] btrfs_search_old_slot+0x105/0xd00 [btrfs] [ 1245.321877] resolve_indirect_refs+0x1eb/0xc60 [btrfs] [ 1245.321912] find_parent_nodes+0x3dc/0x11b0 [btrfs] [ 1245.321947] btrfs_check_shared+0x115/0x1c0 [btrfs] [ 1245.321980] ? extent_fiemap+0x59d/0x6d0 [btrfs] [ 1245.322029] extent_fiemap+0x59d/0x6d0 [btrfs] [ 1245.322066] do_vfs_ioctl+0x45a/0x750 [ 1245.322081] ksys_ioctl+0x70/0x80 [ 1245.322092] ? trace_hardirqs_off_thunk+0x1a/0x1c [ 1245.322113] __x64_sys_ioctl+0x16/0x20 [ 1245.322126] do_syscall_64+0x5c/0x280 [ 1245.322139] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 1245.322155] RIP: 0033:0x7fdee3942dd7 [ 1245.322177] Code: .... [ 1245.322258] RSP: 002b:00007fdee1ec6c88 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 1245.322294] RAX: ffffffffffffffda RBX: 00007fded40210d8 RCX: 00007fdee3942dd7 [ 1245.322314] RDX: 00007fded40210d8 RSI: 00000000c020660b RDI: 0000000000000004 [ 1245.322337] RBP: 0000562aa89e7510 R08: 0000000000000000 R09: 00007fdee1ec6d44 [ 1245.322369] R10: 0000000000000073 R11: 0000000000000246 R12: 00007fdee1ec6d48 [ 1245.322390] R13: 00007fdee1ec6d40 R14: 00007fded40210d0 R15: 00007fdee1ec6d50 [ 1245.322423] Modules linked in: .... [ 1245.323443] ---[ end trace 01de1e9ec5dff3cd ]--- Fix this by ensuring that btrfs_put_tree_mod_seq() computes the minimum sequence number and iterates the rbtree while holding the lock 'tree_mod_log_lock' in write mode. Also get rid of the 'tree_mod_seq_lock' lock, since it is now redundant. Fixes: bd989ba3 ("Btrfs: add tree modification log functions") Fixes: 097b8a7c ("Btrfs: join tree mod log code with the code holding back delayed refs") CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Josef Bacik authored
commit d62b23c9 upstream. If we abort a transaction we have the following sequence if (!trans->dirty && list_empty(&trans->new_bgs)) return; WRITE_ONCE(trans->transaction->aborted, err); The idea being if we didn't modify anything with our trans handle then we don't really need to abort the whole transaction, maybe the other trans handles are fine and we can carry on. However in the case of create_snapshot we add a pending_snapshot object to our transaction and then commit the transaction. We don't actually modify anything. sync() behaves the same way, attach to an existing transaction and commit it. This means that if we have an IO error in the right places we could abort the committing transaction with our trans->dirty being not set and thus not set transaction->aborted. This is a problem because in the create_snapshot() case we depend on pending->error being set to something, or btrfs_commit_transaction returning an error. If we are not the trans handle that gets to commit the transaction, and we're waiting on the commit to happen we get our return value from cur_trans->aborted. If this was not set to anything because sync() hit an error in the transaction commit before it could modify anything then cur_trans->aborted would be 0. Thus we'd return 0 from btrfs_commit_transaction() in create_snapshot. This is a problem because we then try to do things with pending_snapshot->snap, which will be NULL because we didn't create the snapshot, and then we'll get a NULL pointer dereference like the following "BUG: kernel NULL pointer dereference, address: 00000000000001f0" RIP: 0010:btrfs_orphan_cleanup+0x2d/0x330 Call Trace: ? btrfs_mksubvol.isra.31+0x3f2/0x510 btrfs_mksubvol.isra.31+0x4bc/0x510 ? __sb_start_write+0xfa/0x200 ? mnt_want_write_file+0x24/0x50 btrfs_ioctl_snap_create_transid+0x16c/0x1a0 btrfs_ioctl_snap_create_v2+0x11e/0x1a0 btrfs_ioctl+0x1534/0x2c10 ? free_debug_processing+0x262/0x2a3 do_vfs_ioctl+0xa6/0x6b0 ? do_sys_open+0x188/0x220 ? syscall_trace_enter+0x1f8/0x330 ksys_ioctl+0x60/0x90 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x4a/0x1b0 In order to fix this we need to make sure anybody who calls commit_transaction has trans->dirty set so that they properly set the trans->transaction->aborted value properly so any waiters know bad things happened. This was found while I was running generic/475 with my modified fsstress, it reproduced within a few runs. I ran with this patch all night and didn't see the problem again. CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Filipe Manana authored
commit 0e56315c upstream. When using the NO_HOLES feature, if we punch a hole into a file and then fsync it, there are cases where a subsequent fsync will miss the fact that a hole was punched, resulting in the holes not existing after replaying the log tree. Essentially these cases all imply that, tree-log.c:copy_items(), is not invoked for the leafs that delimit holes, because nothing changed those leafs in the current transaction. And it's precisely copy_items() where we currenly detect and log holes, which works as long as the holes are between file extent items in the input leaf or between the beginning of input leaf and the previous leaf or between the last item in the leaf and the next leaf. First example where we miss a hole: *) The extent items of the inode span multiple leafs; *) The punched hole covers a range that affects only the extent items of the first leaf; *) The fsync operation is done in full mode (BTRFS_INODE_NEEDS_FULL_SYNC is set in the inode's runtime flags). That results in the hole not existing after replaying the log tree. For example, if the fs/subvolume tree has the following layout for a particular inode: Leaf N, generation 10: [ ... INODE_ITEM INODE_REF EXTENT_ITEM (0 64K) EXTENT_ITEM (64K 128K) ] Leaf N + 1, generation 10: [ EXTENT_ITEM (128K 64K) ... ] If at transaction 11 we punch a hole coverting the range [0, 128K[, we end up dropping the two extent items from leaf N, but we don't touch the other leaf, so we end up in the following state: Leaf N, generation 11: [ ... INODE_ITEM INODE_REF ] Leaf N + 1, generation 10: [ EXTENT_ITEM (128K 64K) ... ] A full fsync after punching the hole will only process leaf N because it was modified in the current transaction, but not leaf N + 1, since it was not modified in the current transaction (generation 10 and not 11). As a result the fsync will not log any holes, because it didn't process any leaf with extent items. Second example where we will miss a hole: *) An inode as its items spanning 5 (or more) leafs; *) A hole is punched and it covers only the extents items of the 3rd leaf. This resulsts in deleting the entire leaf and not touching any of the other leafs. So the only leaf that is modified in the current transaction, when punching the hole, is the first leaf, which contains the inode item. During the full fsync, the only leaf that is passed to copy_items() is that first leaf, and that's not enough for the hole detection code in copy_items() to determine there's a hole between the last file extent item in the 2nd leaf and the first file extent item in the 3rd leaf (which was the 4th leaf before punching the hole). Fix this by scanning all leafs and punch holes as necessary when doing a full fsync (less common than a non-full fsync) when the NO_HOLES feature is enabled. The lack of explicit file extent items to mark holes makes it necessary to scan existing extents to determine if holes exist. A test case for fstests follows soon. Fixes: 16e7549f ("Btrfs: incompatible format change to remove hole extents") CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-