Commits · 2319dd9da02501f9c139afcaf2b8b07b984f455d · Kirill Smelkov / linux

09 Aug, 2018 1 commit

cpu/hotplug: Fix SMT supported evaluation · 2319dd9d

Thomas Gleixner authored Aug 07, 2018

Josh reported that the late SMT evaluation in cpu_smt_state_init() sets
cpu_smt_control to CPU_SMT_NOT_SUPPORTED in case that 'nosmt' was supplied
on the kernel command line as it cannot differentiate between SMT disabled
by BIOS and SMT soft disable via 'nosmt'. That wreckages the state and
makes the sysfs interface unusable.

Rework this so that during bringup of the non boot CPUs the availability of
SMT is determined in cpu_smt_allowed(). If a newly booted CPU is not a
'primary' thread then set the local cpu_smt_available marker and evaluate
this explicitely right after the initial SMP bringup has finished.

SMT evaulation on x86 is a trainwreck as the firmware has all the
information _before_ booting the kernel, but there is no interface to query
it.

Fixes: 73d5e2b4 ("cpu/hotplug: detect SMT disabled by BIOS")
Reported-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

CVE-2018-3620
CVE-2018-3646

[smb: Context and also adjust to alternative booted_once scheme,
      including a move of the smt check into _cpu_up()]
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

2319dd9d

08 Aug, 2018 39 commits

KVM: VMX: Tell the nested hypervisor to skip L1D flush on vmentry · 0502ecb7

Paolo Bonzini authored Aug 05, 2018

When nested virtualization is in use, VMENTER operations from the nested
hypervisor into the nested guest will always be processed by the bare metal
hypervisor, and KVM's "conditional cache flushes" mode in particular does a
flush on nested vmentry.  Therefore, include the "skip L1D flush on
vmentry" bit in KVM's suggested ARCH_CAPABILITIES setting.

Add the relevant Documentation.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

CVE-2018-3620
CVE-2018-3646

[tyhicks: Adjust for the missing MSR_F10H_DECFG and MSR_IA32_UCODE_REV
 feature MSRs which do not exist in 4.15]
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
[smb: Minor context and adjusted documentation path]
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

0502ecb7

KVM: VMX: support MSR_IA32_ARCH_CAPABILITIES as a feature MSR · e5559836

Paolo Bonzini authored Jun 25, 2018

This lets userspace read the MSR_IA32_ARCH_CAPABILITIES and check that all
requested features are available on the host.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

CVE-2018-3620
CVE-2018-3646

(backported from commit cd283252)
[tyhicks: Adjust for the missing MSR_F10H_DECFG and MSR_IA32_UCODE_REV
 feature MSRs which do not exist in 4.15]
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

e5559836

KVM: X86: Introduce kvm_get_msr_feature() · d59679db

Wanpeng Li authored Feb 28, 2018

Introduce kvm_get_msr_feature() to handle the msrs which are supported
by different vendors and sharing the same emulation logic.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Liran Alon <liran.alon@oracle.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>

CVE-2018-3620
CVE-2018-3646

(cherry picked from commit 66421c1e)
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

d59679db

KVM: x86: Add a framework for supporting MSR-based features · b8d3ddd8

Tom Lendacky authored Feb 21, 2018

Provide a new KVM capability that allows bits within MSRs to be recognized
as features.  Two new ioctls are added to the /dev/kvm ioctl routine to
retrieve the list of these MSRs and then retrieve their values. A kvm_x86_ops
callback is used to determine support for the listed MSR-based features.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[Tweaked documentation. - Radim]
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>

CVE-2018-3620
CVE-2018-3646

(backported from commit 801e459a)
[tyhicks: Adjust context for missing mem_enc_* kvm_x86_ops]
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
[smb: Further context adjustments for two hunks]
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

b8d3ddd8

KVM/VMX: Emulate MSR_IA32_ARCH_CAPABILITIES · 36566c92

KarimAllah Ahmed authored Feb 01, 2018

commit 28c1c9fa

Intel processors use MSR_IA32_ARCH_CAPABILITIES MSR to indicate RDCL_NO
(bit 0) and IBRS_ALL (bit 1). This is a read-only MSR. By default the
contents will come directly from the hardware, but user-space can still
override it.

[dwmw2: The bit in kvm_cpuid_7_0_edx_x86_features can be unconditional]
Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: kvm@vger.kernel.org
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Link: https://lkml.kernel.org/r/1517522386-18410-4-git-send-email-karahmed@amazon.deSigned-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

CVE-2018-3620
CVE-2018-3646

(backported from commit a6005a792e24c30cb5fa8525b91af67ca0bcc1e7 bionic)
[smb: Context adjustments to work around already applied spectre
      patches. Also introduced specific guest_cpuid_has_arch_capabilities
      as replacement for non-existing guest_cpuid_has function.]
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

36566c92

x86/speculation: Use ARCH_CAPABILITIES to skip L1D flush on vmentry · 34d382bd

Paolo Bonzini authored Aug 05, 2018

Bit 3 of ARCH_CAPABILITIES tells a hypervisor that L1D flush on vmentry is
not needed.  Add a new value to enum vmx_l1d_flush_state, which is used
either if there is no L1TF bug at all, or if bit 3 is set in ARCH_CAPABILITIES.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

CVE-2018-3620
CVE-2018-3646
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

34d382bd

x86/speculation: Simplify sysfs report of VMX L1TF vulnerability · b2fea1af

Paolo Bonzini authored Aug 05, 2018

Three changes to the content of the sysfs file:

 - If EPT is disabled, L1TF cannot be exploited even across threads on the
   same core, and SMT is irrelevant.

 - If mitigation is completely disabled, and SMT is enabled, print "vulnerable"
   instead of "vulnerable, SMT vulnerable"

 - Reorder the two parts so that the main vulnerability state comes first
   and the detail on SMT is second.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

CVE-2018-3620
CVE-2018-3646
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

b2fea1af

Documentation/l1tf: Remove Yonah processors from not vulnerable list · 07d4efea

Thomas Gleixner authored Aug 05, 2018

Dave reported, that it's not confirmed that Yonah processors are
unaffected. Remove them from the list.
Reported-by: ave Hansen <dave.hansen@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

CVE-2018-3620
CVE-2018-3646

[smb: Adjusted document path]
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

07d4efea

x86/KVM/VMX: Don't set l1tf_flush_l1d from vmx_handle_external_intr() · 3e50244e

Nicolai Stange authored Jul 22, 2018

For VMEXITs caused by external interrupts, vmx_handle_external_intr()
indirectly calls into the interrupt handlers through the host's IDT.

It follows that these interrupts get accounted for in the
kvm_cpu_l1tf_flush_l1d per-cpu flag.

The subsequently executed vmx_l1d_flush() will thus be aware that some
interrupts have happened and conduct a L1d flush anyway.

Setting l1tf_flush_l1d from vmx_handle_external_intr() isn't needed
anymore. Drop it.
Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

CVE-2018-3620
CVE-2018-3646

[smb: Minor context adjustment]
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

3e50244e

x86/irq: Let interrupt handlers set kvm_cpu_l1tf_flush_l1d · 609f820f

Nicolai Stange authored Jul 29, 2018

The last missing piece to having vmx_l1d_flush() take interrupts after
VMEXIT into account is to set the kvm_cpu_l1tf_flush_l1d per-cpu flag on
irq entry.

Issue calls to kvm_set_cpu_l1tf_flush_l1d() from entering_irq(),
ipi_entering_ack_irq(), smp_reschedule_interrupt() and
uv_bau_message_interrupt().
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

CVE-2018-3620
CVE-2018-3646

[smb: Minor context adjustments]
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

609f820f

x86/apic: Order irq_enter/exit() calls correctly vs. ack_APIC_irq() · a56fcf6a

Wanpeng Li authored Sep 18, 2016

===============================
[ INFO: suspicious RCU usage. ]
4.8.0-rc6+ #5 Not tainted
-------------------------------
./arch/x86/include/asm/msr-trace.h:47 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

RCU used illegally from idle CPU!
rcu_scheduler_active = 1, debug_locks = 0
RCU used illegally from extended quiescent state!
no locks held by swapper/2/0.

stack backtrace:
CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.8.0-rc6+ #5
Hardware name: Dell Inc. OptiPlex 7020/0F5C5X, BIOS A03 01/08/2015
 0000000000000000 ffff8d1bd6003f10 ffffffff94446949 ffff8d1bd4a68000
 0000000000000001 ffff8d1bd6003f40 ffffffff940e9247 ffff8d1bbdfcf3d0
 000000000000080b 0000000000000000 0000000000000000 ffff8d1bd6003f70
Call Trace:
 <IRQ>  [<ffffffff94446949>] dump_stack+0x99/0xd0
 [<ffffffff940e9247>] lockdep_rcu_suspicious+0xe7/0x120
 [<ffffffff9448e0d5>] do_trace_write_msr+0x135/0x140
 [<ffffffff9406e750>] native_write_msr+0x20/0x30
 [<ffffffff9406503d>] native_apic_msr_eoi_write+0x1d/0x30
 [<ffffffff9405b17e>] smp_trace_call_function_interrupt+0x1e/0x270
 [<ffffffff948cb1d6>] trace_call_function_interrupt+0x96/0xa0
 <EOI>  [<ffffffff947200f4>] ? cpuidle_enter_state+0xe4/0x360
 [<ffffffff947200df>] ? cpuidle_enter_state+0xcf/0x360
 [<ffffffff947203a7>] cpuidle_enter+0x17/0x20
 [<ffffffff940df008>] cpu_startup_entry+0x338/0x4d0
 [<ffffffff9405bfc4>] start_secondary+0x154/0x180

This can be reproduced readily by running ftrace test case of kselftest.

Move the irq_enter() call before ack_APIC_irq(), because irq_enter() tells
the RCU susbstems to end the extended quiescent state, so that the
following trace call in ack_APIC_irq() works correctly. The same applies to
exiting_ack_irq() which calls ack_APIC_irq() after irq_exit().

[ tglx: Massaged changelog ]
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Link: http://lkml.kernel.org/r/1474198491-3738-1-git-send-email-wanpeng.li@hotmail.comSigned-off-by: Thomas Gleixner <tglx@linutronix.de>

CVE-2018-3620
CVE-2018-3646

(cherry picked from commit b0f48706)
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

a56fcf6a

x86: Don't include linux/irq.h from asm/hardirq.h · 47a15da0

Nicolai Stange authored Jul 29, 2018

The next patch in this series will have to make the definition of
irq_cpustat_t available to entering_irq().

Inclusion of asm/hardirq.h into asm/apic.h would cause circular header
dependencies like

  asm/smp.h
    asm/apic.h
      asm/hardirq.h
        linux/irq.h
          linux/topology.h
            linux/smp.h
              asm/smp.h

or

  linux/gfp.h
    linux/mmzone.h
      asm/mmzone.h
        asm/mmzone_64.h
          asm/smp.h
            asm/apic.h
              asm/hardirq.h
                linux/irq.h
                  linux/irqdesc.h
                    linux/kobject.h
                      linux/sysfs.h
                        linux/kernfs.h
                          linux/idr.h
                            linux/gfp.h

and others.

This causes compilation errors because of the header guards becoming
effective in the second inclusion: symbols/macros that had been defined
before wouldn't be available to intermediate headers in the #include chain
anymore.

A possible workaround would be to move the definition of irq_cpustat_t
into its own header and include that from both, asm/hardirq.h and
asm/apic.h.

However, this wouldn't solve the real problem, namely asm/harirq.h
unnecessarily pulling in all the linux/irq.h cruft: nothing in
asm/hardirq.h itself requires it. Also, note that there are some other
archs, like e.g. arm64, which don't have that #include in their
asm/hardirq.h.

Remove the linux/irq.h #include from x86' asm/hardirq.h.

Fix resulting compilation errors by adding appropriate #includes to *.c
files as needed.

Note that some of these *.c files could be cleaned up a bit wrt. to their
set of #includes, but that should better be done from separate patches, if
at all.
Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

CVE-2018-3620
CVE-2018-3646

[smb: Heavily modified by cycles of compile-and-fix]
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

47a15da0

x86/KVM/VMX: Introduce per-host-cpu analogue of l1tf_flush_l1d · 70a63a1f

Nicolai Stange authored Jul 27, 2018

Part of the L1TF mitigation for vmx includes flushing the L1D cache upon
VMENTRY.

L1D flushes are costly and two modes of operations are provided to users:
"always" and the more selective "conditional" mode.

If operating in the latter, the cache would get flushed only if a host side
code path considered unconfined had been traversed. "Unconfined" in this
context means that it might have pulled in sensitive data like user data
or kernel crypto keys.

The need for L1D flushes is tracked by means of the per-vcpu flag
l1tf_flush_l1d. KVM exit handlers considered unconfined set it. A
vmx_l1d_flush() subsequently invoked before the next VMENTER will conduct a
L1d flush based on its value and reset that flag again.

Currently, interrupts delivered "normally" while in root operation between
VMEXIT and VMENTER are not taken into account. Part of the reason is that
these don't leave any traces and thus, the vmx code is unable to tell if
any such has happened.

As proposed by Paolo Bonzini, prepare for tracking all interrupts by
introducing a new per-cpu flag, "kvm_cpu_l1tf_flush_l1d". It will be in
strong analogy to the per-vcpu ->l1tf_flush_l1d.

A later patch will make interrupt handlers set it.

For the sake of cache locality, group kvm_cpu_l1tf_flush_l1d into x86'
per-cpu irq_cpustat_t as suggested by Peter Zijlstra.

Provide the helpers kvm_set_cpu_l1tf_flush_l1d(),
kvm_clear_cpu_l1tf_flush_l1d() and kvm_get_cpu_l1tf_flush_l1d(). Make them
trivial resp. non-existent for !CONFIG_KVM_INTEL as appropriate.

Let vmx_l1d_flush() handle kvm_cpu_l1tf_flush_l1d in the same way as
l1tf_flush_l1d.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

CVE-2018-3620
CVE-2018-3646
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

70a63a1f

x86/irq: Demote irq_cpustat_t::__softirq_pending to u16 · 2176bba5

Nicolai Stange authored Jul 27, 2018

An upcoming patch will extend KVM's L1TF mitigation in conditional mode
to also cover interrupts after VMEXITs. For tracking those, stores to a
new per-cpu flag from interrupt handlers will become necessary.

In order to improve cache locality, this new flag will be added to x86's
irq_cpustat_t.

Make some space available there by shrinking the ->softirq_pending bitfield
from 32 to 16 bits: the number of bits actually used is only NR_SOFTIRQS,
i.e. 10.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

CVE-2018-3620
CVE-2018-3646
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

2176bba5

x86/KVM/VMX: Move the l1tf_flush_l1d test to vmx_l1d_flush() · f7329c08

Nicolai Stange authored Jul 21, 2018

Currently, vmx_vcpu_run() checks if l1tf_flush_l1d is set and invokes
vmx_l1d_flush() if so.

This test is unncessary for the "always flush L1D" mode.

Move the check to vmx_l1d_flush()'s conditional mode code path.

Notes:
- vmx_l1d_flush() is likely to get inlined anyway and thus, there's no
  extra function call.

- This inverts the (static) branch prediction, but there hadn't been any
  explicit likely()/unlikely() annotations before and so it stays as is.
Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

CVE-2018-3620
CVE-2018-3646

[smb: Some minor context adjustments in second hunk]
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

f7329c08

x86/KVM/VMX: Replace 'vmx_l1d_flush_always' with 'vmx_l1d_flush_cond' · f9a1bc63

Nicolai Stange authored Jul 21, 2018

The vmx_l1d_flush_always static key is only ever evaluated if
vmx_l1d_should_flush is enabled. In that case however, there are only two
L1d flushing modes possible: "always" and "conditional".

The "conditional" mode's implementation tends to require more sophisticated
logic than the "always" mode.

Avoid inverted logic by replacing the 'vmx_l1d_flush_always' static key
with a 'vmx_l1d_flush_cond' one.

There is no change in functionality.
Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

CVE-2018-3620
CVE-2018-3646
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

f9a1bc63

x86/KVM/VMX: Don't set l1tf_flush_l1d to true from vmx_l1d_flush() · 0661b923

Nicolai Stange authored Jul 21, 2018

vmx_l1d_flush() gets invoked only if l1tf_flush_l1d is true. There's no
point in setting l1tf_flush_l1d to true from there again.
Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

CVE-2018-3620
CVE-2018-3646
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

0661b923

cpu/hotplug: detect SMT disabled by BIOS · 9541fd1b

Josh Poimboeuf authored Jul 24, 2018

If SMT is disabled in BIOS, the CPU code doesn't properly detect it.
The /sys/devices/system/cpu/smt/control file shows 'on', and the 'l1tf'
vulnerabilities file shows SMT as vulnerable.

Fix it by forcing 'cpu_smt_control' to CPU_SMT_NOT_SUPPORTED in such a
case.  Unfortunately the detection can only be done after bringing all
the CPUs online, so we have to overwrite any previous writes to the
variable.
Reported-by: Joe Mario <jmario@redhat.com>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Fixes: f048c399 ("x86/topology: Provide topology_smt_supported()")
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>

CVE-2018-3620
CVE-2018-3646
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

9541fd1b

Documentation/l1tf: Fix typos · 3f1a1526

Tony Luck authored Jul 19, 2018

Fix spelling and other typos
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

CVE-2018-3620
CVE-2018-3646

[smb: Location changed to Documentation/security/l1tf.txt]
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

3f1a1526

x86/KVM/VMX: Initialize the vmx_l1d_flush_pages' content · bbd637c6

Nicolai Stange authored Jul 18, 2018

The slow path in vmx_l1d_flush() reads from vmx_l1d_flush_pages in order
to evict the L1d cache.

However, these pages are never cleared and, in theory, their data could be
leaked.

More importantly, KSM could merge a nested hypervisor's vmx_l1d_flush_pages
to fewer than 1 << L1D_CACHE_ORDER host physical pages and this would break
the L1d flushing algorithm: L1D on x86_64 is tagged by physical addresses.

Fix this by initializing the individual vmx_l1d_flush_pages with a
different pattern each.

Rename the "empty_zp" asm constraint identifier in vmx_l1d_flush() to
"flush_pages" to reflect this change.

Fixes: a47dd5f0 ("x86/KVM/VMX: Add L1D flush algorithm")
Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

CVE-2018-3620
CVE-2018-3646
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

bbd637c6

x86/speculation/l1tf: Unbreak !__HAVE_ARCH_PFN_MODIFY_ALLOWED architectures · fa535113

Jiri Kosina authored Jul 14, 2018

pfn_modify_allowed() and arch_has_pfn_modify_check() are outside of the
!__ASSEMBLY__ section in include/asm-generic/pgtable.h, which confuses
assembler on archs that don't have __HAVE_ARCH_PFN_MODIFY_ALLOWED (e.g.
ia64) and breaks build:

    include/asm-generic/pgtable.h: Assembler messages:
    include/asm-generic/pgtable.h:538: Error: Unknown opcode `static inline bool pfn_modify_allowed(unsigned long pfn,pgprot_t prot)'
    include/asm-generic/pgtable.h:540: Error: Unknown opcode `return true'
    include/asm-generic/pgtable.h:543: Error: Unknown opcode `static inline bool arch_has_pfn_modify_check(void)'
    include/asm-generic/pgtable.h:545: Error: Unknown opcode `return false'
    arch/ia64/kernel/entry.S:69: Error: `mov' does not fit into bundle

Move those two static inlines into the !__ASSEMBLY__ section so that they
don't confuse the asm build pass.

Fixes: 42e4089c ("x86/speculation/l1tf: Disallow non privileged high MMIO PROT_NONE mappings")
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

CVE-2018-3620
CVE-2018-3646

[smb: Reviewed and accepted two fuzz hunks]
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

fa535113

Documentation: Add section about CPU vulnerabilities · 30a9dc08

Thomas Gleixner authored Jul 13, 2018

Add documentation for the L1TF vulnerability and the mitigation mechanisms:

  - Explain the problem and risks
  - Document the mitigation mechanisms
  - Document the command line controls
  - Document the sysfs files
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lkml.kernel.org/r/20180713142323.287429944@linutronix.de

CVE-2018-3620
CVE-2018-3646

[smb: Added document as Documentation/security/l1tf.txt instead.]
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

30a9dc08

x86/bugs, kvm: Introduce boot-time control of L1TF mitigations · 7700d76e

Jiri Kosina authored Jul 13, 2018

Introduce the 'l1tf=' kernel command line option to allow for boot-time
switching of mitigation that is used on processors affected by L1TF.

The possible values are:

  full
	Provides all available mitigations for the L1TF vulnerability. Disables
	SMT and enables all mitigations in the hypervisors. SMT control via
	/sys/devices/system/cpu/smt/control is still possible after boot.
	Hypervisors will issue a warning when the first VM is started in
	a potentially insecure configuration, i.e. SMT enabled or L1D flush
	disabled.

  full,force
	Same as 'full', but disables SMT control. Implies the 'nosmt=force'
	command line option. sysfs control of SMT and the hypervisor flush
	control is disabled.

  flush
	Leaves SMT enabled and enables the conditional hypervisor mitigation.
	Hypervisors will issue a warning when the first VM is started in a
	potentially insecure configuration, i.e. SMT enabled or L1D flush
	disabled.

  flush,nosmt
	Disables SMT and enables the conditional hypervisor mitigation. SMT
	control via /sys/devices/system/cpu/smt/control is still possible
	after boot. If SMT is reenabled or flushing disabled at runtime
	hypervisors will issue a warning.

  flush,nowarn
	Same as 'flush', but hypervisors will not warn when
	a VM is started in a potentially insecure configuration.

  off
	Disables hypervisor mitigations and doesn't emit any warnings.

Default is 'flush'.

Let KVM adhere to these semantics, which means:

  - 'lt1f=full,force'	: Performe L1D flushes. No runtime control
    			  possible.

  - 'l1tf=full'
  - 'l1tf-flush'
  - 'l1tf=flush,nosmt'	: Perform L1D flushes and warn on VM start if
			  SMT has been runtime enabled or L1D flushing
			  has been run-time enabled

  - 'l1tf=flush,nowarn'	: Perform L1D flushes and no warnings are emitted.

  - 'l1tf=off'		: L1D flushes are not performed and no warnings
			  are emitted.

KVM can always override the L1D flushing behavior using its 'vmentry_l1d_flush'
module parameter except when lt1f=full,force is set.

This makes KVM's private 'nosmt' option redundant, and as it is a bit
non-systematic anyway (this is something to control globally, not on
hypervisor level), remove that option.

Add the missing Documentation entry for the l1tf vulnerability sysfs file
while at it.
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20180713142323.202758176@linutronix.de

CVE-2018-3620
CVE-2018-3646

[smb: Minor context adjustments and adapt location of l1tf doc.]
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

7700d76e

cpu/hotplug: Set CPU_SMT_NOT_SUPPORTED early · 94353874

Thomas Gleixner authored Jul 13, 2018

The CPU_SMT_NOT_SUPPORTED state is set (if the processor does not support
SMT) when the sysfs SMT control file is initialized.

That was fine so far as this was only required to make the output of the
control file correct and to prevent writes in that case.

With the upcoming l1tf command line parameter, this needs to be set up
before the L1TF mitigation selection and command line parsing happens.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20180713142323.121795971@linutronix.de

CVE-2018-3620
CVE-2018-3646
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

94353874

cpu/hotplug: Expose SMT control init function · 7b2ab6e5

Jiri Kosina authored Jul 13, 2018

The L1TF mitigation will gain a commend line parameter which allows to set
a combination of hypervisor mitigation and SMT control.

Expose cpu_smt_disable() so the command line parser can tweak SMT settings.

[ tglx: Split out of larger patch and made it preserve an already existing
  	force off state ]
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20180713142323.039715135@linutronix.de

CVE-2018-3620
CVE-2018-3646
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

7b2ab6e5

x86/kvm: Allow runtime control of L1D flush · 29ab360f

Thomas Gleixner authored Jul 13, 2018

All mitigation modes can be switched at run time with a static key now:

 - Use sysfs_streq() instead of strcmp() to handle the trailing new line
   from sysfs writes correctly.
 - Make the static key management handle multiple invocations properly.
 - Set the module parameter file to RW
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20180713142322.954525119@linutronix.de

CVE-2018-3620
CVE-2018-3646

[smb: Reviewed and accepted fuzz in last hunk]
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

29ab360f

x86/kvm: Serialize L1D flush parameter setter · bf502aa8

Thomas Gleixner authored Jul 13, 2018

Writes to the parameter files are not serialized at the sysfs core
level, so local serialization is required.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20180713142322.873642605@linutronix.de

CVE-2018-3620
CVE-2018-3646
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

bf502aa8

x86/kvm: Add static key for flush always · dc17b0f7

Thomas Gleixner authored Jul 13, 2018

Avoid the conditional in the L1D flush control path.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20180713142322.790914912@linutronix.de

CVE-2018-3620
CVE-2018-3646
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

dc17b0f7

x86/kvm: Move l1tf setup function · dc47bec7

Thomas Gleixner authored Jul 13, 2018

In preparation of allowing run time control for L1D flushing, move the
setup code to the module parameter handler.

In case of pre module init parsing, just store the value and let vmx_init()
do the actual setup after running kvm_init() so that enable_ept is having
the correct state.

During run-time invoke it directly from the parameter setter to prepare for
run-time control.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20180713142322.694063239@linutronix.de

CVE-2018-3620
CVE-2018-3646

[smb: Accept/reviewed fuzz]
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

dc47bec7

x86/l1tf: Handle EPT disabled state proper · ed61097d

Thomas Gleixner authored Jul 13, 2018

If Extended Page Tables (EPT) are disabled or not supported, no L1D
flushing is required. The setup function can just avoid setting up the L1D
flush for the EPT=n case.

Invoke it after the hardware setup has be done and enable_ept has the
correct state and expose the EPT disabled state in the mitigation status as
well.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20180713142322.612160168@linutronix.de

CVE-2018-3620
CVE-2018-3646

[smb: Adjusted to work around missing hyperv support in vmx_exit()]
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

ed61097d

x86/kvm: Drop L1TF MSR list approach · 7fea4047

Thomas Gleixner authored Jul 13, 2018

The VMX module parameter to control the L1D flush should become
writeable.

The MSR list is set up at VM init per guest VCPU, but the run time
switching is based on a static key which is global. Toggling the MSR list
at run time might be feasible, but for now drop this optimization and use
the regular MSR write to make run-time switching possible.

The default mitigation is the conditional flush anyway, so for extra
paranoid setups this will add some small overhead, but the extra code
executed is in the noise compared to the flush itself.

Aside of that the EPT disabled case is not handled correctly at the moment
and the MSR list magic is in the way for fixing that as well.

If it's really providing a significant advantage, then this needs to be
revisited after the code is correct and the control is writable.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20180713142322.516940445@linutronix.de

CVE-2018-3620
CVE-2018-3646

[smb: Minor context adjustment in one hunk. FIXME: Should be merged
      with the patch that adds this and possibly dropped completely.]
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

7fea4047

x86/litf: Introduce vmx status variable · 4757e268

Thomas Gleixner authored Jul 13, 2018

Store the effective mitigation of VMX in a status variable and use it to
report the VMX state in the l1tf sysfs file.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20180713142322.433098358@linutronix.de

CVE-2018-3620
CVE-2018-3646

[smb: Minor context adjustment in last hunk]
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

4757e268

cpu/hotplug: Online siblings when SMT control is turned on · eea468d2

Thomas Gleixner authored Jul 07, 2018

Writing 'off' to /sys/devices/system/cpu/smt/control offlines all SMT
siblings. Writing 'on' merily enables the abilify to online them, but does
not online them automatically.

Make 'on' more useful by onlining all offline siblings.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

CVE-2018-3620
CVE-2018-3646

[smb: _cpu_up() only has 2 arguments]
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

eea468d2

x86/KVM/VMX: Use MSR save list for IA32_FLUSH_CMD if required · 3c4f1d15

Konrad Rzeszutek Wilk authored Jun 28, 2018

If the L1D flush module parameter is set to 'always' and the IA32_FLUSH_CMD
MSR is available, optimize the VMENTER code with the MSR save list.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

CVE-2018-3620
CVE-2018-3646

[smb: Minor context adjustments and ensure hubk #2 does not get
      applied into the wrong place.]
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

3c4f1d15

x86/KVM/VMX: Extend add_atomic_switch_msr() to allow VMENTER only MSRs · a844b017

Konrad Rzeszutek Wilk authored Jun 20, 2018

The IA32_FLUSH_CMD MSR needs only to be written on VMENTER. Extend
add_atomic_switch_msr() with an entry_only parameter to allow storing the
MSR only in the guest (ENTRY) MSR array.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

CVE-2018-3620
CVE-2018-3646
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

a844b017

x86/KVM/VMX: Seperate the VMX AUTOLOAD guest/host number accounting. · 62bb770c

Konrad Rzeszutek Wilk authored Jun 20, 2018

This allows to load a different number of MSRs depending on the context:
VMEXIT or VMENTER.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

CVE-2018-3620
CVE-2018-3646

[smb: Minor context adjustments]
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

62bb770c

x86/KVM/VMX: Add find_msr() helper function · 40914af6

Konrad Rzeszutek Wilk authored Jun 20, 2018

.. to help find the MSR on either the guest or host MSR list.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

CVE-2018-3620
CVE-2018-3646
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

40914af6

x86/KVM/VMX: Split the VMX MSR LOAD structures to have an host/guest numbers · c48b1fb8

Konrad Rzeszutek Wilk authored Jun 20, 2018

There is no semantic change but this change allows an unbalanced amount of
MSRs to be loaded on VMEXIT and VMENTER, i.e. the number of MSRs to save or
restore on VMEXIT or VMENTER may be different.

That is the number of MSRs to save or restore on VMEXIT or VMENTER may
be different.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

CVE-2018-3620
CVE-2018-3646

[smb: Drop 2 hunks modifying nested which does not exist, yet]
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

c48b1fb8

x86/KVM/VMX: Add L1D flush logic · 57880666

Paolo Bonzini authored Jul 02, 2018

Add the logic for flushing L1D on VMENTER. The flush depends on the static
key being enabled and the new l1tf_flush_l1d flag being set.

The flags is set:
 - Always, if the flush module parameter is 'always'

 - Conditionally at:
   - Entry to vcpu_run(), i.e. after executing user space

   - From the sched_in notifier, i.e. when switching to a vCPU thread.

   - From vmexit handlers which are considered unsafe, i.e. where
     sensitive data can be brought into L1D:

     - The emulator, which could be a good target for other speculative
       execution-based threats,

     - The MMU, which can bring host page tables in the L1 cache.

     - External interrupts

     - Nested operations that require the MMU (see above). That is
       vmptrld, vmptrst, vmclear,vmwrite,vmread.

     - When handling invept,invvpid

[ tglx: Split out from combo patch and reduced to a single flag ]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

CVE-2018-3620
CVE-2018-3646

[smb: Moved change to kvm/mmu.c(kvm_handle_page_fault) into kvm/vmx.c
      before calling kvm_mmu_page_fault(). Left kvm/svm.c unmodified
      as AMD is not said to be affected.]
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

57880666