- 06 Nov, 2019 40 commits
-
-
Junaid Shahid authored
The page table pages corresponding to broken down large pages are zapped in FIFO order, so that the large page can potentially be recovered, if it is no longer being used for execution. This removes the performance penalty for walking deeper EPT page tables. By default, one large page will last about one hour once the guest reaches a steady state. Signed-off-by: Junaid Shahid <junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> CVE-2018-12207 [tyhicks: Backport to 4.4 - Minor context adjustments due to different members of struct kvm_mmu_page and kvm_arch and lack of per-vm debugfs functionality - kernel-parameters.txt is up one directory level] Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
-
Junaid Shahid authored
This adds a function to create a kernel thread associated with a given VM. In particular, it ensures that the worker thread inherits the priority and cgroups of the calling thread. Signed-off-by: Junaid Shahid <junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> CVE-2018-12207 [tyhicks: Backport to 4.4 - Fix up conflicts in #includes of kvm_main.c - Minor context adjustments in kvm_host.h] Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
-
Paolo Bonzini authored
With some Intel processors, putting the same virtual address in the TLB as both a 4 KiB and 2 MiB page can confuse the instruction fetch unit and cause the processor to issue a machine check. Unfortunately if EPT page tables use huge pages, it possible for a malicious guest to cause this situation. This patch adds a knob to mark huge pages as non-executable. When the nx_huge_pages parameter is enabled (and we are using EPT), all huge pages are marked as NX. If the guest attempts to execute in one of those pages, the page is broken down into 4K pages, which are then marked executable. This is not an issue for shadow paging (except nested EPT), because then the host is in control of TLB flushes and the problematic situation cannot happen. With nested EPT, again the nested guest can cause problems so we treat shadow and direct EPT the same. Signed-off-by: Junaid Shahid <junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> CVE-2018-12207 [tyhicks: Backport to 4.4 - Context adjustments due to missing mmio_cached and unsync members of struct kvm_mmu_page and missing kvm_set_mmio_spte_mask() - Call kvm_mmu_invalidate_zap_all_pages() instead of kvm_mmu_zap_all_fast() since the latter does not exist - Continue to use pfn_t in place of kvm_pfn_t - kernel-parameters.txt is up one directory level - Don't create a "nx_largepages_splitted" debugfs entry since per-VM debugfs entries are not yet supported] Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
-
Pawan Gupta authored
Some processors may incur a machine check error possibly resulting in an unrecoverable cpu hang when an instruction fetch encounters a TLB multi-hit in the instruction TLB. This can occur when the page size is changed along with either the physical address or cache type [1]. This issue affects both bare-metal x86 page tables and EPT. This can be mitigated by either eliminating the use of large pages or by using careful TLB invalidations when changing the page size in the page tables. Just like Spectre, Meltdown, L1TF and MDS, a new bit has been allocated in MSR_IA32_ARCH_CAPABILITIES (PSCHANGE_MC_NO) and will be set on CPUs which are mitigated against this issue. [1] For example please refer to erratum SKL002 in "6th Generation Intel Processor Family Specification Update" https://www.intel.com/content/www/us/en/products/docs/processors/core/desktop-6th-gen-core-family-spec-update.html https://www.google.com/search?q=site:intel.com+SKL002 There are a lot of other affected processors outside of Skylake and that the erratum(referred above) does not fully disclose the issue and the impact, both on Skylake and across all the affected CPUs. Signed-off-by: Vineela Tummalapalli <vineela.tummalapalli@intel.com> Co-developed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> CVE-2018-12207 [tyhicks: Backport to 4.15 - ATOM_SILVERMONT_D is ATOM_SILVERMONT_X - ATOM_AIRMONT_NP does not yet exist - ATOM_GOLDMONT_D is ATOM_GOLDMONT_X - Hygon isn't supported to VULNWL_HYGON() does not exist] Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
-
Paolo Bonzini authored
VMX already does so if the host has SMEP, in order to support the combination of CR0.WP=1 and CR4.SMEP=1. However, it is perfectly safe to always do so, and in fact VMX already ends up running with EFER.NXE=1 on old processors that lack the "load EFER" controls, because it may help avoiding a slow MSR write. Removing all the conditionals simplifies the code. SVM does not have similar code, but it should since recent AMD processors do support SMEP. So this patch also makes the code for the two vendors more similar while fixing NPT=0, CR0.WP=1 and CR4.SMEP=1 on AMD processors. Cc: stable@vger.kernel.org Cc: Joerg Roedel <jroedel@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Joerg Roedel <jroedel@suse.de> CVE-2018-12207 [tyhicks: Backport to 4.15 - vmx.c is up one directory level] Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
-
Paolo Bonzini authored
These are useful in debugging shadow paging. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> CVE-2018-12207 (backported from commit 335e192a) [tyhicks: Backport to 4.4 - Continue to use pfn_t instead of kvm_pfn_t - Remove the use of shadow_present_mask in the kvm_mmu_set_spte trace point since we don't have commit ffb128c8 ("kvm: mmu: don't set the present bit unconditionally") - Open code is_executable_pte() in the kvm_mmu_set_spte() trace point since that function doesn't exit] Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
-
Paolo Bonzini authored
Note that in such a case it is quite likely that KVM will BUG_ON in __pte_list_remove when the VM is closed. However, there is no immediate risk of memory corruption in the host so a WARN_ON is enough and it lets you gather traces for debugging. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> CVE-2018-12207 (cherry picked from commit e9f2a760) Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
-
Paolo Bonzini authored
After the previous patch, the low bits of the gfn are masked in both FNAME(fetch) and __direct_map, so we do not need to clear them in transparent_hugepage_adjust. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> CVE-2018-12207 (backported from commit d679b326) [tyhicks: Backport to 4.4 - Continue to use a pointer to pfn_t for the type of the the pfnp parameter] Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
-
Paolo Bonzini authored
These two functions are basically doing the same thing through kvm_mmu_get_page, link_shadow_page and mmu_set_spte; yet, for historical reasons, their code looks very different. This patch tries to take the best of each and make them very similar, so that it is easy to understand changes that apply to both of them. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> CVE-2018-12207 (backported from commit 3fcf2d1b) [tyhicks: Backport to 4.4 - Minor context change due to mmu not being a pointer in the kvm_vcpu_arch struct - Continue to use pfn_t for the type of the pfn parameter of __direct_map()] Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
-
Junaid Shahid authored
Release the page at the call-site where it was originally acquired. This makes the exit code cleaner for most call sites, since they do not need to duplicate code between success and the failure label. Signed-off-by: Junaid Shahid <junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> CVE-2018-12207 (backportedfrom commit 43fdcda9) [tyhicks: Backport to 4.4 - Adjust for differences in call sites of __direct_map() in mmu.c] Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
-
Junaid Shahid authored
It doesn't seem as if there is any particular need for kvm_lock to be a spinlock, so convert the lock to a mutex so that sleepable functions (in particular cond_resched()) can be called while holding it. Signed-off-by: Junaid Shahid <junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> CVE-2018-12207 (backported from commit 0d9ce162) [tyhicks: Backport to 4.4 - kvm_hyperv_tsc_notifier() does not exist - Adjust for surrounding code changes kvm-s390.c and kvm_main.c] Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
-
Peter Xu authored
It's never used. Drop it. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> CVE-2018-12207 (backported from commit 42522d08) [tyhicks: Backport to 4.4 - Considerable context differences due to code changes but nothing too complex] Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
-
Paolo Bonzini authored
The x86 MMU if full of code that returns 0 and 1 for retry/emulate. Use the existing RET_MMIO_PF_RETRY/RET_MMIO_PF_EMULATE enum, renaming it to drop the MMIO part. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> CVE-2018-12207 (backported from commit 9b8ebbdb) [tyhicks: Backport to 4.4 - Some conditionals that checked for tracked pages did not exist due to the lack of commit 3d0c27ad ("KVM: MMU: let page fault handler be aware tracked page") - lower_32_bits(error_code) is not used in kvm_mmu_page_fault() due to the lack of commit 14727754 ("kvm: svm: Add support for additional SVM NPF error codes")] Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
-
Paolo Bonzini authored
Calling handle_mmio_page_fault() has been unnecessary since commit e9ee956e ("KVM: x86: MMU: Move handle_mmio_page_fault() call to kvm_mmu_page_fault()", 2016-02-22). handle_mmio_page_fault() can now be made static. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> CVE-2018-12207 (backported from commit e08d26f0) [tyhicks: Backport to 4.4 - Minor context change in handle_ept_misconfig() due to missing commit db1c056c ("kvm: vmx: Use the hardware provided GPA instead of page walk")] Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
-
Takuya Yoshikawa authored
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> CVE-2018-12207 (cherry picked from commit bb11c6c9) Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
-
Takuya Yoshikawa authored
Every time kvm_mmu_get_page() is called with a non-NULL parent_pte argument, link_shadow_page() follows that to set the parent entry so that the new mapping will point to the returned page table. Moving parent_pte handling there allows to clean up the code because parent_pte is passed to kvm_mmu_get_page() just for mark_unsync() and mmu_page_add_parent_pte(). In addition, the patch avoids calling mark_unsync() for other parents in the sp->parent_ptes chain than the newly added parent_pte, because they have been there since before the current page fault handling started. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> CVE-2018-12207 (cherry picked from commit 98bba238) Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
-
Paolo Bonzini authored
Commit 7a1638ce ("nEPT: Redefine EPT-specific link_shadow_page()", 2013-08-05) says: Since nEPT doesn't support A/D bit, we should not set those bit when building the shadow page table. but this is not necessary. Even though nEPT doesn't support A/D bits, and hence the vmcs12 EPT pointer will never enable them, we always use them for shadow page tables if available (see construct_eptp in vmx.c). So we can set the A/D bits freely in the shadow page table. This patch hence basically reverts commit 7a1638ce. Cc: Yang Zhang <yang.z.zhang@Intel.com> Cc: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> CVE-2018-12207 (cherry picked from commit 0e3d0648) Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
-
Takuya Yoshikawa authored
Make kvm_mmu_alloc_page() do just what its name tells to do, and remove the extra allocation error check and zero-initialization of parent_ptes: shadow page headers allocated by kmem_cache_zalloc() are always in the per-VCPU pools. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> CVE-2018-12207 (cherry picked from commit 47005792) Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
-
Takuya Yoshikawa authored
mmu_set_spte()'s code is based on the assumption that the emulate parameter has a valid pointer value if set_spte() returns true and write_fault is not zero. In other cases, emulate may be NULL, so a NULL-check is needed. Stop passing emulate pointer and make mmu_set_spte() return the emulate value instead to clean up this complex interface. Prefetch functions can just throw away the return value. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> CVE-2018-12207 (cherry picked from commit 029499b4) Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
-
Xiao Guangrong authored
Abstract the common operations from account_shadowed() and unaccount_shadowed(), then introduce kvm_mmu_gfn_disallow_lpage() and kvm_mmu_gfn_allow_lpage() These two functions will be used by page tracking in the later patch Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> CVE-2018-12207 (cherry picked from commit 547ffaed) Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
-
Xiao Guangrong authored
kvm_lpage_info->write_count is used to detect if the large page mapping for the gfn on the specified level is allowed, rename it to disallow_lpage to reflect its purpose, also we rename has_wrprotected_page() to mmu_gfn_lpage_is_disallowed() to make the code more clearer Later we will extend this mechanism for page tracking: if the gfn is tracked then large mapping for that gfn on any level is not allowed. The new name is more straightforward Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> CVE-2018-12207 (cherry picked from commit 92f94f1e) Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
-
Takuya Yoshikawa authored
Rather than placing a handle_mmio_page_fault() call in each vcpu->arch.mmu.page_fault() handler, moving it up to kvm_mmu_page_fault() makes the code better: - avoids code duplication - for kvm_arch_async_page_ready(), which is the other caller of vcpu->arch.mmu.page_fault(), removes an extra error_code check - avoids returning both RET_MMIO_PF_* values and raw integer values from vcpu->arch.mmu.page_fault() Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> CVE-2018-12207 (cherry picked from commit e9ee956e) Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
-
Takuya Yoshikawa authored
These two have only slight differences: - whether 'addr' is of type u64 or of type gva_t - whether they have 'direct' parameter or not Concerning the former, quickly_check_mmio_pf()'s u64 is better because 'addr' needs to be able to have both a guest physical address and a guest virtual address. The latter is just a stylistic issue as we can always calculate the mode from the 'vcpu' as is_mmio_page_fault() does. This patch keeps the parameter to make the following patch cleaner. In addition, the patch renames the function to mmio_info_in_cache() to make it clear what it actually checks for. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> CVE-2018-12207 (cherry picked from commit ded58749) Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
-
Takuya Yoshikawa authored
New struct kvm_rmap_head makes the code type-safe to some extent. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> CVE-2018-12207 (cherry picked from commit 018aabb5) Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
-
Tyler Hicks authored
Turn on CONFIG_X86_INTEL_TSX_MODE_OFF to disable Intel's Transactional Synchronization Extensions (TSX) feature by default. TSX can only be disable on certain, newer processors that support the IA32_TSX_CTRL MSR via a microcode update. Intel says that future processors will also support the MSR. On processors that support the MSR, TSX will be disabled unless the system administrator overrides the configuration with the "tsx" kernel command line option. CVE-2019-11135 Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
-
Tyler Hicks authored
The linux-4.14.y backport of commit 286836a7 ("x86/cpu: Add a helper function x86_read_arch_cap_msr()") added a dependency on cpu.h from bugs.c so include the header file from bugs.c. CVE-2019-11135 Suggested-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
-
Tyler Hicks authored
The linux-4.14.y backport of upstream commit 95c5824f ("x86/cpu: Add a "tsx=" cmdline option with TSX disabled by default") incorrectly dropped the call to tsx_init(). Add the function call back to identify_boot_cpu() CVE-2019-11135 Suggested-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
-
Michal Hocko authored
commit db616173 upstream There is a general consensus that TSX usage is not largely spread while the history shows there is a non trivial space for side channel attacks possible. Therefore the tsx is disabled by default even on platforms that might have a safe implementation of TSX according to the current knowledge. This is a fair trade off to make. There are, however, workloads that really do benefit from using TSX and updating to a newer kernel with TSX disabled might introduce a noticeable regressions. This would be especially a problem for Linux distributions which will provide TAA mitigations. Introduce config options X86_INTEL_TSX_MODE_OFF, X86_INTEL_TSX_MODE_ON and X86_INTEL_TSX_MODE_AUTO to control the TSX feature. The config setting can be overridden by the tsx cmdline options. [ bp: Text cleanups from Josh. ] Suggested-by: Borislav Petkov <bpetkov@suse.de> Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> CVE-2019-11135 [tyhicks: Backport to 4.4 - Minor context adjustment in arch/x86/Kconfig due to different surrounding Kconfig options] Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
-
Pawan Gupta authored
commit a7a248c5 upstream Add the documenation for TSX Async Abort. Include the description of the issue, how to check the mitigation state, control the mitigation, guidance for system administrators. [ bp: Add proper SPDX tags, touch ups by Josh and me. ] Co-developed-by: Antonio Gomez Iglesias <antonio.gomez.iglesias@intel.com> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Antonio Gomez Iglesias <antonio.gomez.iglesias@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Mark Gross <mgross@linux.intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> CVE-2019-11135 [tyhicks: Backport to 4.4 - kernel-parameters.txt is up one directory level] Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
-
Pawan Gupta authored
commit 7531a359 upstream Platforms which are not affected by X86_BUG_TAA may want the TSX feature enabled. Add "auto" option to the TSX cmdline parameter. When tsx=auto disable TSX when X86_BUG_TAA is present, otherwise enable TSX. More details on X86_BUG_TAA can be found here: https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/tsx_async_abort.html [ bp: Extend the arg buffer to accommodate "auto\0". ] Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> CVE-2019-11135 [tyhicks: Backport to 4.4 - kernel-parameters.txt is up one directory level] Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
-
Pawan Gupta authored
commit e1d38b63 upstream Export the IA32_ARCH_CAPABILITIES MSR bit MDS_NO=0 to guests on TSX Async Abort(TAA) affected hosts that have TSX enabled and updated microcode. This is required so that the guests don't complain, "Vulnerable: Clear CPU buffers attempted, no microcode" when the host has the updated microcode to clear CPU buffers. Microcode update also adds support for MSR_IA32_TSX_CTRL which is enumerated by the ARCH_CAP_TSX_CTRL bit in IA32_ARCH_CAPABILITIES MSR. Guests can't do this check themselves when the ARCH_CAP_TSX_CTRL bit is not exported to the guests. In this case export MDS_NO=0 to the guests. When guests have CPUID.MD_CLEAR=1, they deploy MDS mitigation which also mitigates TAA. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Neelima Krishnan <neelima.krishnan@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> CVE-2019-11135 Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
-
Pawan Gupta authored
commit 6608b45a upstream Add the sysfs reporting file for TSX Async Abort. It exposes the vulnerability and the mitigation state similar to the existing files for the other hardware vulnerabilities. Sysfs file path is: /sys/devices/system/cpu/vulnerabilities/tsx_async_abort Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Neelima Krishnan <neelima.krishnan@intel.com> Reviewed-by: Mark Gross <mgross@linux.intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> CVE-2019-11135 Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
-
Pawan Gupta authored
commit 1b42f017 upstream TSX Async Abort (TAA) is a side channel vulnerability to the internal buffers in some Intel processors similar to Microachitectural Data Sampling (MDS). In this case, certain loads may speculatively pass invalid data to dependent operations when an asynchronous abort condition is pending in a TSX transaction. This includes loads with no fault or assist condition. Such loads may speculatively expose stale data from the uarch data structures as in MDS. Scope of exposure is within the same-thread and cross-thread. This issue affects all current processors that support TSX, but do not have ARCH_CAP_TAA_NO (bit 8) set in MSR_IA32_ARCH_CAPABILITIES. On CPUs which have their IA32_ARCH_CAPABILITIES MSR bit MDS_NO=0, CPUID.MD_CLEAR=1 and the MDS mitigation is clearing the CPU buffers using VERW or L1D_FLUSH, there is no additional mitigation needed for TAA. On affected CPUs with MDS_NO=1 this issue can be mitigated by disabling the Transactional Synchronization Extensions (TSX) feature. A new MSR IA32_TSX_CTRL in future and current processors after a microcode update can be used to control the TSX feature. There are two bits in that MSR: * TSX_CTRL_RTM_DISABLE disables the TSX sub-feature Restricted Transactional Memory (RTM). * TSX_CTRL_CPUID_CLEAR clears the RTM enumeration in CPUID. The other TSX sub-feature, Hardware Lock Elision (HLE), is unconditionally disabled with updated microcode but still enumerated as present by CPUID(EAX=7).EBX{bit4}. The second mitigation approach is similar to MDS which is clearing the affected CPU buffers on return to user space and when entering a guest. Relevant microcode update is required for the mitigation to work. More details on this approach can be found here: https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html The TSX feature can be controlled by the "tsx" command line parameter. If it is force-enabled then "Clear CPU buffers" (MDS mitigation) is deployed. The effective mitigation state can be read from sysfs. [ bp: - massage + comments cleanup - s/TAA_MITIGATION_TSX_DISABLE/TAA_MITIGATION_TSX_DISABLED/g - Josh. - remove partial TAA mitigation in update_mds_branch_idle() - Josh. - s/tsx_async_abort_cmdline/tsx_async_abort_parse_cmdline/g ] Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> CVE-2019-11135 Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
-
Pawan Gupta authored
commit 95c5824f upstream Add a kernel cmdline parameter "tsx" to control the Transactional Synchronization Extensions (TSX) feature. On CPUs that support TSX control, use "tsx=on|off" to enable or disable TSX. Not specifying this option is equivalent to "tsx=off". This is because on certain processors TSX may be used as a part of a speculative side channel attack. Carve out the TSX controlling functionality into a separate compilation unit because TSX is a CPU feature while the TSX async abort control machinery will go to cpu/bugs.c. [ bp: - Massage, shorten and clear the arg buffer. - Clarifications of the tsx= possible options - Josh. - Expand on TSX_CTRL availability - Pawan. ] Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> CVE-2019-11135 [tyhicks: Backport to 4.4 - kernel-parameters.txt is up one directory level - Minor context adjustment in init_intel() because init_intel_misc_features() is not called] Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
-
Pawan Gupta authored
commit 286836a7 upstream Add a helper function to read the IA32_ARCH_CAPABILITIES MSR. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Neelima Krishnan <neelima.krishnan@intel.com> Reviewed-by: Mark Gross <mgross@linux.intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> CVE-2019-11135 Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
-
Pawan Gupta authored
commit c2955f27 upstream Transactional Synchronization Extensions (TSX) may be used on certain processors as part of a speculative side channel attack. A microcode update for existing processors that are vulnerable to this attack will add a new MSR - IA32_TSX_CTRL to allow the system administrator the option to disable TSX as one of the possible mitigations. The CPUs which get this new MSR after a microcode upgrade are the ones which do not set MSR_IA32_ARCH_CAPABILITIES.MDS_NO (bit 5) because those CPUs have CPUID.MD_CLEAR, i.e., the VERW implementation which clears all CPU buffers takes care of the TAA case as well. [ Note that future processors that are not vulnerable will also support the IA32_TSX_CTRL MSR. ] Add defines for the new IA32_TSX_CTRL MSR and its bits. TSX has two sub-features: 1. Restricted Transactional Memory (RTM) is an explicitly-used feature where new instructions begin and end TSX transactions. 2. Hardware Lock Elision (HLE) is implicitly used when certain kinds of "old" style locks are used by software. Bit 7 of the IA32_ARCH_CAPABILITIES indicates the presence of the IA32_TSX_CTRL MSR. There are two control bits in IA32_TSX_CTRL MSR: Bit 0: When set, it disables the Restricted Transactional Memory (RTM) sub-feature of TSX (will force all transactions to abort on the XBEGIN instruction). Bit 1: When set, it disables the enumeration of the RTM and HLE feature (i.e. it will make CPUID(EAX=7).EBX{bit4} and CPUID(EAX=7).EBX{bit11} read as 0). The other TSX sub-feature, Hardware Lock Elision (HLE), is unconditionally disabled by the new microcode but still enumerated as present by CPUID(EAX=7).EBX{bit4}, unless disabled by IA32_TSX_CTRL_MSR[1] - TSX_CTRL_CPUID_CLEAR. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Neelima Krishnan <neelima.krishnan@intel.com> Reviewed-by: Mark Gross <mgross@linux.intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> CVE-2019-11135 Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
-
Paolo Bonzini authored
Similar to AMD bits, set the Intel bits from the vendor-independent feature and bug flags, because KVM_GET_SUPPORTED_CPUID does not care about the vendor and they should be set on AMD processors as well. Suggested-by: Jim Mattson <jmattson@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> CVE-2019-11135 (backported from commit 0c54914d) Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
-
Sean Christopherson authored
The CPUID flag ARCH_CAPABILITIES is unconditioinally exposed to host userspace for all x86 hosts, i.e. KVM advertises ARCH_CAPABILITIES regardless of hardware support under the pretense that KVM fully emulates MSR_IA32_ARCH_CAPABILITIES. Unfortunately, only VMX hosts handle accesses to MSR_IA32_ARCH_CAPABILITIES (despite KVM_GET_MSRS also reporting MSR_IA32_ARCH_CAPABILITIES for all hosts). Move the MSR_IA32_ARCH_CAPABILITIES handling to common x86 code so that it's emulated on AMD hosts. Fixes: 1eaafe91 ("kvm: x86: IA32_ARCH_CAPABILITIES is always supported") Cc: stable@vger.kernel.org Reported-by: Xiaoyao Li <xiaoyao.li@linux.intel.com> Cc: Jim Mattson <jmattson@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> CVE-2019-11135 (backported from commit 0cf9135b) [tyhicks: Backport to 4.4 - vmx.c and vmx.h are up one directory level - Minor context adjustments in x86.c due to different surrounding MSR case statements and stack variable differences in kvm_arch_vcpu_setup() - Call guest_cpuid_has_arch_capabilities() instead of the non-existent guest_cpuid_has()] Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
-
Imre Deak authored
In some circumstances the RC6 context can get corrupted. We can detect this and take the required action, that is disable RC6 and runtime PM. The HW recovers from the corrupted state after a system suspend/resume cycle, so detect the recovery and re-enable RC6 and runtime PM. v2: rebase (Mika) v3: - Move intel_suspend_gt_powersave() to the end of the GEM suspend sequence. - Add commit message. v4: - Rebased on intel_uncore_forcewake_put(i915->uncore, ...) API change. v5: - Rebased on latest upstream gt_pm refactoring. Signed-off-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> CVE-2019-0154 Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
-
Imre Deak authored
In some circumstances the RC6 context can get corrupted. We can detect this and take the required action, that is disable RC6 and runtime PM. The HW recovers from the corrupted state after a system suspend/resume cycle, so detect the recovery and re-enable RC6 and runtime PM. v2: rebase (Mika) v3: - Move intel_suspend_gt_powersave() to the end of the GEM suspend sequence. - Add commit message. v4: - Rebased on intel_uncore_forcewake_put(i915->uncore, ...) API change. v5: - Rebased on latest upstream gt_pm refactoring. Signed-off-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> CVE-2019-0154 [tjaalton: backport to i915_bpo - Use type safe register definitions. - Modify NEEDS_WaRsDisableCoarsePowerGating to match all of GEN9] Signed-off-by: Timo Aaltonen <timo.aaltonen@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
-