1. 11 Jan, 2018 5 commits
    • David Woodhouse's avatar
      x86/spectre: Add boot time option to select Spectre v2 mitigation · da285121
      David Woodhouse authored
      Add a spectre_v2= option to select the mitigation used for the indirect
      branch speculation vulnerability.
      
      Currently, the only option available is retpoline, in its various forms.
      This will be expanded to cover the new IBRS/IBPB microcode features.
      
      The RETPOLINE_AMD feature relies on a serializing LFENCE for speculation
      control. For AMD hardware, only set RETPOLINE_AMD if LFENCE is a
      serializing instruction, which is indicated by the LFENCE_RDTSC feature.
      
      [ tglx: Folded back the LFENCE/AMD fixes and reworked it so IBRS
        	integration becomes simple ]
      Signed-off-by: default avatarDavid Woodhouse <dwmw@amazon.co.uk>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: thomas.lendacky@amd.com
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Kees Cook <keescook@google.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
      Cc: Paul Turner <pjt@google.com>
      Link: https://lkml.kernel.org/r/1515707194-20531-5-git-send-email-dwmw@amazon.co.uk
      da285121
    • David Woodhouse's avatar
      x86/retpoline: Add initial retpoline support · 76b04384
      David Woodhouse authored
      Enable the use of -mindirect-branch=thunk-extern in newer GCC, and provide
      the corresponding thunks. Provide assembler macros for invoking the thunks
      in the same way that GCC does, from native and inline assembler.
      
      This adds X86_FEATURE_RETPOLINE and sets it by default on all CPUs. In
      some circumstances, IBRS microcode features may be used instead, and the
      retpoline can be disabled.
      
      On AMD CPUs if lfence is serialising, the retpoline can be dramatically
      simplified to a simple "lfence; jmp *\reg". A future patch, after it has
      been verified that lfence really is serialising in all circumstances, can
      enable this by setting the X86_FEATURE_RETPOLINE_AMD feature bit in addition
      to X86_FEATURE_RETPOLINE.
      
      Do not align the retpoline in the altinstr section, because there is no
      guarantee that it stays aligned when it's copied over the oldinstr during
      alternative patching.
      
      [ Andi Kleen: Rename the macros, add CONFIG_RETPOLINE option, export thunks]
      [ tglx: Put actual function CALL/JMP in front of the macros, convert to
        	symbolic labels ]
      [ dwmw2: Convert back to numeric labels, merge objtool fixes ]
      Signed-off-by: default avatarDavid Woodhouse <dwmw@amazon.co.uk>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarArjan van de Ven <arjan@linux.intel.com>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: thomas.lendacky@amd.com
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Kees Cook <keescook@google.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
      Cc: Paul Turner <pjt@google.com>
      Link: https://lkml.kernel.org/r/1515707194-20531-4-git-send-email-dwmw@amazon.co.uk
      76b04384
    • Josh Poimboeuf's avatar
      objtool: Allow alternatives to be ignored · 258c7605
      Josh Poimboeuf authored
      Getting objtool to understand retpolines is going to be a bit of a
      challenge.  For now, take advantage of the fact that retpolines are
      patched in with alternatives.  Just read the original (sane)
      non-alternative instruction, and ignore the patched-in retpoline.
      
      This allows objtool to understand the control flow *around* the
      retpoline, even if it can't yet follow what's inside.  This means the
      ORC unwinder will fail to unwind from inside a retpoline, but will work
      fine otherwise.
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: default avatarDavid Woodhouse <dwmw@amazon.co.uk>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: thomas.lendacky@amd.com
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Kees Cook <keescook@google.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
      Cc: Paul Turner <pjt@google.com>
      Link: https://lkml.kernel.org/r/1515707194-20531-3-git-send-email-dwmw@amazon.co.uk
      258c7605
    • Josh Poimboeuf's avatar
      objtool: Detect jumps to retpoline thunks · 39b73533
      Josh Poimboeuf authored
      A direct jump to a retpoline thunk is really an indirect jump in
      disguise.  Change the objtool instruction type accordingly.
      
      Objtool needs to know where indirect branches are so it can detect
      switch statement jump tables.
      
      This fixes a bunch of warnings with CONFIG_RETPOLINE like:
      
        arch/x86/events/intel/uncore_nhmex.o: warning: objtool: nhmex_rbox_msr_enable_event()+0x44: sibling call from callable instruction with modified stack frame
        kernel/signal.o: warning: objtool: copy_siginfo_to_user()+0x91: sibling call from callable instruction with modified stack frame
        ...
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: default avatarDavid Woodhouse <dwmw@amazon.co.uk>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: thomas.lendacky@amd.com
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Kees Cook <keescook@google.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
      Cc: Paul Turner <pjt@google.com>
      Link: https://lkml.kernel.org/r/1515707194-20531-2-git-send-email-dwmw@amazon.co.uk
      39b73533
    • Dave Hansen's avatar
      x86/pti: Make unpoison of pgd for trusted boot work for real · 445b69e3
      Dave Hansen authored
      The inital fix for trusted boot and PTI potentially misses the pgd clearing
      if pud_alloc() sets a PGD.  It probably works in *practice* because for two
      adjacent calls to map_tboot_page() that share a PGD entry, the first will
      clear NX, *then* allocate and set the PGD (without NX clear).  The second
      call will *not* allocate but will clear the NX bit.
      
      Defer the NX clearing to a point after it is known that all top-level
      allocations have occurred.  Add a comment to clarify why.
      
      [ tglx: Massaged changelog ]
      
      Fixes: 262b6b30 ("x86/tboot: Unbreak tboot with PTI enabled")
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: "Tim Chen" <tim.c.chen@linux.intel.com>
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: peterz@infradead.org
      Cc: ning.sun@intel.com
      Cc: tboot-devel@lists.sourceforge.net
      Cc: andi@firstfloor.org
      Cc: luto@kernel.org
      Cc: law@redhat.com
      Cc: pbonzini@redhat.com
      Cc: torvalds@linux-foundation.org
      Cc: gregkh@linux-foundation.org
      Cc: dwmw@amazon.co.uk
      Cc: nickc@redhat.com
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20180110224939.2695CD47@viggo.jf.intel.com
      445b69e3
  2. 10 Jan, 2018 1 commit
  3. 09 Jan, 2018 3 commits
  4. 08 Jan, 2018 4 commits
    • Jike Song's avatar
      x86/mm/pti: Remove dead logic in pti_user_pagetable_walk*() · 8d56eff2
      Jike Song authored
      The following code contains dead logic:
      
       162 if (pgd_none(*pgd)) {
       163         unsigned long new_p4d_page = __get_free_page(gfp);
       164         if (!new_p4d_page)
       165                 return NULL;
       166
       167         if (pgd_none(*pgd)) {
       168                 set_pgd(pgd, __pgd(_KERNPG_TABLE | __pa(new_p4d_page)));
       169                 new_p4d_page = 0;
       170         }
       171         if (new_p4d_page)
       172                 free_page(new_p4d_page);
       173 }
      
      There can't be any difference between two pgd_none(*pgd) at L162 and L167,
      so it's always false at L171.
      
      Dave Hansen explained:
      
       Yes, the double-test was part of an optimization where we attempted to
       avoid using a global spinlock in the fork() path.  We would check for
       unallocated mid-level page tables without the lock.  The lock was only
       taken when we needed to *make* an entry to avoid collisions.
       
       Now that it is all single-threaded, there is no chance of a collision,
       no need for a lock, and no need for the re-check.
      
      As all these functions are only called during init, mark them __init as
      well.
      
      Fixes: 03f4424f ("x86/mm/pti: Add functions to clone kernel PMDs")
      Signed-off-by: default avatarJike Song <albcamus@gmail.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Jiri Koshina <jikos@kernel.org>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Kees Cook <keescook@google.com>
      Cc: Andi Lutomirski <luto@amacapital.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Greg KH <gregkh@linux-foundation.org>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Paul Turner <pjt@google.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20180108160341.3461-1-albcamus@gmail.com
      8d56eff2
    • Dave Hansen's avatar
      x86/tboot: Unbreak tboot with PTI enabled · 262b6b30
      Dave Hansen authored
      This is another case similar to what EFI does: create a new set of
      page tables, map some code at a low address, and jump to it.  PTI
      mistakes this low address for userspace and mistakenly marks it
      non-executable in an effort to make it unusable for userspace.
      
      Undo the poison to allow execution.
      
      Fixes: 385ce0ea ("x86/mm/pti: Add Kconfig")
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Jeff Law <law@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
      Cc: David" <dwmw@amazon.co.uk>
      Cc: Nick Clifton <nickc@redhat.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20180108102805.GK25546@redhat.com
      262b6b30
    • Thomas Gleixner's avatar
      x86/cpu: Implement CPU vulnerabilites sysfs functions · 61dc0f55
      Thomas Gleixner authored
      Implement the CPU vulnerabilty show functions for meltdown, spectre_v1 and
      spectre_v2.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Reviewed-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linuxfoundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Link: https://lkml.kernel.org/r/20180107214913.177414879@linutronix.de
      61dc0f55
    • Thomas Gleixner's avatar
      sysfs/cpu: Add vulnerability folder · 87590ce6
      Thomas Gleixner authored
      As the meltdown/spectre problem affects several CPU architectures, it makes
      sense to have common way to express whether a system is affected by a
      particular vulnerability or not. If affected the way to express the
      mitigation should be common as well.
      
      Create /sys/devices/system/cpu/vulnerabilities folder and files for
      meltdown, spectre_v1 and spectre_v2.
      
      Allow architectures to override the show function.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Reviewed-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linuxfoundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Link: https://lkml.kernel.org/r/20180107214913.096657732@linutronix.de
      87590ce6
  5. 06 Jan, 2018 3 commits
  6. 05 Jan, 2018 2 commits
  7. 04 Jan, 2018 5 commits
    • Thomas Gleixner's avatar
      x86/tlb: Drop the _GPL from the cpu_tlbstate export · 1e547681
      Thomas Gleixner authored
      The recent changes for PTI touch cpu_tlbstate from various tlb_flush
      inlines. cpu_tlbstate is exported as GPL symbol, so this causes a
      regression when building out of tree drivers for certain graphics cards.
      
      Aside of that the export was wrong since it was introduced as it should
      have been EXPORT_PER_CPU_SYMBOL_GPL().
      
      Use the correct PER_CPU export and drop the _GPL to restore the previous
      state which allows users to utilize the cards they payed for.
      
      As always I'm really thrilled to make this kind of change to support the
      #friends (or however the hot hashtag of today is spelled) from that closet
      sauce graphics corp.
      
      Fixes: 1e02ce4c ("x86: Store a per-cpu shadow copy of CR4")
      Fixes: 6fd166aa ("x86/mm: Use/Fix PCID to optimize user/kernel switches")
      Reported-by: default avatarKees Cook <keescook@google.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: stable@vger.kernel.org
      1e547681
    • Peter Zijlstra's avatar
      x86/events/intel/ds: Use the proper cache flush method for mapping ds buffers · 42f3bdc5
      Peter Zijlstra authored
      Thomas reported the following warning:
      
       BUG: using smp_processor_id() in preemptible [00000000] code: ovsdb-server/4498
       caller is native_flush_tlb_single+0x57/0xc0
       native_flush_tlb_single+0x57/0xc0
       __set_pte_vaddr+0x2d/0x40
       set_pte_vaddr+0x2f/0x40
       cea_set_pte+0x30/0x40
       ds_update_cea.constprop.4+0x4d/0x70
       reserve_ds_buffers+0x159/0x410
       x86_reserve_hardware+0x150/0x160
       x86_pmu_event_init+0x3e/0x1f0
       perf_try_init_event+0x69/0x80
       perf_event_alloc+0x652/0x740
       SyS_perf_event_open+0x3f6/0xd60
       do_syscall_64+0x5c/0x190
      
      set_pte_vaddr is used to map the ds buffers into the cpu entry area, but
      there are two problems with that:
      
       1) The resulting flush is not supposed to be called in preemptible context
      
       2) The cpu entry area is supposed to be per CPU, but the debug store
          buffers are mapped for all CPUs so these mappings need to be flushed
          globally.
      
      Add the necessary preemption protection across the mapping code and flush
      TLBs globally.
      
      Fixes: c1961a46 ("x86/events/intel/ds: Map debug buffers in cpu_entry_area")
      Reported-by: default avatarThomas Zeitlhofer <thomas.zeitlhofer+lkml@ze-it.at>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarThomas Zeitlhofer <thomas.zeitlhofer+lkml@ze-it.at>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20180104170712.GB3040@hirez.programming.kicks-ass.net
      42f3bdc5
    • Thomas Gleixner's avatar
      x86/kaslr: Fix the vaddr_end mess · 1dddd251
      Thomas Gleixner authored
      vaddr_end for KASLR is only documented in the KASLR code itself and is
      adjusted depending on config options. So it's not surprising that a change
      of the memory layout causes KASLR to have the wrong vaddr_end. This can map
      arbitrary stuff into other areas causing hard to understand problems.
      
      Remove the whole ifdef magic and define the start of the cpu_entry_area to
      be the end of the KASLR vaddr range.
      
      Add documentation to that effect.
      
      Fixes: 92a0f81d ("x86/cpu_entry_area: Move it out of the fixmap")
      Reported-by: default avatarBenjamin Gilbert <benjamin.gilbert@coreos.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarBenjamin Gilbert <benjamin.gilbert@coreos.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: stable <stable@vger.kernel.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Garnier <thgarnie@google.com>,
      Cc: Alexander Kuleshov <kuleshovmail@gmail.com>
      Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801041320360.1771@nanos
      1dddd251
    • Thomas Gleixner's avatar
      x86/mm: Map cpu_entry_area at the same place on 4/5 level · f2078904
      Thomas Gleixner authored
      There is no reason for 4 and 5 level pagetables to have a different
      layout. It just makes determining vaddr_end for KASLR harder than
      necessary.
      
      Fixes: 92a0f81d ("x86/cpu_entry_area: Move it out of the fixmap")
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Benjamin Gilbert <benjamin.gilbert@coreos.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: stable <stable@vger.kernel.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Garnier <thgarnie@google.com>,
      Cc: Alexander Kuleshov <kuleshovmail@gmail.com>
      Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801041320360.1771@nanos
      f2078904
    • Andrey Ryabinin's avatar
      x86/mm: Set MODULES_END to 0xffffffffff000000 · f5a40711
      Andrey Ryabinin authored
      Since f06bdd40 ("x86/mm: Adapt MODULES_END based on fixmap section size")
      kasan_mem_to_shadow(MODULES_END) could be not aligned to a page boundary.
      
      So passing page unaligned address to kasan_populate_zero_shadow() have two
      possible effects:
      
      1) It may leave one page hole in supposed to be populated area. After commit
        21506525 ("x86/kasan/64: Teach KASAN about the cpu_entry_area") that
        hole happens to be in the shadow covering fixmap area and leads to crash:
      
       BUG: unable to handle kernel paging request at fffffbffffe8ee04
       RIP: 0010:check_memory_region+0x5c/0x190
      
       Call Trace:
        <NMI>
        memcpy+0x1f/0x50
        ghes_copy_tofrom_phys+0xab/0x180
        ghes_read_estatus+0xfb/0x280
        ghes_notify_nmi+0x2b2/0x410
        nmi_handle+0x115/0x2c0
        default_do_nmi+0x57/0x110
        do_nmi+0xf8/0x150
        end_repeat_nmi+0x1a/0x1e
      
      Note, the crash likely disappeared after commit 92a0f81d, which
      changed kasan_populate_zero_shadow() call the way it was before
      commit 21506525.
      
      2) Attempt to load module near MODULES_END will fail, because
         __vmalloc_node_range() called from kasan_module_alloc() will hit the
         WARN_ON(!pte_none(*pte)) in the vmap_pte_range() and bail out with error.
      
      To fix this we need to make kasan_mem_to_shadow(MODULES_END) page aligned
      which means that MODULES_END should be 8*PAGE_SIZE aligned.
      
      The whole point of commit f06bdd40 was to move MODULES_END down if
      NR_CPUS is big, so the cpu_entry_area takes a lot of space.
      But since 92a0f81d ("x86/cpu_entry_area: Move it out of the fixmap")
      the cpu_entry_area is no longer in fixmap, so we could just set
      MODULES_END to a fixed 8*PAGE_SIZE aligned address.
      
      Fixes: f06bdd40 ("x86/mm: Adapt MODULES_END based on fixmap section size")
      Reported-by: default avatarJakub Kicinski <kubakici@wp.pl>
      Signed-off-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Thomas Garnier <thgarnie@google.com>
      Link: https://lkml.kernel.org/r/20171228160620.23818-1-aryabinin@virtuozzo.com
      f5a40711
  8. 03 Jan, 2018 7 commits
  9. 31 Dec, 2017 4 commits
    • Thomas Gleixner's avatar
      x86/ldt: Make LDT pgtable free conditional · 7f414195
      Thomas Gleixner authored
      Andy prefers to be paranoid about the pagetable free in the error path of
      write_ldt(). Make it conditional and warn whenever the installment of a
      secondary LDT fails.
      Requested-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      7f414195
    • Thomas Gleixner's avatar
      x86/ldt: Plug memory leak in error path · a62d6985
      Thomas Gleixner authored
      The error path in write_ldt() tries to free 'old_ldt' instead of the newly
      allocated 'new_ldt', resulting in a memory leak. It also misses to clean up a
      half populated LDT pagetable, which is not a leak as it gets cleaned up
      when the process exits.
      
      Free both the potentially half populated LDT pagetable and the newly
      allocated LDT struct. This can be done unconditionally because once an LDT
      is mapped subsequent maps will succeed, because the PTE page is already
      populated and the two LDTs fit into that single page.
      Reported-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linuxfoundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Fixes: f55f0501 ("x86/pti: Put the LDT in its own PGD if PTI is on")
      Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1712311121340.1899@nanosSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      a62d6985
    • Thomas Gleixner's avatar
      x86/mm: Remove preempt_disable/enable() from __native_flush_tlb() · decab088
      Thomas Gleixner authored
      The preempt_disable/enable() pair in __native_flush_tlb() was added in
      commit:
      
        5cf0791d ("x86/mm: Disable preemption during CR3 read+write")
      
      ... to protect the UP variant of flush_tlb_mm_range().
      
      That preempt_disable/enable() pair should have been added to the UP variant
      of flush_tlb_mm_range() instead.
      
      The UP variant was removed with commit:
      
        ce4a4e56 ("x86/mm: Remove the UP asm/tlbflush.h code, always use the (formerly) SMP code")
      
      ... but the preempt_disable/enable() pair stayed around.
      
      The latest change to __native_flush_tlb() in commit:
      
        6fd166aa ("x86/mm: Use/Fix PCID to optimize user/kernel switches")
      
      ... added an access to a per CPU variable outside the preempt disabled
      regions, which makes no sense at all. __native_flush_tlb() must always
      be called with at least preemption disabled.
      
      Remove the preempt_disable/enable() pair and add a WARN_ON_ONCE() to catch
      bad callers independent of the smp_processor_id() debugging.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: <stable@vger.kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linuxfoundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20171230211829.679325424@linutronix.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      decab088
    • Thomas Gleixner's avatar
      x86/smpboot: Remove stale TLB flush invocations · 322f8b8b
      Thomas Gleixner authored
      smpboot_setup_warm_reset_vector() and smpboot_restore_warm_reset_vector()
      invoke local_flush_tlb() for no obvious reason.
      
      Digging in history revealed that the original code in the 2.1 era added
      those because the code manipulated a swapper_pg_dir pagetable entry. The
      pagetable manipulation was removed long ago in the 2.3 timeframe, but the
      TLB flush invocations stayed around forever.
      
      Remove them along with the pointless pr_debug()s which come from the same 2.1
      change.
      Reported-by: default avatarDominik Brodowski <linux@dominikbrodowski.net>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: <stable@vger.kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linuxfoundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20171230211829.586548655@linutronix.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      322f8b8b
  10. 23 Dec, 2017 6 commits
    • Thomas Gleixner's avatar
      x86/ldt: Make the LDT mapping RO · 9f5cb6b3
      Thomas Gleixner authored
      Now that the LDT mapping is in a known area when PAGE_TABLE_ISOLATION is
      enabled its a primary target for attacks, if a user space interface fails
      to validate a write address correctly. That can never happen, right?
      
      The SDM states:
      
          If the segment descriptors in the GDT or an LDT are placed in ROM, the
          processor can enter an indefinite loop if software or the processor
          attempts to update (write to) the ROM-based segment descriptors. To
          prevent this problem, set the accessed bits for all segment descriptors
          placed in a ROM. Also, remove operating-system or executive code that
          attempts to modify segment descriptors located in ROM.
      
      So its a valid approach to set the ACCESS bit when setting up the LDT entry
      and to map the table RO. Fixup the selftest so it can handle that new mode.
      
      Remove the manual ACCESS bit setter in set_tls_desc() as this is now
      pointless. Folded the patch from Peter Ziljstra.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      9f5cb6b3
    • Thomas Gleixner's avatar
      x86/mm/dump_pagetables: Allow dumping current pagetables · a4b51ef6
      Thomas Gleixner authored
      Add two debugfs files which allow to dump the pagetable of the current
      task.
      
      current_kernel dumps the regular page table. This is the page table which
      is normally shared between kernel and user space. If kernel page table
      isolation is enabled this is the kernel space mapping.
      
      If kernel page table isolation is enabled the second file, current_user,
      dumps the user space page table.
      
      These files allow to verify the resulting page tables for page table
      isolation, but even in the normal case its useful to be able to inspect
      user space page tables of current for debugging purposes.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Cc: linux-mm@kvack.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      a4b51ef6
    • Thomas Gleixner's avatar
      x86/mm/dump_pagetables: Check user space page table for WX pages · b4bf4f92
      Thomas Gleixner authored
      ptdump_walk_pgd_level_checkwx() checks the kernel page table for WX pages,
      but does not check the PAGE_TABLE_ISOLATION user space page table.
      
      Restructure the code so that dmesg output is selected by an explicit
      argument and not implicit via checking the pgd argument for !NULL.
      
      Add the check for the user space page table.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Cc: linux-mm@kvack.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      b4bf4f92
    • Borislav Petkov's avatar
      x86/mm/dump_pagetables: Add page table directory to the debugfs VFS hierarchy · 75298aa1
      Borislav Petkov authored
      The upcoming support for dumping the kernel and the user space page tables
      of the current process would create more random files in the top level
      debugfs directory.
      
      Add a page table directory and move the existing file to it.
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      75298aa1
    • Dave Hansen's avatar
      x86/mm/pti: Add Kconfig · 385ce0ea
      Dave Hansen authored
      Finally allow CONFIG_PAGE_TABLE_ISOLATION to be enabled.
      
      PARAVIRT generally requires that the kernel not manage its own page tables.
      It also means that the hypervisor and kernel must agree wholeheartedly
      about what format the page tables are in and what they contain.
      PAGE_TABLE_ISOLATION, unfortunately, changes the rules and they
      can not be used together.
      
      I've seen conflicting feedback from maintainers lately about whether they
      want the Kconfig magic to go first or last in a patch series.  It's going
      last here because the partially-applied series leads to kernels that can
      not boot in a bunch of cases.  I did a run through the entire series with
      CONFIG_PAGE_TABLE_ISOLATION=y to look for build errors, though.
      
      [ tglx: Removed SMP and !PARAVIRT dependencies as they not longer exist ]
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Cc: linux-mm@kvack.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      385ce0ea
    • Vlastimil Babka's avatar
      x86/dumpstack: Indicate in Oops whether PTI is configured and enabled · 5f26d76c
      Vlastimil Babka authored
      CONFIG_PAGE_TABLE_ISOLATION is relatively new and intrusive feature that may
      still have some corner cases which could take some time to manifest and be
      fixed. It would be useful to have Oops messages indicate whether it was
      enabled for building the kernel, and whether it was disabled during boot.
      
      Example of fully enabled:
      
      	Oops: 0001 [#1] SMP PTI
      
      Example of enabled during build, but disabled during boot:
      
      	Oops: 0001 [#1] SMP NOPTI
      
      We can decide to remove this after the feature has been tested in the field
      long enough.
      
      [ tglx: Made it use boot_cpu_has() as requested by Borislav ]
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarEduardo Valentin <eduval@amazon.com>
      Acked-by: default avatarDave Hansen <dave.hansen@intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirsky <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: bpetkov@suse.de
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: jkosina@suse.cz
      Cc: keescook@google.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      5f26d76c