1. 19 Jan, 2016 1 commit
  2. 12 Jan, 2016 21 commits
    • Andy Lutomirski's avatar
      x86/vdso: Disallow vvar access to vclock IO for never-used vclocks · bd902c53
      Andy Lutomirski authored
      It makes me uncomfortable that even modern systems grant every
      process direct read access to the HPET.
      
      While fixing this for real without regressing anything is a mess
      (unmapping the HPET is tricky because we don't adequately track
      all the mappings), we can do almost as well by tracking which
      vclocks have ever been used and only allowing pages associated
      with used vclocks to be faulted in.
      
      This will cause rogue programs that try to peek at the HPET to
      get SIGBUS instead on most systems.
      
      We can't restrict faults to vclock pages that are associated
      with the currently selected vclock due to a race: a process
      could start to access the HPET for the first time and race
      against a switch away from the HPET as the current clocksource.
      We can't segfault the process trying to peek at the HPET in this
      case, even though the process isn't going to do anything useful
      with the data.
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/e79d06295625c02512277737ab55085a498ac5d8.1451446564.git.luto@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      bd902c53
    • Andy Lutomirski's avatar
      x86/vdso: Use ->fault() instead of remap_pfn_range() for the vvar mapping · a48a7042
      Andy Lutomirski authored
      This is IMO much less ugly, and it also opens the door to
      disallowing unprivileged userspace HPET access on systems with
      usable TSCs.
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/c19c2909e5ee3c3d8742f916586676bb7c40345f.1451446564.git.luto@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      a48a7042
    • Andy Lutomirski's avatar
      x86/vdso: Use .fault for the vDSO text mapping · 05ef76b2
      Andy Lutomirski authored
      The old scheme for mapping the vDSO text is rather complicated.
      vdso2c generates a struct vm_special_mapping and a blank .pages
      array of the correct size for each vdso image.  Init code in
      vdso/vma.c populates the .pages array for each vDSO image, and
      the mapping code selects the appropriate struct
      vm_special_mapping.
      
      With .fault, we can use a less roundabout approach: vdso_fault()
      just returns the appropriate page for the selected vDSO image.
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/f886954c186bafd74e1b967c8931d852ae199aa2.1451446564.git.luto@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      05ef76b2
    • Andy Lutomirski's avatar
      x86/vdso: Track each mm's loaded vDSO image as well as its base · 352b78c6
      Andy Lutomirski authored
      As we start to do more intelligent things with the vDSO at
      runtime (as opposed to just at mm initialization time), we'll
      need to know which vDSO is in use.
      
      In principle, we could guess based on the mm type, but that's
      over-complicated and error-prone.  Instead, just track it in the
      mmu context.
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/c99ac48681bad709ca7ad5ee899d9042a3af6b00.1451446564.git.luto@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      352b78c6
    • Andy Lutomirski's avatar
      mm: Add vm_insert_pfn_prot() · 1745cbc5
      Andy Lutomirski authored
      The x86 vvar vma contains pages with differing cacheability
      flags.  x86 currently implements this by manually inserting all
      the ptes using (io_)remap_pfn_range when the vma is set up.
      
      x86 wants to move to using .fault with VM_FAULT_NOPAGE to set up
      the mappings as needed.  The correct API to use to insert a pfn
      in .fault is vm_insert_pfn(), but vm_insert_pfn() can't override the
      vma's cache mode, and the HPET page in particular needs to be
      uncached despite the fact that the rest of the VMA is cached.
      
      Add vm_insert_pfn_prot() to support varying cacheability within
      the same non-COW VMA in a more sane manner.
      
      x86 could alternatively use multiple VMAs, but that's messy,
      would break CRIU, and would create unnecessary VMAs that would
      waste memory.
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Acked-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/d2938d1eb37be7a5e4f86182db646551f11e45aa.1451446564.git.luto@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      1745cbc5
    • Andy Lutomirski's avatar
      mm: Add a vm_special_mapping.fault() method · f872f540
      Andy Lutomirski authored
      Requiring special mappings to give a list of struct pages is
      inflexible: it prevents sane use of IO memory in a special
      mapping, it's inefficient (it requires arch code to initialize a
      list of struct pages, and it requires the mm core to walk the
      entire list just to figure out how long it is), and it prevents
      arch code from doing anything fancy when a special mapping fault
      occurs.
      
      Add a .fault method as an alternative to filling in a .pages
      array.
      
      Looks-OK-to: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/a26d1677c0bc7e774c33f469451a78ca31e9e6af.1451446564.git.luto@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      f872f540
    • Borislav Petkov's avatar
      x86/boot: Hide local labels in verify_cpu() · aa042141
      Borislav Petkov authored
      ... from the final ELF image's symbol table as they're not
      really needed there.
      
      Before:
      
      $ readelf -a vmlinux | grep verify_cpu
          43: ffffffff810001a9     0 NOTYPE  LOCAL  DEFAULT    1 verify_cpu
          45: ffffffff8100028f     0 NOTYPE  LOCAL  DEFAULT    1 verify_cpu_no_longmode
          46: ffffffff810001de     0 NOTYPE  LOCAL  DEFAULT    1 verify_cpu_noamd
          47: ffffffff8100022b     0 NOTYPE  LOCAL  DEFAULT    1 verify_cpu_check
          48: ffffffff8100021c     0 NOTYPE  LOCAL  DEFAULT    1 verify_cpu_clear_xd
          49: ffffffff81000263     0 NOTYPE  LOCAL  DEFAULT    1 verify_cpu_sse_test
          50: ffffffff81000296     0 NOTYPE  LOCAL  DEFAULT    1 verify_cpu_sse_ok
      
      After:
      
      $ readelf -a vmlinux | grep verify_cpu
          43: ffffffff810001a9     0 NOTYPE  LOCAL  DEFAULT    1 verify_cpu
      
      No functionality change.
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1451860733-21163-1-git-send-email-bp@alien8.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      aa042141
    • yu-cheng yu's avatar
      x86/fpu: Disable AVX when eagerfpu is off · 394db20c
      yu-cheng yu authored
      When "eagerfpu=off" is given as a command-line input, the kernel
      should disable AVX support.
      
      The Task Switched bit used for lazy context switching does not
      support AVX. If AVX is enabled without eagerfpu context
      switching, one task's AVX state could become corrupted or leak
      to other tasks. This is a bug and has bad security implications.
      
      This only affects systems that have AVX/AVX2/AVX512 and this
      issue will be found only when one actually uses AVX/AVX2/AVX512
      _AND_ does eagerfpu=off.
      
      Reference: Intel Software Developer's Manual Vol. 3A
      
      Sec. 2.5 Control Registers:
      TS Task Switched bit (bit 3 of CR0) -- Allows the saving of the
      x87 FPU/ MMX/SSE/SSE2/SSE3/SSSE3/SSE4 context on a task switch
      to be delayed until an x87 FPU/MMX/SSE/SSE2/SSE3/SSSE3/SSE4
      instruction is actually executed by the new task.
      
      Sec. 13.4.1 Using the TS Flag to Control the Saving of the X87
      FPU and SSE State
      When the TS flag is set, the processor monitors the instruction
      stream for x87 FPU, MMX, SSE instructions. When the processor
      detects one of these instructions, it raises a
      device-not-available exeception (#NM) prior to executing the
      instruction.
      Signed-off-by: default avatarYu-cheng Yu <yu-cheng.yu@intel.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
      Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: yu-cheng yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/1452119094-7252-5-git-send-email-yu-cheng.yu@intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      394db20c
    • yu-cheng yu's avatar
      x86/fpu: Disable MPX when eagerfpu is off · a5fe93a5
      yu-cheng yu authored
      This issue is a fallout from the command-line parsing move.
      
      When "eagerfpu=off" is given as a command-line input, the kernel
      should disable MPX support. The decision for turning off MPX was
      made in fpu__init_system_ctx_switch(), which is after the
      selection of the XSAVE format. This patch fixes it by getting
      that decision done earlier in fpu__init_system_xstate().
      Signed-off-by: default avatarYu-cheng Yu <yu-cheng.yu@intel.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
      Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: yu-cheng yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/1452119094-7252-4-git-send-email-yu-cheng.yu@intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      a5fe93a5
    • yu-cheng yu's avatar
      x86/fpu: Disable XGETBV1 when no XSAVE · eb7c5f87
      yu-cheng yu authored
      When "noxsave" is given as a command-line input, the kernel
      should disable XGETBV1. This issue currently does not cause any
      actual problems. XGETBV1 is only useful if we have something
      using the 'init optimization' (i.e. xsaveopt, xsaves). We
      already clear both of those in fpu__xstate_clear_all_cpu_caps().
      But this is good for completeness.
      Signed-off-by: default avatarYu-cheng Yu <yu-cheng.yu@intel.com>
      Reviewed-by: default avatarDave Hansen <dave.hansen@intel.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
      Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: yu-cheng yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/1452119094-7252-3-git-send-email-yu-cheng.yu@intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      eb7c5f87
    • yu-cheng yu's avatar
      x86/fpu: Fix early FPU command-line parsing · 4f81cbaf
      yu-cheng yu authored
      The function fpu__init_system() is executed before
      parse_early_param(). This causes wrong FPU configuration. This
      patch fixes this issue by parsing boot_command_line in the
      beginning of fpu__init_system().
      
      With all four patches in this series, each parameter disables
      features as the following:
      
      eagerfpu=off: eagerfpu, avx, avx2, avx512, mpx
      no387: fpu
      nofxsr: fxsr, fxsropt, xmm
      noxsave: xsave, xsaveopt, xsaves, xsavec, avx, avx2, avx512,
      mpx, xgetbv1 noxsaveopt: xsaveopt
      noxsaves: xsaves
      Signed-off-by: default avatarYu-cheng Yu <yu-cheng.yu@intel.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
      Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: yu-cheng yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/1452119094-7252-2-git-send-email-yu-cheng.yu@intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      4f81cbaf
    • Kefeng Wang's avatar
      x86/mm: Use PAGE_ALIGNED instead of IS_ALIGNED · b500f77b
      Kefeng Wang authored
      Use PAGE_ALIGEND macro in <linux/mm.h> to simplify code.
      Signed-off-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Cc: <guohanjun@huawei.com>
      Cc: Alexander Kuleshov <kuleshovmail@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1452565170-11083-1-git-send-email-wangkefeng.wang@huawei.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      b500f77b
    • Andy Lutomirski's avatar
      selftests/x86: Disable the ldt_gdt_64 test for now · 0f672809
      Andy Lutomirski authored
      ldt_gdt.c relies on cross-cpu invalidation of SS to do one of
      its tests.  On 32-bit builds, this works fine, but on 64-bit
      builds, it only works if the kernel has proper SS sigcontext
      handling for 64-bit user programs.
      
      Since the SS fixes are currently reverted, restrict the test
      case to 32 bits for now.
      
      In principle, I could change the test to use a different segment
      register, but it would be messy: CS can't point to the LDT for
      64-bit code, and the other registers don't result in immediate
      faults because they aren't reloaded on kernel -> user
      transitions.
      
      When we fix sigcontext (in 4.6?), we can revert this.
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Shuah Khan <shuahkh@osg.samsung.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/231591d9122d282402d8f53175134f8db5b3bc73.1452561752.git.luto@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      0f672809
    • Dave Jones's avatar
      x86/mm/pat: Make split_page_count() check for empty levels to fix /proc/meminfo output · c9e0d391
      Dave Jones authored
      In CONFIG_PAGEALLOC_DEBUG=y builds, we disable 2M pages.
      
      Unfortunatly when we split up mappings during boot,
      split_page_count() doesn't take this into account, and
      starts decrementing an empty direct_pages_count[] level.
      
      This results in /proc/meminfo showing crazy things like:
      
        DirectMap2M:    18446744073709543424 kB
      Signed-off-by: default avatarDave Jones <davej@codemonkey.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      c9e0d391
    • Ingo Molnar's avatar
    • Linus Torvalds's avatar
      Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · ae8a5218
      Linus Torvalds authored
      Pull x86 platform updates from Ingo Molnar:
       "Two changes:
      
         - one to quirk-save/restore certain system MSRs across
           suspend/resume, to make certain Intel systems work better
           (Chen Yu)
      
         - and also to constify a read only structure (Julia Lawall)"
      
      * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/platform/calgary: Constify cal_chipset_ops structures
        x86/pm: Introduce quirk framework to save/restore extra MSR registers around suspend/resume
      ae8a5218
    • Linus Torvalds's avatar
      Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 0ffedcda
      Linus Torvalds authored
      Pull x86 mm updates from Ingo Molnar:
       "The main changes in this cycle were:
      
         - make the debugfs 'kernel_page_tables' file read-only, as it only
           has read ops.  (Borislav Petkov)
      
         - micro-optimize clflush_cache_range() (Chris Wilson)
      
         - swiotlb enhancements, which fixes certain KVM emulated devices
           (Igor Mammedov)
      
         - fix an LDT related debug message (Jan Beulich)
      
         - modularize CONFIG_X86_PTDUMP (Kees Cook)
      
         - tone down an overly alarming warning (Laura Abbott)
      
         - Mark variable __initdata (Rasmus Villemoes)
      
         - PAT additions (Toshi Kani)"
      
      * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/mm: Micro-optimise clflush_cache_range()
        x86/mm/pat: Change free_memtype() to support shrinking case
        x86/mm/pat: Add untrack_pfn_moved for mremap
        x86/mm: Drop WARN from multi-BAR check
        x86/LDT: Print the real LDT base address
        x86/mm/64: Enable SWIOTLB if system has SRAT memory regions above MAX_DMA32_PFN
        x86/mm: Introduce max_possible_pfn
        x86/mm/ptdump: Make (debugfs)/kernel_page_tables read-only
        x86/mm/mtrr: Mark the 'range_new' static variable in mtrr_calc_range_state() as __initdata
        x86/mm: Turn CONFIG_X86_PTDUMP into a module
      0ffedcda
    • Linus Torvalds's avatar
      Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 6896d9f7
      Linus Torvalds authored
      Pull x86 fpu updates from Ingo Molnar:
       "This cleans up the FPU fault handling methods to be more robust, and
        moves eligible variables to .init.data"
      
      * 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/fpu: Put a few variables in .init.data
        x86/fpu: Get rid of xstate_fault()
        x86/fpu: Add an XSTATE_OP() macro
      6896d9f7
    • Linus Torvalds's avatar
      Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 671d5532
      Linus Torvalds authored
      Pull x86 cpu updates from Ingo Molnar:
       "The main changes in this cycle were:
      
         - Improved CPU ID handling code and related enhancements (Borislav
           Petkov)
      
         - RDRAND fix (Len Brown)"
      
      * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86: Replace RDRAND forced-reseed with simple sanity check
        x86/MSR: Chop off lower 32-bit value
        x86/cpu: Fix MSR value truncation issue
        x86/cpu/amd, kvm: Satisfy guest kernel reads of IC_CFG MSR
        kvm: Add accessors for guest CPU's family, model, stepping
        x86/cpu: Unify CPU family, model, stepping calculation
      671d5532
    • Linus Torvalds's avatar
      Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 67c707e4
      Linus Torvalds authored
      Pull x86 cleanups from Ingo Molnar:
       "The main changes in this cycle were:
      
         - code patching and cpu_has cleanups (Borislav Petkov)
      
         - paravirt cleanups (Juergen Gross)
      
         - TSC cleanup (Thomas Gleixner)
      
         - ptrace cleanup (Chen Gang)"
      
      * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        arch/x86/kernel/ptrace.c: Remove unused arg_offs_table
        x86/mm: Align macro defines
        x86/cpu: Provide a config option to disable static_cpu_has
        x86/cpufeature: Remove unused and seldomly used cpu_has_xx macros
        x86/cpufeature: Cleanup get_cpu_cap()
        x86/cpufeature: Move some of the scattered feature bits to x86_capability
        x86/paravirt: Remove paravirt ops pmd_update[_defer] and pte_update_defer
        x86/paravirt: Remove unused pv_apic_ops structure
        x86/tsc: Remove unused tsc_pre_init() hook
        x86: Remove unused function cpu_has_ht_siblings()
        x86/paravirt: Kill some unused patching functions
      67c707e4
    • Linus Torvalds's avatar
      Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 463eb8ac
      Linus Torvalds authored
      Pull small x86 boot update from Ingo Molnar:
       "A single update to the MAINTAINERS file"
      
      * 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/tboot: Update maintainer list for Intel TXT
      463eb8ac
  3. 11 Jan, 2016 12 commits
    • Linus Torvalds's avatar
      Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 88cbfd07
      Linus Torvalds authored
      Pull x86 asm updates from Ingo Molnar:
       "The main changes in this cycle were:
      
         - vDSO and asm entry improvements (Andy Lutomirski)
      
         - Xen paravirt entry enhancements (Boris Ostrovsky)
      
         - asm entry labels enhancement (Borislav Petkov)
      
         - and other misc changes (Thomas Gleixner, me)"
      
      * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/vsdo: Fix build on PARAVIRT_CLOCK=y, KVM_GUEST=n
        Revert "x86/kvm: On KVM re-enable (e.g. after suspend), update clocks"
        x86/entry/64_compat: Make labels local
        x86/platform/uv: Include clocksource.h for clocksource_touch_watchdog()
        x86/vdso: Enable vdso pvclock access on all vdso variants
        x86/vdso: Remove pvclock fixmap machinery
        x86/vdso: Get pvclock data from the vvar VMA instead of the fixmap
        x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader
        x86/kvm: On KVM re-enable (e.g. after suspend), update clocks
        x86/entry/64: Bypass enter_from_user_mode on non-context-tracking boots
        x86/asm: Add asm macros for static keys/jump labels
        x86/asm: Error out if asm/jump_label.h is included inappropriately
        context_tracking: Switch to new static_branch API
        x86/entry, x86/paravirt: Remove the unused usergs_sysret32 PV op
        x86/paravirt: Remove the unused irq_enable_sysexit pv op
        x86/xen: Avoid fast syscall path for Xen PV guests
      88cbfd07
    • Linus Torvalds's avatar
      Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 4f19b880
      Linus Torvalds authored
      Pull x86 apic updates from Ingo Molnar:
       "The main changes in this cycle were:
      
         - introduce optimized single IPI sending methods on modern APICs
           (Linus Torvalds, Thomas Gleixner)
      
         - kexec/crash APIC handling fixes and enhancements (Hidehiro Kawai)
      
         - extend lapic vector saving/restoring to the CMCI (MCE) vector as
           well (Juergen Gross)
      
         - various fixes and enhancements (Jake Oshins, Len Brown)"
      
      * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
        x86/irq: Export functions to allow MSI domains in modules
        Documentation: Document kernel.panic_on_io_nmi sysctl
        x86/nmi: Save regs in crash dump on external NMI
        x86/apic: Introduce apic_extnmi command line parameter
        kexec: Fix race between panic() and crash_kexec()
        panic, x86: Allow CPUs to save registers even if looping in NMI context
        panic, x86: Fix re-entrance problem due to panic on NMI
        x86/apic: Fix the saving and restoring of lapic vectors during suspend/resume
        x86/smpboot: Re-enable init_udelay=0 by default on modern CPUs
        x86/smp: Remove single IPI wrapper
        x86/apic: Use default send single IPI wrapper
        x86/apic: Provide default send single IPI wrapper
        x86/apic: Implement single IPI for apic_noop
        x86/apic: Wire up single IPI for apic_numachip
        x86/apic: Wire up single IPI for x2apic_uv
        x86/apic: Implement single IPI for x2apic_phys
        x86/apic: Wire up single IPI for bigsmp_apic
        x86/apic: Remove pointless indirections from bigsmp_apic
        x86/apic: Wire up single IPI for apic_physflat
        x86/apic: Remove pointless indirections from apic_physflat
        ...
      4f19b880
    • Linus Torvalds's avatar
      Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · af345201
      Linus Torvalds authored
      Pull scheduler updates from Ingo Molnar:
       "The main changes in this cycle were:
      
         - tickless load average calculation enhancements (Byungchul Park)
      
         - vtime handling enhancements (Frederic Weisbecker)
      
         - scalability improvement via properly aligning a key structure field
           (Jiri Olsa)
      
         - various stop_machine() fixes (Oleg Nesterov)
      
         - sched/numa enhancement (Rik van Riel)
      
         - various fixes and improvements (Andi Kleen, Dietmar Eggemann,
           Geliang Tang, Hiroshi Shimamoto, Joonwoo Park, Peter Zijlstra,
           Waiman Long, Wanpeng Li, Yuyang Du)"
      
      * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits)
        sched/fair: Fix new task's load avg removed from source CPU in wake_up_new_task()
        sched/core: Move sched_entity::avg into separate cache line
        x86/fpu: Properly align size in CHECK_MEMBER_AT_END_OF() macro
        sched/deadline: Fix the earliest_dl.next logic
        sched/fair: Disable the task group load_avg update for the root_task_group
        sched/fair: Move the cache-hot 'load_avg' variable into its own cacheline
        sched/fair: Avoid redundant idle_cpu() call in update_sg_lb_stats()
        sched/core: Move the sched_to_prio[] arrays out of line
        sched/cputime: Convert vtime_seqlock to seqcount
        sched/cputime: Introduce vtime accounting check for readers
        sched/cputime: Rename vtime_accounting_enabled() to vtime_accounting_cpu_enabled()
        sched/cputime: Correctly handle task guest time on housekeepers
        sched/cputime: Clarify vtime symbols and document them
        sched/cputime: Remove extra cost in task_cputime()
        sched/fair: Make it possible to account fair load avg consistently
        sched/fair: Modify the comment about lock assumptions in migrate_task_rq_fair()
        stop_machine: Clean up the usage of the preemption counter in cpu_stopper_thread()
        stop_machine: Shift the 'done != NULL' check from cpu_stop_signal_done() to callers
        stop_machine: Kill cpu_stop_done->executed
        stop_machine: Change __stop_cpus() to rely on cpu_stop_queue_work()
        ...
      af345201
    • Linus Torvalds's avatar
      Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 4bd20db2
      Linus Torvalds authored
      Pull RAS updates from Ingo Molnar:
       "Various x86 MCE fixes and small enhancements"
      
      * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/mce: Make usable address checks Intel-only
        x86/mce: Add the missing memory error check on AMD
        x86/RAS: Remove mce.usable_addr
        x86/mce: Do not enter deferred errors into the generic pool twice
      4bd20db2
    • Linus Torvalds's avatar
      Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 5cb52b5e
      Linus Torvalds authored
      Pull perf updates from Ingo Molnar:
       "Kernel side changes:
      
         - Intel Knights Landing support.  (Harish Chegondi)
      
         - Intel Broadwell-EP uncore PMU support.  (Kan Liang)
      
         - Core code improvements.  (Peter Zijlstra.)
      
         - Event filter, LBR and PEBS fixes.  (Stephane Eranian)
      
         - Enable cycles:pp on Intel Atom.  (Stephane Eranian)
      
         - Add cycles:ppp support for Skylake.  (Andi Kleen)
      
         - Various x86 NMI overhead optimizations.  (Andi Kleen)
      
         - Intel PT enhancements.  (Takao Indoh)
      
         - AMD cache events fix.  (Vince Weaver)
      
        Tons of tooling changes:
      
         - Show random perf tool tips in the 'perf report' bottom line
           (Namhyung Kim)
      
         - perf report now defaults to --group if the perf.data file has
           grouped events, try it with:
      
            # perf record -e '{cycles,instructions}' -a sleep 1
            [ perf record: Woken up 1 times to write data ]
            [ perf record: Captured and wrote 1.093 MB perf.data (1247 samples) ]
            # perf report
            # Samples: 1K of event 'anon group { cycles, instructions }'
            # Event count (approx.): 1955219195
            #
            #       Overhead  Command     Shared Object      Symbol
      
               2.86%   0.22%  swapper     [kernel.kallsyms]  [k] intel_idle
               1.05%   0.33%  firefox     libxul.so          [.] js::SetObjectElement
               1.05%   0.00%  kworker/0:3 [kernel.kallsyms]  [k] gen6_ring_get_seqno
               0.88%   0.17%  chrome      chrome             [.] 0x0000000000ee27ab
               0.65%   0.86%  firefox     libxul.so          [.] js::ValueToId<(js::AllowGC)1>
               0.64%   0.23%  JS Helper   libxul.so          [.] js::SplayTree<js::jit::LiveRange*, js::jit::LiveRange>::splay
               0.62%   1.27%  firefox     libxul.so          [.] js::GetIterator
               0.61%   1.74%  firefox     libxul.so          [.] js::NativeSetProperty
               0.61%   0.31%  firefox     libxul.so          [.] js::SetPropertyByDefining
      
         - Introduce the 'perf stat record/report' workflow:
      
           Generate perf.data files from 'perf stat', to tap into the
           scripting capabilities perf has instead of defining a 'perf stat'
           specific scripting support to calculate event ratios, etc.
      
           Simple example:
      
              $ perf stat record -e cycles usleep 1
      
               Performance counter stats for 'usleep 1':
      
                     1,134,996      cycles
      
                   0.000670644 seconds time elapsed
      
              $ perf stat report
      
               Performance counter stats for '/home/acme/bin/perf stat record -e cycles usleep 1':
      
                     1,134,996      cycles
      
                   0.000670644 seconds time elapsed
      
              $
      
           It generates PERF_RECORD_ userspace records to store the details:
      
              $ perf report -D | grep PERF_RECORD
              0xf0 [0x28]: PERF_RECORD_THREAD_MAP nr: 1 thread: 27637
              0x118 [0x12]: PERF_RECORD_CPU_MAP nr: 1 cpu: 65535
              0x12a [0x40]: PERF_RECORD_STAT_CONFIG
              0x16a [0x30]: PERF_RECORD_STAT
              -1 -1 0x19a [0x40]: PERF_RECORD_MMAP -1/0: [0xffffffff81000000(0x1f000000) @ 0xffffffff81000000]: x [kernel.kallsyms]_text
              0x1da [0x18]: PERF_RECORD_STAT_ROUND
              [acme@ssdandy linux]$
      
           An effort was made to make perf.data files generated like this to
           not generate cryptic messages when processed by older tools.
      
           The 'perf script' bits need rebasing, will go up later.
      
         - Make command line options always available, even when they depend
           on some feature being enabled, warning the user about use of such
           options (Wang Nan)
      
         - Support hw breakpoint events (mem:0xAddress) in the default output
           mode in 'perf script' (Wang Nan)
      
         - Fixes and improvements for supporting annotating ARM binaries,
           support ARM call and jump instructions, more work needed to have
           arch specific stuff separated into tools/perf/arch/*/annotate/
           (Russell King)
      
         - Add initial 'perf config' command, for now just with a --list
           command to the contents of the configuration file in use and a
           basic man page describing its format, commands for doing edits and
           detailed documentation are being reviewed and proof-read.  (Taeung
           Song)
      
         - Allows BPF scriptlets specify arguments to be fetched using DWARF
           info, using a prologue generated at compile/build time (He Kuang,
           Wang Nan)
      
         - Allow attaching BPF scriptlets to module symbols (Wang Nan)
      
         - Allow attaching BPF scriptlets to userspace code using uprobe (Wang
           Nan)
      
         - BPF programs now can specify 'perf probe' tunables via its section
           name, separating key=val values using semicolons (Wang Nan)
      
           Testing some of these new BPF features:
      
              Use case: get callchains when receiving SSL packets, filter then in the
                        kernel, at arbitrary place.
      
              # cat ssl.bpf.c
              #define SEC(NAME) __attribute__((section(NAME), used))
      
              struct pt_regs;
      
              SEC("func=__inet_lookup_established hnum")
              int func(struct pt_regs *ctx, int err, unsigned short port)
              {
                      return err == 0 && port == 443;
              }
      
              char _license[] SEC("license") = "GPL";
              int  _version   SEC("version") = LINUX_VERSION_CODE;
              #
              # perf record -a -g -e ssl.bpf.c
              ^C[ perf record: Woken up 1 times to write data ]
              [ perf record: Captured and wrote 0.787 MB perf.data (3 samples) ]
              # perf script | head -30
              swapper     0 [000] 58783.268118: perf_bpf_probe:func: (ffffffff816a0f60) hnum=0x1bb
                 8a0f61 __inet_lookup_established (/lib/modules/4.3.0+/build/vmlinux)
                 896def ip_rcv_finish (/lib/modules/4.3.0+/build/vmlinux)
                 8976c2 ip_rcv (/lib/modules/4.3.0+/build/vmlinux)
                 855eba __netif_receive_skb_core (/lib/modules/4.3.0+/build/vmlinux)
                 8565d8 __netif_receive_skb (/lib/modules/4.3.0+/build/vmlinux)
                 8572a8 process_backlog (/lib/modules/4.3.0+/build/vmlinux)
                 856b11 net_rx_action (/lib/modules/4.3.0+/build/vmlinux)
                 2a284b __do_softirq (/lib/modules/4.3.0+/build/vmlinux)
                 2a2ba3 irq_exit (/lib/modules/4.3.0+/build/vmlinux)
                 96b7a4 do_IRQ (/lib/modules/4.3.0+/build/vmlinux)
                 969807 ret_from_intr (/lib/modules/4.3.0+/build/vmlinux)
                 2dede5 cpu_startup_entry (/lib/modules/4.3.0+/build/vmlinux)
                 95d5bc rest_init (/lib/modules/4.3.0+/build/vmlinux)
                1163ffa start_kernel ([kernel.vmlinux].init.text)
                11634d7 x86_64_start_reservations ([kernel.vmlinux].init.text)
                1163623 x86_64_start_kernel ([kernel.vmlinux].init.text)
      
              qemu-system-x86  9178 [003] 58785.792417: perf_bpf_probe:func: (ffffffff816a0f60) hnum=0x1bb
                 8a0f61 __inet_lookup_established (/lib/modules/4.3.0+/build/vmlinux)
                 896def ip_rcv_finish (/lib/modules/4.3.0+/build/vmlinux)
                 8976c2 ip_rcv (/lib/modules/4.3.0+/build/vmlinux)
                 855eba __netif_receive_skb_core (/lib/modules/4.3.0+/build/vmlinux)
                 8565d8 __netif_receive_skb (/lib/modules/4.3.0+/build/vmlinux)
                 856660 netif_receive_skb_internal (/lib/modules/4.3.0+/build/vmlinux)
                 8566ec netif_receive_skb_sk (/lib/modules/4.3.0+/build/vmlinux)
                   430a br_handle_frame_finish ([bridge])
                   48bc br_handle_frame ([bridge])
                 855f44 __netif_receive_skb_core (/lib/modules/4.3.0+/build/vmlinux)
                 8565d8 __netif_receive_skb (/lib/modules/4.3.0+/build/vmlinux)
              #
      
         - Use 'perf probe' various options to list functions, see what
           variables can be collected at any given point, experiment first
           collecting without a filter, then filter, use it together with
           'perf trace', 'perf top', with or without callchains, if it
           explodes, please tell us!
      
         - Introduce a new callchain mode: "folded", that will list per line
           representations of all callchains for a give histogram entry,
           facilitating 'perf report' output processing by other tools, such
           as Brendan Gregg's flamegraph tools (Namhyung Kim)
      
           E.g:
      
              # perf report | grep -v ^# | head
                 18.37%     0.00%  swapper  [kernel.kallsyms]   [k] cpu_startup_entry
                                 |
                                 ---cpu_startup_entry
                                    |
                                    |--12.07%--start_secondary
                                    |
                                     --6.30%--rest_init
                                               start_kernel
                                               x86_64_start_reservations
                                               x86_64_start_kernel
               #
      
           Becomes, in "folded" mode:
      
              # perf report -g folded | grep -v ^# | head -5
                  18.37%     0.00%  swapper [kernel.kallsyms]   [k] cpu_startup_entry
                12.07% cpu_startup_entry;start_secondary
                 6.30% cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel
                  16.90%     0.00%  swapper [kernel.kallsyms]   [k] call_cpuidle
                11.23% call_cpuidle;cpu_startup_entry;start_secondary
                 5.67% call_cpuidle;cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel
                  16.90%     0.00%  swapper [kernel.kallsyms]   [k] cpuidle_enter
                11.23% cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary
                 5.67% cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel
                  15.12%     0.00%  swapper [kernel.kallsyms]   [k] cpuidle_enter_state
               #
      
           The user can also select one of "count", "period" or "percent" as
           the first column.
      
        ... and lots of infrastructure enhancements, plus fixes and other
        changes, features I failed to list - see the shortlog and the git log
        for details"
      
      * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (271 commits)
        perf evlist: Add --trace-fields option to show trace fields
        perf record: Store data mmaps for dwarf unwind
        perf libdw: Check for mmaps also in MAP__VARIABLE tree
        perf unwind: Check for mmaps also in MAP__VARIABLE tree
        perf unwind: Use find_map function in access_dso_mem
        perf evlist: Remove perf_evlist__(enable|disable)_event functions
        perf evlist: Make perf_evlist__open() open evsels with their cpus and threads (like perf record does)
        perf report: Show random usage tip on the help line
        perf hists: Export a couple of hist functions
        perf diff: Use perf_hpp__register_sort_field interface
        perf tools: Add overhead/overhead_children keys defaults via string
        perf tools: Remove list entry from struct sort_entry
        perf tools: Include all tools/lib directory for tags/cscope/TAGS targets
        perf script: Align event name properly
        perf tools: Add missing headers in perf's MANIFEST
        perf tools: Do not show trace command if it's not compiled in
        perf report: Change default to use event group view
        perf top: Decay periods in callchains
        tools lib: Move bitmap.[ch] from tools/perf/ to tools/{lib,include}/
        tools lib: Sync tools/lib/find_bit.c with the kernel
        ...
      5cb52b5e
    • Linus Torvalds's avatar
      Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 24af98c4
      Linus Torvalds authored
      Pull locking updates from Ingo Molnar:
       "So we have a laundry list of locking subsystem changes:
      
         - continuing barrier API and code improvements
      
         - futex enhancements
      
         - atomics API improvements
      
         - pvqspinlock enhancements: in particular lock stealing and adaptive
           spinning
      
         - qspinlock micro-enhancements"
      
      * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        futex: Allow FUTEX_CLOCK_REALTIME with FUTEX_WAIT op
        futex: Cleanup the goto confusion in requeue_pi()
        futex: Remove pointless put_pi_state calls in requeue()
        futex: Document pi_state refcounting in requeue code
        futex: Rename free_pi_state() to put_pi_state()
        futex: Drop refcount if requeue_pi() acquired the rtmutex
        locking/barriers, arch: Remove ambiguous statement in the smp_store_mb() documentation
        lcoking/barriers, arch: Use smp barriers in smp_store_release()
        locking/cmpxchg, arch: Remove tas() definitions
        locking/pvqspinlock: Queue node adaptive spinning
        locking/pvqspinlock: Allow limited lock stealing
        locking/pvqspinlock: Collect slowpath lock statistics
        sched/core, locking: Document Program-Order guarantees
        locking, sched: Introduce smp_cond_acquire() and use it
        locking/pvqspinlock, x86: Optimize the PV unlock code path
        locking/qspinlock: Avoid redundant read of next pointer
        locking/qspinlock: Prefetch the next node cacheline
        locking/qspinlock: Use _acquire/_release() versions of cmpxchg() & xchg()
        atomics: Add test for atomic operations with _relaxed variants
      24af98c4
    • Linus Torvalds's avatar
      Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 9061cbe6
      Linus Torvalds authored
      Pull RCU updates from Ingo Molnar:
       "The changes in this cycle were:
      
         - Adding transitivity uniformly to rcu_node structure ->lock
           acquisitions.  (This is implemented by the first two commits on top
           of v4.4-rc2 due to the pervasive nature of this change.)
      
         - Documentation updates, including RCU requirements.
      
         - Expedited grace-period changes.
      
         - Miscellaneous fixes.
      
         - Linked-list fixes, courtesy of KTSAN.
      
         - Torture-test updates.
      
         - Late-breaking fix to sysrq-generated crash.
      
        One thing I should note is that these pieces of documentation are
        fairly large files:
      
          .../RCU/Design/Requirements/Requirements.html      | 2897 ++++++++++++++++++++
          .../RCU/Design/Requirements/Requirements.htmlx     | 2741 ++++++++++++++++++
      
        and are written in HTML, not the usual .txt style.  I hope they are
        fine"
      
      Paul McKenney explains the html docs:
       "For whatever it is worth, the reason for this unconventional choice
        was that attempts to do the diagrams in ASCII art failed miserably.
      
        And attempts to do ASCII art for the upcoming documentation of the
        data structures failed even more miserably"
      
      * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (49 commits)
        sysrq: Fix warning in sysrq generated crash.
        list: Add lockless list traversal primitives
        rcu: Make rcu_gp_init() be bool rather than int
        rcu: Move wakeup out from under rnp->lock
        rcu: Fix comment for rcu_dereference_raw_notrace
        rcu: Don't redundantly disable irqs in rcu_irq_{enter,exit}()
        rcu: Make cpu_needs_another_gp() be bool
        rcu: Eliminate unused rcu_init_one() argument
        rcu: Remove TINY_RCU bloat from pointless boot parameters
        torture: Place console.log files correctly from the get-go
        torture: Abbreviate console error dump
        rcutorture: Print symbolic name for ->gp_state
        rcutorture: Print symbolic name for rcu_torture_writer_state
        rcutorture: Remove CONFIG_RCU_USER_QS from rcutorture selftest doc
        rcutorture: Default grace period to three minutes, allow override
        rcutorture:  Dump stack when GP kthread stalls
        rcutorture: Flag nonexistent RCU GP kthread
        rcutorture: Add batch number to script printout
        Documentation/memory-barriers.txt: Fix ACCESS_ONCE thinko
        documentation: Update RCU requirements based on expedited changes
        ...
      9061cbe6
    • Linus Torvalds's avatar
      Merge branch 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · ddf1d623
      Linus Torvalds authored
      Pull vfs xattr updates from Al Viro:
       "Andreas' xattr cleanup series.
      
        It's a followup to his xattr work that went in last cycle; -0.5KLoC"
      
      * 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        xattr handlers: Simplify list operation
        ocfs2: Replace list xattr handler operations
        nfs: Move call to security_inode_listsecurity into nfs_listxattr
        xfs: Change how listxattr generates synthetic attributes
        tmpfs: listxattr should include POSIX ACL xattrs
        tmpfs: Use xattr handler infrastructure
        btrfs: Use xattr handler infrastructure
        vfs: Distinguish between full xattr names and proper prefixes
        posix acls: Remove duplicate xattr name definitions
        gfs2: Remove gfs2_xattr_acl_chmod
        vfs: Remove vfs_xattr_cmp
      ddf1d623
    • Linus Torvalds's avatar
      Merge branch 'work.symlinks' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 32fb3784
      Linus Torvalds authored
      Pull vfs RCU symlink updates from Al Viro:
       "Replacement of ->follow_link/->put_link, allowing to stay in RCU mode
        even if the symlink is not an embedded one.
      
        No changes since the mailbomb on Jan 1"
      
      * 'work.symlinks' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        switch ->get_link() to delayed_call, kill ->put_link()
        kill free_page_put_link()
        teach nfs_get_link() to work in RCU mode
        teach proc_self_get_link()/proc_thread_self_get_link() to work in RCU mode
        teach shmem_get_link() to work in RCU mode
        teach page_get_link() to work in RCU mode
        replace ->follow_link() with new method that could stay in RCU mode
        don't put symlink bodies in pagecache into highmem
        namei: page_getlink() and page_follow_link_light() are the same thing
        ufs: get rid of ->setattr() for symlinks
        udf: don't duplicate page_symlink_inode_operations
        logfs: don't duplicate page_symlink_inode_operations
        switch befs long symlinks to page_symlink_operations
      32fb3784
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 19ccb28e
      Linus Torvalds authored
      Pull vfs compat_ioctl fixes from Al Viro:
       "This is basically Jann's patches from last week.  I have _not_
        included the stuff like switching i2c to ->compat_ioctl() into this
        one - those need more testing.
      
        Ideally I would like fs/compat_ioctl.c shrunk a lot, but that's a
        separate story"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        compat_ioctl: don't call do_ioctl under set_fs(KERNEL_DS)
        compat_ioctl: don't pass fd around when not needed
        compat_ioctl: don't look up the fd twice
      19ccb28e
    • H.J. Lu's avatar
      x86/boot: Double BOOT_HEAP_SIZE to 64KB · 8c31902c
      H.J. Lu authored
      When decompressing kernel image during x86 bootup, malloc memory
      for ELF program headers may run out of heap space, which leads
      to system halt.  This patch doubles BOOT_HEAP_SIZE to 64KB.
      
      Tested with 32-bit kernel which failed to boot without this patch.
      Signed-off-by: default avatarH.J. Lu <hjl.tools@gmail.com>
      Acked-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      8c31902c
    • Andy Lutomirski's avatar
      x86/mm: Add barriers and document switch_mm()-vs-flush synchronization · 71b3c126
      Andy Lutomirski authored
      When switch_mm() activates a new PGD, it also sets a bit that
      tells other CPUs that the PGD is in use so that TLB flush IPIs
      will be sent.  In order for that to work correctly, the bit
      needs to be visible prior to loading the PGD and therefore
      starting to fill the local TLB.
      
      Document all the barriers that make this work correctly and add
      a couple that were missing.
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      71b3c126
  4. 10 Jan, 2016 1 commit
  5. 09 Jan, 2016 5 commits
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · eac6f76a
      Linus Torvalds authored
      Pull SCSI fix from James Bottomley:
       "A single fix for machines with pages > 4k (PPC mostly).
      
        There's a bug in our optimal transfer size code where we don't account
        for pages > 4k and can set the transfer size to be less than the page
        size causing nasty failures"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        sd: Reject optimal transfer length smaller than page size
      eac6f76a
    • Linus Torvalds's avatar
      Merge tag 'pci-v4.4-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · c0cb1393
      Linus Torvalds authored
      Pull PCI fixlet from Bjorn Helgaas:
       "This marks the TI DRA7xx host bridge driver as broken.  Apparently it
        has never worked without some additional out-of-tree code, so I'm
        going to mark it broken now and remove it completely next cycle unless
        it's fixed"
      
      * tag 'pci-v4.4-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
        PCI: dra7xx: Mark driver as broken
      c0cb1393
    • Ingo Molnar's avatar
      Merge tag 'perf-core-for-mingo' of... · 3eb9ede2
      Ingo Molnar authored
      Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
      
      Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
      
      New features:
      
      - Allow using trace events fields as sort order keys, making 'perf evlist --trace_fields'
        show those, and then the user can select a subset and use like:
      
          perf top -e sched:sched_switch -s prev_comm,next_comm
      
        That works as well in 'perf report' when handling files containing
        tracepoints.
      
        The default when just tracepoint events are found in a perf.data file is to
        format it like ftrace, using the libtraceevent formatters, plugins, etc (Namhyung Kim)
      
      - Add support in 'perf script' to process 'perf stat record' generated files,
        culminating in a python perf script that calculates CPI (Cycles per
        Instruction) (Jiri Olsa)
      
      - Show random perf tool tips in the 'perf report' bottom line (Namhyung Kim)
      
      - perf report now defaults to --group if the perf.data file has grouped events, try it with:
      
        # perf record -e '{cycles,instructions}' -a sleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 1.093 MB perf.data (1247 samples) ]
        # perf report
        # Samples: 1K of event 'anon group { cycles, instructions }'
        # Event count (approx.): 1955219195
        #
        #       Overhead  Command     Shared Object      Symbol
      
           2.86%   0.22%  swapper     [kernel.kallsyms]  [k] intel_idle
           1.05%   0.33%  firefox     libxul.so          [.] js::SetObjectElement
           1.05%   0.00%  kworker/0:3 [kernel.kallsyms]  [k] gen6_ring_get_seqno
           0.88%   0.17%  chrome      chrome             [.] 0x0000000000ee27ab
           0.65%   0.86%  firefox     libxul.so          [.] js::ValueToId<(js::AllowGC)1>
           0.64%   0.23%  JS Helper   libxul.so          [.] js::SplayTree<js::jit::LiveRange*, js::jit::LiveRange>::splay
           0.62%   1.27%  firefox     libxul.so          [.] js::GetIterator
           0.61%   1.74%  firefox     libxul.so          [.] js::NativeSetProperty
           0.61%   0.31%  firefox     libxul.so          [.] js::SetPropertyByDefining
      
      User visible fixes:
      
      - Coect data mmaps so that the DWARF unwinder can handle usecases needing them,
        like softice (Jiri Olsa)
      
      - Decay callchains in fractal mode, fixing up cases where 'perf top -g' would
        show entries with more than 100% (Namhyung Kim)
      
      Infrastructure changes:
      
      - Sync tools/lib with the lib/ in the kernel sources for find_bit.c and
        move bitmap.[ch] from tools/perf/util/ to tools/lib/ (Arnaldo Carvalho de Melo)
      
      - No need to set attr.sample_freq in some 'perf test' entries that only
        want to deal with PERF_RECORD_ meta-events, improve a bit error output
        for CQM test (Arnaldo Carvalho de Melo)
      
      - Fix python binding build, adding some missing object files now required
        due to cpumap using find_bit stuff (Arnaldo Carvalho de Melo)
      
      - tools/build improvemnts (Jiri Olsa)
      
      - Add more files to cscope/ctags databases (Jiri Olsa)
      
      - Do not show 'trace' in 'perf help' if it is not compiled in (Jiri Olsa)
      
      - Make perf_evlist__open() open evsels with their cpus and threads,
        like perf record does, making them consistent (Adrian Hunter)
      
      - Fix pmu snapshot initialization bug (Stephane Eranian)
      
      - Add missing headers in perf's MANIFEST (Wang Nan)
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      3eb9ede2
    • Michal Hocko's avatar
      vmstat: allocate vmstat_wq before it is used · 751e5f5c
      Michal Hocko authored
      kernel test robot has reported the following crash:
      
        BUG: unable to handle kernel NULL pointer dereference at 00000100
        IP: [<c1074df6>] __queue_work+0x26/0x390
        *pdpt = 0000000000000000 *pde = f000ff53f000ff53 *pde = f000ff53f000ff53
        Oops: 0000 [#1] PREEMPT PREEMPT SMP SMP
        CPU: 0 PID: 24 Comm: kworker/0:1 Not tainted 4.4.0-rc4-00139-g373ccbe5 #1
        Workqueue: events vmstat_shepherd
        task: cb684600 ti: cb7ba000 task.ti: cb7ba000
        EIP: 0060:[<c1074df6>] EFLAGS: 00010046 CPU: 0
        EIP is at __queue_work+0x26/0x390
        EAX: 00000046 EBX: cbb37800 ECX: cbb37800 EDX: 00000000
        ESI: 00000000 EDI: 00000000 EBP: cb7bbe68 ESP: cb7bbe38
         DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
        CR0: 8005003b CR2: 00000100 CR3: 01fd5000 CR4: 000006b0
        Stack:
        Call Trace:
          __queue_delayed_work+0xa1/0x160
          queue_delayed_work_on+0x36/0x60
          vmstat_shepherd+0xad/0xf0
          process_one_work+0x1aa/0x4c0
          worker_thread+0x41/0x440
          kthread+0xb0/0xd0
          ret_from_kernel_thread+0x21/0x40
      
      The reason is that start_shepherd_timer schedules the shepherd work item
      which uses vmstat_wq (vmstat_shepherd) before setup_vmstat allocates
      that workqueue so if the further initialization takes more than HZ we
      might end up scheduling on a NULL vmstat_wq.  This is really unlikely
      but not impossible.
      
      Fixes: 373ccbe5 ("mm, vmstat: allow WQ concurrency to discover memory reclaim doesn't make any progress")
      Reported-by: default avatarkernel test robot <ying.huang@linux.intel.com>
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Tested-by: default avatarTetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Cc: stable@vger.kernel.org
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      751e5f5c
    • Jann Horn's avatar
      compat_ioctl: don't call do_ioctl under set_fs(KERNEL_DS) · a7f61e89
      Jann Horn authored
      This replaces all code in fs/compat_ioctl.c that translated
      ioctl arguments into a in-kernel structure, then performed
      do_ioctl under set_fs(KERNEL_DS), with code that allocates
      data on the user stack and can call the VFS ioctl handler
      under USER_DS.
      
      This is done as a hardening measure because the caller
      does not know what kind of ioctl handler will be invoked,
      only that no corresponding compat_ioctl handler exists and
      what the ioctl command number is. The accidental
      invocation of an unlocked_ioctl handler that unexpectedly
      calls copy_to_user could be a severe security issue.
      Signed-off-by: default avatarJann Horn <jann@thejh.net>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      a7f61e89