1. 23 Jun, 2019 4 commits
    • Fenghua Yu's avatar
      x86/umwait: Add sysfs interface to control umwait maximum time · bd9a0c97
      Fenghua Yu authored
      IA32_UMWAIT_CONTROL[31:2] determines the maximum time in TSC-quanta
      that processor can stay in C0.1 or C0.2. A zero value means no maximum
      time.
      
      Each instruction sets its own deadline in the instruction's implicit
      input EDX:EAX value. The instruction wakes up if the time-stamp counter
      reaches or exceeds the specified deadline, or the umwait maximum time
      expires, or a store happens in the monitored address range in umwait.
      
      The administrator can write an unsigned 32-bit number to
      /sys/devices/system/cpu/umwait_control/max_time to change the default
      value. Note that a value of zero means there is no limit. The lower two
      bits of the value must be zero.
      
      [ tglx: Simplify the write function. Massage changelog ]
      Signed-off-by: default avatarFenghua Yu <fenghua.yu@intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarAshok Raj <ashok.raj@intel.com>
      Reviewed-by: default avatarTony Luck <tony.luck@intel.com>
      Cc: "Borislav Petkov" <bp@alien8.de>
      Cc: "H Peter Anvin" <hpa@zytor.com>
      Cc: "Andy Lutomirski" <luto@kernel.org>
      Cc: "Peter Zijlstra" <peterz@infradead.org>
      Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
      Link: https://lkml.kernel.org/r/1560994438-235698-5-git-send-email-fenghua.yu@intel.com
      bd9a0c97
    • Fenghua Yu's avatar
      x86/umwait: Add sysfs interface to control umwait C0.2 state · ff4b353f
      Fenghua Yu authored
      C0.2 state in umwait and tpause instructions can be enabled or disabled
      on a processor through IA32_UMWAIT_CONTROL MSR register.
      
      By default, C0.2 is enabled and the user wait instructions results in
      lower power consumption with slower wakeup time.
      
      But in real time systems which require faster wakeup time although power
      savings could be smaller, the administrator needs to disable C0.2 and all
      umwait invocations from user applications use C0.1.
      
      Create a sysfs interface which allows the administrator to control C0.2
      state during run time.
      
      Andy Lutomirski suggested to turn off local irqs before writing the MSR to
      ensure the cached control value is not changed by a concurrent sysfs write
      from a different CPU via IPI.
      
      [ tglx: Simplified the update logic in the write function and got rid of
        	all the convoluted type casts. Added a shared update function and
      	made the namespace consistent. Moved the sysfs create invocation.
      	Massaged changelog ]
      Signed-off-by: default avatarFenghua Yu <fenghua.yu@intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarAshok Raj <ashok.raj@intel.com>
      Reviewed-by: default avatarTony Luck <tony.luck@intel.com>
      Cc: "Borislav Petkov" <bp@alien8.de>
      Cc: "H Peter Anvin" <hpa@zytor.com>
      Cc: "Andy Lutomirski" <luto@kernel.org>
      Cc: "Peter Zijlstra" <peterz@infradead.org>
      Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
      Link: https://lkml.kernel.org/r/1560994438-235698-4-git-send-email-fenghua.yu@intel.com
      ff4b353f
    • Fenghua Yu's avatar
      x86/umwait: Initialize umwait control values · bd688c69
      Fenghua Yu authored
      umwait or tpause allows the processor to enter a light-weight
      power/performance optimized state (C0.1 state) or an improved
      power/performance optimized state (C0.2 state) for a period specified by
      the instruction or until the system time limit or until a store to the
      monitored address range in umwait.
      
      IA32_UMWAIT_CONTROL MSR register allows the OS to enable/disable C0.2 on
      the processor and to set the maximum time the processor can reside in C0.1
      or C0.2.
      
      By default C0.2 is enabled so the user wait instructions can enter the
      C0.2 state to save more power with slower wakeup time.
      
      Andy Lutomirski proposed to set the maximum umwait time to 100000 cycles by
      default. A quote from Andy:
      
        "What I want to avoid is the case where it works dramatically differently
         on NO_HZ_FULL systems as compared to everything else. Also, UMWAIT may
         behave a bit differently if the max timeout is hit, and I'd like that
         path to get exercised widely by making it happen even on default
         configs."
      
      A sysfs interface to adjust the time and the C0.2 enablement is provided in
      a follow up change.
      
      [ tglx: Renamed MSR_IA32_UMWAIT_CONTROL_MAX_TIME to
        	MSR_IA32_UMWAIT_CONTROL_TIME_MASK because the constant is used as
        	mask throughout the code.
      	Massaged comments and changelog ]
      Signed-off-by: default avatarFenghua Yu <fenghua.yu@intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarAshok Raj <ashok.raj@intel.com>
      Reviewed-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: "Borislav Petkov" <bp@alien8.de>
      Cc: "H Peter Anvin" <hpa@zytor.com>
      Cc: "Peter Zijlstra" <peterz@infradead.org>
      Cc: "Tony Luck" <tony.luck@intel.com>
      Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
      Link: https://lkml.kernel.org/r/1560994438-235698-3-git-send-email-fenghua.yu@intel.com
      bd688c69
    • Fenghua Yu's avatar
      x86/cpufeatures: Enumerate user wait instructions · 6dbbf5ec
      Fenghua Yu authored
      umonitor, umwait, and tpause are a set of user wait instructions.
      
      umonitor arms address monitoring hardware using an address. The
      address range is determined by using CPUID.0x5. A store to
      an address within the specified address range triggers the
      monitoring hardware to wake up the processor waiting in umwait.
      
      umwait instructs the processor to enter an implementation-dependent
      optimized state while monitoring a range of addresses. The optimized
      state may be either a light-weight power/performance optimized state
      (C0.1 state) or an improved power/performance optimized state
      (C0.2 state).
      
      tpause instructs the processor to enter an implementation-dependent
      optimized state C0.1 or C0.2 state and wake up when time-stamp counter
      reaches specified timeout.
      
      The three instructions may be executed at any privilege level.
      
      The instructions provide power saving method while waiting in
      user space. Additionally, they can allow a sibling hyperthread to
      make faster progress while this thread is waiting. One example of an
      application usage of umwait is when waiting for input data from another
      application, such as a user level multi-threaded packet processing
      engine.
      
      Availability of the user wait instructions is indicated by the presence
      of the CPUID feature flag WAITPKG CPUID.0x07.0x0:ECX[5].
      
      Detailed information on the instructions and CPUID feature WAITPKG flag
      can be found in the latest Intel Architecture Instruction Set Extensions
      and Future Features Programming Reference and Intel 64 and IA-32
      Architectures Software Developer's Manual.
      Signed-off-by: default avatarFenghua Yu <fenghua.yu@intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarAshok Raj <ashok.raj@intel.com>
      Reviewed-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: "Borislav Petkov" <bp@alien8.de>
      Cc: "H Peter Anvin" <hpa@zytor.com>
      Cc: "Peter Zijlstra" <peterz@infradead.org>
      Cc: "Tony Luck" <tony.luck@intel.com>
      Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
      Link: https://lkml.kernel.org/r/1560994438-235698-2-git-send-email-fenghua.yu@intel.com
      6dbbf5ec
  2. 22 Jun, 2019 22 commits
  3. 20 Jun, 2019 3 commits
    • Fenghua Yu's avatar
      x86/cpufeatures: Enumerate the new AVX512 BFLOAT16 instructions · b302e4b1
      Fenghua Yu authored
      AVX512 BFLOAT16 instructions support 16-bit BFLOAT16 floating-point
      format (BF16) for deep learning optimization.
      
      BF16 is a short version of 32-bit single-precision floating-point
      format (FP32) and has several advantages over 16-bit half-precision
      floating-point format (FP16). BF16 keeps FP32 accumulation after
      multiplication without loss of precision, offers more than enough
      range for deep learning training tasks, and doesn't need to handle
      hardware exception.
      
      AVX512 BFLOAT16 instructions are enumerated in CPUID.7.1:EAX[bit 5]
      AVX512_BF16.
      
      CPUID.7.1:EAX contains only feature bits. Reuse the currently empty
      word 12 as a pure features word to hold the feature bits including
      AVX512_BF16.
      
      Detailed information of the CPUID bit and AVX512 BFLOAT16 instructions
      can be found in the latest Intel Architecture Instruction Set Extensions
      and Future Features Programming Reference.
      
       [ bp: Check CPUID(7) subleaf validity before accessing subleaf 1. ]
      Signed-off-by: default avatarFenghua Yu <fenghua.yu@intel.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: "Chang S. Bae" <chang.seok.bae@intel.com>
      Cc: Frederic Weisbecker <frederic@kernel.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Peter Feiner <pfeiner@google.com>
      Cc: Radim Krcmar <rkrcmar@redhat.com>
      Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
      Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
      Cc: Robert Hoo <robert.hu@linux.intel.com>
      Cc: "Sean J Christopherson" <sean.j.christopherson@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Lendacky <Thomas.Lendacky@amd.com>
      Cc: x86 <x86@kernel.org>
      Link: https://lkml.kernel.org/r/1560794416-217638-3-git-send-email-fenghua.yu@intel.com
      b302e4b1
    • Fenghua Yu's avatar
      x86/cpufeatures: Combine word 11 and 12 into a new scattered features word · acec0ce0
      Fenghua Yu authored
      It's a waste for the four X86_FEATURE_CQM_* feature bits to occupy two
      whole feature bits words. To better utilize feature words, re-define
      word 11 to host scattered features and move the four X86_FEATURE_CQM_*
      features into Linux defined word 11. More scattered features can be
      added in word 11 in the future.
      
      Rename leaf 11 in cpuid_leafs to CPUID_LNX_4 to reflect it's a
      Linux-defined leaf.
      
      Rename leaf 12 as CPUID_DUMMY which will be replaced by a meaningful
      name in the next patch when CPUID.7.1:EAX occupies world 12.
      
      Maximum number of RMID and cache occupancy scale are retrieved from
      CPUID.0xf.1 after scattered CQM features are enumerated. Carve out the
      code into a separate function.
      
      KVM doesn't support resctrl now. So it's safe to move the
      X86_FEATURE_CQM_* features to scattered features word 11 for KVM.
      Signed-off-by: default avatarFenghua Yu <fenghua.yu@intel.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: Aaron Lewis <aaronlewis@google.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Babu Moger <babu.moger@amd.com>
      Cc: "Chang S. Bae" <chang.seok.bae@intel.com>
      Cc: "Sean J Christopherson" <sean.j.christopherson@intel.com>
      Cc: Frederic Weisbecker <frederic@kernel.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: kvm ML <kvm@vger.kernel.org>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Peter Feiner <pfeiner@google.com>
      Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
      Cc: Ravi V Shankar <ravi.v.shankar@intel.com>
      Cc: Sherry Hurwitz <sherry.hurwitz@amd.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Lendacky <Thomas.Lendacky@amd.com>
      Cc: x86 <x86@kernel.org>
      Link: https://lkml.kernel.org/r/1560794416-217638-2-git-send-email-fenghua.yu@intel.com
      acec0ce0
    • Borislav Petkov's avatar
      x86/cpufeatures: Carve out CQM features retrieval · 45fc56e6
      Borislav Petkov authored
      ... into a separate function for better readability. Split out from a
      patch from Fenghua Yu <fenghua.yu@intel.com> to keep the mechanical,
      sole code movement separate for easy review.
      
      No functional changes.
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: x86@kernel.org
      45fc56e6
  4. 19 Jun, 2019 1 commit
  5. 14 Jun, 2019 2 commits
  6. 13 Jun, 2019 1 commit
  7. 09 Jun, 2019 1 commit
  8. 08 Jun, 2019 6 commits
    • Linus Torvalds's avatar
      Merge tag 'ceph-for-5.2-rc4' of git://github.com/ceph/ceph-client · 2759e05c
      Linus Torvalds authored
      Pull ceph fixes from Ilya Dryomov:
       "A change to call iput() asynchronously to avoid a possible deadlock
        when iput_final() needs to wait for in-flight I/O (e.g. readahead) and
        a fixup for a cleanup that went into -rc1"
      
      * tag 'ceph-for-5.2-rc4' of git://github.com/ceph/ceph-client:
        ceph: fix error handling in ceph_get_caps()
        ceph: avoid iput_final() while holding mutex or in dispatch thread
        ceph: single workqueue for inode related works
      2759e05c
    • Linus Torvalds's avatar
      Merge tag 'for-linus-5.2b-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · 8e61f6f7
      Linus Torvalds authored
      Pull xen fix from Juergen Gross:
       "Just one fix for the Xen block frontend driver avoiding allocations
        with order > 0"
      
      * tag 'for-linus-5.2b-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        xen-blkfront: switch kcalloc to kvcalloc for large array allocation
      8e61f6f7
    • Linus Torvalds's avatar
      Merge tag 's390-5.2-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · 3d4645bf
      Linus Torvalds authored
      Pull s390 fixes from Heiko Carstens:
      
       - fix stack unwinder: the stack unwinder rework has on off-by-one bug
         which prevents following stack backchains over more than one context
         (e.g. irq -> process).
      
       - fix address space detection in exception handler: if user space
         switches to access register mode, which is not supported anymore, the
         exception handler may resolve to the wrong address space.
      
      * tag 's390-5.2-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        s390/unwind: correct stack switching during unwind
        s390/mm: fix address space detection in exception handling
      3d4645bf
    • Linus Torvalds's avatar
      Merge tag 'mips_fixes_5.2_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux · d0cc617a
      Linus Torvalds authored
      Pull MIPS fixes from Paul Burton:
      
       - Declare ginvt() __always_inline due to its use of an argument as an
         inline asm immediate.
      
       - A VDSO build fix following Kbuild changes made this cycle.
      
       - A fix for boot failures on txx9 systems following memory
         initialization changes made this cycle.
      
       - Bounds check virt_addr_valid() to prevent it spuriously indicating
         that bogus addresses are valid, in turn fixing hardened usercopy
         failures that have been present since v4.12.
      
       - Build uImage.gz for pistachio systems by default, since this is the
         image we need in order to actually boot on a board.
      
       - Remove an unused variable in our uprobes code.
      
      * tag 'mips_fixes_5.2_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
        MIPS: uprobes: remove set but not used variable 'epc'
        MIPS: pistachio: Build uImage.gz by default
        MIPS: Make virt_addr_valid() return bool
        MIPS: Bounds check virt_addr_valid
        MIPS: TXx9: Fix boot crash in free_initmem()
        MIPS: remove a space after -I to cope with header search paths for VDSO
        MIPS: mark ginvt() as __always_inline
      d0cc617a
    • Linus Torvalds's avatar
      Merge tag 'spdx-5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core · 9331b674
      Linus Torvalds authored
      Pull yet more SPDX updates from Greg KH:
       "Another round of SPDX header file fixes for 5.2-rc4
      
        These are all more "GPL-2.0-or-later" or "GPL-2.0-only" tags being
        added, based on the text in the files. We are slowly chipping away at
        the 700+ different ways people tried to write the license text. All of
        these were reviewed on the spdx mailing list by a number of different
        people.
      
        We now have over 60% of the kernel files covered with SPDX tags:
      	$ ./scripts/spdxcheck.py -v 2>&1 | grep Files
      	Files checked:            64533
      	Files with SPDX:          40392
      	Files with errors:            0
      
        I think the majority of the "easy" fixups are now done, it's now the
        start of the longer-tail of crazy variants to wade through"
      
      * tag 'spdx-5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (159 commits)
        treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 450
        treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 449
        treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 448
        treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 446
        treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 445
        treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 444
        treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 443
        treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 442
        treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441
        treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 440
        treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 438
        treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 437
        treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 436
        treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 435
        treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 434
        treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 433
        treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 432
        treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 431
        treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 430
        treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 429
        ...
      9331b674
    • Linus Torvalds's avatar
      Merge tag 'char-misc-5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · 1ce2c851
      Linus Torvalds authored
      Pull char/misc driver fixes from Greg KH:
       "Here are some small char and misc driver fixes for 5.2-rc4 to resolve
        a number of reported issues.
      
        The most "notable" one here is the kernel headers in proc^Wsysfs
        fixes. Those changes move the header file info into sysfs and fixes
        the build issues that you reported.
      
        Other than that, a bunch of small habanalabs driver fixes, some fpga
        driver fixes, and a few other tiny driver fixes.
      
        All of these have been in linux-next for a while with no reported
        issues"
      
      * tag 'char-misc-5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
        habanalabs: Read upper bits of trace buffer from RWPHI
        habanalabs: Fix virtual address access via debugfs for 2MB pages
        fpga: zynqmp-fpga: Correctly handle error pointer
        habanalabs: fix bug in checking huge page optimization
        habanalabs: Avoid using a non-initialized MMU cache mutex
        habanalabs: fix debugfs code
        uapi/habanalabs: add opcode for enable/disable device debug mode
        habanalabs: halt debug engines on user process close
        test_firmware: Use correct snprintf() limit
        genwqe: Prevent an integer overflow in the ioctl
        parport: Fix mem leak in parport_register_dev_model
        fpga: dfl: expand minor range when registering chrdev region
        fpga: dfl: Add lockdep classes for pdata->lock
        fpga: dfl: afu: Pass the correct device to dma_mapping_error()
        fpga: stratix10-soc: fix use-after-free on s10_init()
        w1: ds2408: Fix typo after 49695ac4 (reset on output_write retry with readback)
        kheaders: Do not regenerate archive if config is not changed
        kheaders: Move from proc to sysfs
        lkdtm/bugs: Adjust recursion test to avoid elision
        lkdtm/usercopy: Moves the KERNEL_DS test to non-canonical
      1ce2c851