1. 05 Mar, 2016 16 commits
  2. 03 Mar, 2016 8 commits
  3. 02 Mar, 2016 11 commits
    • Cyril Bur's avatar
      powerpc: Add the ability to save VSX without giving it up · bf6a4d5b
      Cyril Bur authored
      This patch adds the ability to be able to save the VSX registers to the
      thread struct without giving up (disabling the facility) next time the
      process returns to userspace.
      
      This patch builds on a previous optimisation for the FPU and VEC registers
      in the thread copy path to avoid a possibly pointless reload of VSX state.
      Signed-off-by: default avatarCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      bf6a4d5b
    • Cyril Bur's avatar
      powerpc: Add the ability to save Altivec without giving it up · 6f515d84
      Cyril Bur authored
      This patch adds the ability to be able to save the VEC registers to the
      thread struct without giving up (disabling the facility) next time the
      process returns to userspace.
      
      This patch builds on a previous optimisation for the FPU registers in the
      thread copy path to avoid a possibly pointless reload of VEC state.
      Signed-off-by: default avatarCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      6f515d84
    • Cyril Bur's avatar
      powerpc: Add the ability to save FPU without giving it up · 8792468d
      Cyril Bur authored
      This patch adds the ability to be able to save the FPU registers to the
      thread struct without giving up (disabling the facility) next time the
      process returns to userspace.
      
      This patch optimises the thread copy path (as a result of a fork() or
      clone()) so that the parent thread can return to userspace with hot
      registers avoiding a possibly pointless reload of FPU register state.
      Signed-off-by: default avatarCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      8792468d
    • Cyril Bur's avatar
      powerpc: Prepare for splitting giveup_{fpu, altivec, vsx} in two · de2a20aa
      Cyril Bur authored
      This prepares for the decoupling of saving {fpu,altivec,vsx} registers and
      marking {fpu,altivec,vsx} as being unused by a thread.
      
      Currently giveup_{fpu,altivec,vsx}() does both however optimisations to
      task switching can be made if these two operations are decoupled.
      save_all() will permit the saving of registers to thread structs and leave
      threads MSR with bits enabled.
      
      This patch introduces no functional change.
      Signed-off-by: default avatarCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      de2a20aa
    • Cyril Bur's avatar
      powerpc: Restore FPU/VEC/VSX if previously used · 70fe3d98
      Cyril Bur authored
      Currently the FPU, VEC and VSX facilities are lazily loaded. This is not
      a problem unless a process is using these facilities.
      
      Modern versions of GCC are very good at automatically vectorising code,
      new and modernised workloads make use of floating point and vector
      facilities, even the kernel makes use of vectorised memcpy.
      
      All this combined greatly increases the cost of a syscall since the
      kernel uses the facilities sometimes even in syscall fast-path making it
      increasingly common for a thread to take an *_unavailable exception soon
      after a syscall, not to mention potentially taking all three.
      
      The obvious overcompensation to this problem is to simply always load
      all the facilities on every exit to userspace. Loading up all FPU, VEC
      and VSX registers every time can be expensive and if a workload does
      avoid using them, it should not be forced to incur this penalty.
      
      An 8bit counter is used to detect if the registers have been used in the
      past and the registers are always loaded until the value wraps to back
      to zero.
      
      Several versions of the assembly in entry_64.S were tested:
      
        1. Always calling C.
        2. Performing a common case check and then calling C.
        3. A complex check in asm.
      
      After some benchmarking it was determined that avoiding C in the common
      case is a performance benefit (option 2). The full check in asm (option
      3) greatly complicated that codepath for a negligible performance gain
      and the trade-off was deemed not worth it.
      Signed-off-by: default avatarCyril Bur <cyrilbur@gmail.com>
      [mpe: Move load_vec in the struct to fill an existing hole, reword change log]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      
      fixup
      70fe3d98
    • Cyril Bur's avatar
      powerpc: Explicitly disable math features when copying thread · d272f667
      Cyril Bur authored
      Currently when threads get scheduled off they always giveup the FPU,
      Altivec (VMX) and Vector (VSX) units if they were using them. When they are
      scheduled back on a fault is then taken to enable each facility and load
      registers. As a result explicitly disabling FPU/VMX/VSX has not been
      necessary.
      
      Future changes and optimisations remove this mandatory giveup and fault
      which could cause calls such as clone() and fork() to copy threads and run
      them later with FPU/VMX/VSX enabled but no registers loaded.
      
      This patch starts the process of having MSR_{FP,VEC,VSX} mean that a
      threads registers are hot while not having MSR_{FP,VEC,VSX} means that the
      registers must be loaded. This allows for a smarter return to userspace.
      Signed-off-by: default avatarCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      d272f667
    • Cyril Bur's avatar
      selftests/powerpc: Test FPU and VMX regs in signal ucontext · 48e8c571
      Cyril Bur authored
      Load up the non volatile FPU and VMX regs and ensure that they are the
      expected value in a signal handler
      Signed-off-by: default avatarCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      48e8c571
    • Cyril Bur's avatar
      selftests/powerpc: Test preservation of FPU and VMX regs across preemption · e5ab8be6
      Cyril Bur authored
      Loop in assembly checking the registers with many threads.
      Signed-off-by: default avatarCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      e5ab8be6
    • Cyril Bur's avatar
      selftests/powerpc: Test the preservation of FPU and VMX regs across syscall · 01127f1e
      Cyril Bur authored
      Test that the non volatile floating point and Altivec registers get
      correctly preserved across the fork() syscall.
      
      fork() works nicely for this purpose, the registers should be the same for
      both parent and child
      Signed-off-by: default avatarCyril Bur <cyrilbur@gmail.com>
      [mpe: Add include guards to basic_asm.h, minor formatting]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      01127f1e
    • Suraj Jitindar Singh's avatar
      selftests/powerpc: Remove -flto from common CFLAGS · a4cf0a2e
      Suraj Jitindar Singh authored
      LTO can cause GCC to inline some functions which have attributes set.
      The act of inlining the functions can lead to GCC forgetting about the
      attributes which leads to incorrect tests.
      
      Notable example being: __attribute__((__target__("no-vsx")))
      
      LTO can also interact strangely with custom assembly functions and cause
      tests to intermittently fail.
      
      Both these cases are hard to detect and require manual inspection of
      binaries which is unlikely to happen for all tests. Furthermore, LTO
      optimisations are not necessary for selftests and correctness is
      paramount and as such it is best to disable LTO.
      
      LTO can be enabled on a per test basis.
      
      A pseries_le_defconfig kernel on a POWER8 was used to determine that the
      same subset of selftests pass and fail with and without -flto in the
      common Makefile.
      Signed-off-by: default avatarSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Reviewed-by: default avatarCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      a4cf0a2e
    • Michael Ellerman's avatar
      selftests/powerpc: Fix out of bounds access in TM signal test · 501e279c
      Michael Ellerman authored
      Gcc helpfully points out that we're accessing past the end of the gprs
      array:
      
        tm-signal-msr-resv.c: In function 'signal_usr1':
        tm-signal-msr-resv.c:43:37: error: array subscript is above array bounds [-Werror=array-bounds]
          ucp->uc_mcontext.regs->gpr[PT_MSR] |= (7ULL);
      
      We haven't noticed previously because -flto was hiding it somehow.
      
      The code is confused, PT_MSR isn't a gpr, instead it's in
      uc_regs->gregs, so fix it.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      501e279c
  4. 01 Mar, 2016 5 commits
    • David Gibson's avatar
      powerpc/mm: Split hash page table sizing heuristic into a helper · 5c3c7ede
      David Gibson authored
      htab_get_table_size() either retrieve the size of the hash page table (HPT)
      from the device tree - if the HPT size is determined by firmware - or
      uses a heuristic to determine a good size based on RAM size if the kernel
      is responsible for allocating the HPT.
      
      To support a PAPR extension allowing resizing of the HPT, we're going to
      want the memory size -> HPT size logic elsewhere, so split it out into a
      helper function.
      Signed-off-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      5c3c7ede
    • David Gibson's avatar
      powerpc/mm: Clean up memory hotplug failure paths · 1dace6c6
      David Gibson authored
      This makes a number of cleanups to handling of mapping failures during
      memory hotplug on Power:
      
      For errors creating the linear mapping for the hot-added region:
        * This is now reported with EFAULT which is more appropriate than the
          previous EINVAL (the failure is unlikely to be related to the
          function's parameters)
        * An error in this path now prints a warning message, rather than just
          silently failing to add the extra memory.
        * Previously a failure here could result in the region being partially
          mapped.  We now clean up any partial mapping before failing.
      
      For errors creating the vmemmap for the hot-added region:
         * This is now reported with EFAULT instead of causing a BUG() - this
           could happen for external reason (e.g. full hash table) so it's better
           to handle this non-fatally
         * An error message is also printed, so the failure won't be silent
         * As above a failure could cause a partially mapped region, we now
           clean this up. [mpe: move htab_remove_mapping() out of #ifdef
           CONFIG_MEMORY_HOTPLUG to enable this]
      Signed-off-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: default avatarPaul Mackerras <paulus@samba.org>
      Reviewed-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      1dace6c6
    • David Gibson's avatar
      powerpc/mm: Handle removing maybe-present bolted HPTEs · 27828f98
      David Gibson authored
      At the moment the hpte_removebolted callback in ppc_md returns void and
      will BUG_ON() if the hpte it's asked to remove doesn't exist in the first
      place.  This is awkward for the case of cleaning up a mapping which was
      partially made before failing.
      
      So, we add a return value to hpte_removebolted, and have it return ENOENT
      in the case that the HPTE to remove didn't exist in the first place.
      
      In the (sole) caller, we propagate errors in hpte_removebolted to its
      caller to handle.  However, we handle ENOENT specially, continuing to
      complete the unmapping over the specified range before returning the error
      to the caller.
      
      This means that htab_remove_mapping() will work sanely on a partially
      present mapping, removing any HPTEs which are present, while also returning
      ENOENT to its caller in case it's important there.
      
      There are two callers of htab_remove_mapping():
         - In remove_section_mapping() we already WARN_ON() any error return,
           which is reasonable - in this case the mapping should be fully
           present
         - In vmemmap_remove_mapping() we BUG_ON() any error.  We change that to
           just a WARN_ON() in the case of ENOENT, since failing to remove a
           mapping that wasn't there in the first place probably shouldn't be
           fatal.
      Signed-off-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      27828f98
    • David Gibson's avatar
      powerpc/mm: Clean up error handling for htab_remove_mapping · abd0a0e7
      David Gibson authored
      Currently, the only error that htab_remove_mapping() can report is -EINVAL,
      if removal of bolted HPTEs isn't implemeted for this platform.  We make
      a few clean ups to the handling of this:
      
       * EINVAL isn't really the right code - there's nothing wrong with the
         function's arguments - use ENODEV instead
       * We were also printing a warning message, but that's a decision better
         left up to the callers, so remove it
       * One caller is vmemmap_remove_mapping(), which will just BUG_ON() on
         error, making the warning message redundant, so no change is needed
         there.
       * The other caller is remove_section_mapping().  This is called in the
         memory hot remove path at a point after vmemmap_remove_mapping() so
         if hpte_removebolted isn't implemented, we'd expect to have already
         BUG()ed anyway.  Put a WARN_ON() here, in lieu of a printk() since this
         really shouldn't be happening.
      Signed-off-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      abd0a0e7
    • Adam Buchbinder's avatar