1. 15 Oct, 2015 16 commits
  2. 12 Oct, 2015 2 commits
  3. 08 Oct, 2015 2 commits
  4. 06 Oct, 2015 2 commits
  5. 05 Oct, 2015 6 commits
  6. 02 Oct, 2015 3 commits
  7. 01 Oct, 2015 8 commits
    • Michael Ellerman's avatar
      powerpc: Add ppc64le_defconfig · 2adc48a6
      Michael Ellerman authored
      Based directly on ppc64_defconfig using merge_config.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      2adc48a6
    • Michael Ellerman's avatar
      scripts/kconfig/Makefile: Allow KBUILD_DEFCONFIG to be a target · d2036f30
      Michael Ellerman authored
      Arch Makefiles can set KBUILD_DEFCONFIG to tell kbuild the name of the
      defconfig that should be built by default.
      
      However currently there is an assumption that KBUILD_DEFCONFIG points to
      a file at arch/$(SRCARCH)/configs/$(KBUILD_DEFCONFIG).
      
      We would like to use a target, using merge_config, as our defconfig, so
      adapt the logic in scripts/kconfig/Makefile to allow that.
      
      To minimise the chance of breaking anything, we first check if
      KBUILD_DEFCONFIG is a file, and if so we do the old logic. If it's not a
      file, then we call the top-level Makefile with KBUILD_DEFCONFIG as the
      target.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Acked-by: default avatarMichal Marek <mmarek@suse.com>
      d2036f30
    • Aneesh Kumar K.V's avatar
      powerpc/mm: Add virt_to_pfn and use this instead of opencoding · 65d3223a
      Aneesh Kumar K.V authored
      This add helper virt_to_pfn and remove the opencoded usage of the
      same.
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      65d3223a
    • Michael Neuling's avatar
      powerpc/vdso: Avoid link stack corruption in __get_datapage() · c974809a
      Michael Neuling authored
      powerpc has a link register (lr) used for calling functions. We "bl
      <func>" to call a function, and "blr" to return back to the call site.
      
      The lr is only a single register, so if we call another function from
      inside this function (ie. nested calls), software must save away the
      lr on the software stack before calling the new function. Before
      returning (ie. before the "blr"), the lr is restored by software from
      the software stack.
      
      This makes branch prediction quite difficult for the processor as it
      will only know the branch target just before the "blr".
      
      To help with this, modern powerpc processors keep a (non-architected)
      hardware stack of lr called a "link stack". When a "bl <func>" is
      run, the lr is pushed onto this stack. When a "blr" is called, the
      branch predictor pops the lr value from the top of the link stack, and
      uses it to predict the branch target. Hence the processor pipeline
      knows a lot earlier the branch target.
      
      This works great but there are some cases where you call "bl" but
      without a matching "blr". Once such case is when trying to determine
      the program counter (which can't be read directly). Here you "bl+4;
      mflr" to get the program counter. If you do this, the link stack will
      get out of sync with reality, causing the branch predictor to
      mis-predict subsequent function returns.
      
      To avoid this, modern micro-architectures have a special case of bl.
      Using the form "bcl 20,31,+4", ensures the processor doesn't push to
      the link stack.
      
      The 32 and 64 bit variants of __get_datapage() use a "bl; mflr" to
      determine the loaded address of the VDSO. The current versions of
      these attempt to use this special bl variant.
      
      Unfortunately they use +8 rather than the required +4. Hence the
      current code results in the link stack getting out of sync with
      reality and hence the resulting performance degradation.
      
      This patch moves it to bcl+4 by moving __kernel_datapage_offset out of
      __get_datapage().
      
      With this patch, running a gettimeofday() (which uses
      __get_datapage()) microbenchmark we get a decent bump in performance
      on POWER7/8.
      
      For the benchmark in tools/testing/selftests/powerpc/benchmarks/gettimeofday.c
        POWER8:
          64bit gets ~4% improvement
          32bit gets ~9% improvement
        POWER7:
          64bit gets ~7% improvement
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      Reported-by: default avatarAaron Sawdey <sawdey@us.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      c974809a
    • Michael Neuling's avatar
      powerpc/selftest: Add gettimeofday() benchmark · d17475d9
      Michael Neuling authored
      This adds a benchmark directory to the powerpc selftests and adds a
      gettimeofday() benchmark to it.
      Suggested-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      d17475d9
    • Michael Ellerman's avatar
      powerpc/slb: Use a local to avoid multiple calls to get_slb_shadow() · 26cd835e
      Michael Ellerman authored
      For no reason other than it looks ugly.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      26cd835e
    • Anshuman Khandual's avatar
      powerpc/slb: Define an enum for the bolted indexes · 1d15010c
      Anshuman Khandual authored
      This patch defines macros for the three bolted SLB indexes we use.
      Switch the functions that take the indexes as an argument to use the
      enum.
      Signed-off-by: default avatarAnshuman Khandual <khandual@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      1d15010c
    • Michael Ellerman's avatar
      powerpc/vdso: Emit GNU & SysV hashes · 787b393c
      Michael Ellerman authored
      Andy Lutomirski says:
      
        Some dynamic loaders may be slightly faster if a GNU hash is
        available.
      
        This is unlikely to have any measurable effect on the time it takes
        to resolve vdso symbols (since there are so few of them).  In some
        contexts, it can be a win for a different reason: if every DSO has a
        GNU hash section, then libc can avoid calculating SysV hashes at
        all. Both musl and glibc appear to have this optimization.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      787b393c
  8. 29 Sep, 2015 1 commit