1. 20 Oct, 2018 13 commits
  2. 19 Oct, 2018 3 commits
    • Michael Ellerman's avatar
      powerpc/time: Fix clockevent_decrementer initalisation for PR KVM · b4d16ab5
      Michael Ellerman authored
      In the recent commit 8b78fdb0 ("powerpc/time: Use
      clockevents_register_device(), fixing an issue with large
      decrementer") we changed the way we initialise the decrementer
      clockevent(s).
      
      We no longer initialise the mult & shift values of
      decrementer_clockevent itself.
      
      This has the effect of breaking PR KVM, because it uses those values
      in kvmppc_emulate_dec(). The symptom is guest kernels spin forever
      mid-way through boot.
      
      For now fix it by assigning back to decrementer_clockevent the mult
      and shift values.
      
      Fixes: 8b78fdb0 ("powerpc/time: Use clockevents_register_device(), fixing an issue with large decrementer")
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      b4d16ab5
    • Michael Ellerman's avatar
      powerpc/aout: Fix struct user definition to use user_pt_regs · 6ce7bff0
      Michael Ellerman authored
      I'm pretty sure this is dead code, it's only used by the a.out core
      dump code, and we don't support a.out. We should remove it.
      
      But while it's in the tree it should be using the ABI version of
      pt_regs which is called user_pt_regs in the kernel, because the whole
      struct is written to the core dump and so its size shouldn't change.
      
      Note this isn't a uapi header so we don't need an ifdef.
      
      Fixes: 002af939 ("powerpc: Split user/kernel definitions of struct pt_regs")
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      6ce7bff0
    • Michael Ellerman's avatar
      powerpc/uapi: Fix sigcontext definition to use user_pt_regs · 22a3d03d
      Michael Ellerman authored
      My recent patch to split pt_regs between user and kernel missed
      the usage in struct sigcontext.
      
      Because this is a user visible struct it should be using the user
      visible definition, which when we're building for the kernel is called
      struct user_pt_regs.
      
      As far as I can see this hasn't actually caused a bug (yet), because
      we don't use the sizeof() the sigcontext->regs anywhere. But we should
      still fix it to avoid confusion and future bugs.
      
      Fixes: 002af939 ("powerpc: Split user/kernel definitions of struct pt_regs")
      Reported-by: default avatarMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      22a3d03d
  3. 18 Oct, 2018 19 commits
  4. 14 Oct, 2018 5 commits
    • Aneesh Kumar K.V's avatar
      powerpc/mm: Increase the max addressable memory to 2PB · 4ffe713b
      Aneesh Kumar K.V authored
      Currently we limit the max addressable memory to 128TB. This patch increase the
      limit to 2PB. We can have devices like nvdimm which adds memory above 512TB
      limit.
      
      We still don't support regular system ram above 512TB. One of the challenge with
      that is the percpu allocator, that allocates per node memory and use the max
      distance between them as the percpu offsets. This means with large gap in
      address space ( system ram above 1PB) we will run out of vmalloc space to map
      the percpu allocation.
      
      In order to support addressable memory above 512TB, kernel should be able to
      linear map this range. To do that with hash translation we now add 4 context
      to kernel linear map region. Our per context addressable range is 512TB. We
      still keep VMALLOC and VMEMMAP region to old size. SLB miss handlers is updated
      to validate these limit.
      
      We also limit this update to SPARSEMEM_VMEMMAP and SPARSEMEM_EXTREME
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      4ffe713b
    • Aneesh Kumar K.V's avatar
      powerpc/mm/hash: Rename get_ea_context to get_user_context · c9f80734
      Aneesh Kumar K.V authored
      We will be adding get_kernel_context later. Update function name to indicate
      this handle context allocation user space address.
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      c9f80734
    • Nicholas Piggin's avatar
      powerpc/64s/hash: Add some SLB debugging tests · e15a4fea
      Nicholas Piggin authored
      This adds CONFIG_DEBUG_VM checks to ensure:
        - The kernel stack is in the SLB after it's flushed and bolted.
        - We don't insert an SLB for an address that is aleady in the SLB.
        - The kernel SLB miss handler does not take an SLB miss.
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      e15a4fea
    • Nicholas Piggin's avatar
      powerpc/64s/hash: Simplify slb_flush_and_rebolt() · 94ee4272
      Nicholas Piggin authored
      slb_flush_and_rebolt() is misleading, it is called in virtual mode, so
      it can not possibly change the stack, so it should not be touching the
      shadow area. And since vmalloc is no longer bolted, it should not
      change any bolted mappings at all.
      
      Change the name to slb_flush_and_restore_bolted(), and have it just
      load the kernel stack from what's currently in the shadow SLB area.
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      94ee4272
    • Nicholas Piggin's avatar
      powerpc/64s/hash: Add a SLB preload cache · 5434ae74
      Nicholas Piggin authored
      When switching processes, currently all user SLBEs are cleared, and a
      few (exec_base, pc, and stack) are preloaded. In trivial testing with
      small apps, this tends to miss the heap and low 256MB segments, and it
      will also miss commonly accessed segments on large memory workloads.
      
      Add a simple round-robin preload cache that just inserts the last SLB
      miss into the head of the cache and preloads those at context switch
      time. Every 256 context switches, the oldest entry is removed from the
      cache to shrink the cache and require fewer slbmte if they are unused.
      
      Much more could go into this, including into the SLB entry reclaim
      side to track some LRU information etc, which would require a study of
      large memory workloads. But this is a simple thing we can do now that
      is an obvious win for common workloads.
      
      With the full series, process switching speed on the context_switch
      benchmark on POWER9/hash (with kernel speculation security masures
      disabled) increases from 140K/s to 178K/s (27%).
      
      POWER8 does not change much (within 1%), it's unclear why it does not
      see a big gain like POWER9.
      
      Booting to busybox init with 256MB segments has SLB misses go down
      from 945 to 69, and with 1T segments 900 to 21. These could almost all
      be eliminated by preloading a bit more carefully with ELF binary
      loading.
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      5434ae74