1. 20 Dec, 2018 13 commits
    • Christoph Hellwig's avatar
      powerpc/dma: properly wire up the unmap_page and unmap_sg methods · 0aeba2d0
      Christoph Hellwig authored
      The unmap methods need to transfer memory ownership back from the
      device to the cpu by identical means as dma_sync_*_to_cpu. I'm not
      sure powerpc needs to do any work in this transfer direction, but
      given that it does invalidate the caches in dma_sync_*_to_cpu already
      we should make sure we also do so on unmapping.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      [mpe: s/dir/direction in dma_nommu_unmap_page()]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      0aeba2d0
    • Christoph Hellwig's avatar
      powerpc: allow NOT_COHERENT_CACHE for amigaone · 92863569
      Christoph Hellwig authored
      AMIGAONE selects NOT_COHERENT_CACHE, so we better allow it.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      92863569
    • Christophe Leroy's avatar
      powerpc/prom: fix early DEBUG messages · b18f0ae9
      Christophe Leroy authored
      This patch fixes early DEBUG messages in prom.c:
      - Use %px instead of %p to see the addresses
      - Cast memblock_phys_mem_size() with (unsigned long long) to
      avoid build failure when phys_addr_t is not 64 bits.
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      b18f0ae9
    • Greg Kurz's avatar
      ocxl: Fix endiannes bug in ocxl_link_update_pe() · e1e71e20
      Greg Kurz authored
      All fields in the PE are big-endian. Use cpu_to_be32() like everywhere
      else something is written to the PE. Otherwise a wrong TID will be used
      by the NPU. If this TID happens to point to an existing thread sharing
      the same mm, it could be woken up by error. This is highly improbable
      though. The likely outcome of this is the NPU not finding the target
      thread and forcing the AFU into sending an interrupt, which userspace
      is supposed to handle anyway.
      
      Fixes: e948e06f ("ocxl: Expose the thread_id needed for wait on POWER9")
      Cc: stable@vger.kernel.org      # v4.18
      Signed-off-by: default avatarGreg Kurz <groug@kaod.org>
      Acked-by: default avatarAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      e1e71e20
    • Joel Stanley's avatar
      powerpc/32: Avoid unsupported flags with clang · 72e7bcc2
      Joel Stanley authored
      When building for ppc32 with clang these flags are unsupported:
      
        -ffixed-r2 and -mmultiple
      
      llvm's lib/Target/PowerPC/PPCRegisterInfo.cpp marks r2 as reserved on
      when building for SVR4ABI and !ppc64:
      
        // The SVR4 ABI reserves r2 and r13
        if (Subtarget.isSVR4ABI()) {
          // We only reserve r2 if we need to use the TOC pointer. If we have no
          // explicit uses of the TOC pointer (meaning we're a leaf function with
          // no constant-pool loads, etc.) and we have no potential uses inside an
          // inline asm block, then we can treat r2 has an ordinary callee-saved
          // register.
          const PPCFunctionInfo *FuncInfo = MF.getInfo<PPCFunctionInfo>();
          if (!TM.isPPC64() || FuncInfo->usesTOCBasePtr() || MF.hasInlineAsm())
            markSuperRegs(Reserved, PPC::R2);  // System-reserved register
          markSuperRegs(Reserved, PPC::R13); // Small Data Area pointer register
        }
      
      This means we can safely omit -ffixed-r2 when building for 32-bit
      targets.
      
      The -mmultiple/-mno-multiple flags are not supported by clang, so
      platforms that might support multiple miss out on using multiple word
      instructions.
      
      We wrap these flags in cc-option so that when Clang gains support the
      kernel will be able use these flags.
      
      Clang 8 can then build a ppc44x_defconfig which boots in Qemu:
      
        make CC=clang-8 ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnu-  ppc44x_defconfig
        ./scripts/config -e CONFIG_DEVTMPFS -d DEVTMPFS_MOUNT
        make CC=clang-8 ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnu-
      
        qemu-system-ppc -M bamboo \
         -kernel arch/powerpc/boot/zImage \
         -dtb arch/powerpc/boot/dts/bamboo.dtb \
         -initrd ~/ppc32-440-rootfs.cpio \
         -nographic -serial stdio -monitor pty -append "console=ttyS0"
      
      Link: https://github.com/ClangBuiltLinux/linux/issues/261
      Link: https://bugs.llvm.org/show_bug.cgi?id=39556
      Link: https://bugs.llvm.org/show_bug.cgi?id=39555Signed-off-by: default avatarJoel Stanley <joel@jms.id.au>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      72e7bcc2
    • Joel Stanley's avatar
      raid6/ppc: Fix build for clang · e213574a
      Joel Stanley authored
      We cannot build these files with clang as it does not allow altivec
      instructions in assembly when -msoft-float is passed.
      
      Jinsong Ji <jji@us.ibm.com> wrote:
      > We currently disable Altivec/VSX support when enabling soft-float.  So
      > any usage of vector builtins will break.
      >
      > Enable Altivec/VSX with soft-float may need quite some clean up work, so
      > I guess this is currently a limitation.
      >
      > Removing -msoft-float will make it work (and we are lucky that no
      > floating point instructions will be generated as well).
      
      This is a workaround until the issue is resolved in clang.
      
      Link: https://bugs.llvm.org/show_bug.cgi?id=31177
      Link: https://github.com/ClangBuiltLinux/linux/issues/239Signed-off-by: default avatarJoel Stanley <joel@jms.id.au>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      e213574a
    • Madhavan Srinivasan's avatar
      powerpc/perf: Remove l2 bus events from HW cache event array · 3757cba8
      Madhavan Srinivasan authored
      Remove PM_L2_ST_MISS and PM_L2_ST from HW cache event array since
      these are bus events. And these needs to be programmed in groups.
      Hence remove them.
      
      Fixes: f1fb60bf ('powerpc/perf: Export Power9 generic and cache events to sysfs')
      Signed-off-by: default avatarMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      3757cba8
    • Madhavan Srinivasan's avatar
      powerpc/perf: Add constraints for power9 l2/l3 bus events · 59029136
      Madhavan Srinivasan authored
      In previous generation processors, both bus events and direct
      events of performance monitoring unit can be individually
      programmabled and monitored in PMCs.
      
      But in Power9, L2/L3 bus events are always available as a
      "bank" of 4 events. To obtain the counts for any of the
      l2/l3 bus events in a given bank, the user will have to
      program PMC4 with corresponding l2/l3 bus event for that
      bank.
      
      Patch enforce two contraints incase of L2/L3 bus events.
      
      1)Any L2/L3 event when programmed is also expected to program corresponding
      PMC4 event from that group.
      2)PMC4 event should always been programmed first due to group constraint
      logic limitation
      
      For ex. consider these L3 bus events
      
      PM_L3_PF_ON_CHIP_MEM (0x460A0),
      PM_L3_PF_MISS_L3 (0x160A0),
      PM_L3_CO_MEM (0x260A0),
      PM_L3_PF_ON_CHIP_CACHE (0x360A0),
      
      1) This is an INVALID group for L3 Bus event monitoring,
      since it is missing PMC4 event.
      	perf stat -e "{r160A0,r260A0,r360A0}" < >
      
      And this is a VALID group for L3 Bus events:
      	perf stat -e "{r460A0,r160A0,r260A0,r360A0}" < >
      
      2) This is an INVALID group for L3 Bus event monitoring,
      since it is missing PMC4 event.
      	perf stat -e "{r260A0,r360A0}" < >
      
      And this is a VALID group for L3 Bus events:
      	perf stat -e "{r460A0,r260A0,r360A0}" < >
      
      3) This is an INVALID group for L3 Bus event monitoring,
      since it is missing PMC4 event.
      	perf stat -e "{r360A0}" < >
      
      And this is a VALID group for L3 Bus events:
      	perf stat -e "{r460A0,r360A0}" < >
      
      Patch here implements group constraint logic suggested by Michael Ellerman.
      Signed-off-by: default avatarMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      59029136
    • Madhavan Srinivasan's avatar
      powerpc/perf: Fix unit_sel/cache_sel checks · 2d46d487
      Madhavan Srinivasan authored
      Raw event code has couple of fields "unit" and "cache" in it, to capture
      the "unit" to monitor for a given pmcxsel and cache reload qualifier to
      program in MMCR1.
      
      isa207_get_constraint() refers "unit" field to update the MMCRC (L2/L3)
      Event bus control fields with "cache" bits of the raw event code.
      These are power8 specific and not supported by PowerISA v3.0 pmu. So wrap
      the checks to be power8 specific. Also, "cache" bit field is referred to
      update MMCR1[16:17] and this check can be power8 specific.
      
      Fixes: 7ffd948f ('powerpc/perf: factor out power8 pmu functions')
      Signed-off-by: default avatarMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      2d46d487
    • Madhavan Srinivasan's avatar
      powerpc/perf: Cleanup cache_sel bits comment · 8c31459d
      Madhavan Srinivasan authored
      Update the raw event code comment in power9-pmu.c with respect to
      "cache" bits, since power9 MMCRC does not support these.
      Signed-off-by: default avatarMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      8c31459d
    • Madhavan Srinivasan's avatar
      powerpc/perf: Update perf_regs structure to include SIER · 333804dc
      Madhavan Srinivasan authored
      On each sample, Sample Instruction Event Register (SIER) content
      is saved in pt_regs. SIER does not have a entry as-is in the pt_regs
      but instead, SIER content is saved in the "dar" register of pt_regs.
      
      Patch adds another entry to the perf_regs structure to include the "SIER"
      printing which internally maps to the "dar" of pt_regs.
      
      It also check for the SIER availability in the platform and present
      value accordingly
      Signed-off-by: default avatarMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      333804dc
    • Madhavan Srinivasan's avatar
      powerpc/perf: Fix thresholding counter data for unknown type · 17cfccc9
      Madhavan Srinivasan authored
      MMCRA[34:36] and MMCRA[38:44] expose the thresholding counter value.
      Thresholding counter can be used to count latency cycles such as
      load miss to reload. But threshold counter value is not relevant
      when the sampled instruction type is unknown or reserved. Patch to
      fix the thresholding counter value to zero when sampled instruction
      type is unknown or reserved.
      
      Fixes: 170a315f('powerpc/perf: Support to export MMCRA[TEC*] field to userspace')
      Signed-off-by: default avatarMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      17cfccc9
    • Aneesh Kumar K.V's avatar
      powerpc/mm/hash: Handle user access of kernel address gracefully · 374f3f59
      Aneesh Kumar K.V authored
      In commit 2865d08d ("powerpc/mm: Move the DSISR_PROTFAULT sanity
      check") we moved the protection fault access check before the vma
      lookup. That means we hit that WARN_ON when user space accesses a
      kernel address. Before that commit this was handled by find_vma() not
      finding vma for the kernel address and considering that access as bad
      area access.
      
      Avoid the confusing WARN_ON and convert that to a ratelimited printk.
      
      With the patch we now get:
      
      for load:
        a.out[5997]: User access of kernel address (c00000000000dea0) - exploit attempt? (uid: 1000)
        a.out[5997]: segfault (11) at c00000000000dea0 nip 1317c0798 lr 7fff80d6441c code 1 in a.out[1317c0000+10000]
        a.out[5997]: code: 60000000 60420000 3c4c0002 38427790 4bffff20 3c4c0002 38427784 fbe1fff8
        a.out[5997]: code: f821ffc1 7c3f0b78 60000000 e9228030 <89290000> 993f002f 60000000 383f0040
      
      for exec:
        a.out[6067]: User access of kernel address (c00000000000dea0) - exploit attempt? (uid: 1000)
        a.out[6067]: segfault (11) at c00000000000dea0 nip c00000000000dea0 lr 129d507b0 code 1
        a.out[6067]: Bad NIP, not dumping instructions.
      
      Fixes: 2865d08d ("powerpc/mm: Move the DSISR_PROTFAULT sanity check")
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Tested-by: default avatarBreno Leitao <leitao@debian.org>
      [mpe: Don't split printk() string across lines]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      374f3f59
  2. 19 Dec, 2018 19 commits
  3. 17 Dec, 2018 4 commits
  4. 10 Dec, 2018 1 commit
  5. 09 Dec, 2018 3 commits
    • Oliver O'Halloran's avatar
      powerpc/mm: Fallback to RAM if the altmap is unusable · 9ef34630
      Oliver O'Halloran authored
      The "altmap" is used to provide a pool of memory that is reserved for
      the vmemmap backing of hot-plugged memory. This is useful when adding
      large amount of ZONE_DEVICE memory to a system with a limited amount of
      normal memory.
      
      On ppc64 we use huge pages to map the vmemmap which requires the backing
      storage to be contigious and aligned to the hugepage size. The altmap
      implementation allows for the altmap provider to reserve a few PFNs at
      the start of the range for it's own uses and when this occurs the
      first chunk of the altmap is not usable for hugepage mappings. On hash
      there is no sane way to fall back to a normal sized page mapping so we
      fail the allocation. This results in memory hotplug failing with
      ENOMEM when the new range doesn't fall into an existing vmemmap block.
      
      This patch handles this case by falling back to using system memory
      rather than failing if we cannot allocate from the altmap. This
      fallback should only ever be used for the first vmemmap block so it
      should not cause excess memory consumption.
      
      Fixes: 7b73d978 ("mm: pass the vmem_altmap to vmemmap_populate")
      Signed-off-by: default avatarOliver O'Halloran <oohall@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      9ef34630
    • Oliver O'Halloran's avatar
      powerpc/papr_scm: Use ibm,unit-guid as the iset cookie · 43001c52
      Oliver O'Halloran authored
      The interleave set cookie is used to determine if a label stored in the
      metadata space should be applied to the current region. This is
      important in the case of NVDIMMs since the firmware may change the
      interleaving configuration of a DIMM which would invalidate the existing
      labels. In our case the hypervisor hides those details from us so we
      don't really care, but libnvdimm still requires the interleave set
      cookie to be non-zero.
      
      For our purposes we just need the set cookie to be unique and fixed for
      a given PAPR SCM region and using the unit-guid (really a UUID) is fine
      for this purpose.
      
      Fixes: b5beae5e ("powerpc/pseries: Add driver for PAPR SCM regions")
      Signed-off-by: default avatarOliver O'Halloran <oohall@gmail.com>
      [mpe: Use kernel types (u64)]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      43001c52
    • Oliver O'Halloran's avatar
      powerpc/papr_scm: Fix DIMM device registration race · b0d65a8c
      Oliver O'Halloran authored
      When a new nvdimm device is registered with libnvdimm via
      nvdimm_create() it is added as a device on the nvdimm bus. The probe
      function for the DIMM driver is potentially quite slow so actually
      registering and probing the device is done in an async domain rather
      than immediately after device creation. This can result in a race where
      the region device (created 2nd) is probed first and fails to activate at
      boot.
      
      To fix this we use the same approach as the ACPI/NFIT driver which is to
      check that all the DIMM devices registered successfully. LibNVDIMM
      provides the nvdimm_bus_count_dimms() function which synchronises with
      the async domain and verifies that the dimm was successfully registered
      with the bus.
      
      If either of these does not occur then we bail.
      
      Fixes: b5beae5e ("powerpc/pseries: Add driver for PAPR SCM regions")
      Signed-off-by: default avatarOliver O'Halloran <oohall@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      b0d65a8c