1. 20 Dec, 2018 6 commits
    • Madhavan Srinivasan's avatar
      powerpc/perf: Add constraints for power9 l2/l3 bus events · 59029136
      Madhavan Srinivasan authored
      In previous generation processors, both bus events and direct
      events of performance monitoring unit can be individually
      programmabled and monitored in PMCs.
      
      But in Power9, L2/L3 bus events are always available as a
      "bank" of 4 events. To obtain the counts for any of the
      l2/l3 bus events in a given bank, the user will have to
      program PMC4 with corresponding l2/l3 bus event for that
      bank.
      
      Patch enforce two contraints incase of L2/L3 bus events.
      
      1)Any L2/L3 event when programmed is also expected to program corresponding
      PMC4 event from that group.
      2)PMC4 event should always been programmed first due to group constraint
      logic limitation
      
      For ex. consider these L3 bus events
      
      PM_L3_PF_ON_CHIP_MEM (0x460A0),
      PM_L3_PF_MISS_L3 (0x160A0),
      PM_L3_CO_MEM (0x260A0),
      PM_L3_PF_ON_CHIP_CACHE (0x360A0),
      
      1) This is an INVALID group for L3 Bus event monitoring,
      since it is missing PMC4 event.
      	perf stat -e "{r160A0,r260A0,r360A0}" < >
      
      And this is a VALID group for L3 Bus events:
      	perf stat -e "{r460A0,r160A0,r260A0,r360A0}" < >
      
      2) This is an INVALID group for L3 Bus event monitoring,
      since it is missing PMC4 event.
      	perf stat -e "{r260A0,r360A0}" < >
      
      And this is a VALID group for L3 Bus events:
      	perf stat -e "{r460A0,r260A0,r360A0}" < >
      
      3) This is an INVALID group for L3 Bus event monitoring,
      since it is missing PMC4 event.
      	perf stat -e "{r360A0}" < >
      
      And this is a VALID group for L3 Bus events:
      	perf stat -e "{r460A0,r360A0}" < >
      
      Patch here implements group constraint logic suggested by Michael Ellerman.
      Signed-off-by: default avatarMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      59029136
    • Madhavan Srinivasan's avatar
      powerpc/perf: Fix unit_sel/cache_sel checks · 2d46d487
      Madhavan Srinivasan authored
      Raw event code has couple of fields "unit" and "cache" in it, to capture
      the "unit" to monitor for a given pmcxsel and cache reload qualifier to
      program in MMCR1.
      
      isa207_get_constraint() refers "unit" field to update the MMCRC (L2/L3)
      Event bus control fields with "cache" bits of the raw event code.
      These are power8 specific and not supported by PowerISA v3.0 pmu. So wrap
      the checks to be power8 specific. Also, "cache" bit field is referred to
      update MMCR1[16:17] and this check can be power8 specific.
      
      Fixes: 7ffd948f ('powerpc/perf: factor out power8 pmu functions')
      Signed-off-by: default avatarMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      2d46d487
    • Madhavan Srinivasan's avatar
      powerpc/perf: Cleanup cache_sel bits comment · 8c31459d
      Madhavan Srinivasan authored
      Update the raw event code comment in power9-pmu.c with respect to
      "cache" bits, since power9 MMCRC does not support these.
      Signed-off-by: default avatarMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      8c31459d
    • Madhavan Srinivasan's avatar
      powerpc/perf: Update perf_regs structure to include SIER · 333804dc
      Madhavan Srinivasan authored
      On each sample, Sample Instruction Event Register (SIER) content
      is saved in pt_regs. SIER does not have a entry as-is in the pt_regs
      but instead, SIER content is saved in the "dar" register of pt_regs.
      
      Patch adds another entry to the perf_regs structure to include the "SIER"
      printing which internally maps to the "dar" of pt_regs.
      
      It also check for the SIER availability in the platform and present
      value accordingly
      Signed-off-by: default avatarMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      333804dc
    • Madhavan Srinivasan's avatar
      powerpc/perf: Fix thresholding counter data for unknown type · 17cfccc9
      Madhavan Srinivasan authored
      MMCRA[34:36] and MMCRA[38:44] expose the thresholding counter value.
      Thresholding counter can be used to count latency cycles such as
      load miss to reload. But threshold counter value is not relevant
      when the sampled instruction type is unknown or reserved. Patch to
      fix the thresholding counter value to zero when sampled instruction
      type is unknown or reserved.
      
      Fixes: 170a315f('powerpc/perf: Support to export MMCRA[TEC*] field to userspace')
      Signed-off-by: default avatarMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      17cfccc9
    • Aneesh Kumar K.V's avatar
      powerpc/mm/hash: Handle user access of kernel address gracefully · 374f3f59
      Aneesh Kumar K.V authored
      In commit 2865d08d ("powerpc/mm: Move the DSISR_PROTFAULT sanity
      check") we moved the protection fault access check before the vma
      lookup. That means we hit that WARN_ON when user space accesses a
      kernel address. Before that commit this was handled by find_vma() not
      finding vma for the kernel address and considering that access as bad
      area access.
      
      Avoid the confusing WARN_ON and convert that to a ratelimited printk.
      
      With the patch we now get:
      
      for load:
        a.out[5997]: User access of kernel address (c00000000000dea0) - exploit attempt? (uid: 1000)
        a.out[5997]: segfault (11) at c00000000000dea0 nip 1317c0798 lr 7fff80d6441c code 1 in a.out[1317c0000+10000]
        a.out[5997]: code: 60000000 60420000 3c4c0002 38427790 4bffff20 3c4c0002 38427784 fbe1fff8
        a.out[5997]: code: f821ffc1 7c3f0b78 60000000 e9228030 <89290000> 993f002f 60000000 383f0040
      
      for exec:
        a.out[6067]: User access of kernel address (c00000000000dea0) - exploit attempt? (uid: 1000)
        a.out[6067]: segfault (11) at c00000000000dea0 nip c00000000000dea0 lr 129d507b0 code 1
        a.out[6067]: Bad NIP, not dumping instructions.
      
      Fixes: 2865d08d ("powerpc/mm: Move the DSISR_PROTFAULT sanity check")
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Tested-by: default avatarBreno Leitao <leitao@debian.org>
      [mpe: Don't split printk() string across lines]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      374f3f59
  2. 19 Dec, 2018 19 commits
  3. 17 Dec, 2018 4 commits
  4. 10 Dec, 2018 1 commit
  5. 09 Dec, 2018 5 commits
    • Oliver O'Halloran's avatar
      powerpc/mm: Fallback to RAM if the altmap is unusable · 9ef34630
      Oliver O'Halloran authored
      The "altmap" is used to provide a pool of memory that is reserved for
      the vmemmap backing of hot-plugged memory. This is useful when adding
      large amount of ZONE_DEVICE memory to a system with a limited amount of
      normal memory.
      
      On ppc64 we use huge pages to map the vmemmap which requires the backing
      storage to be contigious and aligned to the hugepage size. The altmap
      implementation allows for the altmap provider to reserve a few PFNs at
      the start of the range for it's own uses and when this occurs the
      first chunk of the altmap is not usable for hugepage mappings. On hash
      there is no sane way to fall back to a normal sized page mapping so we
      fail the allocation. This results in memory hotplug failing with
      ENOMEM when the new range doesn't fall into an existing vmemmap block.
      
      This patch handles this case by falling back to using system memory
      rather than failing if we cannot allocate from the altmap. This
      fallback should only ever be used for the first vmemmap block so it
      should not cause excess memory consumption.
      
      Fixes: 7b73d978 ("mm: pass the vmem_altmap to vmemmap_populate")
      Signed-off-by: default avatarOliver O'Halloran <oohall@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      9ef34630
    • Oliver O'Halloran's avatar
      powerpc/papr_scm: Use ibm,unit-guid as the iset cookie · 43001c52
      Oliver O'Halloran authored
      The interleave set cookie is used to determine if a label stored in the
      metadata space should be applied to the current region. This is
      important in the case of NVDIMMs since the firmware may change the
      interleaving configuration of a DIMM which would invalidate the existing
      labels. In our case the hypervisor hides those details from us so we
      don't really care, but libnvdimm still requires the interleave set
      cookie to be non-zero.
      
      For our purposes we just need the set cookie to be unique and fixed for
      a given PAPR SCM region and using the unit-guid (really a UUID) is fine
      for this purpose.
      
      Fixes: b5beae5e ("powerpc/pseries: Add driver for PAPR SCM regions")
      Signed-off-by: default avatarOliver O'Halloran <oohall@gmail.com>
      [mpe: Use kernel types (u64)]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      43001c52
    • Oliver O'Halloran's avatar
      powerpc/papr_scm: Fix DIMM device registration race · b0d65a8c
      Oliver O'Halloran authored
      When a new nvdimm device is registered with libnvdimm via
      nvdimm_create() it is added as a device on the nvdimm bus. The probe
      function for the DIMM driver is potentially quite slow so actually
      registering and probing the device is done in an async domain rather
      than immediately after device creation. This can result in a race where
      the region device (created 2nd) is probed first and fails to activate at
      boot.
      
      To fix this we use the same approach as the ACPI/NFIT driver which is to
      check that all the DIMM devices registered successfully. LibNVDIMM
      provides the nvdimm_bus_count_dimms() function which synchronises with
      the async domain and verifies that the dimm was successfully registered
      with the bus.
      
      If either of these does not occur then we bail.
      
      Fixes: b5beae5e ("powerpc/pseries: Add driver for PAPR SCM regions")
      Signed-off-by: default avatarOliver O'Halloran <oohall@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      b0d65a8c
    • Oliver O'Halloran's avatar
      powerpc/papr_scm: Remove endian conversions · 409dd7dc
      Oliver O'Halloran authored
      The return values of a h-call are returned in the CPU registers and
      written to the provided buffer by the plpar_hcall() wrapper. As a result
      the values written to memory are always in the native endian and should
      not be byte swapped.
      
      The inital implementation of the H-Call interface was done in qemu and
      the returned values were byte swapped unnecessarily in both the
      hypervisor and in the driver so this was only noticed when bringing up
      the PowerVM implementation.
      
      Fixes: b5beae5e ("powerpc/pseries: Add driver for PAPR SCM regions")
      Signed-off-by: default avatarOliver O'Halloran <oohall@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      409dd7dc
    • Oliver O'Halloran's avatar
      powerpc/papr_scm: Update DT properties · 683ec0e0
      Oliver O'Halloran authored
      The ibm,unit-sizes property was originally specified as an array of two
      u32s corresponding to the memory block size, and the number of blocks
      available in that region. A fairly last-minute change to the SCM DT
      specification was splitting that into two seperate u64 properties:
      ibm,block-sizes and ibm,number-of-blocks that convey the same
      information. No firmware / hypervisor that emitted the ibm,unit-size
      property ever appeared in the wild.
      
      Fixes: b5beae5e ("powerpc/pseries: Add driver for PAPR SCM regions")
      Signed-off-by: default avatarOliver O'Halloran <oohall@gmail.com>
      [mpe: Use kernel types (u32/u64)]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      683ec0e0
  6. 07 Dec, 2018 3 commits
  7. 06 Dec, 2018 1 commit
    • Michael Ellerman's avatar
      powerpc/boot: Fix build failures with -j 1 · e41b93a6
      Michael Ellerman authored
      In commit 5e9dcb61 ("powerpc/boot: Expose Kconfig symbols to
      wrapper") we added a dependency to serial.c on autoconf.h:
      
        $(obj)/serial.c: $(obj)/autoconf.h
      
      This works when building in-tree (ie. with KBUILD_OUTPUT unset)
      because the obj tree is the src tree.
      
      But when building with eg. O=build and -j 1 the build fails:
      
        gcc ... -I../arch/powerpc/boot -c -o arch/powerpc/boot/serial.o arch/powerpc/boot/serial.c
        gcc: error: arch/powerpc/boot/serial.c: No such file or directory
      
      Why this is only happening with -j 1 is not clear, when building with
      -j greater than 1 somehow we decide to look for serial.c in the src
      tree (../), eg:
      
        gcc -I../arch/powerpc/boot -c -o arch/powerpc/boot/serial.o ../arch/powerpc/boot/serial.c
      
      Regardless we shouldn't be specifying a dependency on serial.c in the
      build tree, we want to add a dependency to the version in $(srctree)
      so fix the rule to say that.
      
      Fixes: 5e9dcb61 ("powerpc/boot: Expose Kconfig symbols to wrapper")
      Tested-by: default avatarDaniel Axtens <dja@axtens.net>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      e41b93a6
  8. 04 Dec, 2018 1 commit
    • Christophe Leroy's avatar
      powerpc/mm: dump block address translation on book3s/32 · 7c91efce
      Christophe Leroy authored
      This patch adds a debugfs file to dump block address translation:
      
      ~# cat /sys/kernel/debug/powerpc/block_address_translation
      ---[ Instruction Block Address Translations ]---
      0:         -
      1:         -
      2: 0xc0000000-0xcfffffff 0x00000000 Kernel EXEC coherent
      3: 0xd0000000-0xdfffffff 0x10000000 Kernel EXEC coherent
      4:         -
      5:         -
      6:         -
      7:         -
      
      ---[ Data Block Address Translations ]---
      0:         -
      1:         -
      2: 0xc0000000-0xcfffffff 0x00000000 Kernel RW coherent
      3: 0xd0000000-0xdfffffff 0x10000000 Kernel RW coherent
      4:         -
      5:         -
      6:         -
      7:         -
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      7c91efce