1. 19 Dec, 2018 10 commits
  2. 17 Dec, 2018 4 commits
  3. 10 Dec, 2018 1 commit
  4. 09 Dec, 2018 5 commits
    • Oliver O'Halloran's avatar
      powerpc/mm: Fallback to RAM if the altmap is unusable · 9ef34630
      Oliver O'Halloran authored
      The "altmap" is used to provide a pool of memory that is reserved for
      the vmemmap backing of hot-plugged memory. This is useful when adding
      large amount of ZONE_DEVICE memory to a system with a limited amount of
      normal memory.
      
      On ppc64 we use huge pages to map the vmemmap which requires the backing
      storage to be contigious and aligned to the hugepage size. The altmap
      implementation allows for the altmap provider to reserve a few PFNs at
      the start of the range for it's own uses and when this occurs the
      first chunk of the altmap is not usable for hugepage mappings. On hash
      there is no sane way to fall back to a normal sized page mapping so we
      fail the allocation. This results in memory hotplug failing with
      ENOMEM when the new range doesn't fall into an existing vmemmap block.
      
      This patch handles this case by falling back to using system memory
      rather than failing if we cannot allocate from the altmap. This
      fallback should only ever be used for the first vmemmap block so it
      should not cause excess memory consumption.
      
      Fixes: 7b73d978 ("mm: pass the vmem_altmap to vmemmap_populate")
      Signed-off-by: default avatarOliver O'Halloran <oohall@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      9ef34630
    • Oliver O'Halloran's avatar
      powerpc/papr_scm: Use ibm,unit-guid as the iset cookie · 43001c52
      Oliver O'Halloran authored
      The interleave set cookie is used to determine if a label stored in the
      metadata space should be applied to the current region. This is
      important in the case of NVDIMMs since the firmware may change the
      interleaving configuration of a DIMM which would invalidate the existing
      labels. In our case the hypervisor hides those details from us so we
      don't really care, but libnvdimm still requires the interleave set
      cookie to be non-zero.
      
      For our purposes we just need the set cookie to be unique and fixed for
      a given PAPR SCM region and using the unit-guid (really a UUID) is fine
      for this purpose.
      
      Fixes: b5beae5e ("powerpc/pseries: Add driver for PAPR SCM regions")
      Signed-off-by: default avatarOliver O'Halloran <oohall@gmail.com>
      [mpe: Use kernel types (u64)]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      43001c52
    • Oliver O'Halloran's avatar
      powerpc/papr_scm: Fix DIMM device registration race · b0d65a8c
      Oliver O'Halloran authored
      When a new nvdimm device is registered with libnvdimm via
      nvdimm_create() it is added as a device on the nvdimm bus. The probe
      function for the DIMM driver is potentially quite slow so actually
      registering and probing the device is done in an async domain rather
      than immediately after device creation. This can result in a race where
      the region device (created 2nd) is probed first and fails to activate at
      boot.
      
      To fix this we use the same approach as the ACPI/NFIT driver which is to
      check that all the DIMM devices registered successfully. LibNVDIMM
      provides the nvdimm_bus_count_dimms() function which synchronises with
      the async domain and verifies that the dimm was successfully registered
      with the bus.
      
      If either of these does not occur then we bail.
      
      Fixes: b5beae5e ("powerpc/pseries: Add driver for PAPR SCM regions")
      Signed-off-by: default avatarOliver O'Halloran <oohall@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      b0d65a8c
    • Oliver O'Halloran's avatar
      powerpc/papr_scm: Remove endian conversions · 409dd7dc
      Oliver O'Halloran authored
      The return values of a h-call are returned in the CPU registers and
      written to the provided buffer by the plpar_hcall() wrapper. As a result
      the values written to memory are always in the native endian and should
      not be byte swapped.
      
      The inital implementation of the H-Call interface was done in qemu and
      the returned values were byte swapped unnecessarily in both the
      hypervisor and in the driver so this was only noticed when bringing up
      the PowerVM implementation.
      
      Fixes: b5beae5e ("powerpc/pseries: Add driver for PAPR SCM regions")
      Signed-off-by: default avatarOliver O'Halloran <oohall@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      409dd7dc
    • Oliver O'Halloran's avatar
      powerpc/papr_scm: Update DT properties · 683ec0e0
      Oliver O'Halloran authored
      The ibm,unit-sizes property was originally specified as an array of two
      u32s corresponding to the memory block size, and the number of blocks
      available in that region. A fairly last-minute change to the SCM DT
      specification was splitting that into two seperate u64 properties:
      ibm,block-sizes and ibm,number-of-blocks that convey the same
      information. No firmware / hypervisor that emitted the ibm,unit-size
      property ever appeared in the wild.
      
      Fixes: b5beae5e ("powerpc/pseries: Add driver for PAPR SCM regions")
      Signed-off-by: default avatarOliver O'Halloran <oohall@gmail.com>
      [mpe: Use kernel types (u32/u64)]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      683ec0e0
  5. 07 Dec, 2018 3 commits
  6. 06 Dec, 2018 1 commit
    • Michael Ellerman's avatar
      powerpc/boot: Fix build failures with -j 1 · e41b93a6
      Michael Ellerman authored
      In commit 5e9dcb61 ("powerpc/boot: Expose Kconfig symbols to
      wrapper") we added a dependency to serial.c on autoconf.h:
      
        $(obj)/serial.c: $(obj)/autoconf.h
      
      This works when building in-tree (ie. with KBUILD_OUTPUT unset)
      because the obj tree is the src tree.
      
      But when building with eg. O=build and -j 1 the build fails:
      
        gcc ... -I../arch/powerpc/boot -c -o arch/powerpc/boot/serial.o arch/powerpc/boot/serial.c
        gcc: error: arch/powerpc/boot/serial.c: No such file or directory
      
      Why this is only happening with -j 1 is not clear, when building with
      -j greater than 1 somehow we decide to look for serial.c in the src
      tree (../), eg:
      
        gcc -I../arch/powerpc/boot -c -o arch/powerpc/boot/serial.o ../arch/powerpc/boot/serial.c
      
      Regardless we shouldn't be specifying a dependency on serial.c in the
      build tree, we want to add a dependency to the version in $(srctree)
      so fix the rule to say that.
      
      Fixes: 5e9dcb61 ("powerpc/boot: Expose Kconfig symbols to wrapper")
      Tested-by: default avatarDaniel Axtens <dja@axtens.net>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      e41b93a6
  7. 04 Dec, 2018 16 commits
    • Christophe Leroy's avatar
      powerpc/mm: dump block address translation on book3s/32 · 7c91efce
      Christophe Leroy authored
      This patch adds a debugfs file to dump block address translation:
      
      ~# cat /sys/kernel/debug/powerpc/block_address_translation
      ---[ Instruction Block Address Translations ]---
      0:         -
      1:         -
      2: 0xc0000000-0xcfffffff 0x00000000 Kernel EXEC coherent
      3: 0xd0000000-0xdfffffff 0x10000000 Kernel EXEC coherent
      4:         -
      5:         -
      6:         -
      7:         -
      
      ---[ Data Block Address Translations ]---
      0:         -
      1:         -
      2: 0xc0000000-0xcfffffff 0x00000000 Kernel RW coherent
      3: 0xd0000000-0xdfffffff 0x10000000 Kernel RW coherent
      4:         -
      5:         -
      6:         -
      7:         -
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      7c91efce
    • Christophe Leroy's avatar
      powerpc/mm: dump segment registers on book3s/32 · 0261a508
      Christophe Leroy authored
      This patch creates a debugfs file to see content of
      segment registers
      
        # cat /sys/kernel/debug/segment_registers
        ---[ User Segments ]---
        0x00000000-0x0fffffff Kern key 1 User key 1 VSID 0xade2b0
        0x10000000-0x1fffffff Kern key 1 User key 1 VSID 0xade3c1
        0x20000000-0x2fffffff Kern key 1 User key 1 VSID 0xade4d2
        0x30000000-0x3fffffff Kern key 1 User key 1 VSID 0xade5e3
        0x40000000-0x4fffffff Kern key 1 User key 1 VSID 0xade6f4
        0x50000000-0x5fffffff Kern key 1 User key 1 VSID 0xade805
        0x60000000-0x6fffffff Kern key 1 User key 1 VSID 0xade916
        0x70000000-0x7fffffff Kern key 1 User key 1 VSID 0xadea27
        0x80000000-0x8fffffff Kern key 1 User key 1 VSID 0xadeb38
        0x90000000-0x9fffffff Kern key 1 User key 1 VSID 0xadec49
        0xa0000000-0xafffffff Kern key 1 User key 1 VSID 0xaded5a
        0xb0000000-0xbfffffff Kern key 1 User key 1 VSID 0xadee6b
      
        ---[ Kernel Segments ]---
        0xc0000000-0xcfffffff Kern key 0 User key 1 VSID 0x000ccc
        0xd0000000-0xdfffffff Kern key 0 User key 1 VSID 0x000ddd
        0xe0000000-0xefffffff Kern key 0 User key 1 VSID 0x000eee
        0xf0000000-0xffffffff Kern key 0 User key 1 VSID 0x000fff
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      [mpe: Move it under /sys/kernel/debug/powerpc, make sr_init() __init]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      0261a508
    • Joel Stanley's avatar
      powerpc/math-emu: Update macros from GCC · b682c869
      Joel Stanley authored
      The add_ssaaaa, sub_ddmmss, umul_ppmm and udiv_qrnnd macros originate
      from GCC's longlong.h which in turn was copied from GMP's longlong.h a
      few decades ago.
      
      This was found when compiling with clang:
      
         arch/powerpc/math-emu/fnmsub.c:46:2: error: invalid use of a cast in a
         inline asm context requiring an l-value: remove the cast or build with
         -fheinous-gnu-extensions
                 FP_ADD_D(R, T, B);
                 ^~~~~~~~~~~~~~~~~
         ...
      
         ./arch/powerpc/include/asm/sfp-machine.h:283:27: note: expanded from
         macro 'sub_ddmmss'
                        : "=r" ((USItype)(sh)),                                  \
                                ~~~~~~~~~~^~~
      
      Segher points out: this was fixed in GCC over 16 years ago
      ( https://gcc.gnu.org/r56600 ), and in GMP (where it comes from)
      presumably before that.
      
      Update the add_ssaaaa, sub_ddmmss, umul_ppmm and udiv_qrnnd macros to
      the latest GCC version in order to git rid of the invalid casts. These
      were taken as-is from GCC's longlong in order to make future syncs
      obvious. Other parts of sfp-machine.h were left as-is as the file
      contains more features than present in longlong.h.
      
      Link: https://github.com/ClangBuiltLinux/linux/issues/260Signed-off-by: default avatarJoel Stanley <joel@jms.id.au>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Reviewed-by: default avatarSegher Boessenkool <segher@kernel.crashing.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      b682c869
    • Russell Currey's avatar
      powerpc/tools/checkpatch: Ignore DT_SPLIT_BINDING_PATCH · afa202b6
      Russell Currey authored
      From what I've seen, every time this warning comes up it's bogus,
      so let's ignore it.
      Signed-off-by: default avatarRussell Currey <ruscur@russell.cc>
      Reviewed-by: default avatarAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      afa202b6
    • Christophe Leroy's avatar
      powerpc/8xx: regroup TLB handler routines · b14fc502
      Christophe Leroy authored
      As this is running with MMU off, the CPU only does speculative
      fetch for code in the same page.
      
      Following the significant size reduction of TLB handler routines,
      the side handlers can be brought back close to the main part,
      ie in the same page.
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      b14fc502
    • Christophe Leroy's avatar
      powerpc/8xx: don't use r12/SPRN_SPRG_SCRATCH2 in TLB Miss handlers · 74fabcad
      Christophe Leroy authored
      This patch reworks the TLB Miss handler in order to not use r12
      register, hence avoiding having to save it into SPRN_SPRG_SCRATCH2.
      
      In the DAR Fixup code we can now use SPRN_M_TW, freeing
      SPRN_SPRG_SCRATCH2.
      
      Then SPRN_SPRG_SCRATCH2 may be used for something else in the future.
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      74fabcad
    • Christophe Leroy's avatar
      powerpc/8xx: reintroduce 16K pages with HW assistance · 55c8fc3f
      Christophe Leroy authored
      Using this HW assistance implies some constraints on the
      page table structure:
      - Regardless of the main page size used (4k or 16k), the
      level 1 table (PGD) contains 1024 entries and each PGD entry covers
      a 4Mbytes area which is managed by a level 2 table (PTE) containing
      also 1024 entries each describing a 4k page.
      - 16k pages require 4 identifical entries in the L2 table
      - 512k pages PTE have to be spread every 128 bytes in the L2 table
      - 8M pages PTE are at the address pointed by the L1 entry and each
      8M page require 2 identical entries in the PGD.
      
      In order to use hardware assistance with 16K pages, this patch does
      the following modifications:
      - Make PGD size independent of the main page size
      - In 16k pages mode, redefine pte_t as a struct with 4 elements,
      and populate those 4 elements in __set_pte_at() and pte_update()
      - Adapt the size of the hugepage tables.
      - Define a PTE_FRAGMENT_NB so that a 16k page contains 4 page tables.
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      55c8fc3f
    • Christophe Leroy's avatar
      powerpc/8xx: Enable 512k hugepage support with HW assistance · 3fb69c6a
      Christophe Leroy authored
      For using 512k pages with hardware assistance, the PTEs have to be spread
      every 128 bytes in the L2 table.
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      3fb69c6a
    • Christophe Leroy's avatar
      powerpc/8xx: Enable 8M hugepage support with HW assistance · 22569b88
      Christophe Leroy authored
      HW assistance naturally supports 8M huge pages without
      further modifications.
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      22569b88
    • Christophe Leroy's avatar
      powerpc/8xx: Use hardware assistance in TLB handlers · 6a8f911b
      Christophe Leroy authored
      Today, on the 8xx the TLB handlers do SW tablewalk by doing all
      the calculation in ASM, in order to match with the Linux page
      table structure.
      
      The 8xx offers hardware assistance which allows significant size
      reduction of the TLB handlers, hence also reduces the time spent
      in the handlers.
      
      However, using this HW assistance implies some constraints on the
      page table structure:
      - Regardless of the main page size used (4k or 16k), the
      level 1 table (PGD) contains 1024 entries and each PGD entry covers
      a 4Mbytes area which is managed by a level 2 table (PTE) containing
      also 1024 entries each describing a 4k page.
      - 16k pages require 4 identifical entries in the L2 table
      - 512k pages PTE have to be spread every 128 bytes in the L2 table
      - 8M pages PTE are at the address pointed by the L1 entry and each
      8M page require 2 identical entries in the PGD.
      
      This patch modifies the TLB handlers to use HW assistance for 4K PAGES.
      
      Before that patch, the mean time spent in TLB miss handlers is:
      - ITLB miss: 80 ticks
      - DTLB miss: 62 ticks
      After that patch, the mean time spent in TLB miss handlers is:
      - ITLB miss: 72 ticks
      - DTLB miss: 54 ticks
      So the improvement is 10% for ITLB and 13% for DTLB misses
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      6a8f911b
    • Christophe Leroy's avatar
      powerpc/8xx: Temporarily disable 16k pages and hugepages · 5af543be
      Christophe Leroy authored
      In preparation of making use of hardware assistance in TLB handlers,
      this patch temporarily disables 16K pages and hugepages. The reason
      is that when using HW assistance in 4K pages mode, the linux model
      fit with the HW model for 4K pages and 8M pages.
      
      However for 16K pages and 512K mode some additional work is needed
      to get linux model fit with HW model.
      For the 8M pages, they will naturaly come back when we switch to
      HW assistance, without any additional handling.
      In order to keep the following patch smaller, the removal of the
      current special handling for 8M pages gets removed here as well.
      
      Therefore the 4K pages mode will be implemented first and without
      support for 512k hugepages. Then the 512k hugepages will be brought
      back. And the 16K pages will be implemented in the following step.
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      5af543be
    • Christophe Leroy's avatar
      powerpc/8xx: Move SW perf counters in first 32kb of memory · 8cfe4f52
      Christophe Leroy authored
      In order to simplify time critical exceptions handling 8xx
      specific SW perf counters, this patch moves the counters into
      the beginning of memory. This is possible because .text is readable
      and the counters are never modified outside of the handlers.
      
      By doing this, we avoid having to set a second register with
      the upper part of the address of the counters.
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      8cfe4f52
    • Christophe Leroy's avatar
      powerpc/mm: remove unnecessary test in pgtable_cache_init() · 32bff4b9
      Christophe Leroy authored
      pgtable_cache_add() gracefully handles the case when a cache that
      size already exists by returning early with the following test:
      
      	if (PGT_CACHE(shift))
      		return; /* Already have a cache of this size */
      
      It is then not needed to test the existence of the cache before.
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      32bff4b9
    • Christophe Leroy's avatar
      powerpc/mm: fix a warning when a cache is common to PGD and hugepages · 1e03c7e2
      Christophe Leroy authored
      While implementing TLB miss HW assistance on the 8xx, the following
      warning was encountered:
      
      [  423.732965] WARNING: CPU: 0 PID: 345 at mm/slub.c:2412 ___slab_alloc.constprop.30+0x26c/0x46c
      [  423.733033] CPU: 0 PID: 345 Comm: mmap Not tainted 4.18.0-rc8-00664-g2dfff9121c55 #671
      [  423.733075] NIP:  c0108f90 LR: c0109ad0 CTR: 00000004
      [  423.733121] REGS: c455bba0 TRAP: 0700   Not tainted  (4.18.0-rc8-00664-g2dfff9121c55)
      [  423.733147] MSR:  00021032 <ME,IR,DR,RI>  CR: 24224848  XER: 20000000
      [  423.733319]
      [  423.733319] GPR00: c0109ad0 c455bc50 c4521910 c60053c0 007080c0 c0011b34 c7fa41e0 c455be30
      [  423.733319] GPR08: 00000001 c00103a0 c7fa41e0 c49afcc4 24282842 10018840 c079b37c 00000040
      [  423.733319] GPR16: 73f00000 00210d00 00000000 00000001 c455a000 00000100 00000200 c455a000
      [  423.733319] GPR24: c60053c0 c0011b34 007080c0 c455a000 c455a000 c7fa41e0 00000000 00009032
      [  423.734190] NIP [c0108f90] ___slab_alloc.constprop.30+0x26c/0x46c
      [  423.734257] LR [c0109ad0] kmem_cache_alloc+0x210/0x23c
      [  423.734283] Call Trace:
      [  423.734326] [c455bc50] [00000100] 0x100 (unreliable)
      [  423.734430] [c455bcc0] [c0109ad0] kmem_cache_alloc+0x210/0x23c
      [  423.734543] [c455bcf0] [c0011b34] huge_pte_alloc+0xc0/0x1dc
      [  423.734633] [c455bd20] [c01044dc] hugetlb_fault+0x408/0x48c
      [  423.734720] [c455bdb0] [c0104b20] follow_hugetlb_page+0x14c/0x44c
      [  423.734826] [c455be10] [c00e8e54] __get_user_pages+0x1c4/0x3dc
      [  423.734919] [c455be80] [c00e9924] __mm_populate+0xac/0x140
      [  423.735020] [c455bec0] [c00db14c] vm_mmap_pgoff+0xb4/0xb8
      [  423.735127] [c455bf00] [c00f27c0] ksys_mmap_pgoff+0xcc/0x1fc
      [  423.735222] [c455bf40] [c000e0f8] ret_from_syscall+0x0/0x38
      [  423.735271] Instruction dump:
      [  423.735321] 7cbf482e 38fd0008 7fa6eb78 7fc4f378 4bfff5dd 7fe3fb78 4bfffe24 81370010
      [  423.735536] 71280004 41a2ff88 4840c571 4bffff80 <0fe00000> 4bfffeb8 81340010 712a0004
      [  423.735757] ---[ end trace e9b222919a470790 ]---
      
      This warning occurs when calling kmem_cache_zalloc() on a
      cache having a constructor.
      
      In this case it happens because PGD cache and 512k hugepte cache are
      the same size (4k). While a cache with constructor is created for
      the PGD, hugepages create cache without constructor and uses
      kmem_cache_zalloc(). As both expect a cache with the same size,
      the hugepages reuse the cache created for PGD, hence the conflict.
      
      In order to avoid this conflict, this patch:
      - modifies pgtable_cache_add() so that a zeroising constructor is
      added for any cache size.
      - replaces calls to kmem_cache_zalloc() by kmem_cache_alloc()
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      1e03c7e2
    • Christophe Leroy's avatar
      powerpc/mm: replace hugetlb_cache by PGT_CACHE(PTE_T_ORDER) · 03566562
      Christophe Leroy authored
      Instead of opencoding cache handling for the special case
      of hugepage tables having a single pte_t element, this
      patch makes use of the common pgtable_cache helpers
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      03566562
    • Christophe Leroy's avatar
      powerpc/mm: enable the use of page table cache of order 0 · 129dd323
      Christophe Leroy authored
      hugepages uses a cache of order 0. Lets allow page tables
      of order 0 in the common part in order to avoid open coding
      in hugetlb
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      129dd323