1. 19 Apr, 2017 10 commits
  2. 13 Apr, 2017 18 commits
  3. 12 Apr, 2017 6 commits
    • Rashmica Gupta's avatar
      powerpc/mm: Fix hash table dump when memory is not contiguous · 9e4114b3
      Rashmica Gupta authored
      The current behaviour of the hash table dump assumes that memory is contiguous
      and iterates from the start of memory to (start + size of memory). When memory
      isn't physically contiguous, this doesn't work.
      
      If memory exists at 0-5 GB and 6-10 GB then the current approach will check if
      entries exist in the hash table from 0GB to 9GB. This patch changes the
      behaviour to iterate over any holes up to the end of memory.
      
      Fixes: 1515ab93 ("powerpc/mm: Dump hash table")
      Signed-off-by: default avatarRashmica Gupta <rashmica.g@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      9e4114b3
    • Oliver O'Halloran's avatar
      powerpc/mm: Add physical address to Linux page table dump · aaa22952
      Oliver O'Halloran authored
      The current page table dumper scans the Linux page tables and coalesces mappings
      with adjacent virtual addresses and similar PTE flags. This behaviour is
      somewhat broken when you consider the IOREMAP space where entirely unrelated
      mappings will appear to be virtually contiguous. This patch modifies the range
      coalescing so that only ranges that are both physically and virtually contiguous
      are combined. This patch also adds to the dump output the physical address at
      the start of each range.
      
      Fixes: 8eb07b18 ("powerpc/mm: Dump linux pagetables")
      Signed-off-by: default avatarOliver O'Halloran <oohall@gmail.com>
      [mpe: Print the physicall address with 0x like the other addresses]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      aaa22952
    • Oliver O'Halloran's avatar
      powerpc/mm: Fix missing _PAGE_NON_IDEMPOTENT in pgtable dump · 70538eaa
      Oliver O'Halloran authored
      On Book3s we have two PTE flags used to mark cache-inhibited mappings:
      _PAGE_TOLERANT and _PAGE_NON_IDEMPOTENT. Currently the kernel page table dumper
      only looks at the generic _PAGE_NO_CACHE which is defined to be _PAGE_TOLERANT.
      This patch modifies the dumper so both flags are shown in the dump.
      
      Fixes: 8eb07b18 ("powerpc/mm: Dump linux pagetables")
      Signed-off-by: default avatarOliver O'Halloran <oohall@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      70538eaa
    • Balbir Singh's avatar
      powerpc/tracing: Allow tracing of mmap syscalls · 9c355917
      Balbir Singh authored
      Currently sys_mmap() and sys_mmap2() (32-bit only), are not visible to the
      syscall tracing machinery. This means users are not able to see the execution of
      mmap() syscalls using the syscall tracer.
      
      Fix that by using SYSCALL_DEFINE6 for sys_mmap() and sys_mmap2() so that the
      meta-data associated with these syscalls is visible to the syscall tracer.
      
      A side-effect of this change is that the return type has changed from unsigned
      long to long. However this should have no effect, the only code in the kernel
      which uses the result of these syscalls is in the syscall return path, which is
      written in asm and treats the result as unsigned regardless.
      
      Example output:
        cat-3399  [001] ....   196.542410: sys_mmap(addr: 7fff922a0000, len: 20000, prot: 3, flags: 812, fd: 3, offset: 1b0000)
        cat-3399  [001] ....   196.542443: sys_mmap -> 0x7fff922a0000
        cat-3399  [001] ....   196.542668: sys_munmap(addr: 7fff922c0000, len: 6d2c)
        cat-3399  [001] ....   196.542677: sys_munmap -> 0x0
      Signed-off-by: default avatarBalbir Singh <bsingharora@gmail.com>
      [mpe: Massage change log, add detail on return type change]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      9c355917
    • Michael Ellerman's avatar
      powerpc/mm: Fix swapper_pg_dir size on 64-bit hash w/64K pages · 03dfee6d
      Michael Ellerman authored
      Recently in commit f6eedbba ("powerpc/mm/hash: Increase VA range to 128TB"),
      we increased H_PGD_INDEX_SIZE to 15 when we're building with 64K pages. This
      makes it larger than RADIX_PGD_INDEX_SIZE (13), which means the logic to
      calculate MAX_PGD_INDEX_SIZE in book3s/64/pgtable.h is wrong.
      
      The end result is that the PGD (Page Global Directory, ie top level page table)
      of the kernel (aka. swapper_pg_dir), is too small.
      
      This generally doesn't lead to a crash, as we don't use the full range in normal
      operation. However if we try to dump the kernel pagetables we can trigger a
      crash because we walk off the end of the pgd into other memory and eventually
      try to dereference something bogus:
      
        $ cat /sys/kernel/debug/kernel_pagetables
        Unable to handle kernel paging request for data at address 0xe8fece0000000000
        Faulting instruction address: 0xc000000000072314
        cpu 0xc: Vector: 380 (Data SLB Access) at [c0000000daa13890]
            pc: c000000000072314: ptdump_show+0x164/0x430
            lr: c000000000072550: ptdump_show+0x3a0/0x430
           dar: e802cf0000000000
        seq_read+0xf8/0x560
        full_proxy_read+0x84/0xc0
        __vfs_read+0x6c/0x1d0
        vfs_read+0xbc/0x1b0
        SyS_read+0x6c/0x110
        system_call+0x38/0xfc
      
      The root cause is that MAX_PGD_INDEX_SIZE isn't actually computed to be
      the max of H_PGD_INDEX_SIZE or RADIX_PGD_INDEX_SIZE. To fix that move
      the calculation into asm-offsets.c where we can do it easily using
      max().
      
      Fixes: f6eedbba ("powerpc/mm/hash: Increase VA range to 128TB")
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      03dfee6d
    • Michael Ellerman's avatar
      Merge branch 'topic/xive' (early part) into next · 3c19d5ad
      Michael Ellerman authored
      This merges the arch part of the XIVE support, leaving the final commit
      with the KVM specific pieces dangling on the branch for Paul to merge
      via the kvm-ppc tree.
      3c19d5ad
  4. 10 Apr, 2017 6 commits
    • Gautham R. Shenoy's avatar
      powerpc/powernv: Recover correct PACA on wakeup from a stop on P9 DD1 · 17ed4c8f
      Gautham R. Shenoy authored
      POWER9 DD1.0 hardware has a bug where the SPRs of a thread waking up
      from stop 0,1,2 with ESL=1 can endup being misplaced in the core. Thus
      the HSPRG0 of a thread waking up from can contain the paca pointer of
      its sibling.
      
      This patch implements a context recovery framework within threads of a
      core, by provisioning space in paca_struct for saving every sibling
      threads's paca pointers. Basically, we should be able to arrive at the
      right paca pointer from any of the thread's existing paca pointer.
      
      At bootup, during powernv idle-init, we save the paca address of every
      CPU in each one its siblings paca_struct in the slot corresponding to
      this CPU's index in the core.
      
      On wakeup from a stop, the thread will determine its index in the core
      from the TIR register and recover its PACA pointer by indexing into
      the correct slot in the provisioned space in the current PACA.
      
      Furthermore, ensure that the NVGPRs are restored from the stack on the
      way out by setting the NAPSTATELOST in paca.
      
      [Changelog written with inputs from svaidy@linux.vnet.ibm.com]
      Signed-off-by: default avatarGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Reviewed-by: default avatarNicholas Piggin <npiggin@gmail.com>
      [mpe: Call it a bug]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      17ed4c8f
    • Gautham R. Shenoy's avatar
      powerpc/powernv/idle: Don't override default/deepest directly in kernel · f3b3f284
      Gautham R. Shenoy authored
      Currently during idle-init on power9, if we don't find suitable stop
      states in the device tree that can be used as the
      default_stop/deepest_stop, we set stop0 (ESL=1,EC=1) as the default
      stop state psscr to be used by power9_idle and deepest stop state
      which is used by CPU-Hotplug.
      
      However, if the platform firmware has not configured or enabled a stop
      state, the kernel should not make any assumptions and fallback to a
      default choice.
      
      If the kernel uses a stop state that is not configured by the platform
      firmware, it may lead to further failures which should be avoided.
      
      In this patch, we modify the init code to ensure that the kernel uses
      only the stop states exposed by the firmware through the device
      tree. When a suitable default stop state isn't found, we disable
      ppc_md.power_save for power9. Similarly, when a suitable
      deepest_stop_state is not found in the device tree exported by the
      firmware, fall back to the default busy-wait loop in the CPU-Hotplug
      code.
      
      [Changelog written with inputs from svaidy@linux.vnet.ibm.com]
      Reviewed-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      f3b3f284
    • Gautham R. Shenoy's avatar
      powerpc/powernv/smp: Add busy-wait loop as fall back for CPU-Hotplug · 90061231
      Gautham R. Shenoy authored
      Currently, the powernv cpu-offline function assumes that platform idle
      states such as stop on POWER9, winkle/sleep/nap on POWER8 are always
      available. On POWER8, it picks nap as the default state if other deep
      idle states like sleep/winkle are not available and enabled in the
      platform.
      
      On POWER9, nap is not available and all idle states are managed by
      STOP instruction.  The parameters to the idle state are passed through
      processor stop status control register (PSSCR).  Hence as such
      executing STOP would take parameters from current PSSCR. We do not
      want to make any assumptions in kernel on what STOP states and PSSCR
      features are configured by the platform.
      
      Ideally platform will configure a good set of stop states that can be
      used in the kernel.  We would like to start with a clean slate, if the
      platform choose to not configure any state or there is an error in
      platform firmware that lead to no stop states being configured or
      allowed to be requested.
      
      This patch adds a fallback method for CPU-Hotplug that is similar to
      snooze loop at idle where the threads are left to spin at low priority
      and hence reduce the cycles consumed.
      
      This is a safe fallback mechanism in the case when no stop state would
      be requested if the platform firmware did not configure them most
      likely due to an error condition.
      
      Requesting a stop state when the platform has not configured them or
      enabled them would lead to further error conditions which could be
      difficult to debug.
      
      [Changelog written with inputs from svaidy@linux.vnet.ibm.com]
      Reviewed-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      90061231
    • Gautham R. Shenoy's avatar
      powerpc/powernv: Move CPU-Offline idle state invocation from smp.c to idle.c · a7cd88da
      Gautham R. Shenoy authored
      Move the piece of code in powernv/smp.c::pnv_smp_cpu_kill_self() which
      transitions the CPU to the deepest available platform idle state to a
      new function named pnv_cpu_offline() in powernv/idle.c. The rationale
      behind this code movement is that the data required to determine the
      deepest available platform state resides in powernv/idle.c.
      Reviewed-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      a7cd88da
    • Anshuman Khandual's avatar
      powerpc/hugetlb: Add ABI defines for supported HugeTLB page sizes · 2c9faa76
      Anshuman Khandual authored
      Add user space exported API definitions for 512KB, 1MB, 2MB, 8MB, 16MB,
      1GB, 16GB non default huge page sizes to be used with mmap() system
      call.
      Signed-off-by: default avatarAnshuman Khandual <khandual@linux.vnet.ibm.com>
      [mpe: Reword the comment to emphasise that these are only needed to use
       the non-default huge page size, and updated the change log.]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      2c9faa76
    • Anshuman Khandual's avatar
      powerpc/mm: Remove reduntant initmem information from log · ea614555
      Anshuman Khandual authored
      Generic core VM already prints these information in the log
      buffer, hence there is no need for a second print. This just
      removes the second print from arch powerpc NUMA init path.
      
      Before the patch:
      
        $ dmesg | grep "Initmem"
      
        numa: Initmem setup node 0 [mem 0x00000000-0xffffffff]
        numa: Initmem setup node 1 [mem 0x100000000-0x1ffffffff]
        numa: Initmem setup node 2 [mem 0x200000000-0x2ffffffff]
        numa: Initmem setup node 3 [mem 0x300000000-0x3ffffffff]
        numa: Initmem setup node 4 [mem 0x400000000-0x4ffffffff]
        numa: Initmem setup node 5 [mem 0x500000000-0x5ffffffff]
        numa: Initmem setup node 6 [mem 0x600000000-0x6ffffffff]
        numa: Initmem setup node 7 [mem 0x700000000-0x7ffffffff]
        Initmem setup node 0 [mem 0x0000000000000000-0x00000000ffffffff]
        Initmem setup node 1 [mem 0x0000000100000000-0x00000001ffffffff]
        Initmem setup node 2 [mem 0x0000000200000000-0x00000002ffffffff]
        Initmem setup node 3 [mem 0x0000000300000000-0x00000003ffffffff]
        Initmem setup node 4 [mem 0x0000000400000000-0x00000004ffffffff]
        Initmem setup node 5 [mem 0x0000000500000000-0x00000005ffffffff]
        Initmem setup node 6 [mem 0x0000000600000000-0x00000006ffffffff]
        Initmem setup node 7 [mem 0x0000000700000000-0x00000007ffffffff]
      
      After the patch just the latter set is printed.
      Signed-off-by: default avatarAnshuman Khandual <khandual@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      ea614555