1. 02 May, 2019 22 commits
  2. 01 May, 2019 10 commits
    • Breno Leitao's avatar
      powerpc/tm: Avoid machine crash on rt_sigreturn() · e620d450
      Breno Leitao authored
      There is a kernel crash that happens if rt_sigreturn() is called inside
      a transactional block.
      
      This crash happens if the kernel hits an in-kernel page fault when
      accessing userspace memory, usually through copy_ckvsx_to_user(). A
      major page fault calls might_sleep() function, which can cause a task
      reschedule. A task reschedule (switch_to()) reclaim and recheckpoint
      the TM states, but, in the signal return path, the checkpointed memory
      was already reclaimed, thus the exception stack has MSR that points to
      MSR[TS]=0.
      
      When the code returns from might_sleep() and a task reschedule
      happened, then this task is returned with the memory recheckpointed,
      and CPU MSR[TS] = suspended.
      
      This means that there is a side effect at might_sleep() if it is
      called with CPU MSR[TS] = 0 and the task has regs->msr[TS] != 0.
      
      This side effect can cause a TM bad thing, since at the exception
      entrance, the stack saves MSR[TS]=0, and this is what will be used at
      RFID, but, the processor has MSR[TS] = Suspended, and this transition
      will be invalid and a TM Bad thing will be raised, causing the
      following crash:
      
        Unexpected TM Bad Thing exception at c00000000000e9ec (msr 0x8000000302a03031) tm_scratch=800000010280b033
        cpu 0xc: Vector: 700 (Program Check) at [c00000003ff1fd70]
            pc: c00000000000e9ec: fast_exception_return+0x100/0x1bc
            lr: c000000000032948: handle_rt_signal64+0xb8/0xaf0
            sp: c0000004263ebc40
           msr: 8000000302a03031
          current = 0xc000000415050300
          paca    = 0xc00000003ffc4080	 irqmask: 0x03	 irq_happened: 0x01
            pid   = 25006, comm = sigfuz
        Linux version 5.0.0-rc1-00001-g3bd6e94b (breno@debian) (gcc version 8.2.0 (Debian 8.2.0-3)) #899 SMP Mon Jan 7 11:30:07 EST 2019
        WARNING: exception is not recoverable, can't continue
        enter ? for help
        [c0000004263ebc40] c000000000032948 handle_rt_signal64+0xb8/0xaf0 (unreliable)
        [c0000004263ebd30] c000000000022780 do_notify_resume+0x2f0/0x430
        [c0000004263ebe20] c00000000000e844 ret_from_except_lite+0x70/0x74
        --- Exception: c00 (System Call) at 00007fffbaac400c
        SP (7fffeca90f40) is in userspace
      
      The solution for this problem is running the sigreturn code with
      regs->msr[TS] disabled, thus, avoiding hitting the side effect above.
      This does not seem to be a problem since regs->msr will be replaced by
      the ucontext value, so, it is being flushed already. In this case, it
      is flushed earlier.
      Signed-off-by: default avatarBreno Leitao <leitao@debian.org>
      Acked-by: default avatarMichael Neuling <mikey@neuling.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      e620d450
    • Aneesh Kumar K.V's avatar
      powerpc/mm/radix: Fix kernel crash when running subpage protect test · 2c474c03
      Aneesh Kumar K.V authored
      This patch fixes the below crash by making sure we touch the subpage
      protection related structures only if we know they are allocated on
      the platform. With radix translation we don't allocate hash context at
      all and trying to access subpage_prot_table results in:
      
        Faulting instruction address: 0xc00000000008bdb4
        Oops: Kernel access of bad area, sig: 11 [#1]
        LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
        ....
        NIP [c00000000008bdb4] sys_subpage_prot+0x74/0x590
        LR [c00000000000b688] system_call+0x5c/0x70
        Call Trace:
        [c00020002c6b7d30] [c00020002c6b7d90] 0xc00020002c6b7d90 (unreliable)
        [c00020002c6b7e20] [c00000000000b688] system_call+0x5c/0x70
        Instruction dump:
        fb61ffd8 fb81ffe0 fba1ffe8 fbc1fff0 fbe1fff8 f821ff11 e92d1178 f9210068
        39200000 e92d0968 ebe90630 e93f03e8 <eb891038> 60000000 3860fffe e9410068
      
      We also move the subpage_prot_table with mmp_sem held to avoid race
      between two parallel subpage_prot syscall.
      
      Fixes: 70110186 ("powerpc/mm: Reduce memory usage for mm_context_t for radix")
      Reported-by: default avatarSachin Sant <sachinp@linux.ibm.com>
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Tested-by: default avatarSachin Sant <sachinp@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      2c474c03
    • Mahesh Salgaonkar's avatar
      powerpc/powernv/mce: Print additional information about MCE error. · 50dbabe0
      Mahesh Salgaonkar authored
      Print more information about MCE error whether it is an hardware or
      software error.
      
      Some of the MCE errors can be easily categorized as hardware or
      software errors e.g. UEs are due to hardware error, where as error
      triggered due to invalid usage of tlbie is a pure software bug. But
      not all the MCE errors can be easily categorize into either software
      or hardware. There are errors like multihit errors which are usually
      result of a software bug, but in some rare cases a hardware failure
      can cause a multihit error. In past, we have seen case where after
      replacing faulty chip, multihit errors stopped occurring. Same with
      parity errors, which are usually due to faulty hardware but there are
      chances where multihit can also cause an parity error. Such errors are
      difficult to determine what really caused it. Hence this patch
      classifies MCE errors into following four categorize:
      
        1. Hardware error:
        	UE and Link timeout failure errors.
        2. Probable hardware error (some chance of software cause)
        	SLB/ERAT/TLB Parity errors.
        3. Software error
        	Invalid tlbie form.
        4. Probable software error (some chance of hardware cause)
        	SLB/ERAT/TLB Multihit errors.
      
      Sample output:
      
        MCE: CPU80: machine check (Warning) Guest SLB Multihit DAR: 000001001b6e0320 [Recovered]
        MCE: CPU80: PID: 24765 Comm: qemu-system-ppc Guest NIP: [00007fffa309dc60]
        MCE: CPU80: Probable Software error (some chance of hardware cause)
      Signed-off-by: default avatarMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      50dbabe0
    • Mahesh Salgaonkar's avatar
      powerpc/powernv/mce: Print correct severity for MCE error. · cda6618d
      Mahesh Salgaonkar authored
      Currently all machine check errors are printed as severe errors which
      isn't correct. Print soft errors as warning instead of severe errors.
      Signed-off-by: default avatarMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      cda6618d
    • Mahesh Salgaonkar's avatar
      powerpc/powernv/mce: Reduce MCE console logs to lesser lines. · d6e8a150
      Mahesh Salgaonkar authored
      Also add cpu number while displaying MCE log. This will help cleaner
      logs when MCE hits on multiple cpus simultaneously.
      
      Before the changes the MCE output was:
      
        Severe Machine check interrupt [Recovered]
          NIP [d00000000ba80280]: insert_slb_entry.constprop.0+0x278/0x2c0 [mcetest_slb]
          Initiator: CPU
          Error type: SLB [Multihit]
            Effective address: d00000000ba80280
      
      After this patch series changes the MCE output will be:
      
        MCE: CPU80: machine check (Warning) Host SLB Multihit [Recovered]
        MCE: CPU80: NIP: [d00000000b550280] insert_slb_entry.constprop.0+0x278/0x2c0 [mcetest_slb]
        MCE: CPU80: Probable software error (some chance of hardware cause)
      
      UE in host application:
      
        MCE: CPU48: machine check (Severe) Host UE Load/Store DAR: 00007fffc6079a80 paddr: 0000000f8e260000 [Not recovered]
        MCE: CPU48: PID: 4584 Comm: find NIP: [0000000010023368]
        MCE: CPU48: Hardware error
      
      and for MCE in Guest:
      
        MCE: CPU80: machine check (Warning) Guest SLB Multihit DAR: 000001001b6e0320 [Recovered]
        MCE: CPU80: PID: 24765 Comm: qemu-system-ppc Guest NIP: [00007fffa309dc60]
        MCE: CPU80: Probable software error (some chance of hardware cause)
      Signed-off-by: default avatarMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      d6e8a150
    • Anton Blanchard's avatar
      powerpc: Add doorbell tracepoints · 5b2a1529
      Anton Blanchard authored
      When analysing sources of OS jitter, I noticed that doorbells cannot be
      traced.
      Signed-off-by: default avatarAnton Blanchard <anton@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      5b2a1529
    • YueHaibing's avatar
      ocxl: remove set but not used variables 'tid' and 'lpid' · 32eeb561
      YueHaibing authored
      Fixes gcc '-Wunused-but-set-variable' warning:
      
        drivers/misc/ocxl/link.c: In function 'xsl_fault_handler':
        drivers/misc/ocxl/link.c:187:17: warning: variable 'tid' set but not used
        drivers/misc/ocxl/link.c:187:6: warning: variable 'lpid' set but not used
      
      They are never used and can be removed.
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Reviewed-by: default avatarMukesh Ojha <mojha@codeaurora.org>
      Acked-by: default avatarAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Acked-by: default avatarFrederic Barrat <fbarrat@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      32eeb561
    • Mathieu Malaterre's avatar
      powerpc/64s: Remove 'dummy_copy_buffer' · a5ae043d
      Mathieu Malaterre authored
      In commit 2bf1071a ("powerpc/64s: Remove POWER9 DD1 support") the
      function __switch_to remove usage for 'dummy_copy_buffer'. Since it is
      not used anywhere else, remove it completely.
      
      This remove the following warning:
        arch/powerpc/kernel/process.c:1156:17: error: 'dummy_copy_buffer' defined but not used
      Suggested-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarMathieu Malaterre <malat@debian.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      a5ae043d
    • Tobin C. Harding's avatar
      powerpc/cacheinfo: Fix kobject memleak · 7e803979
      Tobin C. Harding authored
      Currently error return from kobject_init_and_add() is not followed by
      a call to kobject_put(). This means there is a memory leak.
      
      Add call to kobject_put() in error path of kobject_init_and_add().
      Signed-off-by: default avatarTobin C. Harding <tobin@kernel.org>
      Reviewed-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Reviewed-by: default avatarTyrel Datwyler <tyreld@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      7e803979
    • Nick Desaulniers's avatar
      powerpc/vdso: Drop unnecessary cc-ldoption · 33dda8c3
      Nick Desaulniers authored
      Towards the goal of removing cc-ldoption, it seems that --hash-style=
      was added to binutils 2.17.50.0.2 in 2006. The minimal required
      version of binutils for the kernel according to
      Documentation/process/changes.rst is 2.20.
      Suggested-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Signed-off-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Reviewed-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      33dda8c3
  3. 30 Apr, 2019 5 commits
    • Alexey Kardashevskiy's avatar
      powerpc/powernv/ioda: Handle failures correctly in pnv_pci_ioda_iommu_bypass_supported() · b511cdd1
      Alexey Kardashevskiy authored
      When the return value type was changed from int to bool, few places
      were left unchanged, this fixes them. We did not hit these failures as
      the first one is not happening at all and the second one is little
      more likely to happen if the user switches a 33..58bit DMA capable
      device between the VFIO and vendor drivers and there are not so many
      of these.
      
      Fixes: 2d6ad41b ("powerpc/powernv: use the generic iommu bypass code")
      Signed-off-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      b511cdd1
    • Michael Ellerman's avatar
      Merge branch 'topic/ppc-kvm' into next · bdc7c970
      Michael Ellerman authored
      Merge our topic branch shared with KVM. In particular this includes the
      rewrite of the idle code into C.
      bdc7c970
    • Michael Ellerman's avatar
      powerpc/powernv/idle: Restore AMR/UAMOR/AMOR/IAMR after idle · e9cef018
      Michael Ellerman authored
      This is an implementation of commits 53a712ba
      ("powerpc/powernv/idle: Restore AMR/UAMOR/AMOR after idle") and
      a3f3072d ("powerpc/powernv/idle: Restore IAMR after idle") using
      the new C-based idle code.
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      [mpe: Extract from Nick's patch]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      e9cef018
    • Nicholas Piggin's avatar
      powerpc/64s: Reimplement book3s idle code in C · 10d91611
      Nicholas Piggin authored
      Reimplement Book3S idle code in C, moving POWER7/8/9 implementation
      speific HV idle code to the powernv platform code.
      
      Book3S assembly stubs are kept in common code and used only to save
      the stack frame and non-volatile GPRs before executing architected
      idle instructions, and restoring the stack and reloading GPRs then
      returning to C after waking from idle.
      
      The complex logic dealing with threads and subcores, locking, SPRs,
      HMIs, timebase resync, etc., is all done in C which makes it more
      maintainable.
      
      This is not a strict translation to C code, there are some
      significant differences:
      
      - Idle wakeup no longer uses the ->cpu_restore call to reinit SPRs,
        but saves and restores them itself.
      
      - The optimisation where EC=ESL=0 idle modes did not have to save GPRs
        or change MSR is restored, because it's now simple to do. ESL=1
        sleeps that do not lose GPRs can use this optimization too.
      
      - KVM secondary entry and cede is now more of a call/return style
        rather than branchy. nap_state_lost is not required because KVM
        always returns via NVGPR restoring path.
      
      - KVM secondary wakeup from offline sequence is moved entirely into
        the offline wakeup, which avoids a hwsync in the normal idle wakeup
        path.
      
      Performance measured with context switch ping-pong on different
      threads or cores, is possibly improved a small amount, 1-3% depending
      on stop state and core vs thread test for shallow states. Deep states
      it's in the noise compared with other latencies.
      
      KVM improvements:
      
      - Idle sleepers now always return to caller rather than branch out
        to KVM first.
      
      - This allows optimisations like very fast return to caller when no
        state has been lost.
      
      - KVM no longer requires nap_state_lost because it controls NVGPR
        save/restore itself on the way in and out.
      
      - The heavy idle wakeup KVM request check can be moved out of the
        normal host idle code and into the not-performance-critical offline
        code.
      
      - KVM nap code now returns from where it is called, which makes the
        flow a bit easier to follow.
      Reviewed-by: default avatarGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      [mpe: Squash the KVM changes in]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      10d91611
    • Nicholas Piggin's avatar
      powerpc/watchdog: Use hrtimers for per-CPU heartbeat · 7ae3f6e1
      Nicholas Piggin authored
      Using a jiffies timer creates a dependency on the tick_do_timer_cpu
      incrementing jiffies. If that CPU has locked up and jiffies is not
      incrementing, the watchdog heartbeat timer for all CPUs stops and
      creates false positives and confusing warnings on local CPUs, and
      also causes the SMP detector to stop, so the root cause is never
      detected.
      
      Fix this by using hrtimer based timers for the watchdog heartbeat,
      like the generic kernel hardlockup detector.
      
      Cc: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
      Reported-by: default avatarRavikumar Bangoria <ravi.bangoria@in.ibm.com>
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Tested-by: default avatarRavi Bangoria <ravi.bangoria@linux.ibm.com>
      Reported-by: default avatarRavi Bangoria <ravi.bangoria@linux.ibm.com>
      Reviewed-by: default avatarGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      7ae3f6e1
  4. 29 Apr, 2019 1 commit
    • Nathan Fontenot's avatar
      powerpc/pseries: Track LMB nid instead of using device tree · b2d3b5ee
      Nathan Fontenot authored
      When removing memory we need to remove the memory from the node
      it was added to instead of looking up the node it should be in
      in the device tree.
      
      During testing we have seen scenarios where the affinity for a
      LMB changes due to a partition migration or PRRN event. In these
      cases the node the LMB exists in may not match the node the device
      tree indicates it belongs in. This can lead to a system crash
      when trying to DLPAR remove the LMB after a migration or PRRN
      event. The current code looks up the node in the device tree to
      remove the LMB from, the crash occurs when we try to offline this
      node and it does not have any data, i.e. node_data[nid] == NULL.
      
      36:mon> e
      cpu 0x36: Vector: 300 (Data Access) at [c0000001828b7810]
          pc: c00000000036d08c: try_offline_node+0x2c/0x1b0
          lr: c0000000003a14ec: remove_memory+0xbc/0x110
          sp: c0000001828b7a90
         msr: 800000000280b033
         dar: 9a28
       dsisr: 40000000
        current = 0xc0000006329c4c80
        paca    = 0xc000000007a55200   softe: 0        irq_happened: 0x01
          pid   = 76926, comm = kworker/u320:3
      
      36:mon> t
      [link register   ] c0000000003a14ec remove_memory+0xbc/0x110
      [c0000001828b7a90] c00000000006a1cc arch_remove_memory+0x9c/0xd0 (unreliable)
      [c0000001828b7ad0] c0000000003a14e0 remove_memory+0xb0/0x110
      [c0000001828b7b20] c0000000000c7db4 dlpar_remove_lmb+0x94/0x160
      [c0000001828b7b60] c0000000000c8ef8 dlpar_memory+0x7e8/0xd10
      [c0000001828b7bf0] c0000000000bf828 handle_dlpar_errorlog+0xf8/0x160
      [c0000001828b7c60] c0000000000bf8cc pseries_hp_work_fn+0x3c/0xa0
      [c0000001828b7c90] c000000000128cd8 process_one_work+0x298/0x5a0
      [c0000001828b7d20] c000000000129068 worker_thread+0x88/0x620
      [c0000001828b7dc0] c00000000013223c kthread+0x1ac/0x1c0
      [c0000001828b7e30] c00000000000b45c ret_from_kernel_thread+0x5c/0x80
      
      To resolve this we need to track the node a LMB belongs to when
      it is added to the system so we can remove it from that node instead
      of the node that the device tree indicates it should belong to.
      Signed-off-by: default avatarNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      b2d3b5ee
  5. 28 Apr, 2019 1 commit
  6. 21 Apr, 2019 1 commit