1. 03 Oct, 2018 20 commits
    • Breno Leitao's avatar
      powerpc: Redefine TIF_32BITS thread flag · 16d7c69c
      Breno Leitao authored
      Moving TIF_32BIT to use bit 20 instead of 4 in the task flag field.
      
      This change is making room for an upcoming new task macro
      (_TIF_SYSCALL_EMU) which is preferred to set a bit in the lower 16-bits
      part of the word.
      
      This upcoming flag macro will take part in a composed macro
      (_TIF_SYSCALL_DOTRACE) which will contain other flags as well, and it is
      preferred that the whole _TIF_SYSCALL_DOTRACE macro only sets the lower 16
      bits of a word, so, it could be handled using immediate operations (as load
      immediate, add immediate, ...) where the immediate operand (SI) is limited
      to 16-bits.
      
      Another possible solution would be using the LOAD_REG_IMMEDIATE() macro
      to load a full 64-bits word immediate, but it takes 5 operations instead of
      one.
      
      Having TIF_32BITS being redefined to use an upper bit is not a problem
      since there is only one place in the assembly code where TIF_32BIT is being
      used, and it could be replaced with an operation with right shift (addis),
      since it is used alone, i.e. not being part of a composed macro, which has
      different bits set, and would require LOAD_REG_IMMEDIATE().
      
      Tested on a 64 bits Big Endian machine running a 32 bits task.
      Signed-off-by: default avatarBreno Leitao <leitao@debian.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      16d7c69c
    • Christophe Leroy's avatar
      powerpc/64: add stack protector support · 06ec27ae
      Christophe Leroy authored
      On PPC64, as register r13 points to the paca_struct at all time,
      this patch adds a copy of the canary there, which is copied at
      task_switch.
      That new canary is then used by using the following GCC options:
      -mstack-protector-guard=tls
      -mstack-protector-guard-reg=r13
      -mstack-protector-guard-offset=offsetof(struct paca_struct, canary))
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      06ec27ae
    • Christophe Leroy's avatar
      powerpc/32: add stack protector support · c3ff2a51
      Christophe Leroy authored
      This functionality was tentatively added in the past
      (commit 6533b7c1 ("powerpc: Initial stack protector
      (-fstack-protector) support")) but had to be reverted
      (commit f2574030 ("powerpc: Revert the initial stack
      protector support") because of GCC implementing it differently
      whether it had been built with libc support or not.
      
      Now, GCC offers the possibility to manually set the
      stack-protector mode (global or tls) regardless of libc support.
      
      This time, the patch selects HAVE_STACKPROTECTOR only if
      -mstack-protector-guard=tls is supported by GCC.
      
      On PPC32, as register r2 points to current task_struct at
      all time, the stack_canary located inside task_struct can be
      used directly by using the following GCC options:
      -mstack-protector-guard=tls
      -mstack-protector-guard-reg=r2
      -mstack-protector-guard-offset=offsetof(struct task_struct, stack_canary))
      
      The protector is disabled for prom_init and bootx_init as
      it is too early to handle it properly.
      
       $ echo CORRUPT_STACK > /sys/kernel/debug/provoke-crash/DIRECT
      [  134.943666] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: lkdtm_CORRUPT_STACK+0x64/0x64
      [  134.943666]
      [  134.955414] CPU: 0 PID: 283 Comm: sh Not tainted 4.18.0-s3k-dev-12143-ga3272be41209 #835
      [  134.963380] Call Trace:
      [  134.965860] [c6615d60] [c001f76c] panic+0x118/0x260 (unreliable)
      [  134.971775] [c6615dc0] [c001f654] panic+0x0/0x260
      [  134.976435] [c6615dd0] [c032c368] lkdtm_CORRUPT_STACK_STRONG+0x0/0x64
      [  134.982769] [c6615e00] [ffffffff] 0xffffffff
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      c3ff2a51
    • zhong jiang's avatar
      powerpc/xive: Move a dereference below a NULL test · cd5ff945
      zhong jiang authored
      Move the dereference of xc below the NULL test.
      Signed-off-by: default avatarzhong jiang <zhongjiang@huawei.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      cd5ff945
    • Naveen N. Rao's avatar
      powerpc/pseries: Fix how we iterate over the DTL entries · 9258227e
      Naveen N. Rao authored
      When CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is not set, we look up dtl_idx in
      the lppaca to determine the number of entries in the buffer. Since
      lppaca is in big endian, we need to do an endian conversion before using
      this in our calculation to determine the number of entries in the
      buffer. Without this, we do not iterate over the existing entries in the
      DTL buffer properly.
      
      Fixes: 7c105b63 ("powerpc: Add CONFIG_CPU_LITTLE_ENDIAN kernel config option.")
      Signed-off-by: default avatarNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      9258227e
    • Naveen N. Rao's avatar
      powerpc/pseries: Fix DTL buffer registration · db787af1
      Naveen N. Rao authored
      When CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is not set, we register the DTL
      buffer for a cpu when the associated file under powerpc/dtl in debugfs
      is opened. When doing so, we need to set the size of the buffer being
      registered in the second u32 word of the buffer. This needs to be in big
      endian, but we are not doing the conversion resulting in the below error
      showing up in dmesg:
      
      	dtl_start: DTL registration for cpu 0 (hw 0) failed with -4
      
      Fix this in the obvious manner.
      
      Fixes: 7c105b63 ("powerpc: Add CONFIG_CPU_LITTLE_ENDIAN kernel config option.")
      Signed-off-by: default avatarNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      db787af1
    • Christophe Leroy's avatar
      powerpc/traps: merge unrecoverable_exception() and nonrecoverable_exception() · 51423a9c
      Christophe Leroy authored
      PPC32 uses nonrecoverable_exception() while PPC64 uses
      unrecoverable_exception().
      
      Both functions are doing almost the same thing.
      
      This patch removes nonrecoverable_exception()
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      51423a9c
    • Rob Herring's avatar
      powerpc: Convert to using %pOFn instead of device_node.name · b9ef7b4b
      Rob Herring authored
      In preparation to remove the node name pointer from struct device_node,
      convert printf users to use the %pOFn format specifier.
      
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: linuxppc-dev@lists.ozlabs.org
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      b9ef7b4b
    • Rob Herring's avatar
      macintosh: Convert to using %pOFn instead of device_node.name · 0bdba867
      Rob Herring authored
      In preparation to remove the node name pointer from struct device_node,
      convert printf users to use the %pOFn format specifier.
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      0bdba867
    • Rob Herring's avatar
      powerpc/pseries: Use of_irq_get helper() in request_event_sources_irqs() · c417596d
      Rob Herring authored
      Instead of calling both of_irq_parse_one() and
      irq_create_of_mapping(), call of_irq_get() instead which does
      essentially the same thing. of_irq_get() also calls irq_find_host()
      for deferred probe support, but this should be fine as
      irq_create_of_mapping() also calls that internally. This gets us
      closer to making the former 2 functions static.
      
      In the process of simplifying request_event_sources_irqs(), combine
      the the pr_err() and WARN_ON() calls to just a WARN().
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      c417596d
    • Rob Herring's avatar
      powerpc/cell: Use irq_of_parse_and_map() helper · 8c8933eb
      Rob Herring authored
      Instead of calling both of_irq_parse_one() and
      irq_create_of_mapping(), call of_irq_parse_and_map() instead which
      does the same thing. This gets us closer to making the former 2
      functions static.
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
      Acked-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      8c8933eb
    • Aneesh Kumar K.V's avatar
    • Aneesh Kumar K.V's avatar
      powerpc/mm/thp: update pmd_trans_huge to check for pmd_present · 8890e033
      Aneesh Kumar K.V authored
      We need to make sure pmd_trans_huge returns false for a pmd migration entry.
      We mark the migration entry by clearing the _PAGE_PRESENT bit. We keep the
      _PAGE_PTE bit set to indicate a leaf page table entry. Hence we need to make
      sure we check for pmd_present() so that pmd_trans_huge won't return true on
      pmd migration entry.
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      8890e033
    • Aneesh Kumar K.V's avatar
      arch/powerpc/mm/hash: validate the pte entries before handling the hash fault · 75646c48
      Aneesh Kumar K.V authored
      Make sure we are operating on THP and hugetlb entries in the respective hash
      fault handling routines.
      
      No functional change in this patch. If we walked the table wrongly before, we
      will retry the access.
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      75646c48
    • Aneesh Kumar K.V's avatar
      powerpc/mm/book3s: Check for pmd_large instead of pmd_trans_huge · ae28f17b
      Aneesh Kumar K.V authored
      Update few code paths to check for pmd_large.
      
      set_pmd_at:
      We want to use this to store swap pte at pmd level. For swap ptes we don't want
      to set H_PAGE_THP_HUGE. Hence check for pmd_large in set_pmd_at. This remove
      the false WARN_ON when using this with swap pmd entry.
      
      pmd_page:
      We don't really use them on pmd migration entries. But they can also work with
      migration entries and we don't differentiate at the pte level. Hence update
      pmd_page to work with pmd migration entries too
      
      __find_linux_pte:
      lockless page table walk need to handle pmd migration entries. pmd_trans_huge
      check will return false on them. We don't set thp = 1 for such entries, but
      update hpage_shift correctly. Without this we will walk pmd migration entries
      as a pte page pointer which is wrong.
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      ae28f17b
    • Aneesh Kumar K.V's avatar
      powerpc/mm/hugetlb/book3s: add _PAGE_PRESENT to hugepd pointer. · f1981b5b
      Aneesh Kumar K.V authored
      This make hugetlb directory pointer similar to other page able entries. A hugepd
      entry is identified by lack of _PAGE_PTE bit set and directory size stored in
      HUGEPD_SHIFT_MASK. We update that to also look at _PAGE_PRESENT
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      f1981b5b
    • Aneesh Kumar K.V's avatar
      powerpc/mm/book3s: Update pmd_present to look at _PAGE_PRESENT bit · da7ad366
      Aneesh Kumar K.V authored
      With this patch we use 0x8000000000000000UL (_PAGE_PRESENT) to indicate a valid
      pgd/pud/pmd entry. We also switch the p**_present() to look at this bit.
      
      With pmd_present, we have a special case. We need to make sure we consider a
      pmd marked invalid during THP split as present. Right now we clear the
      _PAGE_PRESENT bit during a pmdp_invalidate. Inorder to consider this special
      case we add a new pte bit _PAGE_INVALID (mapped to _RPAGE_SW0). This bit is
      only used with _PAGE_PRESENT cleared. Hence we are not really losing a pte bit
      for this special case. pmd_present is also updated to look at _PAGE_INVALID.
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      da7ad366
    • Vaibhav Jain's avatar
      powerpc/powernv: Make possible for user to force a full ipl cec reboot · 8139046a
      Vaibhav Jain authored
      Ever since fast reboot is enabled by default in opal,
      opal_cec_reboot() will use fast-reset instead of full IPL to perform
      system reboot. This leaves the user with no direct way to force a full
      IPL reboot except changing an nvram setting that persistently disables
      fast-reset for all subsequent reboots.
      
      This patch provides a more direct way for the user to force a one-shot
      full IPL reboot by passing the command line argument 'full' to the
      reboot command. So the user will be able to tweak the reboot behavior
      via:
      
        $ sudo reboot full	# Force a full ipl reboot skipping fast-reset
      
        or
        $ sudo reboot  	# default reboot path (usually fast-reset)
      
      The reboot command passes the un-parsed command argument to the kernel
      via the 'Reboot' syscall which is then passed on to the arch function
      pnv_restart(). The patch updates pnv_restart() to handle this cmd-arg
      and issues opal_cec_reboot2 with OPAL_REBOOT_FULL_IPL to force a full
      IPL reset.
      Signed-off-by: default avatarVaibhav Jain <vaibhav@linux.ibm.com>
      Acked-by: default avatarAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      8139046a
    • Michael Ellerman's avatar
      powerpc/perf: Add missing break in power7_marked_instr_event() · db6711b7
      Michael Ellerman authored
      In power7_marked_instr_event() there is a switch case that is missing
      a break or an explicit fallthrough, it's not immediately clear which
      it should be.
      
      The function determines based on the PMU event code, whether the event
      is a "marked" event (which then requires us to configure the PMU in a
      certain way). On Power7 there is no specific bit(s) in the event to
      tell us that, we just have to know.
      
      Rather than having a full list of every event and whether they are
      marked, we pull apart the event code and for events with certain
      values of certain fields we can say that those are all marked events.
      
      We take the psel (bits 0-7) of the event, and look at bits 4-7. For a
      value of 6 we say that if the entire psel == 0x64 then if the pmc == 3
      the event is marked, else not, and otherwise we continue.
      
      It is then that we fallthrough to the 8 case, where we return true if
      the unit == 0xd.
      
      The question is should the 6 case also fallthrough and check for
      unit == 0xd, or should it return.
      
      Looking at the full list of events we see that there are zero events
      where (psel >> 4) == 0x6 and unit == 0xd.
      
      So the answer is it doesn't really matter, there are no valid event
      codes that will return a different result whether we fallthrough or
      break.
      
      But equally, testing the 6 case events against unit == 0xd is slightly
      bogus, as there are no such events. So to make the code clearer, and
      avoid any future confusion, have the 6 case break rather than falling
      through.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: default avatarMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      db6711b7
    • Michael Ellerman's avatar
      Revert "convert SLB miss handlers to C" and subsequent commits · 54be0b9c
      Michael Ellerman authored
      This reverts commits:
        5e46e29e ("powerpc/64s/hash: convert SLB miss handlers to C")
        8fed04d0 ("powerpc/64s/hash: remove user SLB data from the paca")
        655deecf ("powerpc/64s/hash: SLB allocation status bitmaps")
        2e162674 ("powerpc/64s/hash: provide arch_setup_exec hooks for hash slice setup")
        89ca4e12 ("powerpc/64s/hash: Add a SLB preload cache")
      
      This series had a few bugs, and the fixes are not all trivial. So
      revert most of it for now.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      54be0b9c
  2. 19 Sep, 2018 20 commits
    • Hari Bathini's avatar
      powerpc/fadump: re-register firmware-assisted dump if already registered · 0823c68b
      Hari Bathini authored
      Firmware-Assisted Dump (FADump) needs to be registered again after any
      memory hot add/remove operation to update the crash memory ranges. But
      currently, the kernel returns '-EEXIST' if we try to register without
      uregistering it first. This could expose the system to racing issues
      while unregistering and registering FADump from userspace during udev
      events. Spare the userspace of this and let it be taken care of in the
      kernel space for a simpler interface.
      
      Since this change, running 'echo 1 > /sys/kernel/fadump_registered'
      would result in re-regisering (unregistering and registering) FADump,
      if it was already registered.
      Signed-off-by: default avatarHari Bathini <hbathini@linux.ibm.com>
      Acked-by: default avatarMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      0823c68b
    • Suraj Jitindar Singh's avatar
      powerpc/pseries: Remove VLA from lparcfg_write() · 74422e2b
      Suraj Jitindar Singh authored
      In lparcfg_write we hard code kbuf_sz and then use this as the variable
      length of kbuf creating a variable length array. Since we're hard coding
      the length anyway just define the array using this as the length and
      remove the need for kbuf_sz, thus removing the variable length array.
      Signed-off-by: default avatarSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Reviewed-by: default avatarJoel Stanley <joel@jms.id.au>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      74422e2b
    • Suraj Jitindar Singh's avatar
      powerpc/prom: Remove VLA in prom_check_platform_support() · ab912399
      Suraj Jitindar Singh authored
      In prom_check_platform_support() we retrieve and parse the
      "ibm,arch-vec-5-platform-support" property of the chosen node.
      Currently we use a variable length array however to avoid this use an
      array of constant length 8.
      
      This property is used to indicate the supported options of vector 5
      bytes 23-26 of the ibm,architecture.vec node. Each of these options
      is a pair of bytes, thus for 4 options we have a max length of 8 bytes.
      Signed-off-by: default avatarSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Reviewed-by: default avatarJoel Stanley <joel@jms.id.au>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      ab912399
    • Anton Blanchard's avatar
      powerpc: Fix duplicate const clang warning in user access code · e00d93ac
      Anton Blanchard authored
      This re-applies commit b91c1e3e ("powerpc: Fix duplicate const
      clang warning in user access code") (Jun 2015) which was undone in
      commits:
        f2ca8090 ("powerpc/sparse: Constify the address pointer in __get_user_nosleep()") (Feb 2017)
        d466f6c5 ("powerpc/sparse: Constify the address pointer in __get_user_nocheck()") (Feb 2017)
        f84ed59a ("powerpc/sparse: Constify the address pointer in __get_user_check()") (Feb 2017)
      
      We see a large number of duplicate const errors in the user access
      code when building with llvm/clang:
      
        include/linux/pagemap.h:576:8: warning: duplicate 'const' declaration specifier [-Wduplicate-decl-specifier]
              ret = __get_user(c, uaddr);
      
      The problem is we are doing const __typeof__(*(ptr)), which will hit
      the warning if ptr is marked const.
      
      Removing const does not seem to have any effect on GCC code
      generation.
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarJoel Stanley <joel@jms.id.au>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      e00d93ac
    • Joel Stanley's avatar
      powerpc/boot: Ensure _zimage_start is a weak symbol · ee9d21b3
      Joel Stanley authored
      When building with clang crt0's _zimage_start is not marked weak, which
      breaks the build when linking the kernel image:
      
       $ objdump -t arch/powerpc/boot/crt0.o |grep _zimage_start$
       0000000000000058 g       .text  0000000000000000 _zimage_start
      
       ld: arch/powerpc/boot/wrapper.a(crt0.o): in function '_zimage_start':
       (.text+0x58): multiple definition of '_zimage_start';
       arch/powerpc/boot/pseries-head.o:(.text+0x0): first defined here
      
      Clang requires the .weak directive to appear after the symbol is
      declared. The binutils manual says:
      
       This directive sets the weak attribute on the comma separated list of
       symbol names. If the symbols do not already exist, they will be
       created.
      
      So it appears this is different with clang. The only reference I could
      see for this was an OpenBSD mailing list post[1].
      
      Changing it to be after the declaration fixes building with Clang, and
      still works with GCC.
      
       $ objdump -t arch/powerpc/boot/crt0.o |grep _zimage_start$
       0000000000000058  w      .text	0000000000000000 _zimage_start
      
      Reported to clang as https://bugs.llvm.org/show_bug.cgi?id=38921
      
      [1] https://groups.google.com/forum/#!topic/fa.openbsd.tech/PAgKKen2YCYSigned-off-by: default avatarJoel Stanley <joel@jms.id.au>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      ee9d21b3
    • Joel Stanley's avatar
      powerpc/configs: Update skiroot defconfig · cbc39809
      Joel Stanley authored
      Disable new features from recent releases, and clean out some other
      unused options:
      
        - Enable EXPERT, so we can disable some things
        - Disable non-powerpc BPF decoders
        - Disable TASKSTATS
        - Disable unused syscalls
        - Set more things to be modules
        - Turn off unused network vendors
        - PPC_OF_BOOT_TRAMPOLINE and FB_OF are unused on powernv
        - Drop unused Radeon and Matrox GPU drivers
        - IPV6 support landed in petitboot
        - Bringup related command line powersave=off dropped, switch to quiet
      
      Set CONFIG_I2C_CHARDEV=y as the module is not loaded automatically, and
      without this i2cget etc. will fail in the skiroot environment.
      
      This defconfig gets us build coverage of KERNEL_XZ, which was broken in
      the 4.19 merge window for powerpc.
      Signed-off-by: default avatarJoel Stanley <joel@jms.id.au>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      cbc39809
    • Nathan Fontenot's avatar
      powerpc/pseries: Disable CPU hotplug across migrations · 85a88cab
      Nathan Fontenot authored
      When performing partition migrations all present CPUs must be online
      as all present CPUs must make the H_JOIN call as part of the migration
      process. Once all present CPUs make the H_JOIN call, one CPU is returned
      to make the rtas call to perform the migration to the destination system.
      
      During testing of migration and changing the SMT state we have found
      instances where CPUs are offlined, as part of the SMT state change,
      before they make the H_JOIN call. This results in a hung system where
      every CPU is either in H_JOIN or offline.
      
      To prevent this this patch disables CPU hotplug during the migration
      process.
      Signed-off-by: default avatarNathan Fontenot <nfont@linux.vnet.ibm.com>
      Reviewed-by: default avatarTyrel Datwyler <tyreld@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      85a88cab
    • Nathan Fontenot's avatar
      powerpc/pseries: Remove unneeded uses of dlpar work queue · fd12527a
      Nathan Fontenot authored
      There are three instances in which dlpar hotplug events are invoked;
      handling a hotplug interrupt (in a kvm guest), handling a dlpar
      request through sysfs, and updating LMB affinity when handling a
      PRRN event. Only in the case of handling a hotplug interrupt do we
      have to put the work on a workqueue, the other cases can handle the
      dlpar request directly.
      
      This patch exports the handle_dlpar_errorlog() function so that
      dlpar hotplug events can be handled directly and updates the two
      instances mentioned above to use the direct invocation.
      Signed-off-by: default avatarNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      fd12527a
    • Nathan Fontenot's avatar
      powerpc/pseries: Remove prrn_work workqueue · cd24e457
      Nathan Fontenot authored
      When a PRRN event is received we are already running in a worker
      thread. Instead of spawning off another worker thread on the prrn_work
      workqueue to handle the PRRN event we can just call the PRRN handler
      routine directly.
      
      With this update we can also pass the scope variable for the PRRN
      event directly to the handler instead of it being a global variable.
      
      This patch fixes the following oops mnessage we are seeing in PRRN testing:
      
        Oops: Bad kernel stack pointer, sig: 6 [#1]
        SMP NR_CPUS=2048 NUMA pSeries
        Modules linked in: nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc fscache binfmt_misc reiserfs vfat fat rpadlpar_io(X) rpaphp(X) tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag af_packet xfs libcrc32c dm_service_time ibmveth(X) ses enclosure scsi_transport_sas rtc_generic btrfs xor raid6_pq sd_mod ibmvscsi(X) scsi_transport_srp ipr(X) libata sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4
        Supported: Yes, External                                                     54
        CPU: 7 PID: 18967 Comm: kworker/u96:0 Tainted: G                 X 4.4.126-94.22-default #1
        Workqueue: pseries hotplug workque pseries_hp_work_fn
        task: c000000775367790 ti: c00000001ebd4000 task.ti: c00000070d140000
        NIP: 0000000000000000 LR: 000000001fb3d050 CTR: 0000000000000000
        REGS: c00000001ebd7d40 TRAP: 0700   Tainted: G                 X  (4.4.126-94.22-default)
        MSR: 8000000102081000 <41,VEC,ME5  CR: 28000002  XER: 20040018   4
        CFAR: 000000001fb3d084 40 419   1                                3
        GPR00: 000000000000000040000000000010007 000000001ffff400 000000041fffe200
        GPR04: 000000000000008050000000000000000 000000001fb15fa8 0000000500000500
        GPR08: 000000000001f40040000000000000001 0000000000000000 000005:5200040002
        GPR12: 00000000000000005c000000007a05400 c0000000000e89f8 000000001ed9f668
        GPR16: 000000001fbeff944000000001fbeff94 000000001fb545e4 0000006000000060
        GPR20: ffffffffffffffff4ffffffffffffffff 0000000000000000 0000000000000000
        GPR24: 00000000000000005400000001fb3c000 0000000000000000 000000001fb1b040
        GPR28: 000000001fb240004000000001fb440d8 0000000000000008 0000000000000000
        NIP [0000000000000000] 5         (null)
        LR [000000001fb3d050] 031fb3d050
        Call Trace:            4
        Instruction dump:      4                                       5:47 12    2
        XXXXXXXX XXXXXXXX XXXXX4XX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
        XXXXXXXX XXXXXXXX XXXXX5XX XXXXXXXX 60000000 60000000 60000000 60000000
        ---[ end trace aa5627b04a7d9d6b ]---                                       3NMI watchdog: BUG: soft lockup - CPU#27 stuck for 23s! [kworker/27:0:13903]
        Modules linked in: nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc fscache binfmt_misc reiserfs vfat fat rpadlpar_io(X) rpaphp(X) tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag af_packet xfs libcrc32c dm_service_time ibmveth(X) ses enclosure scsi_transport_sas rtc_generic btrfs xor raid6_pq sd_mod ibmvscsi(X) scsi_transport_srp ipr(X) libata sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4
        Supported: Yes, External
        CPU: 27 PID: 13903 Comm: kworker/27:0 Tainted: G      D          X 4.4.126-94.22-default #1
        Workqueue: events prrn_work_fn
        task: c000000747cfa390 ti: c00000074712c000 task.ti: c00000074712c000
        NIP: c0000000008002a8 LR: c000000000090770 CTR: 000000000032e088
        REGS: c00000074712f7b0 TRAP: 0901   Tainted: G      D          X  (4.4.126-94.22-default)
        MSR: 8000000100009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 22482044  XER: 20040000
        CFAR: c0000000008002c4 SOFTE: 1
        GPR00: c000000000090770 c00000074712fa30 c000000000f09800 c000000000fa1928 6:02
        GPR04: c000000775f5e000 fffffffffffffffe 0000000000000001 c000000000f42db8
        GPR08: 0000000000000001 0000000080000007 0000000000000000 0000000000000000
        GPR12: 8006210083180000 c000000007a14400
        NIP [c0000000008002a8] _raw_spin_lock+0x68/0xd0
        LR [c000000000090770] mobility_rtas_call+0x50/0x100
        Call Trace:            59                                        5
        [c00000074712fa60] [c000000000090770] mobility_rtas_call+0x50/0x100
        [c00000074712faf0] [c000000000090b08] pseries_devicetree_update+0xf8/0x530
        [c00000074712fc20] [c000000000031ba4] prrn_work_fn+0x34/0x50
        [c00000074712fc40] [c0000000000e0390] process_one_work+0x1a0/0x4e0
        [c00000074712fcd0] [c0000000000e0870] worker_thread+0x1a0/0x6105:57       2
        [c00000074712fd80] [c0000000000e8b18] kthread+0x128/0x150
        [c00000074712fe30] [c0000000000096f8] ret_from_kernel_thread+0x5c/0x64
        Instruction dump:
        2c090000 40c20010 7d40192d 40c2fff0 7c2004ac 2fa90000 40de0018 5:540030   3
        e8010010 ebe1fff8 7c0803a6 4e800020 <7c210b78> e92d0000 89290009 792affe3
      Signed-off-by: default avatarJohn Allen <jallen@linux.ibm.com>
      Signed-off-by: default avatarHaren Myneni <haren@us.ibm.com>
      Signed-off-by: default avatarNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      cd24e457
    • Nathan Fontenot's avatar
      powerpc/pseries/memory-hotplug: Only update DT once per memory DLPAR request · 063b8b12
      Nathan Fontenot authored
      The updates to powerpc numa and memory hotplug code now use the
      in-kernel LMB array instead of the device tree. This change allows the
      pseries memory DLPAR code to only update the device tree once after
      successfully handling a DLPAR request.
      
      Prior to the in-kernel LMB array, the numa code looked up the affinity
      for memory being added in the device tree, the code now looks this up
      in the LMB array. This change means the memory hotplug code can just
      update the affinity for an LMB in the LMB array instead of updating
      the device tree.
      
      This also provides a savings in kernel memory. When updating the
      device tree old properties are never free'ed since there is no
      usecount on properties. This behavior leads to a new copy of the
      property being allocated every time a LMB is added or removed (i.e. a
      request to add 100 LMBs creates 100 new copies of the property). With
      this update only a single new property is created when a DLPAR request
      completes successfully.
      Signed-off-by: default avatarNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      063b8b12
    • Nicholas Piggin's avatar
    • Nicholas Piggin's avatar
    • Nicholas Piggin's avatar
      powerpc: remove old GCC version checks · f2910f0e
      Nicholas Piggin authored
      GCC 4.6 is the minimum supported now.
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Reviewed-by: default avatarJoel Stanley <joel@jms.id.au>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      f2910f0e
    • Nicholas Piggin's avatar
      powerpc/64s/hash: Add a SLB preload cache · 89ca4e12
      Nicholas Piggin authored
      When switching processes, currently all user SLBEs are cleared, and a
      few (exec_base, pc, and stack) are preloaded. In trivial testing with
      small apps, this tends to miss the heap and low 256MB segments, and it
      will also miss commonly accessed segments on large memory workloads.
      
      Add a simple round-robin preload cache that just inserts the last SLB
      miss into the head of the cache and preloads those at context switch
      time. Every 256 context switches, the oldest entry is removed from the
      cache to shrink the cache and require fewer slbmte if they are unused.
      
      Much more could go into this, including into the SLB entry reclaim
      side to track some LRU information etc, which would require a study of
      large memory workloads. But this is a simple thing we can do now that
      is an obvious win for common workloads.
      
      With the full series, process switching speed on the context_switch
      benchmark on POWER9/hash (with kernel speculation security masures
      disabled) increases from 140K/s to 178K/s (27%).
      
      POWER8 does not change much (within 1%), it's unclear why it does not
      see a big gain like POWER9.
      
      Booting to busybox init with 256MB segments has SLB misses go down
      from 945 to 69, and with 1T segments 900 to 21. These could almost all
      be eliminated by preloading a bit more carefully with ELF binary
      loading.
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      89ca4e12
    • Nicholas Piggin's avatar
      powerpc/64s/hash: provide arch_setup_exec hooks for hash slice setup · 2e162674
      Nicholas Piggin authored
      This will be used by the SLB code in the next patch, but for now this
      sets the slb_addr_limit to the correct size for 32-bit tasks.
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      2e162674
    • Nicholas Piggin's avatar
    • Nicholas Piggin's avatar
      powerpc/64s/hash: SLB allocation status bitmaps · 655deecf
      Nicholas Piggin authored
      Add 32-entry bitmaps to track the allocation status of the first 32
      SLB entries, and whether they are user or kernel entries. These are
      used to allocate free SLB entries first, before resorting to the round
      robin allocator.
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      655deecf
    • Nicholas Piggin's avatar
      powerpc/64s/hash: remove user SLB data from the paca · 8fed04d0
      Nicholas Piggin authored
      User SLB mappig data is copied into the PACA from the mm->context so
      it can be accessed by the SLB miss handlers.
      
      After the C conversion, SLB miss handlers now run with relocation on,
      and user SLB misses are able to take recursive kernel SLB misses, so
      the user SLB mapping data can be removed from the paca and accessed
      directly.
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      8fed04d0
    • Nicholas Piggin's avatar
      powerpc/64s/hash: convert SLB miss handlers to C · 5e46e29e
      Nicholas Piggin authored
      This patch moves SLB miss handlers completely to C, using the standard
      exception handler macros to set up the stack and branch to C.
      
      This can be done because the segment containing the kernel stack is
      always bolted, so accessing it with relocation on will not cause an
      SLB exception.
      
      Arbitrary kernel memory may not be accessed when handling kernel space
      SLB misses, so care should be taken there. However user SLB misses can
      access any kernel memory, which can be used to move some fields out of
      the paca (in later patches).
      
      User SLB misses could quite easily reconcile IRQs and set up a first
      class kernel environment and exit via ret_from_except, however that
      doesn't seem to be necessary at the moment, so we only do that if a
      bad fault is encountered.
      
      [ Credit to Aneesh for bug fixes, error checks, and improvements to bad
        address handling, etc ]
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      
      Since RFC:
      - Added MSR[RI] handling
      - Fixed up a register loss bug exposed by irq tracing (Aneesh)
      - Reject misses outside the defined kernel regions (Aneesh)
      - Added several more sanity checks and error handling (Aneesh), we may
        look at consolidating these tests and tightenig up the code but for
        a first pass we decided it's better to check carefully.
      
      Since v1:
      - Fixed SLB cache corruption (Aneesh)
      - Fixed untidy SLBE allocation "leak" in get_vsid error case
      - Now survives some stress testing on real hardware
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      5e46e29e
    • Nicholas Piggin's avatar
      powerpc/64s/hash: Use POWER9 SLBIA IH=3 variant in switch_slb · 82d8f4c2
      Nicholas Piggin authored
      POWER9 introduces SLBIA IH=3, which invalidates all SLB entries and
      associated lookaside information that have a class value of 1, which
      Linux assigns to user addresses. This matches what switch_slb wants,
      and allows a simple fast implementation that avoids the slb_cache
      complexity.
      
      As a side-effect, the POWER5 < DD2.1 SLB invalidation workaround is
      also avoided on POWER9.
      
      Process context switching rate is improved about 2.2% for a small
      process that hits the slb cache which is the best case for the current
      code.
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      82d8f4c2