1. 20 Jun, 2019 3 commits
    • Suraj Jitindar Singh's avatar
      KVM: PPC: Book3S HV: Clear pending decrementer exceptions on nested guest entry · 3c25ab35
      Suraj Jitindar Singh authored
      If we enter an L1 guest with a pending decrementer exception then this
      is cleared on guest exit if the guest has writtien a positive value
      into the decrementer (indicating that it handled the decrementer
      exception) since there is no other way to detect that the guest has
      handled the pending exception and that it should be dequeued. In the
      event that the L1 guest tries to run a nested (L2) guest immediately
      after this and the L2 guest decrementer is negative (which is loaded
      by L1 before making the H_ENTER_NESTED hcall), then the pending
      decrementer exception isn't cleared and the L2 entry is blocked since
      L1 has a pending exception, even though L1 may have already handled
      the exception and written a positive value for it's decrementer. This
      results in a loop of L1 trying to enter the L2 guest and L0 blocking
      the entry since L1 has an interrupt pending with the outcome being
      that L2 never gets to run and hangs.
      
      Fix this by clearing any pending decrementer exceptions when L1 makes
      the H_ENTER_NESTED hcall since it won't do this if it's decrementer
      has gone negative, and anyway it's decrementer has been communicated
      to L0 in the hdec_expires field and L0 will return control to L1 when
      this goes negative by delivering an H_DECREMENTER exception.
      
      Fixes: 95a6432c ("KVM: PPC: Book3S HV: Streamlined guest entry/exit path on P9 for radix guests")
      Cc: stable@vger.kernel.org # v4.20+
      Signed-off-by: default avatarSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      3c25ab35
    • Suraj Jitindar Singh's avatar
      KVM: PPC: Book3S HV: Signed extend decrementer value if not using large decrementer · 86953770
      Suraj Jitindar Singh authored
      On POWER9 the decrementer can operate in large decrementer mode where
      the decrementer is 56 bits and signed extended to 64 bits. When not
      operating in this mode the decrementer behaves as a 32 bit decrementer
      which is NOT signed extended (as on POWER8).
      
      Currently when reading a guest decrementer value we don't take into
      account whether the large decrementer is enabled or not, and this
      means the value will be incorrect when the guest is not using the
      large decrementer. Fix this by sign extending the value read when the
      guest isn't using the large decrementer.
      
      Fixes: 95a6432c ("KVM: PPC: Book3S HV: Streamlined guest entry/exit path on P9 for radix guests")
      Cc: stable@vger.kernel.org # v4.20+
      Signed-off-by: default avatarSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      86953770
    • Alexey Kardashevskiy's avatar
      powerpc/pci/of: Fix OF flags parsing for 64bit BARs · df5be5be
      Alexey Kardashevskiy authored
      When the firmware does PCI BAR resource allocation, it passes the assigned
      addresses and flags (prefetch/64bit/...) via the "reg" property of
      a PCI device device tree node so the kernel does not need to do
      resource allocation.
      
      The flags are stored in resource::flags - the lower byte stores
      PCI_BASE_ADDRESS_SPACE/etc bits and the other bytes are IORESOURCE_IO/etc.
      Some flags from PCI_BASE_ADDRESS_xxx and IORESOURCE_xxx are duplicated,
      such as PCI_BASE_ADDRESS_MEM_PREFETCH/PCI_BASE_ADDRESS_MEM_TYPE_64/etc.
      When parsing the "reg" property, we copy the prefetch flag but we skip
      on PCI_BASE_ADDRESS_MEM_TYPE_64 which leaves the flags out of sync.
      
      The missing IORESOURCE_MEM_64 flag comes into play under 2 conditions:
      1. we remove PCI_PROBE_ONLY for pseries (by hacking pSeries_setup_arch()
      or by passing "/chosen/linux,pci-probe-only");
      2. we request resource alignment (by passing pci=resource_alignment=
      via the kernel cmd line to request PAGE_SIZE alignment or defining
      ppc_md.pcibios_default_alignment which returns anything but 0). Note that
      the alignment requests are ignored if PCI_PROBE_ONLY is enabled.
      
      With 1) and 2), the generic PCI code in the kernel unconditionally
      decides to:
      - reassign the BARs in pci_specified_resource_alignment() (works fine)
      - write new BARs to the device - this fails for 64bit BARs as the generic
      code looks at IORESOURCE_MEM_64 (not set) and writes only lower 32bits
      of the BAR and leaves the upper 32bit unmodified which breaks BAR mapping
      in the hypervisor.
      
      This fixes the issue by copying the flag. This is useful if we want to
      enforce certain BAR alignment per platform as handling subpage sized BARs
      is proven to cause problems with hotplug (SLOF already aligns BARs to 64k).
      Signed-off-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: default avatarSam Bobroff <sbobroff@linux.ibm.com>
      Reviewed-by: default avatarOliver O'Halloran <oohall@gmail.com>
      Reviewed-by: default avatarShawn Anastasio <shawn@anastas.io>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      df5be5be
  2. 19 Jun, 2019 12 commits
    • Nicholas Piggin's avatar
      powerpc/64s/radix: Enable HAVE_ARCH_HUGE_VMAP · d909f910
      Nicholas Piggin authored
      This sets the HAVE_ARCH_HUGE_VMAP option, and defines the required
      page table functions.
      
      This enables huge (2MB and 1GB) ioremap mappings. I don't have a
      benchmark for this change, but huge vmap will be used by a later core
      kernel change to enable huge vmalloc memory mappings. This improves
      cached `git diff` performance by about 5% on a 2-node POWER9 with 32MB
      size dentry cache hash.
      
        Profiling git diff dTLB misses with a vanilla kernel:
      
        81.75%  git      [kernel.vmlinux]    [k] __d_lookup_rcu
         7.21%  git      [kernel.vmlinux]    [k] strncpy_from_user
         1.77%  git      [kernel.vmlinux]    [k] find_get_entry
         1.59%  git      [kernel.vmlinux]    [k] kmem_cache_free
      
                  40,168      dTLB-miss
             0.100342754 seconds time elapsed
      
        With powerpc huge vmalloc:
      
                   2,987      dTLB-miss
             0.095933138 seconds time elapsed
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      d909f910
    • Nicholas Piggin's avatar
      powerpc/64s/radix: ioremap use ioremap_page_range · d38153f9
      Nicholas Piggin authored
      Radix can use ioremap_page_range for ioremap, after slab is available.
      This makes it possible to enable huge ioremap mapping support.
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      d38153f9
    • Nicholas Piggin's avatar
      powerpc/64: __ioremap_at clean up in the error case · a72808a7
      Nicholas Piggin authored
      __ioremap_at error handling is wonky, it requires caller to clean up
      after it. Implement a helper that does the map and error cleanup and
      remove the requirement from the caller.
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      a72808a7
    • Anju T Sudhakar's avatar
      powerpc/perf: Use cpumask_last() to determine the designated cpu for nest/core units. · 9c9f8fb7
      Anju T Sudhakar authored
      Nest and core IMC (In-Memory Collection counters) assigns a particular
      cpu as the designated target for counter data collection. During
      system boot, the first online cpu in a chip gets assigned as the
      designated cpu for that chip(for nest-imc) and the first online cpu in
      a core gets assigned as the designated cpu for that core(for
      core-imc).
      
      If the designated cpu goes offline, the next online cpu from the same
      chip(for nest-imc)/core(for core-imc) is assigned as the next target,
      and the event context is migrated to the target cpu. Currently,
      cpumask_any_but() function is used to find the target cpu. Though this
      function is expected to return a `random` cpu, this always returns the
      next online cpu.
      
      If all cpus in a chip/core is offlined in a sequential manner,
      starting from the first cpu, the event migration has to happen for all
      the cpus which goes offline. Since the migration process involves a
      grace period, the total time taken to offline all the cpus will be
      significantly high.
      
      Example:
        In a system which has 2 sockets, with
        NUMA node0 CPU(s):     0-87
        NUMA node8 CPU(s):     88-175
      
        Time taken to offline cpu 88-175:
        real    2m56.099s
        user    0m0.191s
        sys     0m0.000s
      
      Use cpumask_last() to choose the target cpu, when the designated cpu
      goes online, so the migration will happen only when the last_cpu in
      the mask goes offline. This way the time taken to offline all cpus in
      a chip/core can be reduced.
      
      With the patch:
      
        Time taken  to offline cpu 88-175:
        real    0m12.207s
        user    0m0.171s
        sys     0m0.000s
      
      Offlining all cpus in reverse order is also taken care because,
      cpumask_any_but() is used to find the designated cpu if the last cpu
      in the mask goes offline. Since cpumask_any_but() always return the
      first cpu in the mask, that becomes the designated cpu and migration
      will happen only when the first_cpu in the mask goes offline.
      
      Example: With the patch,
      
        Time taken to offline cpu from 175-88:
        real    0m9.330s
        user    0m0.110s
        sys     0m0.000s
      Signed-off-by: default avatarAnju T Sudhakar <anju@linux.vnet.ibm.com>
      Reviewed-by: default avatarMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      9c9f8fb7
    • Shaokun Zhang's avatar
      powerpc/64s: Fix misleading SPR and timebase information · 87997471
      Shaokun Zhang authored
      pr_info shows SPR and timebase as a decimal value with a '0x'
      prefix, which is somewhat misleading.
      
      Fix it to print hexadecimal, as was intended.
      
      Fixes: 10d91611 ("powerpc/64s: Reimplement book3s idle code in C")
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarShaokun Zhang <zhangshaokun@hisilicon.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      87997471
    • Nathan Lynch's avatar
      powerpc/pseries: avoid blocking in irq when queuing hotplug events · 348ea30f
      Nathan Lynch authored
      A couple of bugs in queue_hotplug_event():
      
      1. Unchecked kmalloc result which could lead to an oops.
      2. Use of GFP_KERNEL allocations in interrupt context (this code's
         only caller is ras_hotplug_interrupt()).
      
      Use kmemdup to avoid open-coding the allocation+copy and check for
      failure; use GFP_ATOMIC for both allocations.
      
      Ultimately it probably would be better to avoid or reduce allocations
      in this path if possible.
      Signed-off-by: default avatarNathan Lynch <nathanl@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      348ea30f
    • Ravi Bangoria's avatar
      powerpc/watchpoint: Restore NV GPRs while returning from exception · f474c28f
      Ravi Bangoria authored
      powerpc hardware triggers watchpoint before executing the instruction.
      To make trigger-after-execute behavior, kernel emulates the
      instruction. If the instruction is 'load something into non-volatile
      register', exception handler should restore emulated register state
      while returning back, otherwise there will be register state
      corruption. eg, adding a watchpoint on a list can corrput the list:
      
        # cat /proc/kallsyms | grep kthread_create_list
        c00000000121c8b8 d kthread_create_list
      
      Add watchpoint on kthread_create_list->prev:
      
        # perf record -e mem:0xc00000000121c8c0
      
      Run some workload such that new kthread gets invoked. eg, I just
      logged out from console:
      
        list_add corruption. next->prev should be prev (c000000001214e00), \
      	but was c00000000121c8b8. (next=c00000000121c8b8).
        WARNING: CPU: 59 PID: 309 at lib/list_debug.c:25 __list_add_valid+0xb4/0xc0
        CPU: 59 PID: 309 Comm: kworker/59:0 Kdump: loaded Not tainted 5.1.0-rc7+ #69
        ...
        NIP __list_add_valid+0xb4/0xc0
        LR __list_add_valid+0xb0/0xc0
        Call Trace:
        __list_add_valid+0xb0/0xc0 (unreliable)
        __kthread_create_on_node+0xe0/0x260
        kthread_create_on_node+0x34/0x50
        create_worker+0xe8/0x260
        worker_thread+0x444/0x560
        kthread+0x160/0x1a0
        ret_from_kernel_thread+0x5c/0x70
      
      List corruption happened because it uses 'load into non-volatile
      register' instruction:
      
      Snippet from __kthread_create_on_node:
      
        c000000000136be8:     addis   r29,r2,-19
        c000000000136bec:     ld      r29,31424(r29)
              if (!__list_add_valid(new, prev, next))
        c000000000136bf0:     mr      r3,r30
        c000000000136bf4:     mr      r5,r28
        c000000000136bf8:     mr      r4,r29
        c000000000136bfc:     bl      c00000000059a2f8 <__list_add_valid+0x8>
      
      Register state from WARN_ON():
      
        GPR00: c00000000059a3a0 c000007ff23afb50 c000000001344e00 0000000000000075
        GPR04: 0000000000000000 0000000000000000 0000001852af8bc1 0000000000000000
        GPR08: 0000000000000001 0000000000000007 0000000000000006 00000000000004aa
        GPR12: 0000000000000000 c000007ffffeb080 c000000000137038 c000005ff62aaa00
        GPR16: 0000000000000000 0000000000000000 c000007fffbe7600 c000007fffbe7370
        GPR20: c000007fffbe7320 c000007fffbe7300 c000000001373a00 0000000000000000
        GPR24: fffffffffffffef7 c00000000012e320 c000007ff23afcb0 c000000000cb8628
        GPR28: c00000000121c8b8 c000000001214e00 c000007fef5b17e8 c000007fef5b17c0
      
      Watchpoint hit at 0xc000000000136bec.
      
        addis   r29,r2,-19
         => r29 = 0xc000000001344e00 + (-19 << 16)
         => r29 = 0xc000000001214e00
      
        ld      r29,31424(r29)
         => r29 = *(0xc000000001214e00 + 31424)
         => r29 = *(0xc00000000121c8c0)
      
      0xc00000000121c8c0 is where we placed a watchpoint and thus this
      instruction was emulated by emulate_step. But because handle_dabr_fault
      did not restore emulated register state, r29 still contains stale
      value in above register state.
      
      Fixes: 5aae8a53 ("powerpc, hw_breakpoints: Implement hw_breakpoints for 64-bit server processors")
      Signed-off-by: default avatarRavi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: stable@vger.kernel.org # 2.6.36+
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      f474c28f
    • Greg Kroah-Hartman's avatar
      cxl: no need to check return value of debugfs_create functions · 1b7de1df
      Greg Kroah-Hartman authored
      When calling debugfs functions, there is no need to ever check the
      return value.  The function can work or not, but the code logic should
      never do something different based on this.
      
      Because there's no need to check, also make the return value of the
      local debugfs_create_io_x64() call void, as no one ever did anything
      with the return value (as they did not need to.)
      
      And make the cxl_debugfs_* calls return void as no one was even checking
      their return value at all.
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Reviewed-by: default avatarArnd Bergmann <arnd@arndb.de>
      Reviewed-by: default avatarFrederic Barrat <fbarrat@linux.ibm.com>
      Acked-by: default avatarAndrew Donnellan <ajd@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      1b7de1df
    • Geert Uytterhoeven's avatar
      powerpc/ps3: Use [] to denote a flexible array member · 0b1be03f
      Geert Uytterhoeven authored
      Flexible array members should be denoted using [] instead of [0], else
      gcc will not warn when they are no longer at the end of the structure.
      Signed-off-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      0b1be03f
    • Andreas Schwab's avatar
      powerpc/mm/32s: fix condition that is always true · 46c2478a
      Andreas Schwab authored
      Move a misplaced paren that makes the condition always true.
      
      Fixes: 63b2bc61 ("powerpc/mm/32s: Use BATs for STRICT_KERNEL_RWX")
      Cc: stable@vger.kernel.org # v5.1+
      Signed-off-by: default avatarAndreas Schwab <schwab@linux-m68k.org>
      Reviewed-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      46c2478a
    • Christophe Leroy's avatar
      powerpc/32s: fix suspend/resume when IBATs 4-7 are used · 6ecb78ef
      Christophe Leroy authored
      Previously, only IBAT1 and IBAT2 were used to map kernel linear mem.
      Since commit 63b2bc61 ("powerpc/mm/32s: Use BATs for
      STRICT_KERNEL_RWX"), we may have all 8 BATs used for mapping
      kernel text. But the suspend/restore functions only save/restore
      BATs 0 to 3, and clears BATs 4 to 7.
      
      Make suspend and restore functions respectively save and reload
      the 8 BATs on CPUs having MMU_FTR_USE_HIGH_BATS feature.
      Reported-by: default avatarAndreas Schwab <schwab@linux-m68k.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      6ecb78ef
    • Gustavo Romero's avatar
      selftests/powerpc: Fix earlyclobber in tm-vmxcopy · 8d0f1e05
      Gustavo Romero authored
      In some cases, compiler can allocate the same register for operand 'res'
      and 'vecoutptr', resulting in segfault at 'stxvd2x 40,0,%[vecoutptr]'
      because base register will contain 1, yielding a false-positive.
      
      This is because output 'res' must be marked as an earlyclobber operand so
      it may not overlap an input operand ('vecoutptr').
      Signed-off-by: default avatarGustavo Romero <gromero@linux.vnet.ibm.com>
      Reviewed-by: default avatarSegher Boessenkool <segher@kernel.crashing.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      8d0f1e05
  3. 15 Jun, 2019 4 commits
  4. 14 Jun, 2019 3 commits
    • Nathan Lynch's avatar
      powerpc/pseries: Fix oops in hotplug memory notifier · 0aa82c48
      Nathan Lynch authored
      During post-migration device tree updates, we can oops in
      pseries_update_drconf_memory() if the source device tree has an
      ibm,dynamic-memory-v2 property and the destination has a
      ibm,dynamic_memory (v1) property. The notifier processes an "update"
      for the ibm,dynamic-memory property but it's really an add in this
      scenario. So make sure the old property object is there before
      dereferencing it.
      
      Fixes: 2b31e3ae ("powerpc/drmem: Add support for ibm, dynamic-memory-v2 property")
      Cc: stable@vger.kernel.org # v4.16+
      Signed-off-by: default avatarNathan Lynch <nathanl@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      0aa82c48
    • Daniel Axtens's avatar
      powerpc/pseries/hvconsole: Fix stack overread via udbg · 934bda59
      Daniel Axtens authored
      While developing KASAN for 64-bit book3s, I hit the following stack
      over-read.
      
      It occurs because the hypercall to put characters onto the terminal
      takes 2 longs (128 bits/16 bytes) of characters at a time, and so
      hvc_put_chars() would unconditionally copy 16 bytes from the argument
      buffer, regardless of supplied length. However, udbg_hvc_putc() can
      call hvc_put_chars() with a single-byte buffer, leading to the error.
      
        ==================================================================
        BUG: KASAN: stack-out-of-bounds in hvc_put_chars+0xdc/0x110
        Read of size 8 at addr c0000000023e7a90 by task swapper/0
      
        CPU: 0 PID: 0 Comm: swapper Not tainted 5.2.0-rc2-next-20190528-02824-g048a6ab4835b #113
        Call Trace:
          dump_stack+0x104/0x154 (unreliable)
          print_address_description+0xa0/0x30c
          __kasan_report+0x20c/0x224
          kasan_report+0x18/0x30
          __asan_report_load8_noabort+0x24/0x40
          hvc_put_chars+0xdc/0x110
          hvterm_raw_put_chars+0x9c/0x110
          udbg_hvc_putc+0x154/0x200
          udbg_write+0xf0/0x240
          console_unlock+0x868/0xd30
          register_console+0x970/0xe90
          register_early_udbg_console+0xf8/0x114
          setup_arch+0x108/0x790
          start_kernel+0x104/0x784
          start_here_common+0x1c/0x534
      
        Memory state around the buggy address:
         c0000000023e7980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
         c0000000023e7a00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1
        >c0000000023e7a80: f1 f1 01 f2 f2 f2 00 00 00 00 00 00 00 00 00 00
                                 ^
         c0000000023e7b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
         c0000000023e7b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
        ==================================================================
      
      Document that a 16-byte buffer is requred, and provide it in udbg.
      Signed-off-by: default avatarDaniel Axtens <dja@axtens.net>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      934bda59
    • Masahiro Yamada's avatar
      ocxl: do not use C++ style comments in uapi header · 2305ff22
      Masahiro Yamada authored
      Linux kernel tolerates C++ style comments these days. Actually, the
      SPDX License tags for .c files start with //.
      
      On the other hand, uapi headers are written in more strict C, where
      the C++ comment style is forbidden.
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Acked-by: default avatarFrederic Barrat <fbarrat@linux.ibm.com>
      Acked-by: default avatarAndrew Donnellan <ajd@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      2305ff22
  5. 02 Jun, 2019 6 commits
    • Greg Kurz's avatar
      powerpc/pseries: Fix xive=off command line · a3bf9fbd
      Greg Kurz authored
      On POWER9, if the hypervisor supports XIVE exploitation mode, the
      guest OS will unconditionally requests for the XIVE interrupt mode
      even if XIVE was deactivated with the kernel command line xive=off.
      Later on, when the spapr XIVE init code handles xive=off, it disables
      XIVE and tries to fall back on the legacy mode XICS.
      
      This discrepency causes a kernel panic because the hypervisor is
      configured to provide the XIVE interrupt mode to the guest :
      
        kernel BUG at arch/powerpc/sysdev/xics/xics-common.c:135!
        ...
        NIP xics_smp_probe+0x38/0x98
        LR  xics_smp_probe+0x2c/0x98
        Call Trace:
          xics_smp_probe+0x2c/0x98 (unreliable)
          pSeries_smp_probe+0x40/0xa0
          smp_prepare_cpus+0x62c/0x6ec
          kernel_init_freeable+0x148/0x448
          kernel_init+0x2c/0x148
          ret_from_kernel_thread+0x5c/0x68
      
      Look for xive=off during prom_init and don't ask for XIVE in this
      case. One exception though: if the host only supports XIVE, we still
      want to boot so we ignore xive=off.
      
      Similarly, have the spapr XIVE init code to looking at the interrupt
      mode negotiated during CAS, and ignore xive=off if the hypervisor only
      supports XIVE.
      
      Fixes: eac1e731 ("powerpc/xive: guest exploitation of the XIVE interrupt controller")
      Cc: stable@vger.kernel.org # v4.20
      Reported-by: default avatarPavithra R. Prakash <pavrampu@in.ibm.com>
      Signed-off-by: default avatarGreg Kurz <groug@kaod.org>
      Reviewed-by: default avatarCédric Le Goater <clg@kaod.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      a3bf9fbd
    • Greg Kurz's avatar
      powerpc/powernv/npu: Fix reference leak · 02c5f539
      Greg Kurz authored
      Since 902bdc57, get_pci_dev() calls pci_get_domain_bus_and_slot(). This
      has the effect of incrementing the reference count of the PCI device, as
      explained in drivers/pci/search.c:
      
       * Given a PCI domain, bus, and slot/function number, the desired PCI
       * device is located in the list of PCI devices. If the device is
       * found, its reference count is increased and this function returns a
       * pointer to its data structure.  The caller must decrement the
       * reference count by calling pci_dev_put().  If no device is found,
       * %NULL is returned.
      
      Nothing was done to call pci_dev_put() and the reference count of GPU and
      NPU PCI devices rockets up.
      
      A natural way to fix this would be to teach the callers about the change,
      so that they call pci_dev_put() when done with the pointer. This turns
      out to be quite intrusive, as it affects many paths in npu-dma.c,
      pci-ioda.c and vfio_pci_nvlink2.c. Also, the issue appeared in 4.16 and
      some affected code got moved around since then: it would be problematic
      to backport the fix to stable releases.
      
      All that code never cared for reference counting anyway. Call pci_dev_put()
      from get_pci_dev() to revert to the previous behavior.
      
      Fixes: 902bdc57 ("powerpc/powernv/idoa: Remove unnecessary pcidev from pci_dn")
      Cc: stable@vger.kernel.org # v4.16
      Signed-off-by: default avatarGreg Kurz <groug@kaod.org>
      Reviewed-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      02c5f539
    • Mathieu Malaterre's avatar
      powerpc: Remove variable ‘path’ since not used · c806a6fd
      Mathieu Malaterre authored
      In commit eab00a20 ("powerpc: Move `path` variable inside
      DEBUG_PROM") DEBUG_PROM sentinels were added to silence a warning
      (treated as error with W=1):
      
        arch/powerpc/kernel/prom_init.c:1388:8: error: variable ‘path’ set but not used [-Werror=unused-but-set-variable]
      
      Rework the original patch and simplify the code, by removing the
      variable ‘path’ completely. Fix line over 90 characters.
      Suggested-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarMathieu Malaterre <malat@debian.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      c806a6fd
    • Frederic Barrat's avatar
      powerpc/powernv: Show checkstop reason for NPU2 HMIs · 89d87bcb
      Frederic Barrat authored
      If the kernel is notified of an HMI caused by the NPU2, it's currently
      not being recognized and it logs the default message:
      
          Unknown Malfunction Alert of type 3
      
      The NPU on Power 9 has 3 Fault Isolation Registers, so that's a lot of
      possible causes, but we should at least log that it's an NPU problem
      and report which FIR and which bit were raised if opal gave us the
      information.
      Signed-off-by: default avatarFrederic Barrat <fbarrat@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      89d87bcb
    • Stewart Smith's avatar
      powerpc/powernv: Update firmware archaeology around OPAL_HANDLE_HMI · 1549c42d
      Stewart Smith authored
      The first machines to ship with OPAL firmware all got firmware updates
      that have the new call, but just in case someone is foolish enough to
      believe the first 4 months of firmware is the best, we keep this code
      around.
      
      Comment is updated to not refer to late 2014 as recent or the future.
      Signed-off-by: default avatarStewart Smith <stewart@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      1549c42d
    • Gen Zhang's avatar
      powerpc/pseries/dlpar: Fix a missing check in dlpar_parse_cc_property() · efa9ace6
      Gen Zhang authored
      In dlpar_parse_cc_property(), 'prop->name' is allocated by kstrdup().
      kstrdup() may return NULL, so it should be checked and handle error.
      And prop should be freed if 'prop->name' is NULL.
      Signed-off-by: default avatarGen Zhang <blackgod016574@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      efa9ace6
  6. 28 May, 2019 4 commits
  7. 26 May, 2019 6 commits
    • Linus Torvalds's avatar
      Linux 5.2-rc2 · cd6c84d8
      Linus Torvalds authored
      cd6c84d8
    • Linus Torvalds's avatar
      Merge tag 'trace-v5.2-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · c5b44095
      Linus Torvalds authored
      Pull tracing warning fix from Steven Rostedt:
       "Make the GCC 9 warning for sub struct memset go away.
      
        GCC 9 now warns about calling memset() on partial structures when it
        goes across multiple fields. This adds a helper for the place in
        tracing that does this type of clearing of a structure"
      
      * tag 'trace-v5.2-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        tracing: Silence GCC 9 array bounds warning
      c5b44095
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 862f0a32
      Linus Torvalds authored
      Pull KVM fixes from Paolo Bonzini:
       "The usual smattering of fixes and tunings that came in too late for
        the merge window, but should not wait four months before they appear
        in a release.
      
        I also travelled a bit more than usual in the first part of May, which
        didn't help with picking up patches and reports promptly"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (33 commits)
        KVM: x86: fix return value for reserved EFER
        tools/kvm_stat: fix fields filter for child events
        KVM: selftests: Wrap vcpu_nested_state_get/set functions with x86 guard
        kvm: selftests: aarch64: compile with warnings on
        kvm: selftests: aarch64: fix default vm mode
        kvm: selftests: aarch64: dirty_log_test: fix unaligned memslot size
        KVM: s390: fix memory slot handling for KVM_SET_USER_MEMORY_REGION
        KVM: x86/pmu: do not mask the value that is written to fixed PMUs
        KVM: x86/pmu: mask the result of rdpmc according to the width of the counters
        x86/kvm/pmu: Set AMD's virt PMU version to 1
        KVM: x86: do not spam dmesg with VMCS/VMCB dumps
        kvm: Check irqchip mode before assign irqfd
        kvm: svm/avic: fix off-by-one in checking host APIC ID
        KVM: selftests: do not blindly clobber registers in guest asm
        KVM: selftests: Remove duplicated TEST_ASSERT in hyperv_cpuid.c
        KVM: LAPIC: Expose per-vCPU timer_advance_ns to userspace
        KVM: LAPIC: Fix lapic_timer_advance_ns parameter overflow
        kvm: vmx: Fix -Wmissing-prototypes warnings
        KVM: nVMX: Fix using __this_cpu_read() in preemptible context
        kvm: fix compilation on s390
        ...
      862f0a32
    • Linus Torvalds's avatar
      Merge tag 'random_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random · 128f2bfa
      Linus Torvalds authored
      Pull /dev/random fix from Ted Ts'o:
       "Fix a soft lockup regression when reading from /dev/random in early
        boot"
      
      * tag 'random_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
        random: fix soft lockup when trying to read from an uninitialized blocking pool
      128f2bfa
    • Theodore Ts'o's avatar
      random: fix soft lockup when trying to read from an uninitialized blocking pool · 58be0106
      Theodore Ts'o authored
      Fixes: eb9d1bf0: "random: only read from /dev/random after its pool has received 128 bits"
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      58be0106
    • Miguel Ojeda's avatar
      tracing: Silence GCC 9 array bounds warning · 0c97bf86
      Miguel Ojeda authored
      Starting with GCC 9, -Warray-bounds detects cases when memset is called
      starting on a member of a struct but the size to be cleared ends up
      writing over further members.
      
      Such a call happens in the trace code to clear, at once, all members
      after and including `seq` on struct trace_iterator:
      
          In function 'memset',
              inlined from 'ftrace_dump' at kernel/trace/trace.c:8914:3:
          ./include/linux/string.h:344:9: warning: '__builtin_memset' offset
          [8505, 8560] from the object at 'iter' is out of the bounds of
          referenced subobject 'seq' with type 'struct trace_seq' at offset
          4368 [-Warray-bounds]
            344 |  return __builtin_memset(p, c, size);
                |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      In order to avoid GCC complaining about it, we compute the address
      ourselves by adding the offsetof distance instead of referring
      directly to the member.
      
      Since there are two places doing this clear (trace.c and trace_kdb.c),
      take the chance to move the workaround into a single place in
      the internal header.
      
      Link: http://lkml.kernel.org/r/20190523124535.GA12931@gmail.comSigned-off-by: default avatarMiguel Ojeda <miguel.ojeda.sandonis@gmail.com>
      [ Removed unnecessary parenthesis around "iter" ]
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      0c97bf86
  8. 25 May, 2019 2 commits