1. 21 Jun, 2016 24 commits
    • Gavin Shan's avatar
      PCI/hotplug: PowerPC PowerNV PCI hotplug driver · 66725152
      Gavin Shan authored
      This adds standalone driver to support PCI hotplug for PowerPC PowerNV
      platform that runs on top of skiboot firmware. The firmware identifies
      hotpluggable slots and marked their device tree node with proper
      "ibm,slot-pluggable" and "ibm,reset-by-firmware". The driver scans
      device tree nodes to create/register PCI hotplug slot accordingly.
      
      The PCI slots are organized in fashion of tree, which means one
      PCI slot might have parent PCI slot and parent PCI slot possibly
      contains multiple child PCI slots. At the plugging time, the parent
      PCI slot is populated before its children. The child PCI slots are
      removed before their parent PCI slot can be removed from the system.
      
      If the skiboot firmware doesn't support slot status retrieval, the PCI
      slot device node shouldn't have property "ibm,reset-by-firmware". In
      that case, none of valid PCI slots will be detected from device tree.
      The skiboot firmware doesn't export the capability to access attention
      LEDs yet and it's something for TBD.
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Acked-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      66725152
    • Gavin Shan's avatar
      powerpc/powernv: Functions to get/set PCI slot state · ea0d856c
      Gavin Shan authored
      This exports 4 functions, which base on the corresponding OPAL
      APIs to get/set PCI slot status. Those functions are going to
      be used by PowerNV PCI hotplug driver:
      
         pnv_pci_get_device_tree()    opal_get_device_tree()
         pnv_pci_get_presence_state() opal_pci_get_presence_state()
         pnv_pci_get_power_state()    opal_pci_get_power_state()
         pnv_pci_set_power_state()    opal_pci_set_power_state()
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Reviewed-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      ea0d856c
    • Gavin Shan's avatar
      powerpc/powernv: Introduce pnv_pci_get_slot_id() · 7e19bf32
      Gavin Shan authored
      This introduces pnv_pci_get_slot_id() to get the hotpluggable PCI
      slot ID from the corresponding device node. It will be used by
      hotplug driver.
      Requested-by: default avatarAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      7e19bf32
    • Gavin Shan's avatar
      powerpc/powernv: Use PCI slot reset infrastructure · 9c0e1ecb
      Gavin Shan authored
      The (OPAL) firmware might provide the PCI slot reset capability
      which is identified by property "ibm,reset-by-firmware" on the
      PCI slot associated device node.
      
      This routes the reset request to firmware if "ibm,reset-by-firmware"
      exists in the PCI slot device node. Otherwise, the reset is done
      inside kernel as before.
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      9c0e1ecb
    • Gavin Shan's avatar
      powerpc/powernv: Support PCI slot ID · ebe22531
      Gavin Shan authored
      The reset and poll functionality from (OPAL) firmware supports
      PHB and PCI slot at same time. They are identified by ID. This
      supports PCI slot ID by:
      
         * Rename the argument name for opal_pci_reset() and opal_pci_poll()
           accordingly
         * Rename pnv_eeh_phb_poll() to pnv_eeh_poll() and adjust its argument
           name.
         * One macro is added to produce PCI slot ID.
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      ebe22531
    • Gavin Shan's avatar
      powerpc/pci: Delay populating pdn · 8cc7581c
      Gavin Shan authored
      The pdn (struct pci_dn) instances are allocated from memblock or
      bootmem when creating PCI controller (hoses) in setup_arch(). PCI
      hotplug, which will be supported by proceeding patches, releases
      PCI device nodes and their corresponding pdn on unplugging event.
      The memory chunks for pdn instances allocated from memblock or
      bootmem are hard to reused after being released.
      
      This delays creating pdn by pci_devs_phb_init() from setup_arch()
      to core_initcall() so that they are allocated from slab. The memory
      consumed by pdn can be released to system without problem during
      PCI unplugging time. It indicates that pci_dn is unavailable in
      setup_arch() and the the fixup on pdn (like AGP's) can't be carried
      out that time. We have to do that in pcibios_root_bridge_prepare()
      on maple/pasemi/powermac platforms where/when the pdn is available.
      pcibios_root_bridge_prepare is called from subsys_initcall() which
      is executed after core_initcall() so the code flow does not change.
      
      At the mean while, the EEH device is created when pdn is populated,
      meaning pdn and EEH device have same life cycle. In turn, we needn't
      call eeh_dev_init() to create EEH device explicitly.
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Reviewed-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      8cc7581c
    • Gavin Shan's avatar
      powerpc/pci: Update bridge windows on PCI plug · 7415c14c
      Gavin Shan authored
      On the PCI plugging event, PCI slot's subordinate devices are
      scanned and their (IO and MMIO) resources are assigned. Platform
      dependent resources (PE#, IO/MMIO/DMA windows) are allocated or
      created on updating windows of the slot's upstream bridge.
      
      This updates the windows of the hot plugged slot's upstream bridge
      in pcibios_finish_adding_to_bus() so that the platform resources
      (PE#, IO/MMIO/DMA segments) are allocated or created accordingly.
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Reviewed-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      7415c14c
    • Gavin Shan's avatar
      powerpc/powernv: Dynamically release PE · c5f7700b
      Gavin Shan authored
      This supports releasing PEs dynamically. A reference count is
      introduced to PE representing number of PCI devices associated
      with the PE. The reference count is increased when PCI device
      joins the PE and decreased when PCI device leaves the PE in
      pnv_pci_release_device(). When the count becomes zero, the PE
      and its consumed resources are released. Note that the count
      is accessed concurrently. So a counter with "int" type is enough
      here.
      
      In order to release the sources consumed by the PE, couple of
      helper functions are introduced as below:
      
         * pnv_pci_ioda1_unset_window() - Unset IODA1 DMA32 window
         * pnv_pci_ioda1_release_dma_pe() - Release IODA1 DMA32 segments
         * pnv_pci_ioda2_release_dma_pe() - Release IODA2 DMA resource
         * pnv_ioda_release_pe_seg() - Unmap IO/M32/M64 segments
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      c5f7700b
    • Gavin Shan's avatar
      powerpc/powernv: Make pnv_ioda_deconfigure_pe() visible · 93e01a50
      Gavin Shan authored
      pnv_ioda_deconfigure_pe() is visible only when CONFIG_PCI_IOV is
      enabled. The function will be used to tear down PE's associated
      mapping in PCI hotplug path that doesn't depend on CONFIG_PCI_IOV.
      
      This makes pnv_ioda_deconfigure_pe() visible and not depend on
      CONFIG_PCI_IOV.
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      93e01a50
    • Gavin Shan's avatar
      powerpc/powernv: Extend PCI bridge resources · 40e2a47e
      Gavin Shan authored
      The PCI slots are associated with root port or downstream ports
      of the PCIe switch connected to root port. When adapter is hot
      added to the PCI slot, it usually requests more IO or memory
      resource from the directly connected parent bridge (port) and
      update the bridge's windows accordingly. The resource windows
      of upstream bridges can't be updated automatically. It possibly
      leads to unbalanced resource across the bridges: The window of
      downstream bridge is overruning that of upstream bridge. The
      IO or MMIO path won't work.
      
      This resolves the above issue by extending bridge windows of
      root port and upstream port of the PCIe switch connected to
      the root port to PHB's windows.
      
      The windows of root port and bridge behind that are extended to
      the PHB's windows to accomodate the PCI hotplug happening in
      future. The PHB's 64KB 32-bits MSI region is included in bridge's
      M32 windows (in hardware) though it's excluded in the corresponding
      resource, as the bridge's M32 windows have 1MB as their minimal
      alignment. We observed EEH error during system boot when the MSI
      region is included in bridge's M32 window.
      
      This excludes top 1MB (including 64KB 32-bits MSI region) region
      from bridge's M32 windows when extending them.
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      40e2a47e
    • Gavin Shan's avatar
      powerpc/powernv: Setup PE for root bus · 63803c39
      Gavin Shan authored
      There is no parent bridge for root bus, meaning pcibios_setup_bridge()
      isn't invoked for root bus. The PE for root bus is the ancestor of
      other PEs in PELTV. It means we need PE for root bus populated before
      all others.
      
      This populates the PE for root bus in pcibios_setup_bridge() path
      if it's not populated yet. The PE number next to the reserved one
      is used as the PE# to avoid holes in continuous M64 space.
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      63803c39
    • Gavin Shan's avatar
      powerpc/powernv: Create PEs in pcibios_setup_bridge() · ccd1c191
      Gavin Shan authored
      Currently, the PEs and their associated resources are assigned in
      ppc_md.pcibios_fixup() except those used by SRIOV VFs. The function
      is called for once after PCI probing and resources assignment is
      completed. So it's obviously not hotplug friendly.
      
      This creates PEs dynamically in pcibios_setup_bridge() that is
      called for the event during system bootup and PCI hotplug: updating
      PCI bridge's windows after resource assignment/reassignment are done.
      In partial hotplug case, not all PCI devices included to one particular
      PE are unplugged and plugged again, we just need unbinding/binding the
      hot added PCI devices with the corresponding PE without creating new
      one. The change is applied to IODA1 and IODA2 PHBs only. The behaviour
      on NPU PHBs aren't changed. There are no PCI bridges on NPU PHBs,
      meaning pcibios_setup_bridge() won't be invoked there. We have to use
      old path (pnv_pci_ioda_fixup()) to setup PEs on NPU PHBs.
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      ccd1c191
    • Gavin Shan's avatar
      powerpc/powernv: Allocate PE# in reverse order · 9fcd6f4a
      Gavin Shan authored
      PE number for one particular PE can be allocated dynamically or
      reserved according to the consumed M64 (64-bits prefetchable)
      segments of the PE. The M64 segment can't be remapped to arbitrary
      PE, meaning the PE number is determined according to the index
      of the consumed M64 segment. As below figure shows, M64 resource
      grows from low to high end, meaning the PE (number) reserved
      according to M64 segment grows from low to high end as well,
      so does the dynamically allocated PE number. It will lead to
      conflict: PE number (M64 segment) reserved by dynamic allocation
      is required by hot added PCI adapter at later point. It fails
      the PCI hotplug because of the PE number can't be reserved
      based on the index of the consumed M64 segment.
      
        +---+---+---+---+---+--------------------------------+-----+
        | 0 | 1 | 2 | 3 | 4 |      .......                   | 255 |
        +---+---+---+---+---+--------------------------------+-----+
      
        PE number for dynamic allocation          ----------------->
        PE number reserved for M64 segment        ----------------->
      
      To resolve above conflicts, this forces the PE number to be
      allocated dynamically in reverse order. With this patch applied,
      the PE numbers are reserved in ascending order, but allocated
      dynamically in reverse order.
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      9fcd6f4a
    • Gavin Shan's avatar
      powerpc/powernv: Increase PE# capacity · c127562a
      Gavin Shan authored
      Each PHB maintains an array helping to translate 2-bytes Request
      ID (RID) to PE# with the assumption that PE# takes one byte, meaning
      that we can't have more than 256 PEs. However, pci_dn->pe_number
      already had 4-bytes for the PE#.
      
      This extends the PE# capacity for every PHB. After that, the PE number
      is represented by 4-bytes value. Then we can reuse IODA_INVALID_PE to
      check the PE# in phb->pe_rmap[] is valid or not.
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Reviewed-by: default avatarDaniel Axtens <dja@axtens.net>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      c127562a
    • Gavin Shan's avatar
      powerpc/powernv: Move pnv_pci_ioda_setup_opal_tce_kill() around · 577c8c88
      Gavin Shan authored
      pnv_pci_ioda_setup_opal_tce_kill() called by pnv_ioda_setup_dma()
      to remap the TCE kill regiter. What's done in pnv_ioda_setup_dma()
      will be covered in pcibios_setup_bridge() which is invoked on each
      PCI bridge. It means we will possibly remap the TCE kill register
      for multiple times and it's unnecessary.
      
      This moves pnv_pci_ioda_setup_opal_tce_kill() to where the PHB is
      initialized (pnv_pci_init_ioda_phb()) to avoid above issue.
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Reviewed-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      577c8c88
    • Gavin Shan's avatar
      powerpc/powernv: Remove PCI_RESET_DELAY_US · e368e4ca
      Gavin Shan authored
      The macro defined in arch/powerpc/platforms/powernv/pci.c isn't
      used by anyone. Just remove it.
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Reviewed-by: default avatarAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      e368e4ca
    • Gavin Shan's avatar
      powerpc/pci: Override pcibios_setup_bridge() · c5fcb29a
      Gavin Shan authored
      This overrides pcibios_setup_bridge() that is called to update PCI
      bridge windows when PCI resource assignment is completed, to assign
      PE and setup various (resource) mapping for the PE in subsequent
      patches.
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Reviewed-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      c5fcb29a
    • Gavin Shan's avatar
      PCI: Add pcibios_setup_bridge() · d366d28c
      Gavin Shan authored
      Currently, PowerPC PowerNV platform utilizes ppc_md.pcibios_fixup(),
      which is called for once after PCI probing and resource assignment
      are completed, to allocate platform required resources for PCI devices:
      PE#, IO and MMIO mapping, DMA address translation (TCE) table etc.
      Obviously, it's not hotplug friendly.
      
      This adds weak function pcibios_setup_bridge(), which is called by
      pci_setup_bridge(). PowerPC PowerNV platform will reuse the function
      to assign above platform required resources to newly plugged PCI devices
      during PCI hotplug in subsequent patches.
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Acked-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      d366d28c
    • Mauricio Faria de Oliveira's avatar
      powerpc: export cpu_to_core_id() · f8ab4810
      Mauricio Faria de Oliveira authored
      Export cpu_to_core_id(). This will be used by the lpfc driver.
      
      This enables topology_core_id() from <linux/topology.h> (defined
      to cpu_to_core_id() in arch/powerpc/include/asm/topology.h) to be
      used by (non-builtin) modules.
      
      That is arch-neutral, already used by eg, drivers/base/topology.c,
      but it is builtin (obj-y in Makefile) thus didn't need the export.
      
      Since the module uses topology_core_id() and this is defined to
      cpu_to_core_id(), it needs the export, otherwise:
      
          ERROR: "cpu_to_core_id" [drivers/scsi/lpfc/lpfc.ko] undefined!
      
      Tested on next-20160601.
      Signed-off-by: default avatarMauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      f8ab4810
    • Jack Miller's avatar
      selftests/powerpc: Load Monitor Register Tests · 16c19a2e
      Jack Miller authored
      Adds two tests. One is a simple test to ensure that the new registers
      LMRR and LMSER are properly maintained. The other actually uses the
      existing EBB test infrastructure to test that LMRR and LMSER behave as
      documented.
      Signed-off-by: default avatarJack Miller <jack@codezen.org>
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      16c19a2e
    • Jack Miller's avatar
      powerpc: Load Monitor Register Support · bd3ea317
      Jack Miller authored
      This enables new registers, LMRR and LMSER, that can trigger an EBB in
      userspace code when a monitored load (via the new ldmx instruction)
      loads memory from a monitored space. This facility is controlled by a
      new FSCR bit, LM.
      
      This patch disables the FSCR LM control bit on task init and enables
      that bit when a load monitor facility unavailable exception is taken
      for using it. On context switch, this bit is then used to determine
      whether the two relevant registers are saved and restored. This is
      done lazily for performance reasons.
      Signed-off-by: default avatarJack Miller <jack@codezen.org>
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      bd3ea317
    • Michael Neuling's avatar
      powerpc: Improve FSCR init and context switching · b57bd2de
      Michael Neuling authored
      This fixes a few issues with FSCR init and switching.
      
      In commit 152d523e ("powerpc: Create context switch helpers
      save_sprs() and restore_sprs()") we moved the setting of the FSCR
      register from inside an CPU_FTR_ARCH_207S section to inside just a
      CPU_FTR_ARCH_DSCR section. Hence we are setting FSCR on POWER6/7 where
      the FSCR doesn't exist. This is harmless but we shouldn't do it.
      
      Also, we can simplify the FSCR context switch. We don't need to go
      through the calculation involving dscr_inherit. We can just restore
      what we saved last time.
      
      We also set an initial value in INIT_THREAD, so that pid 1 which is
      cloned from that gets a sane value.
      
      Based on patch by Jack Miller.
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      b57bd2de
    • Madhavan Srinivasan's avatar
      powerpc: Fix misleading comment in early_setup_secondary() · 103b7827
      Madhavan Srinivasan authored
      Current comment in the early_setup_secondary() for paca->soft_enabled
      update is misleading. Comment should say to Mark interrupts "disabled"
      instead of "enabled". Fix the typo.
      Signed-off-by: default avatarMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      103b7827
    • Thiago Jung Bauermann's avatar
      powerpc/kprobes: Remove kretprobe_trampoline_holder. · 61ed9cfb
      Thiago Jung Bauermann authored
      Fixes the following testsuite failure:
      
        $ sudo ./perf test -v kallsyms
         1: vmlinux symtab matches kallsyms                          :
        --- start ---
        test child forked, pid 12489
        Using /proc/kcore for kernel object code
        Looking at the vmlinux_path (8 entries long)
        Using /boot/vmlinux for symbols
        0xc00000000003d300: diff name v: .kretprobe_trampoline_holder k: kretprobe_trampoline
        Maps only in vmlinux:
         c00000000086ca38-c000000000879b6c 87ca38 [kernel].text.unlikely
         c000000000879b6c-c000000000bf0000 889b6c [kernel].meminit.text
         c000000000bf0000-c000000000c53264 c00000 [kernel].init.text
         c000000000c53264-d000000004250000 c63264 [kernel].exit.text
         d000000004250000-d000000004450000 0 [libcrc32c]
         d000000004450000-d000000004620000 0 [xfs]
         d000000004620000-d000000004680000 0 [autofs4]
         d000000004680000-d0000000046e0000 0 [x_tables]
         d0000000046e0000-d000000004780000 0 [ip_tables]
         d000000004780000-d0000000047e0000 0 [rng_core]
         d0000000047e0000-ffffffffffffffff 0 [pseries_rng]
        Maps in vmlinux with a different name in kallsyms:
        Maps only in kallsyms:
         d000000000000000-f000000000000000 1000000000010000 [kernel.kallsyms]
         f000000000000000-ffffffffffffffff 3000000000010000 [kernel.kallsyms]
        test child finished with -1
        ---- end ----
        vmlinux symtab matches kallsyms: FAILED!
      
      The problem is that the kretprobe_trampoline symbol looks like this:
      
        $ eu-readelf -s /boot/vmlinux G kretprobe_trampoline
         2431: c000000001302368     24 NOTYPE  LOCAL  DEFAULT       37 kretprobe_trampoline_holder
         2432: c00000000003d300      8 FUNC    LOCAL  DEFAULT        1 .kretprobe_trampoline_holder
        97543: c00000000003d300      0 NOTYPE  GLOBAL DEFAULT        1 kretprobe_trampoline
      
      Its type is NOTYPE, and its size is 0, and this is a problem because
      symbol-elf.c:dso__load_sym skips function symbols that are not STT_FUNC
      or STT_GNU_IFUNC (this is determined by elf_sym__is_function). Even
      if the type is changed to STT_FUNC, when dso__load_sym calls
      symbols__fixup_duplicate, the kretprobe_trampoline symbol is dropped in
      favour of .kretprobe_trampoline_holder because the latter has non-zero
      size (as determined by choose_best_symbol).
      
      With this patch, all vmlinux symbols match /proc/kallsyms and the
      testcase passes.
      
      Commit c1c355ce ("x86/kprobes: Get rid of
      kretprobe_trampoline_holder()") gets rid of kretprobe_trampoline_holder
      altogether on x86. This commit does the same on powerpc. This change
      introduces no regressions on the perf and ftracetest testsuite results.
      Reviewed-by: default avatarNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: default avatarThiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      61ed9cfb
  2. 17 Jun, 2016 1 commit
  3. 16 Jun, 2016 12 commits
    • Frederic Barrat's avatar
      cxl: Make vPHB device node match adapter's · a4307390
      Frederic Barrat authored
      On bare-metal, when a device is attached to the cxl card, lsvpd shows
      a location code such as (with cxlflash):
           # lsvpd -l sg22
           ...
           *YL U78CB.001.WZS0073-P1-C33-B0-T0-L0
      which makes it hard to easily identify the cxl adapter owning the
      flash device, since in this example C33 refers to a P8 processor.
      
      lsvpd looks in the parent devices until it finds a location code, so the
      device node for the vPHB ends up being used.
      
      By reusing the device node of the adapter for the vPHB, lsvpd shows:
           # lsvpd -l sg16
           ...
           *YL U78C9.001.WZS09XA-P1-C7-B1-T0-L3
      where C7 is the PCI slot of the cxl adapter.
      
      On powerVM, the vPHB was already using the adapter device node, so
      there's no change there.
      
      Tested by cxlflash on bare-metal and powerVM.
      Signed-off-by: default avatarFrederic Barrat <fbarrat@linux.vnet.ibm.com>
      Reviewed-by: default avatarMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
      Acked-by: default avatarIan Munsie <imunsie@au1.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      a4307390
    • Ian Munsie's avatar
      cxl: Add support for CAPP DMA mode · b385c9e9
      Ian Munsie authored
      This adds support for using CAPP DMA mode, which is required for XSL
      based cards such as the Mellanox CX4 to function.
      
      This is currently an RFC as it depends on the corresponding support to
      be merged into skiboot first, which was submitted here:
      http://patchwork.ozlabs.org/patch/625582/
      
      In the event that the skiboot on the system does not have the above
      support, it will indicate as such in the kernel log and abort the init
      process.
      Signed-off-by: default avatarIan Munsie <imunsie@au1.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      b385c9e9
    • Frederic Barrat's avatar
      cxl: Abstract the differences between the PSL and XSL · 6d382616
      Frederic Barrat authored
      The XSL (Translation Service Layer) is a stripped down version of the
      PSL (Power Service Layer) used in some cards such as the Mellanox CX4.
      
      Like the PSL, it implements the CAIA architecture, but has a number of
      differences, mostly in it's implementation dependent registers. This
      adds an ops structure to abstract these differences to bring initial
      support for XSL CAPI devices.
      
      The XSL does not implement the optional architected SERR register,
      however while it treats it as a reserved register and should work with
      no special treatment, attempting to access it will cause the XSL_FEC
      (First Error Capture) register to be filled out, preventing it from
      capturing any subsequent errors. Therefore, this patch also prevents the
      kernel from trying to set up the SERR register so that the FEC register
      may still be useful, and to save one interrupt.
      
      The XSL also uses a special DMA cxl mode, which uses a slightly
      different init sequence for the CAPP and PHB. The kernel support for
      this will be in a future patch once the corresponding support has been
      merged into skiboot.
      Co-authored-by: default avatarIan Munsie <imunsie@au1.ibm.com>
      Signed-off-by: default avatarIan Munsie <imunsie@au1.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      6d382616
    • Ian Munsie's avatar
      cxl: Update process element after allocating interrupts · 292841b0
      Ian Munsie authored
      In the kernel API, it is possible to attempt to allocate AFU interrupts
      after already starting a context. Since the process element structure
      used by the hardware is only filled out at the time the context is
      started, it will not be updated with the interrupt numbers that have
      just been allocated and therefore AFU interrupts will not work unless
      they were allocated prior to starting the context.
      
      This can present some difficulties as each CAPI enabled PCI device in
      the kernel API has a default context, which may need to be started very
      early to enable translations, potentially before interrupts can easily
      be set up.
      
      This patch makes the API more flexible to allow interrupts to be
      allocated after a context has already been started and takes care of
      updating the PE structure used by the hardware and notifying it to
      discard any cached copy it may have.
      
      The update is currently performed via a terminate/remove/add sequence.
      This is necessary on some hardware such as the XSL that does not
      properly support the update LLCMD.
      
      Note that this is only supported on powernv at present - attempting to
      perform this ordering on PowerVM will raise a warning.
      Signed-off-by: default avatarIan Munsie <imunsie@au1.ibm.com>
      Reviewed-by: default avatarFrederic Barrat <fbarrat@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      292841b0
    • Andrew Donnellan's avatar
      cxl: static-ify variables to fix sparse warnings · 64417a39
      Andrew Donnellan authored
      Make a couple more variables static. Found by sparse.
      Signed-off-by: default avatarAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Reviewed-by: fbarrat@linux.vnet.ibm.com
      Reviewed-by: default avatarMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
      Acked-by: default avatarIan Munsie <imunsie@au1.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      64417a39
    • Daniel Axtens's avatar
      powerpc/align: Use #ifdef __BIG_ENDIAN__ #else for REG_BYTE · a9650e9b
      Daniel Axtens authored
      Sparse complains that it doesn't know what REG_BYTE is:
      
        arch/powerpc/kernel/align.c:313:29: error: undefined identifier 'REG_BYTE'
      
      REG_BYTE is defined differently based on whether we're compiling for
      LE, BE32 or BE64. Sparse apparently doesn't provide __BIG_ENDIAN__ or
      __LITTLE_ENDIAN__, which means we get no definition.
      
      Rather than check for __BIG_ENDIAN__ and then separately for
      __LITTLE_ENDIAN__, just switch the #ifdef to check for __BIG_ENDIAN__
      and then #else we define the little endian version. Technically that's
      dicey because PDP_ENDIAN is also a possibility, but we already do it in
      a lot of places so one more hardly matters.
      Signed-off-by: default avatarDaniel Axtens <dja@axtens.net>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      a9650e9b
    • Daniel Axtens's avatar
      powerpc/sparse: Include headers containing prototypes · 665e87ff
      Daniel Axtens authored
      Sometimes headers that provide prototypes for functions are
      accidentally omitted from the files that define the functions.
      
      Fix a couple of times that occurs.
      Signed-off-by: default avatarDaniel Axtens <dja@axtens.net>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      665e87ff
    • Daniel Axtens's avatar
      powerpc: Introduce asm-prototypes.h · 42f5b4ca
      Daniel Axtens authored
      Sparse picked up a number of functions that are implemented in C and
      then only referred to in asm code.
      
      This introduces asm-prototypes.h, which provides a place for
      prototypes of these functions.
      
      This silences some sparse warnings.
      Signed-off-by: default avatarDaniel Axtens <dja@axtens.net>
      [mpe: Add include guards, clean up copyright & GPL text]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      42f5b4ca
    • Daniel Axtens's avatar
      powerpc/sparse: make some things static · 34852ed5
      Daniel Axtens authored
      This is just a smattering of things picked up by sparse that should
      be made static.
      Signed-off-by: default avatarDaniel Axtens <dja@axtens.net>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      34852ed5
    • Suraj Jitindar Singh's avatar
      powerpc: Add array bounds checking to crash_shutdown_handlers · 1d145165
      Suraj Jitindar Singh authored
      The array crash_shutdown_handles is an array of size CRASH_HANDLER_MAX+1
      containing up to CRASH_HANDLER_MAX shutdown_handlers. It is assumed to
      be NULL terminated, which it is under normal circumstances. Array
      accesses in the functions crash_shutdown_unregister() and
      default_machine_crash_shutdown() rely on this NULL termination property
      when traversing this list and don't protect again out of bounds accesses.
      If the NULL terminator were somehow overwritten these functions could
      potentially access out of the bounds of the array.
      
      Shrink the array to size CRASH_HANDLER_MAX and implement explicit array
      bounds checking when accessing the elements of the
      crash_shutdown_handles[] array in crash_shutdown_unregister() and
      default_machine_crash_shutdown().
      Signed-off-by: default avatarSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      1d145165
    • Oliver O'Halloran's avatar
      powerpc/mm: Ensure "special" zones are empty · 3079abe5
      Oliver O'Halloran authored
      The mm zone mechanism was traditionally used by arch specific code to
      partition memory into allocation zones. However there are several zones
      that are managed by the mm subsystem rather than the architecture. Most
      architectures set the max PFN of these special zones to zero, however on
      powerpc we set them to ~0ul. This, in conjunction with a bug in
      free_area_init_nodes() results in all of system memory being placed in
      ZONE_DEVICE when enabled. Device memory cannot be used for regular kernel
      memory allocations so this will cause a kernel panic at boot. Given the
      planned addition of more mm managed zones (ZONE_CMA) we should aim to be
      consistent with every other architecture and set the max PFN for these
      zones to zero.
      Signed-off-by: default avatarOliver O'Halloran <oohall@gmail.com>
      Reviewed-by: default avatarBalbir Singh <bsingharora@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      3079abe5
    • Rashmica Gupta's avatar
      powerpc/asm: Remove unused symbols in asm-offsets.c · aac6a91f
      Rashmica Gupta authored
      THREAD_DSCR:
        Added in efcac658 "powerpc: Per process DSCR + some fixes (try#4)"
        Last usage removed in 152d523e "powerpc: Create context switch helpers save_sprs() and restore_sprs()"
      
      THREAD_DSCR_INHERIT:
        Added in 71433285 "powerpc: Restore correct DSCR in context switch"
        Last usage removed in 152d523e "powerpc: Create context switch helpers save_sprs() and restore_sprs()"
      
      THREAD_TAR:
        Added in 2468dcf6 "powerpc: Add support for context switching the TAR register"
        Last usage removed in 152d523e "powerpc: Create context switch helpers save_sprs() and restore_sprs()"
      
      THREAD_BESCR, THREAD_EBBHR and THREAD_EBBRR:
        Added in 9353374b "powerpc: Context switch the new EBB SPRs"
        Last usage removed in 152d523e "powerpc: Create context switch helpers save_sprs() and restore_sprs()"
      
      THREAD_SIAR, THREAD_SDAR, THREAD_SIER, THREAD_MMCR0, and THREAD_MMCR2:
        Added in 59affcd3 "powerpc: Context switch more PMU related SPRs"
        Last usage removed in b11ae951 "powerpc: Partial revert of "Context switch more PMU related SPRs""
      
      PACA_LOCK_TOKEN:
        Added in 9e368f29 "KVM: PPC: book3s_hv: Add support for PPC970-family processors"
        Last usage removed in c17b98cf "KVM: PPC: Book3S HV: Remove code for PPC970 processors"
      
      HCALL_STAT_SIZE, HCALL_STAT_CALLS, HCALL_STAT_TB and HCALL_STAT_PURR:
        Added in 57852a85 "[POWERPC] powerpc: Instrument Hypervisor Calls"
        Last usage removed in c8cd093a "powerpc: tracing: Add hypervisor call tracepoints"
      
      VCPU_EPLC:
        Added in d30f6e48 "KVM: PPC: booke: category E.HV (GS-mode) support"
        Never used.
      
      CPU_DOWN_FLUSH:
        Added in e7affb1d "powerpc/cache: add cache flush operation for various e500"
        Never used.
      
      CFG_STAMP_XSEC:
        Added in 14cf11af "powerpc: Merge enough to start building in arch/powerpc."
        Last usage removed in 0e469db8 "powerpc: Rework VDSO gettimeofday to prevent time going backwards"
      
      KVM_LPCR:
        Added in aa04b4cc "KVM: PPC: Allocate RMAs (Real Mode Areas) at boot for use by guests"
        Last usage removed in a0144e2a "KVM: PPC: Book3S HV: Store LPCR value for each virtual core"
      
      GPR15, GPR16, GPR17, GPR18, GPR19, GPR20, GPR21, GPR22, GPR23, GPR24,
      GPR25, GPR26, GPR27, GPR28, GPR29, GPR30 and GPR31:
        Added in 14cf11af "powerpc: Merge enough to start building in arch/powerpc."
        Never used.
      
      VCPU_SHADOW_FSCR:
        Added in 616dff86 "KVM: PPC: Book3S PR: Handle Facility interrupt and FSCR"
        Never used.
      
      VCPU_SHADOW_SRR1:
        Added in a2d56020 "KVM: PPC: Book3S PR: Keep volatile reg values in vcpu rather than shadow_vcpu"
        Never used.
      
      KVM_SPLIT_SIZE:
        Added in b4deba5c "KVM: PPC: Book3S HV: Implement dynamicmicro-threading on POWER8"
        Never used.
      
      VCPU_VCPUID:
        Added in de56a948 "KVM: PPC: Add support for Book3S processors in hypervisor mode"
        Last usage removed 1b400ba0 "KVM: PPC: Book3S HV: Improve handling of local vs. global TLB invalidations"
      
      _MQ:
        Added in 14cf11af "powerpc: Merge enough to start building in arch/powerpc."
        Never used.
      
      AUDITCONTEXT:
        Added in 14cf11af "powerpc: Merge enough to start building in arch/powerpc."
        Last usage removed in 401d1f02 "[PATCH] syscall entry/exit revamp"
      
      CLONE_VM:
        Added in 14cf11af "powerpc: Merge enough to start building in arch/powerpc."
        Currently unused.
      
      CLONE_UNTRACED:
        Added in 14cf11af "powerpc: Merge enough to start building in arch/powerpc."
        Currently unused.
      Signed-off-by: default avatarRashmica Gupta <rashmicy@gmail.com>
      [mpe: Munge change log]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      aac6a91f
  4. 14 Jun, 2016 3 commits
    • Bharata B Rao's avatar
      powerpc/numa: Fix multiple bugs in memory_hotplug_max() · 45b64ee6
      Bharata B Rao authored
      memory_hotplug_max() uses hot_add_drconf_memory_max() to get maxmimum
      addressable memory by referring to ibm,dyanamic-memory property. There
      are three problems with the current approach:
      
      1 hot_add_drconf_memory_max() assumes that ibm,dynamic-memory includes
        all the LMBs of the guest, but that is not true for PowerKVM which
        populates only DR LMBs (LMBs that can be hotplugged/removed) in that
        property.
      2 hot_add_drconf_memory_max() multiplies lmb-size with lmb-count to arrive
        at the max possible address. Since ibm,dynamic-memory doesn't include
        RMA LMBs, the address thus obtained will be less than the actual max
        address. For example, if max possible memory size is 32G, with lmb-size
        of 256MB there can be 127 LMBs in ibm,dynamic-memory (1 LMB for RMA
        which won't be present here).  hot_add_drconf_memory_max() would then
        return the max addressable memory as 127 * 256MB = 31.75GB, the max
        address should have been 32G which is what ibm,lrdr-capacity shows.
      3 In PowerKVM, there can be a gap between the end of boot time RAM and
        beginning of hotplug RAM area. So just multiplying lmb-count with
        lmb-size will not provide the correct max possible address for PowerKVM.
      
      This patch fixes 1 by using ibm,lrdr-capacity property to return the max
      addressable memory whenever the property is present. Then it fixes 2 & 3
      by fetching the address of the last LMB in ibm,dynamic-memory property.
      
      Fixes: cd34206e ("powerpc: Add memory_hotplug_max()")
      Signed-off-by: default avatarBharata B Rao <bharata@linux.vnet.ibm.com>
      Reviewed-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      45b64ee6
    • Bharata B Rao's avatar
    • Boqun Feng's avatar
      powerpc/spinlock: Fix spin_unlock_wait() · 6262db7c
      Boqun Feng authored
      There is an ordering issue with spin_unlock_wait() on powerpc, because
      the spin_lock primitive is an ACQUIRE and an ACQUIRE is only ordering
      the load part of the operation with memory operations following it.
      Therefore the following event sequence can happen:
      
      CPU 1			CPU 2			CPU 3
      
      ==================	====================	==============
      						spin_unlock(&lock);
      			spin_lock(&lock):
      			  r1 = *lock; // r1 == 0;
      o = object;		o = READ_ONCE(object); // reordered here
      object = NULL;
      smp_mb();
      spin_unlock_wait(&lock);
      			  *lock = 1;
      smp_mb();
      o->dead = true;         < o = READ_ONCE(object); > // reordered upwards
      			if (o) // true
      				BUG_ON(o->dead); // true!!
      
      To fix this, we add a "nop" ll/sc loop in arch_spin_unlock_wait() on
      ppc, the "nop" ll/sc loop reads the lock
      value and writes it back atomically, in this way it will synchronize the
      view of the lock on CPU1 with that on CPU2. Therefore in the scenario
      above, either CPU2 will fail to get the lock at first or CPU1 will see
      the lock acquired by CPU2, both cases will eliminate this bug. This is a
      similar idea as what Will Deacon did for ARM64 in:
      
        d86b8da0 ("arm64: spinlock: serialise spin_unlock_wait against concurrent lockers")
      
      Furthermore, if the "nop" ll/sc figures out the lock is locked, we
      actually don't need to do the "nop" ll/sc trick again, we can just do a
      normal load+check loop for the lock to be released, because in that
      case, spin_unlock_wait() is called when someone is holding the lock, and
      the store part of the "nop" ll/sc happens before the lock release of the
      current lock holder:
      
      	"nop" ll/sc -> spin_unlock()
      
      and the lock release happens before the next lock acquisition:
      
      	spin_unlock() -> spin_lock() <next holder>
      
      which means the "nop" ll/sc happens before the next lock acquisition:
      
      	"nop" ll/sc -> spin_unlock() -> spin_lock() <next holder>
      
      With a smp_mb() preceding spin_unlock_wait(), the store of object is
      guaranteed to be observed by the next lock holder:
      
      	STORE -> smp_mb() -> "nop" ll/sc
      	-> spin_unlock() -> spin_lock() <next holder>
      
      This patch therefore fixes the issue and also cleans the
      arch_spin_unlock_wait() a little bit by removing superfluous memory
      barriers in loops and consolidating the implementations for PPC32 and
      PPC64 into one.
      Suggested-by: default avatar"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarBoqun Feng <boqun.feng@gmail.com>
      Reviewed-by: default avatar"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      [mpe: Inline the "nop" ll/sc loop and set EH=0, munge change log]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      6262db7c