1. 12 May, 2016 8 commits
    • Alexey Kardashevskiy's avatar
      powerpc/powernv: Fix insufficient memory allocation · 92a86756
      Alexey Kardashevskiy authored
      The pnv_pci_init_ioda_phb() helper allocates a blob to store auxilary
      data such PE and M32/M64 segment allocation maps; this single blob has
      few partitions, size of each is derived from the PE number -
      phb->ioda.total_pe_num.
      
      It was assumed that the minimum PE number is 8, however it is 4 for NPU
      so the pe_alloc part was missing in the allocated blob. It was invisible
      till recently as we were not tracking used M64 segments and NPUs do not
      use M32 segments so the phb->ioda.m32_segmap (which was pointing to the
      same address as phb->ioda.pe_alloc) has never been written to leaving
      the pe_alloc memory intact.
      
      After commit 401203ac2d "powerpc/powernv: Track M64 segment consumption"
      the pe_alloc gets corrupted and PE allocation cannot work. This fixes
      the issue by enforcing the minimum PE number to 8.
      
      Fixes: 401203ac2d15 ("powerpc/powernv: Track M64 segment consumption")
      Signed-off-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      92a86756
    • Guilherme G. Piccoli's avatar
      powerpc/iommu: Remove the dependency on EEH struct in DDW mechanism · 8445a87f
      Guilherme G. Piccoli authored
      Commit 39baadbf ("powerpc/eeh: Remove eeh information from pci_dn")
      changed the pci_dn struct by removing its EEH-related members.
      As part of this clean-up, DDW mechanism was modified to read the device
      configuration address from eeh_dev struct.
      
      As a consequence, now if we disable EEH mechanism on kernel command-line
      for example, the DDW mechanism will fail, generating a kernel oops by
      dereferencing a NULL pointer (which turns to be the eeh_dev pointer).
      
      This patch just changes the configuration address calculation on DDW
      functions to a manual calculation based on pci_dn members instead of
      using eeh_dev-based address.
      
      No functional changes were made. This was tested on pSeries, both
      in PHyp and qemu guest.
      
      Fixes: 39baadbf ("powerpc/eeh: Remove eeh information from pci_dn")
      Cc: stable@vger.kernel.org # v3.4+
      Reviewed-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: default avatarGuilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      8445a87f
    • Guilherme G. Piccoli's avatar
      Revert "powerpc/eeh: Fix crash in eeh_add_device_early() on Cell" · c2078d9e
      Guilherme G. Piccoli authored
      This reverts commit 89a51df5.
      
      The function eeh_add_device_early() is used to perform EEH
      initialization in devices added later on the system, like in
      hotplug/DLPAR scenarios. Since the commit 89a51df5 ("powerpc/eeh:
      Fix crash in eeh_add_device_early() on Cell") a new check was introduced
      in this function - Cell has no EEH capabilities which led to kernel oops
      if hotplug was performed, so checking for eeh_enabled() was introduced
      to avoid the issue.
      
      However, in architectures that EEH is present like pSeries or PowerNV,
      we might reach a case in which no PCI devices are present on boot time
      and so EEH is not initialized. Then, if a device is added via DLPAR for
      example, eeh_add_device_early() fails because eeh_enabled() is false,
      and EEH end up not being enabled at all.
      
      This reverts the aforementioned patch since a new verification was
      introduced by the commit d91dafc0 ("powerpc/eeh: Delay probing EEH
      device during hotplug") and so the original Cell issue does not happen
      anymore.
      
      Cc: stable@vger.kernel.org # v4.1+
      Reviewed-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: default avatarGuilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      c2078d9e
    • Gavin Shan's avatar
      powerpc/eeh: Drop unnecessary label in eeh_pe_change_owner() · d6d63d72
      Gavin Shan authored
      The label "reset" in eeh_pe_change_owner() is used only for once.
      No need to keep it and just drop it. No logical changes introduced.
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Reviewed-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: default avatarRussell Currey <ruscur@russell.cc>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      d6d63d72
    • Gavin Shan's avatar
      powerpc/eeh: Ignore handlers in eeh_pe_reset_and_recover() · 2efc771f
      Gavin Shan authored
      The function eeh_pe_reset_and_recover() is used to recover EEH
      error when the passthrough device are transferred to guest and
      backwards, meaning the device's driver is vfio-pci or none. In
      both cases, the handlers triggered by eeh_report_reset() and
      eeh_report_resume() shouldn't be called.
      
      This ignores the error handlers from eeh_report_reset() and
      eeh_report_resume().
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Reviewed-by: default avatarRussell Currey <ruscur@russell.cc>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      2efc771f
    • Gavin Shan's avatar
      powerpc/eeh: Restore initial state in eeh_pe_reset_and_recover() · 5a0cdbfd
      Gavin Shan authored
      The function eeh_pe_reset_and_recover() is used to recover EEH
      error when the passthrou device are transferred to guest and
      backwards. The content in the device's config space will be lost
      on PE reset issued in the middle of the recovery. The function
      saves/restores it before/after the reset. However, config access
      to some adapters like Broadcom BCM5719 at this point will causes
      fenced PHB. The config space is always blocked and we save 0xFF's
      that are restored at late point. The memory BARs are totally
      corrupted, causing another EEH error upon access to one of the
      memory BARs.
      
      This restores the config space on those adapters like BCM5719
      from the content saved to the EEH device when it's populated,
      to resolve above issue.
      
      Fixes: 5cfb20b9 ("powerpc/eeh: Emulate EEH recovery for VFIO devices")
      Cc: stable@vger.kernel.org #v3.18+
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Reviewed-by: default avatarRussell Currey <ruscur@russell.cc>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      5a0cdbfd
    • Gavin Shan's avatar
      powerpc/eeh: Don't report error in eeh_pe_reset_and_recover() · affeb0f2
      Gavin Shan authored
      The function eeh_pe_reset_and_recover() is used to recover EEH
      error when the passthrough device are transferred to guest and
      backwards, meaning the device's driver is vfio-pci or none.
      When the driver is vfio-pci that provides error_detected() error
      handler only, the handler simply stops the guest and it's not
      expected behaviour. On the other hand, no error handlers will
      be called if we don't have a bound driver.
      
      This ignores the error handler in eeh_pe_reset_and_recover()
      that reports the error to device driver to avoid the exceptional
      behaviour.
      
      Fixes: 5cfb20b9 ("powerpc/eeh: Emulate EEH recovery for VFIO devices")
      Cc: stable@vger.kernel.org #v3.18+
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Reviewed-by: default avatarRussell Currey <ruscur@russell.cc>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      affeb0f2
    • Michael Ellerman's avatar
      Revert "powerpc/powernv: Exclude root bus in pnv_pci_reset_secondary_bus()" · 848912e5
      Michael Ellerman authored
      This reverts commit c8ceacc2.
      
      Gavin says: I missed the fact that it affects the PCI passthrou path as
      reported by Alexey: When passing GPU (0003:01:00.0) which seats behind
      the root port, the reset request is routed to skiboot in original code.
      In skiboot, the link bouncing events are masked during the reset. So we
      don't see EEH (freeze all) error even link bouncing happens. With the
      changes included, the reset is done by kernel and the link bouncing
      events aren't masked by altering content of PHB3 (or P7IOC) specific
      hardware registers which are invisible to kernel (skiboot hides the
      hardware specific). It means the link bouncing is seen by the root port
      and it causes a EEH (freeze all) error. The PCI passthrough on GPU
      device cannot work.
      Requested-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Requested-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      848912e5
  2. 11 May, 2016 32 commits