1. 05 Sep, 2019 2 commits
    • Oliver O'Halloran's avatar
      powerpc/eeh: Fix race when freeing PDNs · 5ef753ae
      Oliver O'Halloran authored
      When hot-adding devices we rely on the hotplug driver to create pci_dn's
      for the devices under the hotplug slot. Converse, when hot-removing the
      driver will remove the pci_dn's that it created. This is a problem because
      the pci_dev is still live until it's refcount drops to zero. This can
      happen if the driver is slow to tear down it's internal state. Ideally, the
      driver would not attempt to perform any config accesses to the device once
      it's been marked as removed, but sometimes it happens. As a result, we
      might attempt to access the pci_dn for a device that has been torn down and
      the kernel may crash as a result.
      
      To fix this, don't free the pci_dn unless the corresponding pci_dev has
      been released.  If the pci_dev is still live, then we mark the pci_dn with
      a flag that indicates the pci_dev's release function should free it.
      Signed-off-by: default avatarOliver O'Halloran <oohall@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190903101605.2890-3-oohall@gmail.com
      5ef753ae
    • Oliver O'Halloran's avatar
      powerpc/eeh: Clean up EEH PEs after recovery finishes · 799abe28
      Oliver O'Halloran authored
      When the last device in an eeh_pe is removed the eeh_pe structure itself
      (and any empty parents) are freed since they are no longer needed. This
      results in a crash when a hotplug driver is involved since the following
      may occur:
      
      1. Device is suprise removed.
      2. Driver performs an MMIO, which fails and queues and eeh_event.
      3. Hotplug driver receives a hotplug interrupt and removes any
         pci_devs that were under the slot.
      4. pci_dev is torn down and the eeh_pe is freed.
      5. The EEH event handler thread processes the eeh_event and crashes
         since the eeh_pe pointer in the eeh_event structure is no
         longer valid.
      
      Crashing is generally considered poor form. Instead of doing that use
      the fact PEs are marked as EEH_PE_INVALID to keep them around until the
      end of the recovery cycle, at which point we can safely prune any empty
      PEs.
      Signed-off-by: default avatarOliver O'Halloran <oohall@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190903101605.2890-2-oohall@gmail.com
      799abe28
  2. 30 Aug, 2019 37 commits
  3. 29 Aug, 2019 1 commit