An error occurred fetching the project authors.
  1. 11 Jun, 2014 1 commit
    • Gavin Shan's avatar
      powerpc/powernv: Fix killed EEH event · 5c7a35e3
      Gavin Shan authored
      On PowerNV platform, EEH errors are reported by IO accessors or poller
      driven by interrupt. After the PE is isolated, we won't produce EEH
      event for the PE. The current implementation has possibility of EEH
      event lost in this way:
      
      The interrupt handler queues one "special" event, which drives the poller.
      EEH thread doesn't pick the special event yet. IO accessors kicks in, the
      frozen PE is marked as "isolated" and EEH event is queued to the list.
      EEH thread runs because of special event and purge all existing EEH events.
      However, we never produce an other EEH event for the frozen PE. Eventually,
      the PE is marked as "isolated" and we don't have EEH event to recover it.
      
      The patch fixes the issue to keep EEH events for PEs that have been
      marked as "isolated" with the help of additional "force" help to
      eeh_remove_event().
      Reported-by: default avatarRolf Brudeseth <rolfb@us.ibm.com>
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      5c7a35e3
  2. 20 Jun, 2013 2 commits
  3. 09 Sep, 2012 2 commits
  4. 09 Mar, 2012 2 commits
    • Gavin Shan's avatar
      powerpc/eeh: Replace pci_dn with eeh_dev for EEH aux components · 40a7cd92
      Gavin Shan authored
      The original EEH implementation is heavily depending on struct pci_dn.
      We have to put EEH related information to pci_dn. Actually, we could
      split struct pci_dn so that the EEH sensitive information to form an
      individual struct, then EEH looks more independent.
      
      The patch replaces pci_dn with eeh_dev for EEH aux components like
      event and driver. Also, the eeh_event struct has been adjusted for
      a little bit since eeh_dev has linked the associated FDT (Flat Device
      Tree) node and PCI device. It's not necessary for eeh_event struct to
      trace FDT node and PCI device. We can just simply to trace eeh_dev in
      eeh_event.
      
      The patch also renames function pcid_name() to eeh_pcid_name(), which
      should be missed in the previous patch where the EEH aux components
      have been cleaned up.
      Signed-off-by: default avatarGavin Shan <shangw@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      40a7cd92
    • Gavin Shan's avatar
      powerpc/pseries: Cleanup comments in EEH aux components · 29f8bf1b
      Gavin Shan authored
      There're several EEH aux components and the patch does some cleanup
      for them so that they look more clean.
      
              * Duplicated comments have been removed from the header file.
              * Comments have been reorganized so that it looks more clean.
              * The leading comments of functions are adjusted for a little
                bit so that the result of "make pdfdocs" would be more
                unified.
              * Function calls "xxx ()" has been replaced by "xxx()".
      Signed-off-by: default avatarGavin Shan <shangw@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      29f8bf1b
  5. 04 Aug, 2008 1 commit
  6. 22 Mar, 2007 1 commit
  7. 22 Apr, 2006 1 commit
    • Linas Vepstas's avatar
      [PATCH] powerpc/pseries: clear PCI failure counter if no new failures · ac325acd
      Linas Vepstas authored
      The current PCI error recovery system keeps track of the number of PCI card
      resets, and refuses to bring a card back up if this number is too large.
      The goal of doing this was to avoid an infinite loop of resets if a card is
      obviously dead.  However, if the failures are rare, but the machine has a
      high uptime, this mechanism might still be triggered; this is too harsh.
      
      This patch will avoids this problem by decrementing the fail count after an
      hour.  Thus, as long as a pci card BSOD's less than 6 times an hour, it
      will continue to be reset indefinitely.  If it's failure rate is greater
      than that, it will be taken off-line permanently.
      
      This patch is larger than it might otherwise be because it changes
      indentation by removing a pointless while-loop.  The while loop is not
      needed, as the handler is invoked once fo each event (by schedule_work());
      the loop is leftover cruft from an earlier implementation.
      Signed-off-by: default avatarLinas Vepstas <linas@austin.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      ac325acd
  8. 10 Jan, 2006 1 commit
  9. 09 Jan, 2006 1 commit
  10. 10 Nov, 2005 1 commit
    • Linas Vepstas's avatar
      [PATCH] ppc64: PCI error event dispatcher · 172ca926
      Linas Vepstas authored
      12-eeh-event-dispatcher.patch
      
      ppc64: EEH Recovery dispatcher thread
      
      This patch adds a mechanism to create recovery threads when an
      EEH event is received.  Since an EEH freeze state may be detected
      within an interrupt context, we need to get out of the interrupt
      context before starting recovery. This dispatcher does this in
      two steps: first, it uses a workqueue to get out, and then
      lanuches a kernel thread, so that the recovery routine can
      sleep for exteded periods without upseting the keventd.
      
      A kernel thread is created with each EEH event, rather than
      having one long-running daemon started at boot time.  This is
      because it is anticipated that EEH events will be very rare
      (very very rare, ideally) and so its pointless to cluter the
      process tables with a daemon that will almost never run.
      Signed-off-by: default avatarLinas Vepstas <linas@austin.ibm.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      172ca926