An error occurred fetching the project authors.
- 11 Jun, 2014 1 commit
-
-
Gavin Shan authored
On PowerNV platform, EEH errors are reported by IO accessors or poller driven by interrupt. After the PE is isolated, we won't produce EEH event for the PE. The current implementation has possibility of EEH event lost in this way: The interrupt handler queues one "special" event, which drives the poller. EEH thread doesn't pick the special event yet. IO accessors kicks in, the frozen PE is marked as "isolated" and EEH event is queued to the list. EEH thread runs because of special event and purge all existing EEH events. However, we never produce an other EEH event for the frozen PE. Eventually, the PE is marked as "isolated" and we don't have EEH event to recover it. The patch fixes the issue to keep EEH events for PEs that have been marked as "isolated" with the help of additional "force" help to eeh_remove_event(). Reported-by:
Rolf Brudeseth <rolfb@us.ibm.com> Signed-off-by:
Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 20 Jun, 2013 2 commits
-
-
Gavin Shan authored
On PowerNV platform, we might run into the situation where subsequent events are duplicated events of former one, which is being processed. For the case, we need the function implemented by the patch to purge EEH events accordingly. Signed-off-by:
Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Gavin Shan authored
We possiblly have multiple kthreads running for multiple EEH errors (events) and use one spinlock to make the process of handling those EEH events serialized. That's unnecessary and the patch creates only one kthread, which is started during EEH core initialization time in eeh_init(). A new semaphore introduced to count the number of existing EEH events in the queue and the kthread waiting on the semaphore. Signed-off-by:
Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 09 Sep, 2012 2 commits
-
-
Gavin Shan authored
The patch reworks the current implementation so that the eeh errors will be handled basing on PE instead of eeh device. Signed-off-by:
Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Gavin Shan authored
The original implementation builds EEH event based on EEH device. We already had dedicated struct to depict PE. It's reasonable to build EEH event based on PE. Signed-off-by:
Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 09 Mar, 2012 2 commits
-
-
Gavin Shan authored
The original EEH implementation is heavily depending on struct pci_dn. We have to put EEH related information to pci_dn. Actually, we could split struct pci_dn so that the EEH sensitive information to form an individual struct, then EEH looks more independent. The patch replaces pci_dn with eeh_dev for EEH aux components like event and driver. Also, the eeh_event struct has been adjusted for a little bit since eeh_dev has linked the associated FDT (Flat Device Tree) node and PCI device. It's not necessary for eeh_event struct to trace FDT node and PCI device. We can just simply to trace eeh_dev in eeh_event. The patch also renames function pcid_name() to eeh_pcid_name(), which should be missed in the previous patch where the EEH aux components have been cleaned up. Signed-off-by:
Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Gavin Shan authored
There're several EEH aux components and the patch does some cleanup for them so that they look more clean. * Duplicated comments have been removed from the header file. * Comments have been reorganized so that it looks more clean. * The leading comments of functions are adjusted for a little bit so that the result of "make pdfdocs" would be more unified. * Function calls "xxx ()" has been replaced by "xxx()". Signed-off-by:
Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 04 Aug, 2008 1 commit
-
-
Stephen Rothwell authored
from include/asm-powerpc. This is the result of a mkdir arch/powerpc/include/asm git mv include/asm-powerpc/* arch/powerpc/include/asm Followed by a few documentation/comment fixups and a couple of places where <asm-powepc/...> was being used explicitly. Of the latter only one was outside the arch code and it is a driver only built for powerpc. Signed-off-by:
Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by:
Paul Mackerras <paulus@samba.org>
-
- 22 Mar, 2007 1 commit
-
-
Linas Vepstas authored
The EEH event notification system passes around data that is not needed or at least, not used properly. Stop passing this data; get it in a more reliable fashion. Signed-off-by:
Linas Vepstas <linas@austin.ibm.com> Signed-off-by:
Paul Mackerras <paulus@samba.org>
-
- 22 Apr, 2006 1 commit
-
-
Linas Vepstas authored
The current PCI error recovery system keeps track of the number of PCI card resets, and refuses to bring a card back up if this number is too large. The goal of doing this was to avoid an infinite loop of resets if a card is obviously dead. However, if the failures are rare, but the machine has a high uptime, this mechanism might still be triggered; this is too harsh. This patch will avoids this problem by decrementing the fail count after an hour. Thus, as long as a pci card BSOD's less than 6 times an hour, it will continue to be reset indefinitely. If it's failure rate is greater than that, it will be taken off-line permanently. This patch is larger than it might otherwise be because it changes indentation by removing a pointless while-loop. The while loop is not needed, as the handler is invoked once fo each event (by schedule_work()); the loop is leftover cruft from an earlier implementation. Signed-off-by:
Linas Vepstas <linas@austin.ibm.com> Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Paul Mackerras <paulus@samba.org>
-
- 10 Jan, 2006 1 commit
-
-
Linas Vepstas authored
Various PCI bus errors can be signaled by newer PCI controllers. The core error recovery routines are architecture dependent. This patch adds a recovery infrastructure for the PPC64 pSeries systems. Signed-off-by:
Linas Vepstas <linas@austin.ibm.com> Signed-off-by:
Paul Mackerras <paulus@samba.org> (cherry picked from e8ca11b460c4c9c7fa6b529be221529ebd770e38 commit)
-
- 09 Jan, 2006 1 commit
-
-
Arnd Bergmann authored
include/asm-ppc/ had #ifdef __KERNEL__ in all header files that are not meant for use by user space, include/asm-powerpc does not have this yet. This patch gets us a lot closer there. There are a few cases where I was not sure, so I left them out. I have verified that no CONFIG_* symbols are used outside of __KERNEL__ any more and that there are no obvious compile errors when including any of the headers in user space libraries. Signed-off-by:
Arnd Bergmann <arnd@arndb.de> Signed-off-by:
Paul Mackerras <paulus@samba.org>
-
- 10 Nov, 2005 1 commit
-
-
Linas Vepstas authored
12-eeh-event-dispatcher.patch ppc64: EEH Recovery dispatcher thread This patch adds a mechanism to create recovery threads when an EEH event is received. Since an EEH freeze state may be detected within an interrupt context, we need to get out of the interrupt context before starting recovery. This dispatcher does this in two steps: first, it uses a workqueue to get out, and then lanuches a kernel thread, so that the recovery routine can sleep for exteded periods without upseting the keventd. A kernel thread is created with each EEH event, rather than having one long-running daemon started at boot time. This is because it is anticipated that EEH events will be very rare (very very rare, ideally) and so its pointless to cluter the process tables with a daemon that will almost never run. Signed-off-by:
Linas Vepstas <linas@austin.ibm.com> Signed-off-by:
Paul Mackerras <paulus@samba.org>
-