- 12 Mar, 2012 1 commit
-
-
Benjamin Herrenschmidt authored
-
- 09 Mar, 2012 22 commits
-
-
Benjamin Herrenschmidt authored
The current implementation of lazy interrupts handling has some issues that this tries to address. We don't do the various workarounds we need to do when re-enabling interrupts in some cases such as when returning from an interrupt and thus we may still lose or get delayed decrementer or doorbell interrupts. The current scheme also makes it much harder to handle the external "edge" interrupts provided by some BookE processors when using the EPR facility (External Proxy) and the Freescale Hypervisor. Additionally, we tend to keep interrupts hard disabled in a number of cases, such as decrementer interrupts, external interrupts, or when a masked decrementer interrupt is pending. This is sub-optimal. This is an attempt at fixing it all in one go by reworking the way we do the lazy interrupt disabling from the ground up. The base idea is to replace the "hard_enabled" field with a "irq_happened" field in which we store a bit mask of what interrupt occurred while soft-disabled. When re-enabling, either via arch_local_irq_restore() or when returning from an interrupt, we can now decide what to do by testing bits in that field. We then implement replaying of the missed interrupts either by re-using the existing exception frame (in exception exit case) or via the creation of a new one from an assembly trampoline (in the arch_local_irq_enable case). This removes the need to play with the decrementer to try to create fake interrupts, among others. In addition, this adds a few refinements: - We no longer hard disable decrementer interrupts that occur while soft-disabled. We now simply bump the decrementer back to max (on BookS) or leave it stopped (on BookE) and continue with hard interrupts enabled, which means that we'll potentially get better sample quality from performance monitor interrupts. - Timer, decrementer and doorbell interrupts now hard-enable shortly after removing the source of the interrupt, which means they no longer run entirely hard disabled. Again, this will improve perf sample quality. - On Book3E 64-bit, we now make the performance monitor interrupt act as an NMI like Book3S (the necessary C code for that to work appear to already be present in the FSL perf code, notably calling nmi_enter instead of irq_enter). (This also fixes a bug where BookE perfmon interrupts could clobber r14 ... oops) - We could make "masked" decrementer interrupts act as NMIs when doing timer-based perf sampling to improve the sample quality. Signed-off-by-yet: Benjamin Herrenschmidt <benh@kernel.crashing.org> --- v2: - Add hard-enable to decrementer, timer and doorbells - Fix CR clobber in masked irq handling on BookE - Make embedded perf interrupt act as an NMI - Add a PACA_HAPPENED_EE_EDGE for use by FSL if they want to retrigger an interrupt without preventing hard-enable v3: - Fix or vs. ori bug on Book3E - Fix enabling of interrupts for some exceptions on Book3E v4: - Fix resend of doorbells on return from interrupt on Book3E v5: - Rebased on top of my latest series, which involves some significant rework of some aspects of the patch. v6: - 32-bit compile fix - more compile fixes with various .config combos - factor out the asm code to soft-disable interrupts - remove the C wrapper around preempt_schedule_irq v7: - Fix a bug with hard irq state tracking on native power7
-
Gavin Shan authored
With the original EEH implementation, the access to config space of the corresponding PCI device is done by RTAS sensitive function. That depends on pci_dn heavily. That would limit EEH extension to other platforms like powernv because other platforms might have different ways to access PCI config space. The patch splits those functions used to access PCI config space and implement them in platform related EEH component. It would be helpful to support EEH on multiple platforms simutaneously in future. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Gavin Shan authored
With the original EEH implementation, the EEH global statistics are maintained by individual global variables. That makes the code a little hard to maintain. The patch introduces extra struct eeh_stats for the EEH global statistics so that it can be maintained in collective fashion. It's the rework on the corresponding v5 patch. According to the comments from David Laight, the EEH global statistics have been changed for a litte bit so that they have fixed-type of "u64". Also, the format used to print them has been changed to "%llu" based on David's suggestion. Also, the output format of EEH global statistics should be kept as intacted according to Michael's suggestion that there might be tools parsing them. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Gavin Shan authored
The pci_dn has been replaced with eeh_dev. In order to comply with the rule, the EEH platform implementation on pSeries should also be adjusted for a little bit so that it will depend on eeh_dev instead of pci_dn. The patch replaces pci_dn with eeh_dev. The corresponding information will be retrieved from eeh_dev instead of pci_dn. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Gavin Shan authored
The original EEH implementation is heavily depending on struct pci_dn. We have to put EEH related information to pci_dn. Actually, we could split struct pci_dn so that the EEH sensitive information to form an individual struct, then EEH looks more independent. The patch replaces pci_dn with eeh_dev for EEH aux components like event and driver. Also, the eeh_event struct has been adjusted for a little bit since eeh_dev has linked the associated FDT (Flat Device Tree) node and PCI device. It's not necessary for eeh_event struct to trace FDT node and PCI device. We can just simply to trace eeh_dev in eeh_event. The patch also renames function pcid_name() to eeh_pcid_name(), which should be missed in the previous patch where the EEH aux components have been cleaned up. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Gavin Shan authored
The original EEH implementation is heavily depending on struct pci_dn. We have to put EEH related information to pci_dn. Actually, we could split struct pci_dn so that the EEH sensitive information to form an individual struct, then EEH looks more independent. The patch replaces pci_dn with eeh_dev for EEH core. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Gavin Shan authored
With original EEH implementation, struct pci_dn is used while building PCI I/O address cache, which helps on searching the corresponding PCI device according to the given physical I/O address. Besides, pci_dn is associated with the corresponding PCI device while building its I/O cache. The patch replaces struct pci_dn with struct eeh_dev so that EEH address cache won't depend on struct pci_dn. That will help EEH to become an independent module in future. Besides, the binding of eeh_dev and PCI device is done while building PCI device I/O cache. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Gavin Shan authored
With original EEH implementation, all EEH related statistics have been put into struct pci_dn. We've introduced struct eeh_dev to replace struct pci_dn in EEH core components, including EEH sysfs component. The patch shows EEH statistics from struct eeh_dev instead of struct pci_dn in EEH sysfs component. Besides, it also fixed the EEH device retrieval from PCI device, which was introduced by the previous patch in the series of patch. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Gavin Shan authored
Original EEH implementation depends on struct pci_dn heavily. However, EEH shouldn't depend on that actually because EEH needn't share much information with other PCI components. That's to say, EEH should have worked independently. The patch introduces struct eeh_dev so that EEH core components needn't be working based on struct pci_dn in future. Also, struct pci_dn, struct eeh_dev instances are created in dynamic fasion and the binding with EEH device, OF node, PCI device is implemented as well. The EEH devices are created after PHBs are detected and initialized, but PCI emunation hasn't started yet. Apart from that, PHB might be created dynamically through DLPAR component and the EEH devices should be creatd as well. Another case might be OF node is created dynamically by DR (Dynamic Reconfiguration), which has been defined by PAPR. For those OF nodes created by DR, EEH devices should be also created accordingly. The binding between EEH device and OF node is done while the EEH device is initially created. The binding between EEH device and PCI device should be done after PCI emunation is done. Besides, PCI hotplug also needs the binding so that the EEH devices could be traced from the newly coming PCI buses or PCI devices. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Gavin Shan authored
The patch does some cleanup on the function names of EEH aux components. Currently, only couple of function names from eeh_cache have been adjusted so that: * The function name has prefix "eeh_addr_cache". * Move around pci_addr_cache_build() in the header file to reflect function call sequence. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Gavin Shan authored
There're several EEH aux components and the patch does some cleanup for them so that they look more clean. * Duplicated comments have been removed from the header file. * Comments have been reorganized so that it looks more clean. * The leading comments of functions are adjusted for a little bit so that the result of "make pdfdocs" would be more unified. * Function calls "xxx ()" has been replaced by "xxx()". Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Gavin Shan authored
In order to enable particular PCI device, which has been included in the parent PE. The involved PCI bridges should be enabled explicitly if there has. On pSeries platform, there're dedicated RTAS calls to fulfil the purpose. The patch implements the function of configuring PCI bridges through the dedicated RTAS calls. Besides, the function has been abstracted by struct eeh_ops::configure_bridge so that the EEH core components could support multiple platforms in future. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Gavin Shan authored
On RTAS compliant pSeries platform, one dedicated RTAS call has been introduced to retrieve EEH temporary or permanent error log. The patch implements the function of retriving EEH error log through RTAS call. Besides, it has been abstracted by struct eeh_ops::get_log so that EEH core components could support multiple platforms in future. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Gavin Shan authored
On RTAS compliant pSeries platform, there is a dedicated RTAS call (ibm,set-slot-reset) to reset the specified PE. Furthermore, two types of resets are supported: hot and fundamental. the type of reset is to be used actually depends on the included PCI device's requirements. The patch implements resetting PE on pSeries platform through RTAS call. Besides, it has been abstracted through struct eeh_ops::reset so that EEH core components could support multiple platforms in future. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Gavin Shan authored
On pSeries platform, the PE state might be temporarily unavailable. In that case, the firmware will return the corresponding wait time. That means the kernel has to wait for appropriate time in order to get the PE state. The patch does the implementation for that. Besides, the function has been abstracted through struct eeh_ops::wait_state so that EEH core components could support multiple platforms in future. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Gavin Shan authored
On pSeries platform, there're 2 dedicated RTAS calls introduced to retrieve the corresponding PE's state: ibm,read-slot-reset-state and ibm,read-slot-reset-state2. The patch implements the retrieval of PE's state according to the given PE address. Besides, the implementation has been abstracted by struct eeh_ops::get_state so that EEH core components could support multiple platforms in future. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Gavin Shan authored
There're 2 types of addresses used for EEH operations. The first one would be BDF (Bus/Device/Function) address which is retrieved from the reg property of the corresponding FDT node. Another one is PE address that should be enquired from firmware through RTAS call on pSeries platform. When issuing EEH operation, the PE address has precedence over BDF address. The patch implements retrieving PE address according to the given BDF address on pSeries platform. Also, the struct eeh_early_enable_info has been removed since the information can be figured out from dn->pdn->phb->buid directly and that simplifies the code. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Gavin Shan authored
There're 4 EEH operations that are covered by the dedicated RTAS call <ibm,set-eeh-option>: enable or disable EEH, enable MMIO and enable DMA. At early stage of system boot, the EEH would be tried to enable on PCI device related device node. MMIO and DMA for particular PE should be enabled when doing recovery on EEH errors so that the PE could function properly again. The patch implements it and abstract that through struct eeh_ops::set_eeh. It would be help for EEH to support multiple platforms in future. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Gavin Shan authored
The platform specific EEH operations have been abstracted by struct eeh_ops. The individual platroms, including pSeries, needs doing necessary initialization before the platform dependent EEH operations work properly. The patch is addressing that and do necessary platform initialization for pSeries platform. More specificly, it will figure out the tokens of EEH related RTAS calls. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Gavin Shan authored
EEH has been implemented on RTAS-compliant pSeries platform. That's to say, the EEH operations will be implemented through RTAS calls eventually. The situation limited feasible extension on EEH. In order to support EEH on multiple platforms like pseries and powernv simutaneously. We have to split the platform dependent EEH options up out of current implementation. The patch addresses supporting EEH on multiple platforms. The pseries platform dependent EEH operations will be abstracted by struct eeh_ops. EEH core components will be built based on the registered EEH operations. With the mechanism, what the individual platform needs to do is implement platform dependent EEH operations. For now, the pseries platform is covered under the mechanism. That means we have to think about other platforms to support EEH, like powernv. Besides, we only have framework for the mechanism and we have to implement it for pseries platform later. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Gavin Shan authored
The EEH has been implemented on pSeries platform. The original code looks a little bit nasty. The patch does cleanup on the current EEH implementation so that it looks more clean. * Try adding prefix "eeh" for functions. * Some function names have been adjusted so that they looks shorter and meaningful. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Gavin Shan authored
The EEH has been implemented on pSeries platform. The original code looks a little bit nasty. The patch does cleanup on the current EEH implementation so that it looks more clean. * Duplicated comments have been removed from the corresponding header files. * Comments have been reorganized so that it looks more clean. * The leading comments of functions are adjusted for a little bit so that the result of "make pdfdocs" would be more unified. * Function definitions and calls have unified format as "xxx()". That means the format "xxx ()" has been replaced by "xxx()". * There're multiple functions implemented for resetting PE. The position of those functions have been move around so that they are adjacent to each other to reflect their relationship. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 08 Mar, 2012 17 commits
-
-
Benjamin Herrenschmidt authored
On 64-bit, the mfmsr instruction can be quite slow, slower than loading a field from the cache-hot PACA, which happens to already contain the value we want in most cases. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Benjamin Herrenschmidt authored
We were using CR0.EQ after EXCEPTION_COMMON, hoping it still contained whether we came from userspace or kernel space. However, under some circumstances, EXCEPTION_COMMON will call C code and clobber non-volatile registers, so we really need to re-load the previous MSR from the stackframe and re-test. While there, invert the condition to make the fast path more obvious and remove the BUG_OPCODE which was a debugging leftover and call .ret_from_except as we should. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Benjamin Herrenschmidt authored
When running under a hypervisor that supports stolen time accounting, we may call C code from the macro EXCEPTION_PROLOG_COMMON in the exception entry path, which clobbers CR0. However, the FPU and vector traps rely on CR0 indicating whether we are coming from userspace or kernel to decide what to do. So we need to restore that value after the C call Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Benjamin Herrenschmidt authored
Also use local_paca instead of get_paca() to avoid getting into the smp_processor_id() debugging code from the debugger Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Benjamin Herrenschmidt authored
Other architectures such as x86 and ARM have been growing new support for features like retrying page faults after dropping the mm semaphore to break contention, or being able to return from a stuck page fault when a SIGKILL is pending. This refactors our implementation of do_page_fault() to move the error handling out of line in a way similar to x86 and adds support for those two features. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Benjamin Herrenschmidt authored
If we get a floating point, altivec or vsx unavaible interrupt in kernel, we trigger a kernel error. There is no point preserving the interrupt state, in fact, that can even make debugging harder as the processor state might change (we may even preempt) between taking the exception and landing in a debugger. So just make those 3 disable interrupts unconditionally. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> --- v2: On BookE only disable when hitting the kernel unavailable path, otherwise it will fail to restore softe as fast_exception_return doesn't do it.
-
Benjamin Herrenschmidt authored
We currently turn interrupts back to their previous state before calling do_page_fault(). This can be annoying when debugging as a bad fault will potentially have lost some processor state before getting into the debugger. We also end up calling some generic code with interrupts enabled such as notify_page_fault() with interrupts enabled, which could be unexpected. This changes our code to behave more like other architectures, and make the assembly entry code call into do_page_faults() with interrupts disabled. They are conditionally re-enabled from within do_page_fault() in the same spot x86 does it. While there, add the might_sleep() test in the case of a successful trylock of the mmap semaphore, again like x86. Also fix a bug in the existing assembly where r12 (_MSR) could get clobbered by C calls (the DTL accounting in the exception common macro and DISABLE_INTS) in some cases. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> --- v2. Add the r12 clobber fix
-
Benjamin Herrenschmidt authored
Some exceptions would unconditionally disable interrupts on entry, which is fine, but calling lockdep every time not only adds more overhead than strictly needed, but also means we get quite a few "redudant" disable logged, which makes it hard to spot the really bad ones. So instead, split the macro used by the exception code into a normal one and a separate one used when CONFIG_TRACE_IRQFLAGS is enabled, and make the later skip th tracing if interrupts were already disabled. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Benjamin Herrenschmidt authored
We unconditionally hard enable interrupts. This is unnecessary as syscalls are expected to always be called with interrupts enabled. While at it, we add a WARN_ON if that is not the case and CONFIG_TRACE_IRQFLAGS is enabled (we don't want to add overhead to the fast path when this is not set though). Thus let's remove the enabling (and associated irq tracing) from the syscall entry path. Also on Book3S, replace a few mfmsr instructions with loads of PACAMSR from the PACA, which should be faster & schedule better. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Benjamin Herrenschmidt authored
This moves the inlines into system.h and changes the runlatch code to use the thread local flags (non-atomic) rather than the TIF flags (atomic) to keep track of the latch state. The code to turn it back on in an asynchronous interrupt is now simplified and partially inlined. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Benjamin Herrenschmidt authored
The perfmon interrupt is the sole user of a special variant of the interrupt prolog which differs from the one used by external and timer interrupts in that it saves the non-volatile GPRs and doesn't turn the runlatch on. The former is unnecessary and the later is arguably incorrect, so let's clean that up by using the same prolog. While at it we rename that prolog to use the _ASYNC prefix. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Benjamin Herrenschmidt authored
This removes the various bits of assembly in the kernel entry, exception handling and SLB management code that were specific to running under the legacy iSeries hypervisor which is no longer supported. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Stephen Rothwell authored
This cleans up vio.c after the removal of the legacy iSeries platform. It also removes some no longer referenced include files. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Stephen Rothwell authored
The PowerPC legacy iSeries plateform is being removed along with the "one looney iseries driver", so this code can now be removed as well. cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Stephen Rothwell authored
The PowerPC legacy iSeries platform is being removed so this is no longer selectable. Cc: Alan Cox <alan@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: linux-serial@vger.kernel.org Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Stephen Rothwell authored
The PowerPC legacy iSeries platform is being removed, so this code is no longer needed. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Stephen Rothwell authored
The PowerPC legacy iSeries platform is being removed and this code is no longer selectable. There is more clean up that can be done, but this just gets the old code out of the way. Cc: "James E.J. Bottomley" <JBottomley@parallels.com> Cc: Brian King <brking@linux.vnet.ibm.com> Cc: linux-scsi@vger.kernel.org Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
-