• Ganesh Goudar's avatar
    powerpc/eeh: Set channel state after notifying the drivers · 9efcdaac
    Ganesh Goudar authored
    When a PCI error is encountered 6th time in an hour we
    set the channel state to perm_failure and notify the
    driver about the permanent failure.
    
    However, after upstream commit 38ddc011 ("powerpc/eeh:
    Make permanently failed devices non-actionable"), EEH handler
    stops calling any routine once the device is marked as
    permanent failure. This issue can lead to fatal consequences
    like kernel hang with certain PCI devices.
    
    Following log is observed with lpfc driver, with and without
    this change, Without this change kernel hangs, If PCI error
    is encountered 6 times for a device in an hour.
    
    Without the change
    
     EEH: Beginning: 'error_detected(permanent failure)'
     PCI 0132:60:00.0#600000: EEH: not actionable (1,1,1)
     PCI 0132:60:00.1#600000: EEH: not actionable (1,1,1)
     EEH: Finished:'error_detected(permanent failure)'
    
    With the change
    
     EEH: Beginning: 'error_detected(permanent failure)'
     EEH: Invoking lpfc->error_detected(permanent failure)
     EEH: lpfc driver reports: 'disconnect'
     EEH: Invoking lpfc->error_detected(permanent failure)
     EEH: lpfc driver reports: 'disconnect'
     EEH: Finished:'error_detected(permanent failure)'
    
    To fix the issue, set channel state to permanent failure after
    notifying the drivers.
    
    Fixes: 38ddc011 ("powerpc/eeh: Make permanently failed devices non-actionable")
    Suggested-by: default avatarMahesh Salgaonkar <mahesh@linux.ibm.com>
    Signed-off-by: default avatarGanesh Goudar <ganeshgr@linux.ibm.com>
    Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/20230209105649.127707-1-ganeshgr@linux.ibm.com
    9efcdaac
eeh_driver.c 32.1 KB