• Christophe Lombard's avatar
    cxl: Check periodically the coherent platform function's state · 266eab8f
    Christophe Lombard authored
    In the PowerVM environment, the PHYP CoherentAccel component manages
    the state of the Coherent Accelerator Processor Interface adapter and
    virtualizes CAPI resources, handles CAPP, PSL, PSL Slice errors - and
    interrupts - and provides a new set of hcalls for the OS APIs to utilize
    Accelerator Function Unit (AFU).
    
    During the course of operation, a coherent platform function can
    encounter errors. Some possible reason for errors are:
    • Hardware recoverable and unrecoverable errors
    • Transient and over-threshold correctable errors
    
    PHYP implements its own state model for the coherent platform function.
    The state of the AFU is available through a hcall.
    
    The current implementation of the cxl driver, for the PowerVM
    environment, checks this state of the AFU only when an action is
    requested - open a device, ioctl command, memory map, attach/detach a
    process - from an external driver - cxlflash, libcxl. If an error is
    detected the cxl driver handles the error according the content of the
    Power Architecture Platform Requirements document.
    
    But in case of low-level troubles (or error injection), the PHYP
    component may reset the card and change the AFU state. The PHYP
    interface doesn't provide any way to be notified when that happens thus
    implies that the cxl driver:
    • cannot handle immediatly the state change of the AFU.
    • cannot notify other drivers (cxlflash, ...)
    
    The purpose of this patch is to wake up the cpu periodically to check
    the current state of each AFU and to see if we need to enter an error
    recovery path.
    Signed-off-by: default avatarChristophe Lombard <clombard@linux.vnet.ibm.com>
    Acked-by: default avatarIan Munsie <imunsie@au1.ibm.com>
    Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
    266eab8f
guest.c 27.2 KB