• Jonathan Kim's avatar
    drm/amdkfd: fix mes set shader debugger process management · bd33bb14
    Jonathan Kim authored
    MES provides the driver a call to explicitly flush stale process memory
    within the MES to avoid a race condition that results in a fatal
    memory violation.
    
    When SET_SHADER_DEBUGGER is called, the driver passes a memory address
    that represents a process context address MES uses to keep track of
    future per-process calls.
    
    Normally, MES will purge its process context list when the last queue
    has been removed.  The driver, however, can call SET_SHADER_DEBUGGER
    regardless of whether a queue has been added or not.
    
    If SET_SHADER_DEBUGGER has been called with no queues as the last call
    prior to process termination, the passed process context address will
    still reside within MES.
    
    On a new process call to SET_SHADER_DEBUGGER, the driver may end up
    passing an identical process context address value (based on per-process
    gpu memory address) to MES but is now pointing to a new allocated buffer
    object during KFD process creation.  Since the MES is unaware of this,
    access of the passed address points to the stale object within MES and
    triggers a fatal memory violation.
    
    The solution is for KFD to explicitly flush the process context address
    from MES on process termination.
    
    Note that the flush call and the MES debugger calls use the same MES
    interface but are separated as KFD calls to avoid conflicting with each
    other.
    Signed-off-by: default avatarJonathan Kim <jonathan.kim@amd.com>
    Tested-by: default avatarAlice Wong <shiwei.wong@amd.com>
    Reviewed-by: default avatarEric Huang <jinhuieric.huang@amd.com>
    Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
    bd33bb14
amdgpu_mes.c 41.1 KB