• Lancelot SIX's avatar
    drm/amdkfd: Flush the process wq before creating a kfd_process · f5b90533
    Lancelot SIX authored
    There is a race condition when re-creating a kfd_process for a process.
    This has been observed when a process under the debugger executes
    exec(3).  In this scenario:
    - The process executes exec.
     - This will eventually release the process's mm, which will cause the
       kfd_process object associated with the process to be freed
       (kfd_process_free_notifier decrements the reference count to the
       kfd_process to 0).  This causes kfd_process_ref_release to enqueue
       kfd_process_wq_release to the kfd_process_wq.
    - The debugger receives the PTRACE_EVENT_EXEC notification, and tries to
      re-enable AMDGPU traps (KFD_IOC_DBG_TRAP_ENABLE).
     - When handling this request, KFD tries to re-create a kfd_process.
       This eventually calls kfd_create_process and kobject_init_and_add.
    
    At this point the call to kobject_init_and_add can fail because the
    old kfd_process.kobj has not been freed yet by kfd_process_wq_release.
    
    This patch proposes to avoid this race by making sure to drain
    kfd_process_wq before creating a new kfd_process object.  This way, we
    know that any cleanup task is done executing when we reach
    kobject_init_and_add.
    Signed-off-by: default avatarLancelot SIX <lancelot.six@amd.com>
    Reviewed-by: default avatarFelix Kuehling <felix.kuehling@amd.com>
    Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
    f5b90533
kfd_process.c 57.3 KB