• Nicolai Hähnle's avatar
    drm/amd/sched: fix deadlock caused by unsignaled fences of deleted jobs · 79867462
    Nicolai Hähnle authored
    Highly concurrent Piglit runs can trigger a race condition where a pending
    SDMA job on a buffer object is never executed because the corresponding
    process is killed (perhaps due to a crash). Since the job's fences were
    never signaled, the buffer object was effectively leaked. Worse, the
    buffer was stuck wherever it happened to be at the time, possibly in VRAM.
    
    The symptom was user space processes stuck in interruptible waits with
    kernel stacks like:
    
        [<ffffffffbc5e6722>] dma_fence_default_wait+0x112/0x250
        [<ffffffffbc5e6399>] dma_fence_wait_timeout+0x39/0xf0
        [<ffffffffbc5e82d2>] reservation_object_wait_timeout_rcu+0x1c2/0x300
        [<ffffffffc03ce56f>] ttm_bo_cleanup_refs_and_unlock+0xff/0x1a0 [ttm]
        [<ffffffffc03cf1ea>] ttm_mem_evict_first+0xba/0x1a0 [ttm]
        [<ffffffffc03cf611>] ttm_bo_mem_space+0x341/0x4c0 [ttm]
        [<ffffffffc03cfc54>] ttm_bo_validate+0xd4/0x150 [ttm]
        [<ffffffffc03cffbd>] ttm_bo_init_reserved+0x2ed/0x420 [ttm]
        [<ffffffffc042f523>] amdgpu_bo_create_restricted+0x1f3/0x470 [amdgpu]
        [<ffffffffc042f9fa>] amdgpu_bo_create+0xda/0x220 [amdgpu]
        [<ffffffffc04349ea>] amdgpu_gem_object_create+0xaa/0x140 [amdgpu]
        [<ffffffffc0434f97>] amdgpu_gem_create_ioctl+0x97/0x120 [amdgpu]
        [<ffffffffc037ddba>] drm_ioctl+0x1fa/0x480 [drm]
        [<ffffffffc041904f>] amdgpu_drm_ioctl+0x4f/0x90 [amdgpu]
        [<ffffffffbc23db33>] do_vfs_ioctl+0xa3/0x5f0
        [<ffffffffbc23e0f9>] SyS_ioctl+0x79/0x90
        [<ffffffffbc864ffb>] entry_SYSCALL_64_fastpath+0x1e/0xad
        [<ffffffffffffffff>] 0xffffffffffffffff
    
    Note: The correctness of this change depends on the earlier commit
    "drm/amd/sched: move adding finish callback to amd_sched_job_begin"
    
    v2: set an error on the finished fence
    Signed-off-by: default avatarNicolai Hähnle <nicolai.haehnle@amd.com>
    Reviewed-by: default avatarChristian König <christian.koenig@amd.com>
    Reviewed-by: default avatarAndres Rodriguez <andresx7@gmail.com>
    Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
    79867462
gpu_scheduler.c 18 KB