• Vladimir Lypak's avatar
    drm/msm/a5xx: fix races in preemption evaluation stage · ce050f30
    Vladimir Lypak authored
    On A5XX GPUs when preemption is used it's invietable to enter a soft
    lock-up state in which GPU is stuck at empty ring-buffer doing nothing.
    This appears as full UI lockup and not detected as GPU hang (because
    it's not). This happens due to not triggering preemption when it was
    needed. Sometimes this state can be recovered by some new submit but
    generally it won't happen because applications are waiting for old
    submits to retire.
    
    One of the reasons why this happens is a race between a5xx_submit and
    a5xx_preempt_trigger called from IRQ during submit retire. Former thread
    updates ring->cur of previously empty and not current ring right after
    latter checks it for emptiness. Then both threads can just exit because
    for first one preempt_state wasn't NONE yet and for second one all rings
    appeared to be empty.
    
    To prevent such situations from happening we need to establish guarantee
    for preempt_trigger to make decision after each submit or retire. To
    implement this we serialize preemption initiation using spinlock. If
    switch is already in progress we need to re-trigger preemption when it
    finishes.
    
    Fixes: b1fc2839 ("drm/msm: Implement preemption for A5XX targets")
    Signed-off-by: default avatarVladimir Lypak <vladimir.lypak@gmail.com>
    Patchwork: https://patchwork.freedesktop.org/patch/612045/Signed-off-by: default avatarRob Clark <robdclark@chromium.org>
    ce050f30
a5xx_preempt.c 8.92 KB