• Monk Liu's avatar
    amd/scheduler:imple job skip feature(v3) · 48f05f29
    Monk Liu authored
    jobs are skipped under two cases
    1)when the entity behind this job marked guilty, the job
    poped from this entity's queue will be dropped in sched_main loop.
    
    2)in job_recovery(), skip the scheduling job if its karma detected
    above limit, and also skipped as well for other jobs sharing the
    same fence context. this approach is becuase job_recovery() cannot
    access job->entity due to entity may already dead.
    
    v2:
    some logic fix
    
    v3:
    when entity detected guilty, don't drop the job in the poping
    stage, instead set its fence error as -ECANCELED
    
    in run_job(), skip the scheduling either:1) fence->error < 0
    or 2) there was a VRAM LOST occurred on this job.
    this way we can unify the job skipping logic.
    
    with this feature we can introduce new gpu recover feature.
    Signed-off-by: default avatarMonk Liu <Monk.Liu@amd.com>
    Reviewed-by: default avatarChristian König <christian.koenig@amd.com>
    Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
    48f05f29
amdgpu_job.c 5.96 KB