• Oded Gabbay's avatar
    habanalabs: soft-reset device if context-switch fails · af5f7eea
    Oded Gabbay authored
    This patch fix a bug in the driver, where if the TPC or MME remains in
    non-IDLE even after all the command submissions are done (due to user bug
    or malicious user), then future command submissions will fail in the
    context-switch stage and the driver will remain in "stuck" mode.
    
    The fix is to do a soft-reset of the device in case the context-switch
    fails, because the device should be IDLE during context-switch. If it is
    not IDLE, then something is wrong and we should reset the compute engines.
    Signed-off-by: default avatarOded Gabbay <oded.gabbay@gmail.com>
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    af5f7eea
command_submission.c 17.8 KB