• farah kassabri's avatar
    habanalabs: Fix reset upon device release bug · a78b07dc
    farah kassabri authored
    In case user application was interrupted while some cs still in-flight
    or in the middle of completion handling in driver, the
    last refcount of the kernel private data for the user process
    will not be put in the fd close flow, but in the cs completion
    workqueue context.
    
    This means that the device reset-upon-device-release will be called
    from that context. During the reset flow, the driver flushes all the cs
    workqueue to ensure that any scheduled work has run to completion,
    and since we are running from the completion context we will
    have deadlock.
    
    Therefore, we need to skip flushing the workqueue in those cases.
    It is safe to do it because the user won't be able to release the device
    unless the workqueues are already empty.
    Signed-off-by: default avatarfarah kassabri <fkassabri@habana.ai>
    Reviewed-by: default avatarOded Gabbay <ogabbay@kernel.org>
    Signed-off-by: default avatarOded Gabbay <ogabbay@kernel.org>
    a78b07dc
command_submission.c 88.9 KB