• Oded Gabbay's avatar
    habanalabs: define soft-reset as inference op · a00f1f57
    Oded Gabbay authored
    Soft-reset is the procedure where we reset only the compute/DMA engines
    of the device, without requiring the current user-space process to
    release the device.
    
    This type of reset can happen if TDR event occurred (a workload got
    stuck) or by a root request through sysfs.
    
    This is only relevant for inference ASICs, as there is no real-world
    use-case to do that in training, because training runs on multiple
    devices.
    
    In addition, we also do (in certain ASICs) a reset upon device release.
    That reset uses the same code as the soft-reset.
    
    Therefore, to better differentiate between the two resets, it is better
    to rename the soft-reset support as "inference soft-reset", to make
    the code more self-explanatory.
    Signed-off-by: default avatarOded Gabbay <ogabbay@kernel.org>
    a00f1f57
sysfs.c 10.4 KB