habanalabs/gaudi2: reset device upon critical ECC event
Correctable ECC events are not fatal, but as they accumulate, the f/w can decide that a hard-rest is required. This indication is propagated to the host using the existing ECC event interface. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Showing
Please register or sign in to comment