• Dani Liberman's avatar
    habanalabs: handle race in driver fini · c3780338
    Dani Liberman authored
    Scenario:
    
    1. During hard reset, driver executes device_kill_open_processes.
    2. Drivers file descriptor is not closed yet (user process is alive),
       hence we are starting loop on all open file descriptors.
    3. Just before getting task struct of user process, according to
       pid, SIGKILL is sent to the user process, hence get_pid_task
       fails, driver prints a warning and device_kill_open_processes
       returns an error.
    4. Returned error causing driver fini do disable the device object
       of the process which causes a kernel crash.
    
    The fix is to handle this case not as an error and continue fini flow
    as normal, since the killed process (by the SIGKILL) will release its
    resources just like it will do when the driver sends him the sigkill.
    Signed-off-by: default avatarDani Liberman <dliberman@habana.ai>
    Reviewed-by: default avatarOded Gabbay <ogabbay@kernel.org>
    Signed-off-by: default avatarOded Gabbay <ogabbay@kernel.org>
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    c3780338
device.c 51.1 KB