• Oded Gabbay's avatar
    habanalabs: all FD must be closed before removing device · f1d84fe4
    Oded Gabbay authored
    [ Upstream commit caa3c8e5 ]
    
    This patch fixes a bug in the implementation of the function that removes
    the device.
    
    The bug can happen when the device is removed but not the driver itself
    (e.g. remove by the OS due to PCI freeze in Power architecture).
    
    In that case, there maybe open users that are calling IOCTLs while the
    device is removed. This is a possible race condition that the driver must
    handle. Otherwise, a kernel panic may occur.
    
    This race is prevented in the hard-reset flow, because the driver makes
    sure the users are closed before continuing with the hard-reset. This
    race can not occur when the driver itself is removed because the OS makes
    sure all the file descriptors are closed.
    
    The fix is to make sure the open users close their file descriptors and if
    they don't (after a certain amount of time), the driver sends them a
    SIGKILL, because the remove of the device can't be stopped.
    
    The patch re-uses the same code that is called from the hard-reset flow.
    Signed-off-by: default avatarOded Gabbay <oded.gabbay@gmail.com>
    Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
    f1d84fe4
device.c 28.5 KB