• Dani Liberman's avatar
    habanalabs: fix race when waiting on encaps signal · b32cd104
    Dani Liberman authored
    Scenario:
    1. CS which is part of encaps signal has been completed and now
    executing kref_put to its encaps signal handle. The refcount of the
    handle decremented to 0, and called the encaps signal handle
    release function - hl_encaps_handle_do_release.
    
    2. At this point the user starts waiting on the signal, and finds the
    encaps signal handle in the handlers list and increment the habdle
    refcount to 1.
    
    3. Immediately after, hl_encaps_handle_do_release removed the handle
    from the list and free its memory.
    
    4. Wait function using the handle although it has been freed.
    
    This scenario caused the slab area which was previously allocated
    for the handle to be poison overwritten which triggered kernel bug
    the next time the OS needed to allocate this slab.
    
    Fixed by getting the refcount of the handle only in case it is not
    zero.
    Signed-off-by: default avatarDani Liberman <dliberman@habana.ai>
    Reviewed-by: default avatarOded Gabbay <ogabbay@kernel.org>
    Signed-off-by: default avatarOded Gabbay <ogabbay@kernel.org>
    b32cd104
command_submission.c 84.3 KB