• Jack Morgenstein's avatar
    IB/mlx4: Don't allow userspace open while recovering from catastrophic error · 3b4a8cd5
    Jack Morgenstein authored
    Userspace apps are supposed to release all ib device resources if they
    receive a fatal async event (IBV_EVENT_DEVICE_FATAL).  However, the
    app has no way of knowing when the device has come back up, except to
    repeatedly attempt ibv_open_device() until it succeeds.
    
    However, currently there is no protection against the open succeeding
    while the device is in being removed following the fatal event.  In
    this case, the open will succeed, but as a result the device waits in
    the middle of its removal until the new app releases its resources --
    and the new app will not do so, since the open succeeded at a point
    following the fatal event generation.
    
    This patch adds an "active" flag to the device. The active flag is set
    to false (in the fatal event flow) before the "fatal" event is
    generated, so any subsequent ibv_dev_open() call to the device will
    fail until the device comes back up, thus preventing the above
    deadlock.
    Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
    Signed-off-by: default avatarRoland Dreier <rolandd@cisco.com>
    3b4a8cd5
main.c 22.2 KB