• Parav Pandit's avatar
    net/mlx5: E-switch, Protect eswitch mode changes · 8e0aa4bc
    Parav Pandit authored
    Currently eswitch mode change is occurring from 2 different execution
    contexts as below.
    1. sriov sysfs enable/disable
    2. devlink eswitch set commands
    
    Both of them need to access eswitch related data structures in
    synchronized manner.
    Without any synchronization below race condition exist.
    
    SR-IOV enable/disable with devlink eswitch mode change:
    
              cpu-0                         cpu-1
              -----                         -----
    mlx5_device_disable_sriov()        mlx5_devlink_eswitch_mode_set()
      mlx5_eswitch_disable()             esw_offloads_stop()
        esw_offloads_disable()             mlx5_eswitch_disable()
                                             esw_offloads_disable()
    
    Hence, they are synchronized using a new mode_lock.
    eswitch's state_lock is not used as it can lead to a deadlock scenario
    below and state_lock is only for vport and fdb exclusive access.
    
    ip link set vf <param>
      netlink rcv_msg() - Lock A
        rtnl_lock
        vfinfo()
          esw->state_lock() - Lock B
    devlink eswitch_set
       devlink_mutex
         esw->state_lock() - Lock B
           attach_netdev()
             register_netdev()
               rtnl_lock - Lock A
    
    Alternatives considered:
    1. Acquiring rtnl lock before taking esw->state_lock to follow similar
    locking sequence as ip link flow during eswitch mode set.
    rtnl lock is not good idea for two reasons.
    (a) Holding rtnl lock for several hundred device commands is not good
        idea.
    (b) It leads to below and more similar deadlocks.
    
    devlink eswitch_set
       devlink_mutex
         rtnl_lock - Lock A
           esw->state_lock() - Lock B
             eswitch_disable()
               reload()
                 ib_register_device()
                   ib_cache_setup_one()
                     rtnl_lock()
    
    2. Exporting devlink lock may lead to undesired use of it in vendor
    driver(s) in future.
    
    3. Unloading representors outside of the mode_lock requires
    serialization with other process trying to enable the eswitch.
    
    4. Differing the representors life cycle to a different workqueue
    requires synchronization with func_change_handler workqueue.
    Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
    Reviewed-by: default avatarBodong Wang <bodong@mellanox.com>
    Reviewed-by: default avatarMark Bloch <markb@mellanox.com>
    Signed-off-by: default avatarParav Pandit <parav@mellanox.com>
    Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
    8e0aa4bc
eswitch_offloads.c 77.7 KB