• Amir Vadai's avatar
    net/mlx4_core: mlx4_init_slave() shouldn't access comm channel before PF is ready · 97989356
    Amir Vadai authored
    Currently, the PF call to pci_enable_sriov from the PF probe function
    stalls for 10 seconds times the number of VFs probed on the host. This
    happens because the way for such VFs to determine of the PF
    initialization finished, is by attempting to issue reset on the
    comm-channel and get timeout (after 10s).
    
    The PF probe function is called from a kenernel workqueue, and therefore
    during that time, rcu lock is being held and kernel's workqueue is
    stalled. This blocks other processes that try to use the workqueue
    or rcu lock.  For example, interface renaming which is calling
    rcu_synchronize is blocked, and timedout by systemd.
    
    Changed mlx4_init_slave() to allow VF probed on the host to immediatly
    detect that the PF is not ready, and return EPROBE_DEFER instantly.
    
    Only when the PF finishes the initialization, allow such VFs to
    access the comm channel.
    
    This issue and fix are relevant only for probed VFs on the hypervisor,
    there is no way to pass this information to a VM until comm channel is
    ready, so in a VM, if PF is not ready, the first command will be timedout
    after 10 seconds and return EPROBE_DEFER.
    Signed-off-by: default avatarAmir Vadai <amirv@mellanox.com>
    Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    97989356
main.c 75.5 KB