• Taehee Yoo's avatar
    team: fix hang in team_mode_get() · 1c30fbc7
    Taehee Yoo authored
    When team mode is changed or set, the team_mode_get() is called to check
    whether the mode module is inserted or not. If the mode module is not
    inserted, it calls the request_module().
    In the request_module(), it creates a child process, which is
    the "modprobe" process and waits for the done of the child process.
    At this point, the following locks were used.
    down_read(&cb_lock()); by genl_rcv()
        genl_lock(); by genl_rcv_msc()
            rtnl_lock(); by team_nl_cmd_options_set()
                mutex_lock(&team->lock); by team_nl_team_get()
    
    Concurrently, the team module could be removed by rmmod or "modprobe -r"
    The __exit function of team module is team_module_exit(), which calls
    team_nl_fini() and it tries to acquire following locks.
    down_write(&cb_lock);
        genl_lock();
    Because of the genl_lock() and cb_lock, this process can't be finished
    earlier than request_module() routine.
    
    The problem secenario.
    CPU0                                     CPU1
    team_mode_get
        request_module()
                                             modprobe -r team_mode_roundrobin
                                                         team <--(B)
            modprobe team <--(A)
                team_mode_roundrobin
    
    By request_module(), the "modprobe team_mode_roundrobin" command
    will be executed. At this point, the modprobe process will decide
    that the team module should be inserted before team_mode_roundrobin.
    Because the team module is being removed.
    
    By the module infrastructure, the same module insert/remove operations
    can't be executed concurrently.
    So, (A) waits for (B) but (B) also waits for (A) because of locks.
    So that the hang occurs at this point.
    
    Test commands:
        while :
        do
            teamd -d &
    	killall teamd &
    	modprobe -rv team_mode_roundrobin &
        done
    
    The approach of this patch is to hold the reference count of the team
    module if the team module is compiled as a module. If the reference count
    of the team module is not zero while request_module() is being called,
    the team module will not be removed at that moment.
    So that the above scenario could not occur.
    
    Fixes: 3d249d4c ("net: introduce ethernet teaming device")
    Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
    Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    1c30fbc7
team.c 72.7 KB