• Jay Vosburgh's avatar
    bonding: fix slave stuck in BOND_LINK_FAIL state · 055db695
    Jay Vosburgh authored
    The bonding miimon logic has a flaw, in that a failure of the
    rtnl_trylock can cause a slave to become permanently stuck in
    BOND_LINK_FAIL state.
    
    	The sequence of events to cause this is as follows:
    
    	1) bond_miimon_inspect finds that a slave's link is down, and so
    calls bond_propose_link_state, setting slave->new_link_state to
    BOND_LINK_FAIL, then sets slave->new_link to BOND_LINK_DOWN and returns
    non-zero.
    
    	2) In bond_mii_monitor, the rtnl_trylock fails, and the timer is
    rescheduled.  No change is committed.
    
    	3) bond_miimon_inspect is called again, but this time the slave
    from step 1 has recovered.  slave->new_link is reset to NOCHANGE, and, as
    slave->link was never changed, the switch enters the BOND_LINK_UP case,
    and does nothing.  The pending BOND_LINK_FAIL state from step 1 remains
    pending, as new_link_state is not reset.
    
    	4) The state from step 3 persists until another slave changes link
    state and causes bond_miimon_inspect to return non-zero.  At this point,
    the BOND_LINK_FAIL state change on the slave from steps 1-3 is committed,
    and the slave will remain stuck in BOND_LINK_FAIL state even though it
    is actually link up.
    
    	The remedy for this is to initialize new_link_state on each entry
    to bond_miimon_inspect, as is already done with new_link.
    
    Fixes: fb9eb899 ("bonding: handle link transition from FAIL to UP correctly")
    Reported-by: default avatarAlex Sidorenko <alexandre.sidorenko@hpe.com>
    Reviewed-by: default avatarJarod Wilson <jarod@redhat.com>
    Signed-off-by: default avatarJay Vosburgh <jay.vosburgh@canonical.com>
    Acked-by: default avatarMahesh Bandewar <maheshb@google.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    055db695
bond_main.c 134 KB