• LUU Duc Canh's avatar
    tipc: fix failover problem · c140eb16
    LUU Duc Canh authored
    We see the following scenario:
    1) Link endpoint B on node 1 discovers that its peer endpoint is gone.
       Since there is a second working link, failover procedure is started.
    2) Link endpoint A on node 1 sends a FAILOVER message to peer endpoint
       A on node 2. The node item 1->2 goes to state FAILINGOVER.
    3) Linke endpoint A/2 receives the failover, and is supposed to take
       down its parallell link endpoint B/2, while producing a FAILOVER
       message to send back to A/1.
    4) However, B/2 has already been deleted, so no FAILOVER message can
       created.
    5) Node 1->2 remains in state FAILINGOVER forever, refusing to receive
       any messages that can bring B/1 up again. We are left with a non-
       redundant link between node 1 and 2.
    
    We fix this with letting endpoint A/2 build a dummy FAILOVER message
    to send to back to A/1, so that the situation can be resolved.
    Signed-off-by: default avatarLUU Duc Canh <canh.d.luu@dektech.com.au>
    Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    c140eb16
link.c 59.6 KB