• Junxiao Bi's avatar
    ocfs2: don't fire quorum before connection established · 5046f18d
    Junxiao Bi authored
    Firing quorum before connection established can cause unexpected node to
    reboot.
    
    Assume there are 3 nodes in the cluster, Node 1, 2, 3.  Node 2 and 3 have
    wrong ip address of Node 1 in cluster.conf and global heartbeat is enabled
    in the cluster.  After the heatbeats are started on these three nodes,
    Node 1 will reboot due to quorum fencing.  It is similar case if Node 1's
    networking is not ready when starting the global heartbeat.
    
    The reboot is not friendly as customer is not fully ready for ocfs2 to
    work.  Fix it by not allowing firing quorum before the connection is
    established.  In this case, ocfs2 will wait until the wrong configuration
    is fixed or networking is up to continue.  Also update the log to guide
    the user where to check when connection is not built for a long time.
    Signed-off-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
    Reviewed-by: default avatarSrinivas Eeda <srinivas.eeda@oracle.com>
    Cc: Joel Becker <jlbec@evilplan.org>
    Cc: Mark Fasheh <mfasheh@suse.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    5046f18d
tcp.c 59.7 KB