• Yang Zhang's avatar
    ocfs2/cluster: close a race that fence can't be triggered · fc2af28b
    Yang Zhang authored
    When some nodes of cluster face with TCP connection fault, ocfs2 will
    pick up a quorum to continue to work and other nodes will be fenced by
    resetting host.
    
    In order to decide which node should be fenced, ocfs2 leverages
    o2quo_state::qs_holds.  If that variable is reduced to zero, then a try
    to decide if fence local node is performed.  However, under a specific
    scenario that local node is not disconnected from others at the same
    time, above method has a problem to reduce ::qs_holds to zero.
    
    Because, o2net 90s idle timer corresponding to different nodes is
    triggered one after another.
    
      node 2			node 3
      90s idle timer elapses
      clear ::qs_conn_bm
      set hold
    				40s is passed
    				90 idle timer elapses
    				clear ::qs_conn_bm
    				set hold
      still up timer elapses
      clear hold (NOT to zero )
      90s idle timer elapses AGAIN
    				still up timer elapses.
    				clear hold
    				still up timer elapses
    
    To solve this issue, a node which has already be evicted from
    ::qs_conn_bm can't set hold again and again invoked from idle timer.
    
    Link: http://lkml.kernel.org/r/63ADC13FD55D6546B7DECE290D39E373F1F3F93B@H3CMLB12-EX.srv.huawei-3com.comSigned-off-by: default avatarYang Zhang <zhang.yangB@h3c.com>
    Signed-off-by: default avatarChangwei Ge <ge.changwei@h3c.com>
    Cc: Mark Fasheh <mfasheh@versity.com>
    Cc: Joel Becker <jlbec@evilplan.org>
    Cc: Junxiao Bi <junxiao.bi@oracle.com>
    Cc: Joseph Qi <jiangqi903@gmail.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    fc2af28b
quorum.c 10.3 KB