staging: lustre: lnet: Stop Infinite CON RACE Condition
In current code, when a CON RACE occurs, the passive side will let the node with the higher NID value win the race. We have a field case where a node can have a "stuck" connection which never goes away and is the trigger of a never-ending loop of re-connections. This patch introduces a counter to how many times a connection in a connecting state has been the cause of a CON RACE rejection. After 20 times (constant MAX_CONN_RACES_BEFORE_ABORT), we assume the connection is stuck and let the other side (with lower NID) win. Signed-off-by:Doug Oucharek <doug.s.oucharek@intel.com> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7646 Reviewed-on: http://review.whamcloud.com/19430Reviewed-by:
Amir Shehata <amir.shehata@intel.com> Reviewed-by:
Andreas Dilger <andreas.dilger@intel.com> Reviewed-by:
Oleg Drokin <oleg.drokin@intel.com> Signed-off-by:
James Simmons <jsimmons@infradead.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Showing
Please register or sign in to comment