Another attempt at fixing the rare random failures of rpl_corruption

The previous patch partially fixed things by waiting for the old dump thread on the master to exit before injecting the DBUG error. This prevents the error injection going to the wrong thread. However, there is still the problem that the old dump thread may never exit, causing the wait to time out. This happens if the dump thread manages to write all events down the socket before the socket is closed by the slave. The master dump thread only checks for slave gone when writing a new event, so if no new events are generated, old dump threads can hang around forever on the master after the slave disconnects. Fix by explicitly killing the old dump thread if it is still around.

Another attempt at fixing the rare random failures of rpl_corruption
The previous patch partially fixed things by waiting for the old dump thread on the master to exit before injecting the DBUG error. This prevents the error injection going to the wrong thread. However, there is still the problem that the old dump thread may never exit, causing the wait to time out. This happens if the dump thread manages to write all events down the socket before the socket is closed by the slave. The master dump thread only checks for slave gone when writing a new event, so if no new events are generated, old dump threads can hang around forever on the master after the slave disconnects. Fix by explicitly killing the old dump thread if it is still around.
54fcd3b8 · unknown · b8fdbf88 · 54fcd3b8
Commit 54fcd3b8 authored Jun 14, 2011 by unknown
Hide whitespace changes
Inline Side-by-side

Showing with 11 additions and 0 deletions

mysql-test/suite/rpl/t/rpl_corruption.test mysql-test/suite/rpl/t/rpl_corruption.test +11 -0

No files found.
--- a/mysql-test/suite/rpl/t/rpl_corruption.test
+++ b/mysql-test/suite/rpl/t/rpl_corruption.test
@@ -85,6 +85,17 @@ SET GLOBAL debug="-d,corrupt_read_log_event_char";
 # that the slave has disconnected, we will inject the corrupt event on
 # the wrong connection, and the test will fail
 # (+d,corrupt_read_log_event2 corrupts only one event).
+# So kill any lingering dump thread (we need to kill; otherwise dump thread
+# could manage to send all events down the socket before seeing it close, and
+# hang forever waiting for new binlog events to be created).
+let $id= `select id from information_schema.processlist where command = "Binlog Dump"`;
+if ($id)
+{
+  --disable_query_log
+  --error 0,1094
+  eval kill $id;
+  --enable_query_log
+}
 let $wait_condition=
  SELECT COUNT(*)=0 FROM INFORMATION_SCHEMA.PROCESSLIST WHERE command = 'Binlog Dump';
 --source include/wait_condition.inc