MDEV-8302: Duplicate key with parallel replication
This bug is essentially another variant of MDEV-7458. If a transaction conflict caused a deadlock kill of T2 in record_gtid() during commit, the code would do a rollback _before_ running rgi->unmark_start_commit(). This creates a race where following transactions could start too early (before T2 has completed its transaction retry). This in turn could lead to replication failure, if there was a conflict that caused eg. duplicate key error or similar. The fix is to remove these rollbacks (in Query_log_event::do_apply_event() and Xid_log_event::do_apply_event(). They seem out-of-place; code in log_event.cc generally does not roll back on error, this is handled higher up. In addition, because of the extreme difficulty of reproducing bugs like MDEV-7458 and MDEV-8302, this patch adds some extra precations to try to detect (in debug builds) or prevent (in release builds) similar bugs. ha_rollback_trans() will now call unmark_start_commit() if needed (and assert in debug build when a caller does rollback without unmark first). We also add an extra check for thd->killed() so that we avoid doing mark_start_commit() if we already have a pending deadlock kill. And we add a missing unmark_start_commit() call in the error case, found by the above assertion.
Showing
Please register or sign in to comment