1. 14 Jan, 2015 1 commit
    • Kristian Nielsen's avatar
      MDEV-7467: sporadic failure in rpl.rpl_gtid_crash · 02099a33
      Kristian Nielsen authored
      The test case injects a DBUG that will crash the server during replication,
      then does a START SLAVE. We need to use --error 0,2006,2013 on the START
      SLAVE, so that we will not fail the test if the server has time to crash
      before the START SLAVE returns to the client.
      
      Fixes a failure seen in Buildbot.
      02099a33
  2. 13 Jan, 2015 1 commit
  3. 07 Jan, 2015 1 commit
    • Kristian Nielsen's avatar
      MDEV-7326: Server deadlock in connection with parallel replication · f27817c1
      Kristian Nielsen authored
      The bug occurs when a transaction does a retry after all transactions have
      done mark_start_commit() in a batch of group commit from the master. In this
      case, the retrying transaction can unmark_start_commit() after the following
      batch has already started running and de-allocated the GCO. Then after retry,
      the transaction will re-do mark_start_commit() on a de-allocated GCO, and also
      wakeup of later GCOs can be lost.
      
      This was seen "in the wild" by a user, even though it is not known exactly
      what circumstances can lead to retry of one transaction after all transactions
      in a group have reached the commit phase.
      
      The lifetime around GCO was somewhat clunky anyway. With this patch, a GCO
      lives until rpl_parallel_entry::last_committed_sub_id has reached the last
      transaction in the GCO. This guarantees that the GCO will still be alive when
      a transaction does mark_start_commit(). Also, we now loop over the list of
      active GCOs for wakeup, to ensure we do not lose a wakeup even in the
      problematic case.
      f27817c1
  4. 06 Jan, 2015 2 commits
    • Jan Lindström's avatar
      MDEV-7403: should not pass recv_writer_thread_handle to CloseHandle() · 4a325159
      Jan Lindström authored
      Analysis: For some reason actual thread handle is not
      returned on Windows instead lpThreadId was returned and
      thread handle was closed after thread create. Later
      CloseHandle was called for recv_writer_thread_handle
      and psort_info->thread_hdl.
      
      Fix: Return thread handle from os_thread_create()
      also on Windows and store these thread handles also
      in srv0start.cc so that they can be later closed.
      4a325159
    • Kristian Nielsen's avatar
      MDEV-7353: rpl_mdev6386 fails sporadically in buildbot · 6e0a00ed
      Kristian Nielsen authored
      Use include/sync_with_master_gtid.inc instead of --sync_with_master to avoid a
      race in the test case.
      
      In parallel replication, the old-style slave position (which is used by
      --sync_with_master) is updated out-of-order between parallel threads. This
      makes it possible for the position to be updated past DROP TEMPORARY TABLE t2
      just before the commit of INSERT INTO t1 SELECT * FROM t2 becomes visible.
      
      In this case, there is a small window where a SELECT just after
      --sync_with_master may not see the changes from the INSERT.
      6e0a00ed
  5. 30 Dec, 2014 1 commit
  6. 28 Dec, 2014 1 commit
  7. 19 Dec, 2014 1 commit
  8. 18 Dec, 2014 1 commit
    • Kristian Nielsen's avatar
      MDEV-7342: Test failure in perfschema.setup_instruments_defaults · 826d7c68
      Kristian Nielsen authored
      Fix a possible race in the test case when restarting the server.
      
      Make sure we have disconnected before waiting for the reconnect
      that signals that the server is back up. Otherwise, we may in
      rare cases continue the test while the old server is shutting
      down, eventually leading to "connection lost" failure.
      826d7c68
  9. 12 Dec, 2014 3 commits
  10. 10 Dec, 2014 1 commit
  11. 07 Dec, 2014 1 commit
  12. 05 Dec, 2014 2 commits
  13. 04 Dec, 2014 1 commit
  14. 03 Dec, 2014 6 commits
  15. 02 Dec, 2014 4 commits
  16. 03 Dec, 2014 1 commit
    • Kristian Nielsen's avatar
      MDEV-4393: show_explain.test times out randomly · d79cce86
      Kristian Nielsen authored
      The problem was a race between the debug code in the server and the SHOW
      EXPLAIN FOR in the test case.
      
      The test case would wait for a query to reach the first point of interest
      (inside dbug_serve_apcs()), then send it a SHOW EXPLAIN FOR, then wait for the
      query to reach the next point of interest. However, the second wait was
      insufficient. It was possible for the the second wait to complete immediately,
      causing both the first and the second SHOW EXPLAIN FOR to hit the same
      invocation of dbug_server_apcs(). Then a later invocation would miss its
      intended SHOW EXPLAIN FOR and hang, and the test case would eventually time
      out.
      
      Fix is to make sure that the second wait can not trigger during the first
      invocation of dbug_server_apcs(). We do this by clearing the thd_proc_info
      (that the wait is looking for) before processing the SHOW EXPLAIN FOR; this
      way the second wait can not start until the thd_proc_info from the first
      invocation has been cleared.
      d79cce86
  17. 02 Dec, 2014 3 commits
  18. 01 Dec, 2014 5 commits
  19. 22 Nov, 2014 2 commits
  20. 01 Dec, 2014 1 commit
    • Kristian Nielsen's avatar
      MDEV-7237: Parallel replication: incorrect relaylog position after stop/start the slave · 52b25934
      Kristian Nielsen authored
      The replication relay log position was sometimes updated incorrectly at the
      end of a transaction in parallel replication. This happened because the relay
      log file name was taken from the current Relay_log_info (SQL driver thread),
      not the correct value for the transaction in question.
      
      The result was that if a transaction was applied while the SQL driver thread
      was at least one relay log file ahead, _and_ the SQL thread was subsequently
      stopped before applying any events from the most recent relay log file, then
      the relay log position would be incorrect - wrong relay log file name. Thus,
      when the slave was started again, usually a relay log read error would result,
      or in rare cases, if the position happened to be readable, the slave might
      even skip arbitrary amounts of events.
      
      In GTID mode, the relay log position is reset when both slave threads are
      restarted, so this bug would only be seen in non-GTID mode, or in GTID mode
      when only the SQL thread, not the IO thread, was stopped.
      52b25934
  21. 28 Nov, 2014 1 commit