1. 13 Nov, 2014 9 commits
    • Kristian Nielsen's avatar
      MDEV-6718: Server crashed in Gtid_log_event::Gtid_log_event with parallel replication · 8a3e2f29
      Kristian Nielsen authored
      The bug occured in parallel replication when re-trying transactions that
      failed due to deadlock. In this case, the relay log file is re-opened and the
      events are read out again. This reading requires a format description event of
      the appropriate version. But the code was using a description event stored in
      rli, which is not thread-safe. This could lead to various rare races if the
      format description event was replaced by the SQL driver thread at the exact
      moment where a worker thread was trying to use it.
      
      The fix is to instead make the retry code create and maintain its own format
      description event. When the relay log file is opened, we first read the format
      description event from the start of the file, before seeking to the current
      position. This now uses the same code as when the SQL driver threads starts
      from a given relay log position. This also makes sure that the correct format
      description event version will be used in cases where the version of the
      binlog could change during replication.
      8a3e2f29
    • Kristian Nielsen's avatar
      MDEV-7102: Incorrect PSI_stage_info message in SHOW PROCESSLIST during parallel replication · a98a034c
      Kristian Nielsen authored
      In parallel replication, threads can do two different waits for a prior
      transaction. One is for the prior transaction to start commit, the other is
      for it to complete commit.
      
      It turns out that the same PSI_stage_info message was errorneously used in
      both cases (probably a merge error), causing SHOW PROCESSLIST to be
      misleading.
      
      Fix by using correct, distinct message in each case.
      a98a034c
    • Kristian Nielsen's avatar
      ecc33da2
    • Kristian Nielsen's avatar
      MDEV-6775: Wrong binlog order in parallel replication · 684715a2
      Kristian Nielsen authored
      In parallel replication, the wait_for_commit facility is used to ensure that
      events are written into the binlog in the correct order. This is handled in an
      optimised way in the binlogging group commit code.
      
      However, some statements, for example GRANT, are written directly into the
      binlog, outside of the group commit code. There was a bug that this direct
      write does not correctly wait for the prior transactions to have been written
      first, which allows f.ex. GRANT to be written ahead of earlier transactions.
      
      This patch adds the missing wait_for_prior_commit() before writing directly to
      the binlog.
      
      However, the problem is still there, although the race is much less likely to
      occur now. The problem is that the optimised group commit code does wakeup of
      following transactions early, before the binlog write is actually done. A
      woken-up following transaction is then allowed to run ahead and queue up for
      the group commit, which will ensure that binlog write happens in correct order
      in the end. However, the code for directly written events currently bypass
      this mechanism, so they get woken up and written too early.
      
      This will be fixed properly in a later patch.
      684715a2
    • Kristian Nielsen's avatar
      Revert incorrect/redundant fix for old BUG#34656 · 55791c1a
      Kristian Nielsen authored
      The real bug was that open_tables() returned error in case of
      thd->killed() without properly calling thd->send_kill_message()
      to set the correct error. This was fixed some time ago.
      
      So remove the, now redundant, extra checks for thd->is_error(),
      possibly allowing to catch in debug builds more incorrect
      error handling cases.
      55791c1a
    • Kristian Nielsen's avatar
      MDEV-7101: SAFE_MUTEX lock order warning when reusing wait_for_commit mutex · fbc8768c
      Kristian Nielsen authored
      In SAFE_MUTEX builds, reset the wait_for_commit mutex (destroy and
      re-initialise), so that SAFE_MUTEX lock order check does not become
      confused when the mutex is re-used for a different purpose.
      fbc8768c
    • Jan Lindström's avatar
      MDEV-7035: Remove innodb_io_capacity setting depending on · 0f322994
      Jan Lindström authored
      setting of innodb_io_capacity_max
      
      (a) Changed the behaviour so that if you set innodb_io_capacity to a 
      value > innodb_io_capacity_max that the value is accepted AND 
      that innodb_io_capacity_max = innodb_io_capacity * 2.
      
      (b) If someone wants to reduce innodb_io_capacity_max and 
      reduce it below innodb_io_capacity then innodb_io_capacity 
      should be reduced to the same level as innodb_io_capacity_max.
      
      In both cases give a warning to user.
      0f322994
    • Jan Lindström's avatar
      MDEV-7100: InnoDB error monitor might unnecessary wait log_sys mutex · bff2d46b
      Jan Lindström authored
      Analysis: InnoDB error monitor is responsible to call every second
      sync_arr_wake_threads_if_sema_free() to wake up possible hanging
      threads if they are missed in mutex_signal_object. This is not
      possible if error monitor itself is on mutex/semaphore wait. We
      should avoid all unnecessary mutex/semaphore waits on error monitor.
      Currently error monitor calls function buf_flush_stat_update()
      that calls log_get_lsn() function and there we will try to get
      log_sys mutex. Better, solution for error monitor is that in
      buf_flush_stat_update() we will try to get lsn with
      mutex_enter_nowait() and if we did not get mutex do not update
      the stats.
      
      Fix: Use log_get_lsn_nowait() function on buf_flush_stat_update()
      function. If returned lsn is 0, we do not update flush stats.
      log_get_lsn_nowait() will use mutex_enter_nowait() and if
      we get mutex we return a correct lsn if not we return 0.
      bff2d46b
    • Jan Lindström's avatar
      MDEV-7083: sys_vars.innodb_sched_priority* tests fail in buildbot · 84f3f3fa
      Jan Lindström authored
      on work-amd64-valgrind.
      
      Fixed issue by finding out first the current used priority
      for both treads and using that seeing did we really change
      the priority or not.
      84f3f3fa
  2. 12 Nov, 2014 5 commits
  3. 11 Nov, 2014 1 commit
  4. 10 Nov, 2014 1 commit
  5. 03 Nov, 2014 2 commits
  6. 02 Nov, 2014 2 commits
  7. 01 Nov, 2014 1 commit
  8. 31 Oct, 2014 2 commits
    • unknown's avatar
      Cleanup. · ee309b10
      unknown authored
      ee309b10
    • Kristian Nielsen's avatar
      Fix sporadic test failure in main.processlist · bad5fdec
      Kristian Nielsen authored
      The test runs a query in one thread, then in another queries the processlist
      and expects to find the first thread in the COM_SLEEP state. The problem is
      that the thread signals completion to the client before changing to COM_SLEEP
      state, so there is a window where the other thread can see the wrong state.
      
      A previous attempt to fix this was ineffective. It set a DEBUG_SYNC to handle
      proper waiting, but unfortunately that DEBUG_SYNC point ended up triggering
      already at the end of SET DEBUG_SYNC=xxx, so the wait was ineffective.
      
      Fix it properly now (hopefully) by ensuring that we wait for the DEBUG_SYNC
      point to trigger at the end of the SELECT SLEEP(), not just at the end of
      SET DEBUG_SYNC=xxx.
      bad5fdec
  9. 30 Oct, 2014 2 commits
  10. 29 Oct, 2014 4 commits
  11. 28 Oct, 2014 7 commits
  12. 27 Oct, 2014 2 commits
  13. 26 Oct, 2014 2 commits