1. 12 May, 2017 1 commit
  2. 31 Mar, 2017 2 commits
  3. 23 Mar, 2017 1 commit
  4. 17 Mar, 2017 1 commit
  5. 14 Mar, 2017 1 commit
  6. 02 Mar, 2017 1 commit
  7. 21 Feb, 2017 1 commit
    • Julien Muchembled's avatar
      Implement deadlock avoidance · 092992db
      Julien Muchembled authored
      This is a first version with several optimizations possible:
      - improve EventQueue (or implement a specific queue) to minimize deadlocks
      - turn the RebaseObject packet into a notification
      
      Sorting oids could also be useful to reduce the probability of deadlocks,
      but that would never be enough to avoid them completely, even if there's a
      single storage. For example:
      
      1. C1 does a first store (x or y)
      2. C2 stores x and y; one is delayed
      3. C1 stores the other -> deadlock
         When solving the deadlock, the data of the first store may only
         exist on the storage.
      
      2 functional tests are removed because they're redundant,
      either with ZODB tests or with the new threaded tests.
      092992db
  8. 18 Jan, 2017 1 commit
  9. 26 Dec, 2016 3 commits
  10. 01 Dec, 2016 1 commit
  11. 27 Nov, 2016 1 commit
  12. 17 Oct, 2016 1 commit
    • Kirill Smelkov's avatar
      mysql: force _getNextTID() to use appropriate/whole index · eaa00a88
      Kirill Smelkov authored
      Similarly to 13911ca3 on the same instance after MariaDB was upgraded to
      10.1.17 the following query, even after `OPTIMIZE TABLE obj`, started to execute
      very slowly:
      
          MariaDB [(none)]> SELECT tid FROM neo1.obj WHERE `partition`=5 AND oid=79613 AND tid>268707071353462798 ORDER BY tid LIMIT 1;
          +--------------------+
          | tid                |
          +--------------------+
          | 268707072758797063 |
          +--------------------+
          1 row in set (4.82 sec)
      
      Both explain and analyze says the query will/is using `partition` key but only partially (note key_len is only 10, not 18):
      
          MariaDB [(none)]> SHOW INDEX FROM neo1.obj;
          +-------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
          | Table | Non_unique | Key_name  | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
          +-------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
          | obj   |          0 | PRIMARY   |            1 | partition   | A         |    28755928 |     NULL | NULL   |      | BTREE      |         |               |
          | obj   |          0 | PRIMARY   |            2 | tid         | A         |    28755928 |     NULL | NULL   |      | BTREE      |         |               |
          | obj   |          0 | PRIMARY   |            3 | oid         | A         |    28755928 |     NULL | NULL   |      | BTREE      |         |               |
          | obj   |          0 | partition |            1 | partition   | A         |    28755928 |     NULL | NULL   |      | BTREE      |         |               |
          | obj   |          0 | partition |            2 | oid         | A         |    28755928 |     NULL | NULL   |      | BTREE      |         |               |
          | obj   |          0 | partition |            3 | tid         | A         |    28755928 |     NULL | NULL   |      | BTREE      |         |               |
          | obj   |          1 | data_id   |            1 | data_id     | A         |    28755928 |     NULL | NULL   | YES  | BTREE      |         |               |
          +-------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
          7 rows in set (0.00 sec)
      
          MariaDB [(none)]> explain SELECT tid FROM neo1.obj WHERE `partition`=5 AND oid=79613 AND tid>268707071353462798 ORDER BY tid LIMIT 1;
          +------+-------------+-------+------+-------------------+-----------+---------+-------------+------+--------------------------+
          | id   | select_type | table | type | possible_keys     | key       | key_len | ref         | rows | Extra                    |
          +------+-------------+-------+------+-------------------+-----------+---------+-------------+------+--------------------------+
          |    1 | SIMPLE      | obj   | ref  | PRIMARY,partition | partition | 10      | const,const |    2 | Using where; Using index |
          +------+-------------+-------+------+-------------------+-----------+---------+-------------+------+--------------------------+
          1 row in set (0.00 sec)
      
          MariaDB [(none)]> analyze SELECT tid FROM neo1.obj WHERE `partition`=5 AND oid=79613 AND tid>268707071353462798 ORDER BY tid LIMIT 1;
          +------+-------------+-------+------+-------------------+-----------+---------+-------------+------+------------+----------+------------+--------------------------+
          | id   | select_type | table | type | possible_keys     | key       | key_len | ref         | rows | r_rows     | filtered | r_filtered | Extra                    |
          +------+-------------+-------+------+-------------------+-----------+---------+-------------+------+------------+----------+------------+--------------------------+
          |    1 | SIMPLE      | obj   | ref  | PRIMARY,partition | partition | 10      | const,const |    2 | 9741121.00 |   100.00 |       0.00 | Using where; Using index |
          +------+-------------+-------+------+-------------------+-----------+---------+-------------+------+------------+----------+------------+--------------------------+
          1 row in set (4.93 sec)
      
      By explicitly forcing (partition, oid, tid) index usage which is precisely designed to serve this and similar queries can avoid the query from being slow:
      
          MariaDB [(none)]> analyze SELECT tid FROM neo1.obj FORCE INDEX(`partition`) WHERE `partition`=5 AND oid=79613 AND tid>268707071353462798 ORDER BY tid LIMIT 1;
          +------+-------------+-------+-------+---------------+-----------+---------+------+------+--------+----------+------------+--------------------------+
          | id   | select_type | table | type  | possible_keys | key       | key_len | ref  | rows | r_rows | filtered | r_filtered | Extra                    |
          +------+-------------+-------+-------+---------------+-----------+---------+------+------+--------+----------+------------+--------------------------+
          |    1 | SIMPLE      | obj   | range | partition     | partition | 18      | NULL |    2 |   1.00 |   100.00 |     100.00 | Using where; Using index |
          +------+-------------+-------+-------+---------------+-----------+---------+------+------+--------+----------+------------+--------------------------+
          1 row in set (0.00 sec)
      
      /cc @jm, @vpelltier, @Tyagov
      
      /reviewed-on !1
      eaa00a88
  13. 29 Aug, 2016 1 commit
    • Julien Muchembled's avatar
      mysql: fix use of wrong SQL index when checking for dropped partitions · 13911ca3
      Julien Muchembled authored
      After partitions were dropped with TokuDB, we had a case where MariaDB 10.1.14
      stopped using the most appropriate index.
      
      MariaDB [neo0]> explain SELECT DISTINCT data_id FROM obj WHERE `partition`=5;
      +------+-------------+-------+-------+-------------------+---------+---------+------+------+---------------------------------------+
      | id   | select_type | table | type  | possible_keys     | key     | key_len | ref  | rows | Extra                                 |
      +------+-------------+-------+-------+-------------------+---------+---------+------+------+---------------------------------------+
      |    1 | SIMPLE      | obj   | range | PRIMARY,partition | data_id | 11      | NULL |   10 | Using where; Using index for group-by |
      +------+-------------+-------+-------+-------------------+---------+---------+------+------+---------------------------------------+
      MariaDB [neo0]> SELECT SQL_NO_CACHE DISTINCT data_id FROM obj WHERE `partition`=5;
      Empty set (1 min 51.47 sec)
      
      Expected:
      
      MariaDB [neo1]> explain SELECT DISTINCT data_id FROM obj WHERE `partition`=4;
      +------+-------------+-------+------+-------------------+---------+---------+-------+------+------------------------------+
      | id   | select_type | table | type | possible_keys     | key     | key_len | ref   | rows | Extra                        |
      +------+-------------+-------+------+-------------------+---------+---------+-------+------+------------------------------+
      |    1 | SIMPLE      | obj   | ref  | PRIMARY,partition | PRIMARY | 2       | const |    1 | Using where; Using temporary |
      +------+-------------+-------+------+-------------------+---------+---------+-------+------+------------------------------+
      1 row in set (0.00 sec)
      MariaDB [neo1]> SELECT SQL_NO_CACHE DISTINCT data_id FROM obj WHERE `partition`=4;
      Empty set (0.00 sec)
      
      Restarting the server or 'OPTIMIZE TABLE obj; ' does not help.
      
      Such issue could prevent the cluster to start due to timeouts, by always going
      back to RECOVERING state.
      13911ca3
  14. 11 Aug, 2016 1 commit
    • Julien Muchembled's avatar
      mysql: do not use unsafe TRUNCATE statement · c3c2ffe2
      Julien Muchembled authored
      TRUNCATE was chosen for performance reasons, but it's usually done on small
      tables, and not for performance-critical operations. TRUNCATE commits
      implicitely, so for pt/ttrans in particular, it's certainly slower due to extra
      fsyncs to disk.
      
      On the other side, committing too early can corrupt the database if the storage
      node is stopped just after. For example, a failure in changePartitionTable()
      can cause 'pt' to remain empty.
      c3c2ffe2
  15. 04 Mar, 2016 2 commits
  16. 25 Jan, 2016 1 commit
  17. 13 Dec, 2015 1 commit
  18. 30 Nov, 2015 2 commits
    • Julien Muchembled's avatar
      Minimize the amount of work during tpc_finish · 7eb7cf1b
      Julien Muchembled authored
      NEO did not ensure that all data and metadata are written on disk before
      tpc_finish, and it was for example vulnerable to ENOSPC errors.
      In other words, some work had to be moved to tpc_vote:
      
      - In tpc_vote, all involved storage nodes are now asked to write all metadata
        to ttrans/tobj and _commit_. Because the final tid is not known yet, the tid
        column of ttrans and tobj now contains NULL and the ttid respectively.
      
      - In tpc_finish, AskLockInformation is still required for read locking,
        ttrans.tid is updated with the final value and this change is _committed_.
      
      - The verification phase is greatly simplified, more reliable and faster. For
        all voted transactions, we can know if a tpc_finish was started by getting
        the final tid from the ttid, either from ttrans or from trans. And we know
        that such transactions can't be partial so we don't need to check oids.
      
      So in addition to minimizing the risk of failures during tpc_finish, we also
      fix a bug causing the verification phase to discard transactions with objects
      for which readCurrent was called.
      
      On performance side:
      
      - Although tpc_vote now asks all involved storages, instead of only those
        storing the transaction metadata, the client has been improved to do this
        in parallel. The additional commits are also all done in parallel.
      
      - A possible improvement to compensate the additional commits is to delay the
        commit done by the unlock.
      
      - By minimizing the time to lock transactions, objects are read-locked for a
        much shorter period. This is even more important that locked transactions
        must be unlocked in the same order.
      
      Transactions with too many modified objects will now timeout inside tpc_vote
      instead of tpc_finish. Of course, such transactions may still cause other
      transaction to timeout in tpc_finish.
      7eb7cf1b
    • Julien Muchembled's avatar
      fixup! storage: fix pruning of data when deleting partial transactions during verification · cff279af
      Julien Muchembled authored
      This fixes a regression in commit 83fe64bf
      when ttrans has several rows to the same data_id.
      cff279af
  19. 25 Nov, 2015 1 commit
  20. 03 Nov, 2015 1 commit
  21. 26 Oct, 2015 1 commit
  22. 19 Oct, 2015 2 commits
  23. 02 Oct, 2015 1 commit
  24. 24 Jun, 2015 2 commits
  25. 15 Jun, 2015 1 commit
  26. 09 Jun, 2015 2 commits
  27. 21 May, 2015 1 commit
  28. 05 May, 2015 1 commit
  29. 07 Nov, 2014 1 commit
  30. 30 Jul, 2014 1 commit
  31. 22 Jul, 2014 1 commit
  32. 08 Jul, 2014 1 commit