Commits · 349f1e7de49a624788d076afb627f04be7bcb549 · Kirill Smelkov / neo

28 Oct, 2016 2 commits
- Merge remote-tracking branch 'origin/master' into t · 349f1e7d
  Kirill Smelkov authored Oct 28, 2016
```
* origin/master:
  neoctl: make 'print ids' command display time of TIDs
  mysql: force _getNextTID() to use appropriate/whole index
```
  349f1e7d
- . · 16d84f10
  Kirill Smelkov authored Oct 28, 2016
  
  16d84f10
27 Oct, 2016 1 commit

neoctl: make 'print ids' command display time of TIDs · d9dd39f0

Iliya Manolov authored Oct 12, 2016

Currently, the command "neoctl [arguments] print ids" has the following output:

    last_oid = 0x...
    last_tid = 0x...
    last_ptid = ...

or

    backup_tid = 0x...
    last_tid = 0x...
    last_ptid = ...

depending on whether the cluster is in normal or backup mode.

This is extremely unreadable since the admin is often interested in the time that corresponds to each tid. Now the output is:

    last_oid = 0x...
    last_tid = 0x... (yyyy-mm-dd hh:mm:ss.ssssss)
    last_ptid = ...

or

    backup_tid = 0x... (yyyy-mm-dd hh:mm:ss.ssssss)
    last_tid = 0x... (yyyy-mm-dd hh:mm:ss.ssssss)
    last_ptid = ...

/reviewed-on nexedi/neoppod!2

d9dd39f0

25 Oct, 2016 1 commit
- . · fb8eaf3b
  Kirill Smelkov authored Oct 25, 2016
  
  fb8eaf3b
24 Oct, 2016 3 commits
- . · d58435ed
  Kirill Smelkov authored Oct 24, 2016
  
  d58435ed
- . · 9480f2e3
  Kirill Smelkov authored Oct 24, 2016
  
  9480f2e3
- . · 5e0639c1
  Kirill Smelkov authored Oct 24, 2016
  
  5e0639c1
21 Oct, 2016 1 commit
- . · e4059423
  Kirill Smelkov authored Oct 21, 2016
  
  e4059423
19 Oct, 2016 2 commits
- XY Allow read-only access in BACKINGUP state (for now master-only & draft) · 095f166c
  Kirill Smelkov authored Oct 19, 2016
  
  095f166c
- . · 67041723
  Kirill Smelkov authored Oct 19, 2016
  
  67041723
18 Oct, 2016 5 commits
- . · d94f146b
  Kirill Smelkov authored Oct 18, 2016
  
  d94f146b
- . · 5300d582
  Kirill Smelkov authored Oct 18, 2016
  
  5300d582
- X writer to N1 · e2a713b7
  Kirill Smelkov authored Oct 18, 2016
  
  e2a713b7
- . · c89ed4a6
  Kirill Smelkov authored Oct 18, 2016
  
  c89ed4a6
- . · 902c76b3
  Kirill Smelkov authored Oct 18, 2016
  
  902c76b3
17 Oct, 2016 1 commit

mysql: force _getNextTID() to use appropriate/whole index · eaa00a88

Kirill Smelkov authored Oct 16, 2016

Similarly to 13911ca3 on the same instance after MariaDB was upgraded to
10.1.17 the following query, even after `OPTIMIZE TABLE obj`, started to execute
very slowly:

    MariaDB [(none)]> SELECT tid FROM neo1.obj WHERE `partition`=5 AND oid=79613 AND tid>268707071353462798 ORDER BY tid LIMIT 1;
    +--------------------+
    | tid                |
    +--------------------+
    | 268707072758797063 |
    +--------------------+
    1 row in set (4.82 sec)

Both explain and analyze says the query will/is using `partition` key but only partially (note key_len is only 10, not 18):

    MariaDB [(none)]> SHOW INDEX FROM neo1.obj;
    +-------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
    | Table | Non_unique | Key_name  | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
    +-------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
    | obj   |          0 | PRIMARY   |            1 | partition   | A         |    28755928 |     NULL | NULL   |      | BTREE      |         |               |
    | obj   |          0 | PRIMARY   |            2 | tid         | A         |    28755928 |     NULL | NULL   |      | BTREE      |         |               |
    | obj   |          0 | PRIMARY   |            3 | oid         | A         |    28755928 |     NULL | NULL   |      | BTREE      |         |               |
    | obj   |          0 | partition |            1 | partition   | A         |    28755928 |     NULL | NULL   |      | BTREE      |         |               |
    | obj   |          0 | partition |            2 | oid         | A         |    28755928 |     NULL | NULL   |      | BTREE      |         |               |
    | obj   |          0 | partition |            3 | tid         | A         |    28755928 |     NULL | NULL   |      | BTREE      |         |               |
    | obj   |          1 | data_id   |            1 | data_id     | A         |    28755928 |     NULL | NULL   | YES  | BTREE      |         |               |
    +-------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
    7 rows in set (0.00 sec)

    MariaDB [(none)]> explain SELECT tid FROM neo1.obj WHERE `partition`=5 AND oid=79613 AND tid>268707071353462798 ORDER BY tid LIMIT 1;
    +------+-------------+-------+------+-------------------+-----------+---------+-------------+------+--------------------------+
    | id   | select_type | table | type | possible_keys     | key       | key_len | ref         | rows | Extra                    |
    +------+-------------+-------+------+-------------------+-----------+---------+-------------+------+--------------------------+
    |    1 | SIMPLE      | obj   | ref  | PRIMARY,partition | partition | 10      | const,const |    2 | Using where; Using index |
    +------+-------------+-------+------+-------------------+-----------+---------+-------------+------+--------------------------+
    1 row in set (0.00 sec)

    MariaDB [(none)]> analyze SELECT tid FROM neo1.obj WHERE `partition`=5 AND oid=79613 AND tid>268707071353462798 ORDER BY tid LIMIT 1;
    +------+-------------+-------+------+-------------------+-----------+---------+-------------+------+------------+----------+------------+--------------------------+
    | id   | select_type | table | type | possible_keys     | key       | key_len | ref         | rows | r_rows     | filtered | r_filtered | Extra                    |
    +------+-------------+-------+------+-------------------+-----------+---------+-------------+------+------------+----------+------------+--------------------------+
    |    1 | SIMPLE      | obj   | ref  | PRIMARY,partition | partition | 10      | const,const |    2 | 9741121.00 |   100.00 |       0.00 | Using where; Using index |
    +------+-------------+-------+------+-------------------+-----------+---------+-------------+------+------------+----------+------------+--------------------------+
    1 row in set (4.93 sec)

By explicitly forcing (partition, oid, tid) index usage which is precisely designed to serve this and similar queries can avoid the query from being slow:

    MariaDB [(none)]> analyze SELECT tid FROM neo1.obj FORCE INDEX(`partition`) WHERE `partition`=5 AND oid=79613 AND tid>268707071353462798 ORDER BY tid LIMIT 1;
    +------+-------------+-------+-------+---------------+-----------+---------+------+------+--------+----------+------------+--------------------------+
    | id   | select_type | table | type  | possible_keys | key       | key_len | ref  | rows | r_rows | filtered | r_filtered | Extra                    |
    +------+-------------+-------+-------+---------------+-----------+---------+------+------+--------+----------+------------+--------------------------+
    |    1 | SIMPLE      | obj   | range | partition     | partition | 18      | NULL |    2 |   1.00 |   100.00 |     100.00 | Using where; Using index |
    +------+-------------+-------+-------+---------------+-----------+---------+------+------+--------+----------+------------+--------------------------+
    1 row in set (0.00 sec)

/cc @jm, @vpelltier, @Tyagov

/reviewed-on nexedi/neoppod!1

eaa00a88

12 Oct, 2016 1 commit
- . · 5dda5ec4
  Kirill Smelkov authored Oct 12, 2016
  
  5dda5ec4
11 Oct, 2016 1 commit
- . · fdb68d11
  Kirill Smelkov authored Oct 11, 2016
  
  fdb68d11
10 Oct, 2016 1 commit
- . · cbd19c0c
  Kirill Smelkov authored Oct 10, 2016
  
  cbd19c0c
29 Sep, 2016 1 commit
- . · 6095e132
  Kirill Smelkov authored Sep 29, 2016
  
  6095e132
27 Sep, 2016 1 commit
- . · 5fdeeb72
  Kirill Smelkov authored Sep 27, 2016
  
  5fdeeb72
23 Sep, 2016 1 commit
- X . · be8030b7
  Kirill Smelkov authored Sep 24, 2016
  
  be8030b7
22 Sep, 2016 2 commits
- X . · f6edf6ac
  Kirill Smelkov authored Sep 23, 2016
  
  f6edf6ac
- Merge remote-tracking branch 'origin/master' into t · a1018ff0
  Kirill Smelkov authored Sep 22, 2016
```
* origin/master:
  Add support for latest versions of ZODB (4.4.3 & 5.0.1)
```
  a1018ff0
21 Sep, 2016 1 commit
- X . · 1d5f8f4c
  Kirill Smelkov authored Sep 22, 2016
  
  1d5f8f4c
20 Sep, 2016 1 commit
- X My notes · c7705b64
  Kirill Smelkov authored Sep 20, 2016
  
  c7705b64
19 Sep, 2016 1 commit
- X test client + stats for mariadb while NEO running with load or with load+backup · d709e1ad
  Kirill Smelkov authored Sep 19, 2016
  
  d709e1ad
12 Sep, 2016 1 commit

Add support for latest versions of ZODB (4.4.3 & 5.0.1) · c39d5c67

Julien Muchembled authored Jun 15, 2016

Many patches have been merged upstream :)

A notable change is that lastTransaction() does not ping the master anymore
(but it still causes a connection to the master if the client is disconnected).

c39d5c67

29 Aug, 2016 2 commits

mysql: fix use of wrong SQL index when checking for dropped partitions · 13911ca3

Julien Muchembled authored Aug 29, 2016

After partitions were dropped with TokuDB, we had a case where MariaDB 10.1.14
stopped using the most appropriate index.

MariaDB [neo0]> explain SELECT DISTINCT data_id FROM obj WHERE `partition`=5;
+------+-------------+-------+-------+-------------------+---------+---------+------+------+---------------------------------------+
| id   | select_type | table | type  | possible_keys     | key     | key_len | ref  | rows | Extra                                 |
+------+-------------+-------+-------+-------------------+---------+---------+------+------+---------------------------------------+
|    1 | SIMPLE      | obj   | range | PRIMARY,partition | data_id | 11      | NULL |   10 | Using where; Using index for group-by |
+------+-------------+-------+-------+-------------------+---------+---------+------+------+---------------------------------------+
MariaDB [neo0]> SELECT SQL_NO_CACHE DISTINCT data_id FROM obj WHERE `partition`=5;
Empty set (1 min 51.47 sec)

Expected:

MariaDB [neo1]> explain SELECT DISTINCT data_id FROM obj WHERE `partition`=4;
+------+-------------+-------+------+-------------------+---------+---------+-------+------+------------------------------+
| id   | select_type | table | type | possible_keys     | key     | key_len | ref   | rows | Extra                        |
+------+-------------+-------+------+-------------------+---------+---------+-------+------+------------------------------+
|    1 | SIMPLE      | obj   | ref  | PRIMARY,partition | PRIMARY | 2       | const |    1 | Using where; Using temporary |
+------+-------------+-------+------+-------------------+---------+---------+-------+------+------------------------------+
1 row in set (0.00 sec)
MariaDB [neo1]> SELECT SQL_NO_CACHE DISTINCT data_id FROM obj WHERE `partition`=4;
Empty set (0.00 sec)

Restarting the server or 'OPTIMIZE TABLE obj; ' does not help.

Such issue could prevent the cluster to start due to timeouts, by always going
back to RECOVERING state.

13911ca3

Update TODO · 00ffb1ef
Julien Muchembled authored Aug 29, 2016

00ffb1ef

11 Aug, 2016 2 commits

Add test to check that a moved cell doesn't cause POSKeyError · df990a05
Julien Muchembled authored Aug 11, 2016
```
Freeing disk space when a cell is dropped will have to be implemented with care,
not only for performance reasons.
```
df990a05

mysql: do not use unsafe TRUNCATE statement · c3c2ffe2

Julien Muchembled authored Aug 11, 2016

TRUNCATE was chosen for performance reasons, but it's usually done on small
tables, and not for performance-critical operations. TRUNCATE commits
implicitely, so for pt/ttrans in particular, it's certainly slower due to extra
fsyncs to disk.

On the other side, committing too early can corrupt the database if the storage
node is stopped just after. For example, a failure in changePartitionTable()
can cause 'pt' to remain empty.

c3c2ffe2

01 Aug, 2016 2 commits
- storage: speed up transaction registration · e25fa5d9
  Julien Muchembled authored Aug 01, 2016
  
  e25fa5d9
- storage: remove uuid index in TransactionManager · c3d3dabd
  Julien Muchembled authored Aug 01, 2016
```
It slowed down everything but abortFor(), which is not performance critical.
```
  c3d3dabd
31 Jul, 2016 1 commit

storage: review TransactionManager.abortFor · 2d388048

Julien Muchembled authored Jul 31, 2016

This reverts commit 7aecdada partially.
There seems to be no bug here, because:
- abortFor() is only called upon a notification from the master that a client
  is disconnected,
- and from the same TCP connection, we only receive a LockInformation packet
  if there's still such a transaction on the master side.

The code removed in abortFor() was redundant with abort().

2d388048

27 Jul, 2016 5 commits

Reenable checkTransactionalUndoIterator · cb144fdb
Julien Muchembled authored Jul 27, 2016

cb144fdb
client: better exception handling in tpc_abort · 38583af9
Julien Muchembled authored Jul 27, 2016

38583af9

client: do not limit the number of open connections to storage nodes · 77132157

Julien Muchembled authored Jul 27, 2016

There was a bug that connections were not maintained during a TPC,
which caused transactions to be aborted when the limit was reached.

Given that oids are spreaded evenly over all partitions, and that clients always
write to all cells of each involved partitions, clients would spend their time
reconnecting to storage nodes as soon as the limit is reached. So such feature
really looks counter-productive.

77132157

client: small optimization when iterating over storage connections · cfe1b5ca
Julien Muchembled authored Jul 27, 2016

cfe1b5ca

client: fix conflict of node id by never reading from storage without being connected to the master · 11d83ad9

Julien Muchembled authored Jul 26, 2016

Client nodes ignored the state of the connection to the master node when reading
data from storage, as long as their partition tables were recent enough. This
way, they were able to finish read-only transactions even if they could't reach
the master, which could be useful for high availability. The downside is that
the master node ignored that their node ids were still used, which causes "uuid"
conflicts when reallocating them.

Rejected solutions:
- An unused NEO Storage should not insist in staying connected to master node.
- Reverting to big random node identifiers is a lot of work and it would make
  debugging annoying (see commit 23fad3af).
- Always increasing node ids could have been a simple solution if we accepted
  that the cluster dies after that all 2^24 possible ids were allocated.

Given that reading from storage without being connected to the master can only
be useful to finish the current transaction (because we always ping the master
at the beginning of every transaction), keeping such feature is not worth the
effort.

This commit fixes id conflicts in a very simple way, by clearing the partition
table upon primary node failure, which forces reconnection to the master before
querying any storage node. In such case, we raise a special exception that will
cause the transaction to be restarted, so that the user does not get errors for
temporary connection failures.

11d83ad9