Commits · 4d3f37235fc6dee18752fc4dc75a44da5d0e85ae · Stefane Fermigier / neo

21 Dec, 2016 3 commits

storage: start replicating the partition which is furthest behind · 4d3f3723

Julien Muchembled authored Dec 21, 2016

This fixes the following case when the backup is far behing the upstream DB,
and there are transactions being committed at the same time:

1. replicate partition 0
2. replicate partition 0
3. replicate partition 1
4. replicate partition 0
5. replicate partition 1
6. replicate partition 2
7. replicate partition 0
...
and so on in a quadratic way.

When the upstream activity was too high, the backup could even be stuck looping
on the first partitions.

4d3f3723

master: fix possibly wrong knowledge of cells' backup_tid when resuming backup · 17af3b47

Julien Muchembled authored Dec 20, 2016

The issue happens when there were commits while the backup cluster was down.
In this case, the master thinks that these commits are already replicated,
reporting wrong backup_tid to neoctl. It solved by itself once:
- there are new commits triggering replication for all partitions;
- all storage nodes have really replicated.

This also resulted in an inconsistent database when leaving backup mode during
this period.

17af3b47

Minor comment/doc changes · c95c6c39
Julien Muchembled authored Dec 20, 2016

c95c6c39

20 Dec, 2016 1 commit
- Release version 1.7.0 · 37f58489
  Julien Muchembled authored Dec 19, 2016
  
  37f58489
06 Dec, 2016 2 commits

master,client: ignore notifications before complete initialization · 36b2d141

Julien Muchembled authored Dec 06, 2016

A backup master crashed with the following traceback after a reconnection:

    Traceback (most recent call last):
      File "neo/master/app.py", line 127, in run
        self._run()
      File "neo/master/app.py", line 147, in _run
        self.playPrimaryRole()
      File "neo/master/app.py", line 348, in playPrimaryRole
        self.backup_app.provideService())
      File "neo/master/backup_app.py", line 123, in provideService
        poll(1)
      File "neo/lib/event.py", line 126, in poll
        to_process.process()
      File "neo/lib/connection.py", line 500, in process
        self._handlers.handle(self, self._queue.pop(0))
      File "neo/lib/connection.py", line 110, in handle
        self._handle(connection, packet)
      File "neo/lib/connection.py", line 125, in _handle
        handler.packetReceived(connection, packet)
      File "neo/lib/handler.py", line 117, in packetReceived
        self.dispatch(*args)
      File "neo/lib/handler.py", line 66, in dispatch
        method(conn, *args, **kw)
      File "neo/master/handlers/backup.py", line 52, in invalidateObjects
        app.invalidatePartitions(tid, partition_set)
      File "neo/master/backup_app.py", line 257, in invalidatePartitions
        self.triggerBackup(node)
      File "neo/master/backup_app.py", line 281, in triggerBackup
        assert cell_list, offset
    AssertionError: 0

36b2d141

Update comment that was still showing UUIDs instead of node ids · 02292584
Julien Muchembled authored Dec 06, 2016

02292584

01 Dec, 2016 5 commits

Remove dead code found by coverage · 23b9544d
Julien Muchembled authored Dec 01, 2016

23b9544d

Remove some useless unit tests · 1e4a4178

Julien Muchembled authored Nov 28, 2016

Many "unit" tests (!= "threaded" tests) don't do more than checking
implementation details, and increase coverage artificially. As with testEvent
in commit 71e30fb9, most of these tests will
either be removed or rewritten as threaded tests.

The fact that the remaining unit tests actually cover code that other test
don't gives motivation to maintain them. It will be also less code to update
when switching to https://pypi.python.org/pypi/mock

I proceeded as follows:

1. Measure coverage for all tests except unit tests. While checking my work,
   I found that coverage stats for threaded/functional/zodb tests are quite
   unstable, so I restarted from the beginning by doing this measure several
   times and only keeping the intersection of coverage data.

2. Measure coverage individually for each 'unit' tests, and substract the
   each result with the data in 1.

3. The candidates for deletion are those without any code covered.

Tests I didn't delete:

- neo.tests.master.testElectionHandler: I always do minimal changes about
  election, as long as there's no serious review.

- neo.tests.master.testMasterPT.MasterPartitionTableTests.test_13_outdate

- 4 tests in neo.tests.testPT:
  test_01_Cell, test_04_removeCell, test_06_clear, test_08_filled

- neo.tests.storage.testStorage{MySQL,SQLite}

- neo.tests.testUtil.UtilTests.testReadBufferRead

In a way, this commit is actually quite conservative. There are still many
useless tests that only check error paths and for simple tested methods, this
is just duplicating thie tested code.

1e4a4178

Enable coverage for neo.tests, which is useful to find dead code · ed693968
Julien Muchembled authored Dec 01, 2016

ed693968
Remove unused imports, found by pylint · 3b5a6edb
Julien Muchembled authored Dec 01, 2016

3b5a6edb
TODO: tweak should be safer · e0a2a217
Julien Muchembled authored Dec 01, 2016

e0a2a217

30 Nov, 2016 3 commits

Various neoctl/neolog formatting improvements/fixes · 264f6f57

Julien Muchembled authored Nov 30, 2016

- format IPv6 inside [] when followed by :<port>
- unify rendering of node lists
- neoctl: do not crash on empty DB (no PT id)

264f6f57

debug: extend 'pdb' example to optionally break on an arbitrary list of callables · 933953e2
Julien Muchembled authored Nov 30, 2016

933953e2

client: fix simultaneous (re)connections to the master · ec031cdf

Julien Muchembled authored Nov 30, 2016

This fixes a reqression in commit c39d5c67,
which could leads to failures like:

2016-11-29 09:56:58,756 ERROR ZODB.Connection Couldn't load state for 0x4843
Traceback (most recent call last):
  File "ZODB/Connection.py", line 860, in setstate
    self._setstate(obj)
  File "ZODB/Connection.py", line 901, in _setstate
    p, serial = self._storage.load(obj._p_oid, '')
  File "neo/client/Storage.py", line 82, in load
    return self.app.load(oid)[:2]
  File "neo/client/app.py", line 352, in load
    data, tid, next_tid, _ = self._loadFromStorage(oid, tid, before_tid)
  File "neo/client/app.py", line 372, in _loadFromStorage
    for node, conn in self.cp.iterateForObject(oid, readable=True):
  File "neo/client/pool.py", line 91, in iterateForObject
    pt = self.app.pt
  File "neo/client/app.py", line 146, in __getattr__
    return self.__getattribute__(attr)
AttributeError: 'Application' object has no attribute 'pt'

ec031cdf

28 Nov, 2016 4 commits
- storage: fix crash when a client loses connection to the master just before voting · 15472c62
  Julien Muchembled authored Nov 25, 2016
```
This is similar to commit 7aecdada
and for completeness, we also protect unlock the same way.
```
  15472c62
- no change: only some code reindentation · 298cb6c4
  Julien Muchembled authored Nov 28, 2016
```
For 2 previous commits, we didn't reindent in order to keep the diff readable.
```
  298cb6c4
- Enable branch coverage measurement by default · f22d5c4e
  Julien Muchembled authored Nov 28, 2016
  
  f22d5c4e
- coverage: add support for functional tests · bec29220
  Julien Muchembled authored Nov 28, 2016
  
  bec29220
27 Nov, 2016 11 commits

Bump protocol version · 8eb14b01
Julien Muchembled authored Nov 27, 2016

8eb14b01

Fix identification issues, including a race condition causing id conflicts · 9385706f

Julien Muchembled authored Nov 24, 2016

The added test describes how the new id timestamps fix the race condition.
These timestamps could be any unique opaque values, and the protocol is
extended to exchange them along with node ids.

Internally, nodes also reuse timestamps as a marker to identify the first
NotifyNodeInformation packets from the master: since this packet is a complete
list of nodes in the cluster, any other node in the node manager has left the
cluster definitely and is removed.

The secondary masters didn't receive update about master nodes.
It's also useless to send them information about non-master nodes.

9385706f

protocol: simplify definition of Struct-based items · 54e819ff
Julien Muchembled authored Nov 24, 2016

54e819ff

Remove AskNodeInformation packet · d048a52d

Julien Muchembled authored Nov 25, 2016

When Client (including backup master) and admin nodes are identified,
the primary master now sends them automatically all nodes with
NotifyNodeInformation, as with storage nodes.

d048a52d

master: fix crashes in identification due to buggy nodes · 35664759
Julien Muchembled authored Nov 24, 2016
```
- check address conflicts
- on invalid values, reject peer instead of dying
```
35664759

lib.node: fix NodeManager accessors returning identified nodes · e7cccf01

Julien Muchembled authored Nov 23, 2016

Listing connected/connecting nodes with a UUID is used:
- in one place by storage nodes: here, it does not matter if we skip nodes that
  aren't really identified
- in many places by the master, only for server connections, in which case we
  have equivalence with real identification

So in practice, NodeManager is only simplified to reuse the 'identified'
property of nodes.

e7cccf01

lib.node: code refactoring · 5941b27d
Julien Muchembled authored Nov 23, 2016

5941b27d
storage: only accept clients that are known by the master · c17f5f91
Julien Muchembled authored Nov 23, 2016
```
Therefore, a client node in the node manager is always RUNNING.
```
c17f5f91

Give new ids to clients whose ids were already reallocated · d752aadb

Julien Muchembled authored Nov 21, 2016

Although the change applies to any node with a temporary ids (all but storage),
only clients don't have addresses and are therefore not recognizable.

After a client is disconnected from the master and before reconnecting, another
client may join the cluster and "steals" the id of the first client. This issue
leads to stuck clients, failing in loop with exceptions like the following one:

    ERROR ZODB.Connection Couldn't load state for 0x0251
    Traceback (most recent call last):
      File "ZODB/Connection.py", line 860, in setstate
        self._setstate(obj)
      File "ZODB/Connection.py", line 901, in _setstate
        p, serial = self._storage.load(obj._p_oid, '')
      File "neo/client/Storage.py", line 82, in load
        return self.app.load(oid)[:2]
      File "neo/client/app.py", line 353, in load
        data, tid, next_tid, _ = self._loadFromStorage(oid, tid, before_tid)
      File "neo/client/app.py", line 373, in _loadFromStorage
        for node, conn in self.cp.iterateForObject(oid, readable=True):
      File "neo/client/pool.py", line 91, in iterateForObject
        pt = self.app.pt
      File "neo/client/app.py", line 145, in __getattr__
        self._getMasterConnection()
      File "neo/client/app.py", line 214, in _getMasterConnection
        result = self.master_conn = self._connectToPrimaryNode()
      File "neo/client/app.py", line 246, in _connectToPrimaryNode
        handler=handler)
      File "neo/lib/threaded_app.py", line 154, in _ask
        _handlePacket(qconn, qpacket, kw, handler)
      File "neo/lib/threaded_app.py", line 135, in _handlePacket
        handler.dispatch(conn, packet, kw)
      File "neo/lib/handler.py", line 66, in dispatch
        method(conn, *args, **kw)
      File "neo/lib/handler.py", line 188, in error
        getattr(self, Errors[code])(conn, message)
      File "neo/client/handlers/__init__.py", line 23, in protocolError
        raise StorageError("protocol error: %s" % message)
    StorageError: protocol error: already connected

d752aadb

spelling: oudated -> outdated · b62b8dc3
Julien Muchembled authored Nov 27, 2016

b62b8dc3
Fix spelling mistakes · 6e32ebb7
Julien Muchembled authored Nov 21, 2016

6e32ebb7

25 Nov, 2016 2 commits
- coverage: CacheItem.__repr__ (client) · b61f8745
  Julien Muchembled authored Nov 24, 2016
  
  b61f8745
- New neotestrunner option for code coverage testing · 5de0ff3a
  Julien Muchembled authored Nov 24, 2016
  
  5de0ff3a
21 Nov, 2016 2 commits

client: fix item eviction from cache, which could break loading from storage · 4ef05b9e

Julien Muchembled authored Nov 18, 2016

`ClientCache._oid_dict` shall not have empty values. For a given oid, when the
last item is removed from the cache, the oid must be removed as well to free
memory. In some cases, this was not done.

A consequence of this bug is the following exception:

    ERROR ZODB.Connection Couldn't load state for 0x02d1e1e4
    Traceback (most recent call last):
      File "ZODB/Connection.py", line 860, in setstate
        self._setstate(obj)
      File "ZODB/Connection.py", line 901, in _setstate
        p, serial = self._storage.load(obj._p_oid, '')
      File "neo/client/Storage.py", line 82, in load
        return self.app.load(oid)[:2]
      File "neo/client/app.py", line 358, in load
        self._cache.store(oid, data, tid, next_tid)
      File "neo/client/cache.py", line 228, in store
        prev = item_list[-1]
    IndexError: list index out of range

4ef05b9e

Bump protocol version for new read-only mode in BACKUPING state · 2b3993f1
Julien Muchembled authored Nov 21, 2016

2b3993f1

15 Nov, 2016 2 commits

backup: Teach cluster in BACKUPING state to also serve regular ZODB clients in read-only mode · d4944062

Kirill Smelkov authored Nov 10, 2016

A backup cluster for tids <= backup_tid has all data to provide regular
read-only ZODB service. Having regular ZODB access to the data can be
handy e.g. for externally verifying data for consistency between
main and backup clusters. Peeking around without disturbing main
cluster might be also useful sometimes.

In this patch:

- master & storage nodes are taught:

* to instantiate read-only or regular client service handler depending on cluster state:
RUNNING -> regular
BACKINGUP -> read-only

* in read-only client handler:
+ to reject write-related operations
+ to provide read operations but adjust semantic as last_tid in the database
would be = backup_tid

- new READ_ONLY_ACCESS protocol error code is introduced so that client can
raise POSException.ReadOnlyError upon receiving it.

I have not implemented back-channel for invalidations in read-only mode (yet ?).
This way once a client connects to cluster in backup state, it won't see
new data fetched by backup cluster from upstream after client connected.

The reasons invalidations are not implemented is that for now (imho)
there is no off-hand ready infrastructure to get updates from
replicating node on transaction-by-transaction basis (it currently only
notifies when whole batch is done). For consistency verification (main
reason for this patch) we also don't need invalidations to work, as in
that task we always connect afresh to backup. So I simply only put
relevant TODOs about invalidations for now.

The patch is not very polished but should work.

/reviewed-on nexedi/neoppod!4

d4944062

tests/threaded: Add handy shortcuts to NEOCluster to concisely check cluster properties in tests · ab552d87
Kirill Smelkov authored Nov 10, 2016

ab552d87

27 Oct, 2016 1 commit

neoctl: make 'print ids' command display time of TIDs · d9dd39f0

Iliya Manolov authored Oct 12, 2016

Currently, the command "neoctl [arguments] print ids" has the following output:

    last_oid = 0x...
    last_tid = 0x...
    last_ptid = ...

or

    backup_tid = 0x...
    last_tid = 0x...
    last_ptid = ...

depending on whether the cluster is in normal or backup mode.

This is extremely unreadable since the admin is often interested in the time that corresponds to each tid. Now the output is:

    last_oid = 0x...
    last_tid = 0x... (yyyy-mm-dd hh:mm:ss.ssssss)
    last_ptid = ...

or

    backup_tid = 0x... (yyyy-mm-dd hh:mm:ss.ssssss)
    last_tid = 0x... (yyyy-mm-dd hh:mm:ss.ssssss)
    last_ptid = ...

/reviewed-on !2

d9dd39f0

17 Oct, 2016 1 commit

mysql: force _getNextTID() to use appropriate/whole index · eaa00a88

Kirill Smelkov authored Oct 16, 2016

Similarly to 13911ca3 on the same instance after MariaDB was upgraded to
10.1.17 the following query, even after `OPTIMIZE TABLE obj`, started to execute
very slowly:

    MariaDB [(none)]> SELECT tid FROM neo1.obj WHERE `partition`=5 AND oid=79613 AND tid>268707071353462798 ORDER BY tid LIMIT 1;
    +--------------------+
    | tid                |
    +--------------------+
    | 268707072758797063 |
    +--------------------+
    1 row in set (4.82 sec)

Both explain and analyze says the query will/is using `partition` key but only partially (note key_len is only 10, not 18):

    MariaDB [(none)]> SHOW INDEX FROM neo1.obj;
    +-------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
    | Table | Non_unique | Key_name  | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
    +-------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
    | obj   |          0 | PRIMARY   |            1 | partition   | A         |    28755928 |     NULL | NULL   |      | BTREE      |         |               |
    | obj   |          0 | PRIMARY   |            2 | tid         | A         |    28755928 |     NULL | NULL   |      | BTREE      |         |               |
    | obj   |          0 | PRIMARY   |            3 | oid         | A         |    28755928 |     NULL | NULL   |      | BTREE      |         |               |
    | obj   |          0 | partition |            1 | partition   | A         |    28755928 |     NULL | NULL   |      | BTREE      |         |               |
    | obj   |          0 | partition |            2 | oid         | A         |    28755928 |     NULL | NULL   |      | BTREE      |         |               |
    | obj   |          0 | partition |            3 | tid         | A         |    28755928 |     NULL | NULL   |      | BTREE      |         |               |
    | obj   |          1 | data_id   |            1 | data_id     | A         |    28755928 |     NULL | NULL   | YES  | BTREE      |         |               |
    +-------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
    7 rows in set (0.00 sec)

    MariaDB [(none)]> explain SELECT tid FROM neo1.obj WHERE `partition`=5 AND oid=79613 AND tid>268707071353462798 ORDER BY tid LIMIT 1;
    +------+-------------+-------+------+-------------------+-----------+---------+-------------+------+--------------------------+
    | id   | select_type | table | type | possible_keys     | key       | key_len | ref         | rows | Extra                    |
    +------+-------------+-------+------+-------------------+-----------+---------+-------------+------+--------------------------+
    |    1 | SIMPLE      | obj   | ref  | PRIMARY,partition | partition | 10      | const,const |    2 | Using where; Using index |
    +------+-------------+-------+------+-------------------+-----------+---------+-------------+------+--------------------------+
    1 row in set (0.00 sec)

    MariaDB [(none)]> analyze SELECT tid FROM neo1.obj WHERE `partition`=5 AND oid=79613 AND tid>268707071353462798 ORDER BY tid LIMIT 1;
    +------+-------------+-------+------+-------------------+-----------+---------+-------------+------+------------+----------+------------+--------------------------+
    | id   | select_type | table | type | possible_keys     | key       | key_len | ref         | rows | r_rows     | filtered | r_filtered | Extra                    |
    +------+-------------+-------+------+-------------------+-----------+---------+-------------+------+------------+----------+------------+--------------------------+
    |    1 | SIMPLE      | obj   | ref  | PRIMARY,partition | partition | 10      | const,const |    2 | 9741121.00 |   100.00 |       0.00 | Using where; Using index |
    +------+-------------+-------+------+-------------------+-----------+---------+-------------+------+------------+----------+------------+--------------------------+
    1 row in set (4.93 sec)

By explicitly forcing (partition, oid, tid) index usage which is precisely designed to serve this and similar queries can avoid the query from being slow:

    MariaDB [(none)]> analyze SELECT tid FROM neo1.obj FORCE INDEX(`partition`) WHERE `partition`=5 AND oid=79613 AND tid>268707071353462798 ORDER BY tid LIMIT 1;
    +------+-------------+-------+-------+---------------+-----------+---------+------+------+--------+----------+------------+--------------------------+
    | id   | select_type | table | type  | possible_keys | key       | key_len | ref  | rows | r_rows | filtered | r_filtered | Extra                    |
    +------+-------------+-------+-------+---------------+-----------+---------+------+------+--------+----------+------------+--------------------------+
    |    1 | SIMPLE      | obj   | range | partition     | partition | 18      | NULL |    2 |   1.00 |   100.00 |     100.00 | Using where; Using index |
    +------+-------------+-------+-------+---------------+-----------+---------+------+------+--------+----------+------------+--------------------------+
    1 row in set (0.00 sec)

/cc @jm, @vpelltier, @Tyagov

/reviewed-on nexedi/neoppod!1

eaa00a88

12 Sep, 2016 1 commit

Add support for latest versions of ZODB (4.4.3 & 5.0.1) · c39d5c67

Julien Muchembled authored Jun 15, 2016

Many patches have been merged upstream :)

A notable change is that lastTransaction() does not ping the master anymore
(but it still causes a connection to the master if the client is disconnected).

c39d5c67

29 Aug, 2016 2 commits

mysql: fix use of wrong SQL index when checking for dropped partitions · 13911ca3

Julien Muchembled authored Aug 29, 2016

After partitions were dropped with TokuDB, we had a case where MariaDB 10.1.14
stopped using the most appropriate index.

MariaDB [neo0]> explain SELECT DISTINCT data_id FROM obj WHERE `partition`=5;
+------+-------------+-------+-------+-------------------+---------+---------+------+------+---------------------------------------+
| id   | select_type | table | type  | possible_keys     | key     | key_len | ref  | rows | Extra                                 |
+------+-------------+-------+-------+-------------------+---------+---------+------+------+---------------------------------------+
|    1 | SIMPLE      | obj   | range | PRIMARY,partition | data_id | 11      | NULL |   10 | Using where; Using index for group-by |
+------+-------------+-------+-------+-------------------+---------+---------+------+------+---------------------------------------+
MariaDB [neo0]> SELECT SQL_NO_CACHE DISTINCT data_id FROM obj WHERE `partition`=5;
Empty set (1 min 51.47 sec)

Expected:

MariaDB [neo1]> explain SELECT DISTINCT data_id FROM obj WHERE `partition`=4;
+------+-------------+-------+------+-------------------+---------+---------+-------+------+------------------------------+
| id   | select_type | table | type | possible_keys     | key     | key_len | ref   | rows | Extra                        |
+------+-------------+-------+------+-------------------+---------+---------+-------+------+------------------------------+
|    1 | SIMPLE      | obj   | ref  | PRIMARY,partition | PRIMARY | 2       | const |    1 | Using where; Using temporary |
+------+-------------+-------+------+-------------------+---------+---------+-------+------+------------------------------+
1 row in set (0.00 sec)
MariaDB [neo1]> SELECT SQL_NO_CACHE DISTINCT data_id FROM obj WHERE `partition`=4;
Empty set (0.00 sec)

Restarting the server or 'OPTIMIZE TABLE obj; ' does not help.

Such issue could prevent the cluster to start due to timeouts, by always going
back to RECOVERING state.

13911ca3

Update TODO · 00ffb1ef
Julien Muchembled authored Aug 29, 2016

00ffb1ef