- 27 Apr, 2019 2 commits
-
-
Julien Muchembled authored
neoctl gets a new command to change the number of replicas. The number of replicas becomes a new partition table attribute and like the PT id, it is stored in the config table. On the other side, the configuration value for the number of partitions is dropped, since it can be computed from the partition table, which is always stored in full. The -p/-r master options now only apply at database creation. Some implementation notes: - The protocol is slightly optimized in that the master now sends automatically the whole partition tables to the admin & client nodes upon connection, like for storage nodes. This makes the protocol more consistent, and the master is the only remaining node requesting partition tables, during recovery. - Some parts become tricky because app.pt can be None in more cases. For example, the extra condition in NodeManager.update (before app.pt.dropNode) was added for this is the reason. Or the 'loadPartitionTable' method (storage) that is not inlined because of unit tests. Overall, this commit simplifies more than it complicates. - In the master handlers, we stop hijacking the 'connectionCompleted' method for tasks to be performed (often send the full partition table) on handler switches. - The admin's 'bootstrapped' flag could have been removed earlier: race conditions can't happen since the AskNodeInformation packet was removed (commit d048a52d).
-
Julien Muchembled authored
It is often faster to set up replicas by stopping a node (and any underlying database server like MariaDB) and do a raw copy of the database (e.g. with rsync). So far, it required to stop the whole cluster and use tools like 'mysql' or sqlite3' to edit: - the 'pt' table in databases, - the 'config.nid' values of the new nodes. With this new option, if you already have 1 replica, you can set up new replicas with such fast raw copy, and without interruption of service. Obviously, this implies less redundancy during the operation.
-
- 26 Apr, 2019 3 commits
-
-
Julien Muchembled authored
--kill-mysqld should be combined with something like -f .3 -r .1 to give storage nodes enough time to recover. And also -D 0 to focus testing on the storage backend rather than NEO.
-
Julien Muchembled authored
-
Julien Muchembled authored
-
- 16 Apr, 2019 3 commits
-
-
Julien Muchembled authored
-
Julien Muchembled authored
-
Julien Muchembled authored
-
- 05 Apr, 2019 3 commits
-
-
Julien Muchembled authored
-
Julien Muchembled authored
This fixes up commit be839e92.
-
Julien Muchembled authored
-
- 21 Mar, 2019 2 commits
-
-
Julien Muchembled authored
This is not used currently.
-
Julien Muchembled authored
This breaks compatibily but it was mentionned from the beginning that these options are only there for testing purpose. TODO: rename all remaining occurrences of UUID into NID in the code
-
- 16 Mar, 2019 1 commit
-
-
Julien Muchembled authored
If the source DB is lost during the import and then restored from a backup, all new transactions have to written back again on resume. It is the most common case for which the writeback hits the maximum number of transactions per partition to process at each iteration; the previous code was buggy in that it could skip transactions.
-
- 11 Mar, 2019 1 commit
-
-
Julien Muchembled authored
-
- 31 Dec, 2018 1 commit
-
-
Julien Muchembled authored
This makes commit 3c7a3160 (storage: speed up reads by indexing 'obj' primarily by 'oid') effective for SQLite. The fake changes in test data are because we don't force upgrade for this optimization.
-
- 05 Dec, 2018 1 commit
-
-
Julien Muchembled authored
neolog has new options: -N for old behaviour, and -C to show the cluster name.
-
- 15 Nov, 2018 2 commits
-
-
Julien Muchembled authored
-
Julien Muchembled authored
-
- 08 Nov, 2018 4 commits
-
-
Julien Muchembled authored
-
Julien Muchembled authored
This fixes: Traceback (most recent call last): File "neo/client/Storage.py", line 108, in tpc_vote return self.app.tpc_vote(transaction) File "neo/client/app.py", line 546, in tpc_vote self.waitStoreResponses(txn_context) File "neo/client/app.py", line 539, in waitStoreResponses _waitAnyTransactionMessage(txn_context) File "neo/client/app.py", line 160, in _waitAnyTransactionMessage self._handleConflicts(txn_context) File "neo/client/app.py", line 514, in _handleConflicts self._store(txn_context, oid, serial, data) File "neo/client/app.py", line 452, in _store self._waitAnyTransactionMessage(txn_context, False) File "neo/client/app.py", line 155, in _waitAnyTransactionMessage self._waitAnyMessage(queue, block=block) File "neo/client/app.py", line 142, in _waitAnyMessage _handlePacket(conn, packet, kw) File "neo/lib/threaded_app.py", line 133, in _handlePacket handler.dispatch(conn, packet, kw) File "neo/lib/handler.py", line 72, in dispatch method(conn, *args, **kw) File "neo/client/handlers/storage.py", line 143, in answerRebaseObject assert cached == data AssertionError
-
Julien Muchembled authored
During rebase, writes could stay lockless although the partition was replicated. Another transaction could then take locks prematurely, leading to the following crash: Traceback (most recent call last): File "neo/lib/handler.py", line 72, in dispatch method(conn, *args, **kw) File "neo/storage/handlers/master.py", line 36, in notifyUnlockInformation self.app.tm.unlock(ttid) File "neo/storage/transactions.py", line 329, in unlock self.abort(ttid, even_if_locked=True) File "neo/storage/transactions.py", line 573, in abort not self._replicated.get(self.getPartition(oid))), x AssertionError: ('\x00\x00\x00\x00\x00\x03\x03v', '\x03\xca\xb44J\x13\x99\x88', '\x03\xca\xb44J\xe0\xdcU', {}, set(['\x00\x00\x00\x00\x00\x03\x03v']))
-
Julien Muchembled authored
Not doing so was an incorrect optimization. Checking serials does take write-locks and they must not be released when a client-storage connection breaks between vote and lock, otherwise a concurrent transaction modifying such serials may finish before.
-
- 07 Nov, 2018 4 commits
-
-
Julien Muchembled authored
Without this new mechanism to detect oids that aren't write-locked, a transaction could be committed successfully without detecting conflicts. In the added test, the resulting value was 2, whereas it should be 5 if there was no node failure.
-
Julien Muchembled authored
-
Julien Muchembled authored
Nothing wrong actually happens. Traceback (most recent call last): File "neo/scripts/neostorage.py", line 32, in main app.run() File "neo/storage/app.py", line 194, in run self._run() File "neo/storage/app.py", line 225, in _run self.doOperation() File "neo/storage/app.py", line 310, in doOperation poll() File "neo/storage/app.py", line 134, in _poll self.em.poll(1) File "neo/lib/event.py", line 168, in poll self._poll(0) File "neo/lib/event.py", line 220, in _poll if conn.readable(): File "neo/lib/connection.py", line 483, in readable self._closure() File "neo/lib/connection.py", line 541, in _closure self.close() File "neo/lib/connection.py", line 533, in close handler.connectionClosed(self) File "neo/storage/handlers/client.py", line 46, in connectionClosed app.tm.abortFor(conn.getUUID()) File "neo/storage/transactions.py", line 594, in abortFor self.abort(ttid) File "neo/storage/transactions.py", line 570, in abort self._replicated.get(self.getPartition(oid))), x AssertionError: ('\x00\x00\x00\x00\x00\x01a\xe5', '\x03\xcaZ\x04\x14o\x8e\xbb', '\x03\xcaZ\x04\x0eX{\xbb', {1: None, 21: '\x03\xcaZ\x04\x11\xc6\x94\xf6'}, set([]))
-
Julien Muchembled authored
-
- 05 Nov, 2018 1 commit
-
-
Julien Muchembled authored
-
- 06 Sep, 2018 1 commit
-
-
Julien Muchembled authored
Here is what happened after simulating a network failure between a client and a storage: C8 DEBUG recv failed for <SSLSocketConnectorIPv6 at 0x7f8198027f90 fileno 17 ('xxxx:xxxx:120:cd8::90a1', 53970), opened to ('xxxx:xxxx:60:4c2c::25c3', 39085)>: ECONNRESET (Connection reset by peer) DEBUG connection closed for <MTClientConnection(uuid=S2, address=[xxxx:xxxx:60:4c2c::25c3]:39085, handler=StorageEventHandler, closed, client) at 7f81939a0950> DEBUG connection started for <MTClientConnection(uuid=S2, address=[xxxx:xxxx:60:4c2c::25c3]:39085, handler=StorageEventHandler, fd=17, on_close=onConnectionClosed, connecting, client) at 7f8192eb17d0> PACKET #0x0000 RequestIdentification > S2 ([xxxx:xxxx:60:4c2c::25c3]:39085) | (<EnumItem CLIENT (2)>, -536870904, None, '...', [], 1535555463.455761) DEBUG SSL handshake done for <SSLSocketConnectorIPv6 at 0x7f8192eb1850 fileno 17 ('xxxx:xxxx:120:cd8::90a1', 54014), opened to ('xxxx:xxxx:60:4c2c::25c3', 39085)>: ECDHE-RSA-AES256-GCM-SHA384 256 DEBUG connection completed for <MTClientConnection(uuid=S2, address=[xxxx:xxxx:60:4c2c::25c3]:39085, handler=StorageEventHandler, fd=17, on_close=onConnectionClosed, client) at 7f8192eb17d0> (from xxxx:xxxx:120:cd8::90a1:54014) DEBUG <SSLSocketConnectorIPv6 at 0x7f8192eb1850 fileno 17 ('xxxx:xxxx:120:cd8::90a1', 54014), opened to ('xxxx:xxxx:60:4c2c::25c3', 39085)> closed in recv DEBUG connection closed for <MTClientConnection(uuid=S2, address=[xxxx:xxxx:60:4c2c::25c3]:39085, handler=StorageEventHandler, closed, client) at 7f8192eb17d0> ERROR Connection to <StorageNode(uuid=S2, address=[xxxx:xxxx:60:4c2c::25c3]:39085, state=RUNNING, connection=None, not identified) at 7f81a8874690> failed S2 DEBUG accepted a connection from xxxx:xxxx:120:cd8::90a1:54014 DEBUG SSL handshake done for <SSLSocketConnectorIPv6 at 0x7f657144a910 fileno 22 ('xxxx:xxxx:60:4c2c::25c3', 39085), opened from ('xxxx:xxxx:120:cd8::90a1', 54014)>: ECDHE-RSA-AES256-GCM-SHA384 256 DEBUG connection completed for <ServerConnection(uuid=None, address=[xxxx:xxxx:120:cd8::90a1]:54014, handler=IdentificationHandler, fd=22, server) at 7f657144a090> (from xxxx:xxxx:60:4c2c::25c3:39085) PACKET #0x0000 RequestIdentification < None ([xxxx:xxxx:120:cd8::90a1]:54014) | (<EnumItem CLIENT (2)>, -536870904, None, '...', [], 1535555463.455761) DEBUG connection closed for <ServerConnection(uuid=None, address=[xxxx:xxxx:120:cd8::90a1]:54014, handler=IdentificationHandler, closed, server) at 7f657144a090> WARNING A connection was lost during identification ERROR Pre-mortem data: ERROR Traceback (most recent call last): ERROR File "neo/storage/app.py", line 194, in run ERROR self._run() ERROR File "neo/storage/app.py", line 225, in _run ERROR self.doOperation() ERROR File "neo/storage/app.py", line 310, in doOperation ERROR poll() ERROR File "neo/storage/app.py", line 134, in _poll ERROR self.em.poll(1) ERROR File "neo/lib/event.py", line 160, in poll ERROR to_process.process() ERROR File "neo/lib/connection.py", line 499, in process ERROR self._handlers.handle(self, self._queue.pop(0)) ERROR File "neo/lib/connection.py", line 85, in handle ERROR self._handle(connection, packet) ERROR File "neo/lib/connection.py", line 100, in _handle ERROR pending[0][1].packetReceived(connection, packet) ERROR File "neo/lib/handler.py", line 123, in packetReceived ERROR self.dispatch(*args) ERROR File "neo/lib/handler.py", line 72, in dispatch ERROR method(conn, *args, **kw) ERROR File "neo/storage/handlers/identification.py", line 56, in requestIdentification ERROR assert not node.isConnected(), node ERROR AssertionError: <ClientNode(uuid=C8, state=RUNNING, connection=<ServerConnection(uuid=C8, address=[xxxx:xxxx:120:cd8::90a1]:53970, handler=ClientOperationHandler, fd=18, on_close=onConnectionClosed, server) at 7f657147d7d0>) at 7f65714d6cd0>
-
- 07 Aug, 2018 1 commit
-
-
Julien Muchembled authored
Besides the use of another module for option parsing, the main change is that there's no more Config class that mixes configuration for different components. Application classes now takes a simple 'dict' with parsed values. The changes in 'neoctl' are somewhat ugly, because command-line options are not defined on the command-line class, but this component is likely to disappear in the future. It remains possible to pass options via a configuration file. The code is a bit complex but isolated in neo.lib.config For SSL, the code may be simpler if we change for a single --ssl option that takes 3 paths. Not done to not break compatibility. Hence, the hack with an extra OptionList class in neo.lib.app A new functional test tests the 'neomigrate' script, instead of just the internal API to migrate data.
-
- 22 Jun, 2018 2 commits
-
-
Julien Muchembled authored
This commit adds a contraint when tweaking the partition table with replicas, so that cells of each partition are assigned as far as possible from each other, e.g. not on the same machine even if each one has several disks, and in any case not on the same storage device. Currently, the topology path of each node is automatically calculated by the storage backend. Both MySQL and SQLite return a 2-tuple (host, st_dev). To be improved: - Add a storage option to override the path: the 'tweak' algorithm can already handle topology paths of any length, so something like (room, machine, disk) could be done easily. - Write OS-specific code to determine the real hardware behind st_dev (e.g. 2 different 'st_dev' values may actually refer to the same disk, because of layers like partitioning, device-mapper, loop, btrfs subvolumes, and so on). - Make 'neoctl' report in some way if the PT is optimal. Meanwhile, if it isn't, the master only logs a WARNING during tweak.
-
Julien Muchembled authored
This is a follow-up of commit b3dd6973 ("Optimize resumption of replication by starting from a greater TID"). I missed the case where a storage node is restarted while it is replicating: it lost the TID where it was interrupted. Although we commit after each replicated chunk, to avoid transferring again all the data from the beginning, it could still waste time to check that the data are already replicated.
-
- 21 Jun, 2018 1 commit
-
-
Julien Muchembled authored
-
- 19 Jun, 2018 1 commit
-
-
Julien Muchembled authored
-
- 04 Jun, 2018 1 commit
-
-
Julien Muchembled authored
-
- 30 May, 2018 3 commits
-
-
Julien Muchembled authored
/reviewed-on !9
-
Julien Muchembled authored
Although data that are already transferred aren't transferred again, checking that the data are there for a whole partition can still be a lot of work for big databases. This commit is a major performance improvement in that a storage node that gets disconnected for a short time now gets fully operational quite instantaneously because it only has to replicate the new data. Before, the time to recover depended on the size of the DB. For OUT_OF_DATE cells, the difficult part was that they are writable and can then contain holes, so we can't just take the last TID in trans/obj (we wrongly did that at the beginning, and then committed 6b1f198f as a workaround). We solve that by storing up to where it was up-to-date: this value is initialized from the last TIDs in trans/obj when the state switches from UP_TO_DATE/FEEDING. There's actually one such OUT_OF_DATE TID per assigned cell (backends store these values in the 'pt' table). Otherwise, a cell that still has a lot to replicate would still cause all other cells to resume from the a very small TID, or even ZERO_TID; the worse case is when a new cell is assigned to a node (as a result of tweak). For UP_TO_DATE cells of a backup cluster, replication was resumed from the maximum TID at which all assigned cells are known to be fully replicated. Like for OUT_OF_DATE cells, the presence of a late cell could cause a lot of extra work for others, the worst case being when setting up a backup cluster (it always restarted from ZERO_TID as long as at least 1 cell was still empty). Because UP_TO_DATE cells are guaranteed to have no holes, there's no need to store extra information: we simply look at the last TIDs in trans/obj. We even handle trans & obj independently, to minimize the work in 1 table (i.e. trans since it's processed first) if the other is late (obj). There's a small change in the protocol so that OUT_OF_DATE enum value equals 0. This way, backends can store the OUT_OF_DATE TID (verbatim) in the same column as the cell state. Note about MySQL changes in commit ca58ccd7: what we did as a workaround is not one any more. Now, we do so much on Python side that it's unlikely we could reduce the number of queries using GROUP BY. We even stopped doing that for SQLite.
-
Julien Muchembled authored
-
- 24 May, 2018 2 commits
-
-
Julien Muchembled authored
It was confusing and there's already the 'Unlock TXN' log just before abort() is called (in this case, it's more a cleanup than an abort).
-
Julien Muchembled authored
Future migration steps are likely to alter tables, possibly with transformation of data, and this is complicated for both supported backend.
-