- 01 Dec, 2015 2 commits
-
-
Julien Muchembled authored
-
Julien Muchembled authored
With the previous commit, the request to truncate the DB was not stored persistently, which means that this operation was still vulnerable to the case where the master is restarted after some nodes, but not all, have already truncated. The master didn't have the information to fix this and the result was a DB partially truncated. -> On a Truncate packet, a storage node only stores the tid somewhere, to send it back to the master, which stays in RECOVERING state as long as any node has a different value than that of the node with the latest partition table. We also want to make sure that there is no unfinished data, because a user may truncate at a tid higher than a locked one. -> Truncation is now effective at the end on the VERIFYING phase, just before returning the last ids to the master. At last all nodes should be truncated, to avoid that an offline node comes back with a different history. Currently, this would not be an issue since replication is always restart from the beginning, but later we'd like they remember where they stopped to replicate. -> If a truncation is requested, the master waits for all nodes to be pending, even if it was previously started (the user can still force the cluster to start with neoctl). And any lost node during verification also causes the master to go back to recovery. Obviously, the protocol has been changed to split the LastIDs packet and introduce a new Recovery, since it does not make sense anymore to ask last ids during recovery.
-
- 30 Nov, 2015 10 commits
-
-
Julien Muchembled authored
Currently, the database may only be truncated when leaving backup mode, but the issue will be the same when neoctl gets a new command to truncate at an arbitrary tid: we want to be sure that all nodes are truncated before anything else. Therefore, we stop sending Truncate orders before stopping operation because nodes could fail/exit before actually processing them. Truncation must also happen before asking nodes their last ids. With this commit, if a truncation is requested: - this is always the first thing done when a storage node connects to the primary master during the RECOVERING phase, - and the cluster does not start automatically if there are missing nodes, unless an admin forces it. Other changes: - Connections to storage nodes don't need to be aborted anymore when leaving backup mode. - The master always initiates communication when a storage node identifies, which simplifies code and reduces the number of exchanged packets.
-
Julien Muchembled authored
At some point, the master asks a storage node its partition table. If this node is lost before getting an answer, another node (or the same one if it comes back) must be asked. Before this change, the master node had to be restarted.
-
Julien Muchembled authored
The important bugfix is to update the last oid when the master verifies a transaction with new oids. By resetting the transaction manager at the beginning of the recovery phase, it become possible to avoid tid/oid holes: - by reallocating previously unused allocated oids - when going back "in the past", i.e. reverting to an older version of the database (with fewer oids) and/or adjusting the clock
-
Julien Muchembled authored
This fixes several cases where the partition table could become corrupt and the whole cluster being stuck in VERIFYING state. This also reduces the probability the have cells out of date when restarting several storage nodes simultaneously. At last, if a master node becomes primary again, a cluster must not be started automatically if nodes with readable cells are missing, in order to avoid a split of the database. This could happen if this master node was previously forced to start it.
-
Julien Muchembled authored
NEO did not ensure that all data and metadata are written on disk before tpc_finish, and it was for example vulnerable to ENOSPC errors. In other words, some work had to be moved to tpc_vote: - In tpc_vote, all involved storage nodes are now asked to write all metadata to ttrans/tobj and _commit_. Because the final tid is not known yet, the tid column of ttrans and tobj now contains NULL and the ttid respectively. - In tpc_finish, AskLockInformation is still required for read locking, ttrans.tid is updated with the final value and this change is _committed_. - The verification phase is greatly simplified, more reliable and faster. For all voted transactions, we can know if a tpc_finish was started by getting the final tid from the ttid, either from ttrans or from trans. And we know that such transactions can't be partial so we don't need to check oids. So in addition to minimizing the risk of failures during tpc_finish, we also fix a bug causing the verification phase to discard transactions with objects for which readCurrent was called. On performance side: - Although tpc_vote now asks all involved storages, instead of only those storing the transaction metadata, the client has been improved to do this in parallel. The additional commits are also all done in parallel. - A possible improvement to compensate the additional commits is to delay the commit done by the unlock. - By minimizing the time to lock transactions, objects are read-locked for a much shorter period. This is even more important that locked transactions must be unlocked in the same order. Transactions with too many modified objects will now timeout inside tpc_vote instead of tpc_finish. Of course, such transactions may still cause other transaction to timeout in tpc_finish.
-
Julien Muchembled authored
-
Julien Muchembled authored
This fixes a regression in commit 83fe64bf when ttrans has several rows to the same data_id.
-
Julien Muchembled authored
-
Julien Muchembled authored
-
Julien Muchembled authored
- Server connections can now be in 'connecting' state. - connectionAccepted event (which has never been used so far) is merged into connectionCompleted.
-
- 25 Nov, 2015 13 commits
-
-
Julien Muchembled authored
This is a workaround to fix holes if replication is interrupted after new data is committed.
-
Julien Muchembled authored
-
Julien Muchembled authored
AssertionError are certainly more severe that any other exception (including OperationFailure) because the process is in an unknown state.
-
Julien Muchembled authored
This could have been useful in testStorageFailureDuringTpcFinish: close() could not be called from answerTransactionFinished because it deadlocked while trying to send notifications.
-
Julien Muchembled authored
The test was relying on fact on the fact that 'c.abort()' caused an assertion failure, which closed the connection and then raised OperationFailure. Actually, I wanted to close the connection on master, but it's clearer this way.
-
Julien Muchembled authored
-
Julien Muchembled authored
-
Julien Muchembled authored
Previous code relied on the fact that the exception target is kept past the end of the except clause. 2to3 is not smart enough to detect that. Without this change, a different OperationalError exception would be ignored because there's already a local variable of the same name.
-
Julien Muchembled authored
This was forgotten when this table was introduced in commit f9a8500d
-
Julien Muchembled authored
If needed, sortStorageList can be extended in the future to support a 'readable' parameter.
-
Julien Muchembled authored
-
Julien Muchembled authored
-
Julien Muchembled authored
We can never receive several answers from the same node. testVerification is dropped for the same reason as for testEvent and most of testConnection, since there is much incoming changes for verification.
-
- 03 Nov, 2015 2 commits
-
-
Julien Muchembled authored
-
Julien Muchembled authored
- Last known TID was not updated when recovering a transaction. - Missing OIDs were ignored, which caused partial transactions to be committed instead of being deleted.
-
- 29 Oct, 2015 2 commits
-
-
Julien Muchembled authored
-
Julien Muchembled authored
-
- 26 Oct, 2015 2 commits
-
-
Julien Muchembled authored
-
Julien Muchembled authored
The previous SQL query caused a full table scan of the 'trans' table at startup.
-
- 21 Oct, 2015 3 commits
-
-
Julien Muchembled authored
I used git-diff for each file and concatenated the result to preverse the order.
-
Julien Muchembled authored
-
Julien Muchembled authored
This fixes invalid next_serial entries in cache, and the following error for values not in cache: Traceback (most recent call last): File "ZODB/Connection.py", line 856, in setstate self._setstate(obj) File "ZODB/Connection.py", line 894, in _setstate self._load_before_or_conflict(obj) File "ZODB/Connection.py", line 922, in _load_before_or_conflict if not self._setstate_noncurrent(obj): File "ZODB/Connection.py", line 945, in _setstate_noncurrent assert end is not None AssertionError
-
- 20 Oct, 2015 1 commit
-
-
Julien Muchembled authored
-
- 19 Oct, 2015 4 commits
-
-
Julien Muchembled authored
-
Julien Muchembled authored
-
Julien Muchembled authored
When run with MySQL, testBasicStore (neo.tests.threaded.test.Test) was slow and generated log exceeded 29MB.
-
Julien Muchembled authored
-
- 16 Oct, 2015 1 commit
-
-
Julien Muchembled authored
This increases the number of rows to check per AskCheck*Range packets.
-