neoppod:55eb90c157639e6b4daa6ffabdcac802c5cc8c5b commitshttps://lab.nexedi.com/nexedi/neoppod/-/commits/55eb90c157639e6b4daa6ffabdcac802c5cc8c5b2016-02-26T17:02:35+01:00https://lab.nexedi.com/nexedi/neoppod/-/commit/55eb90c157639e6b4daa6ffabdcac802c5cc8c5bdoc: rename CHANGES/README/UPGRADE for GitLab2016-02-26T17:02:35+01:00Julien Muchembledjm@nexedi.comhttps://lab.nexedi.com/nexedi/neoppod/-/commit/cc72e9724b8543058d68f3d88e2278b22350856dtests: new NEO_DB_SOCKET environment variable to chose the MySQL server to use2016-02-26T12:20:14+01:00Julien Muchembledjm@nexedi.comhttps://lab.nexedi.com/nexedi/neoppod/-/commit/9bd524ab7965f718f372351c843d5d032232c938BUGS: deadlock avoidance can also happen with only 1 storage node2016-02-26T12:20:14+01:00Julien Muchembledjm@nexedi.comhttps://lab.nexedi.com/nexedi/neoppod/-/commit/d2d77437e7976790c194d08d21bd626f8bf86c64client: make the cache tolerant to late invalidations when the entry is in th...2016-02-05T17:36:36+01:00Julien Muchembledjm@nexedi.com
This fixes the following scenario:
1. the master sends invalidations to clients,
and unlocks to storages (oid1, tid1)
2. the storage receives/processes the unlock
3. the client asks data (oid1, tid0)
4. the storage returns tid1 as next tid, whereas it's still None in the cache
(before, it caused an assertion failure)
6. the client processes invalidationshttps://lab.nexedi.com/nexedi/neoppod/-/commit/a7f50dfc387fe2a462bc3b5884048e84900701baRelease version 1.62016-01-25T21:32:14+01:00Julien Muchembledjm@nexedi.comhttps://lab.nexedi.com/nexedi/neoppod/-/commit/5a8e9d04a3ab4a4fcc3dd7d5565af728c48fe90cUpdate copyright year2016-01-25T20:39:47+01:00Julien Muchembledjm@nexedi.comhttps://lab.nexedi.com/nexedi/neoppod/-/commit/321b0bf8d60c63fb174773149ad86b66c46b062dUpdate neo/debug.py example2016-01-21T18:57:43+01:00Julien Muchembledjm@nexedi.comhttps://lab.nexedi.com/nexedi/neoppod/-/commit/e5c056b9e8f15db2e1360f354fc5fe069916e75ctests: document Patch class2016-01-21T18:46:25+01:00Julien Muchembledjm@nexedi.comhttps://lab.nexedi.com/nexedi/neoppod/-/commit/d43bd510d2d0a8c30fbadaa67f0b30a2d5727e31client: remove obsolete comment in Storage.load2016-01-12T23:08:32+01:00Julien Muchembledjm@nexedi.com
See commit <a href="/nexedi/neoppod/-/commit/c277ed20dd4db047c93a9666d6265ef7a836ec40" data-original="c277ed20dd4db047c93a9666d6265ef7a836ec40" data-link="false" data-link-reference="false" data-project="72" data-commit="c277ed20dd4db047c93a9666d6265ef7a836ec40" data-reference-type="commit" data-container="body" data-placement="top" data-html="true" title="client: really process all invalidations in poll thread" class="gfm gfm-commit has-tooltip">c277ed20</a>
("client: really process all invalidations in poll thread").https://lab.nexedi.com/nexedi/neoppod/-/commit/50a6cf41309d3e40c3c9112e3cd459d64d1707b0neoctl: don't print 'None' on successful check/truncate commands2015-12-16T04:28:05+01:00Julien Muchembledjm@nexedi.comhttps://lab.nexedi.com/nexedi/neoppod/-/commit/82d958463317331991bd098d085544bc393b1890interfaces: check signature of methods2015-12-16T04:28:05+01:00Julien Muchembledjm@nexedi.comhttps://lab.nexedi.com/nexedi/neoppod/-/commit/f419f97462271265734c4157e08d3b0172979c08storage: define interface for backends and check they implement it2015-12-13T18:53:20+01:00Julien Muchembledjm@nexedi.comhttps://lab.nexedi.com/nexedi/neoppod/-/commit/c6b80f7b25c0bda112aa594c901667ff8c7328e9importer: allow truncation after the last tid to import, during or after the ...2015-12-13T03:26:41+01:00Julien Muchembledjm@nexedi.com
This is a partial implementation. To truncate at a smaller tid, you must wait
that data is imported up to this tid and stop using the Importer backend.https://lab.nexedi.com/nexedi/neoppod/-/commit/24a9f1b8fefd3b5ae76e784540cd0b8030f9e047importer: do not implement deleteTransaction, now only used for replication2015-12-13T03:21:37+01:00Julien Muchembledjm@nexedi.com
This backend does not support replication. Even if we implemented it, such node
could only be a source for other nodes so we should never delete transactions.https://lab.nexedi.com/nexedi/neoppod/-/commit/af8a8370ab779329c0329fbd32644840ffd922f3neolog: fix crash on unknown packets2015-12-12T22:25:38+01:00Julien Muchembledjm@nexedi.comhttps://lab.nexedi.com/nexedi/neoppod/-/commit/9e543d7681cbea2ff3d8529ade6397c03907fd66client: dump cache stats on SIGRTMIN+22015-12-11T11:56:09+01:00Julien Muchembledjm@nexedi.comhttps://lab.nexedi.com/nexedi/neoppod/-/commit/06a64d80cff61cebc33a45cd12beba11cc5053f5client: fix spurious connection timeouts2015-12-09T17:46:57+01:00Julien Muchembledjm@nexedi.com
This fixes a regression caused by
commit <a href="/nexedi/neoppod/-/commit/eef52c27bc9955f8e68f0442089afb8fc03987f7" data-original="eef52c27bc9955f8e68f0442089afb8fc03987f7" data-link="false" data-link-reference="false" data-project="72" data-commit="eef52c27bc9955f8e68f0442089afb8fc03987f7" data-reference-type="commit" data-container="body" data-placement="top" data-html="true" title="Tickless poll loop, for lowest latency and cpu usage" class="gfm gfm-commit has-tooltip">eef52c27</a>https://lab.nexedi.com/nexedi/neoppod/-/commit/f180b00ecb672fe2e73121d9bb51dbe43b310e79Release version 1.62015-12-02T16:02:41+01:00Julien Muchembledjm@nexedi.comhttps://lab.nexedi.com/nexedi/neoppod/-/commit/cd66922133528a871ca34bdc95065263b45bb5abmaster: fix verification when nodes don't have any readable cell2015-12-01T18:16:33+01:00Julien Muchembledjm@nexedi.comhttps://lab.nexedi.com/nexedi/neoppod/-/commit/ca2caf875dddc9dd3902b763dc5d8aaf2c6a2877Bump protocol version and upgrade storages automatically2015-12-01T18:16:27+01:00Julien Muchembledjm@nexedi.comhttps://lab.nexedi.com/nexedi/neoppod/-/commit/d3c8b76d72b42bbdf69857e19eb1ddf8d20f7348Safer DB truncation, new 'truncate' ctl command2015-12-01T18:14:59+01:00Julien Muchembledjm@nexedi.com
With the previous commit, the request to truncate the DB was not stored
persistently, which means that this operation was still vulnerable to the case
where the master is restarted after some nodes, but not all, have already
truncated. The master didn't have the information to fix this and the result
was a DB partially truncated.
-> On a Truncate packet, a storage node only stores the tid somewhere, to send
it back to the master, which stays in RECOVERING state as long as any node
has a different value than that of the node with the latest partition table.
We also want to make sure that there is no unfinished data, because a user may
truncate at a tid higher than a locked one.
-> Truncation is now effective at the end on the VERIFYING phase, just before
returning the last ids to the master.
At last all nodes should be truncated, to avoid that an offline node comes back
with a different history. Currently, this would not be an issue since
replication is always restart from the beginning, but later we'd like they
remember where they stopped to replicate.
-> If a truncation is requested, the master waits for all nodes to be pending,
even if it was previously started (the user can still force the cluster to
start with neoctl). And any lost node during verification also causes the
master to go back to recovery.
Obviously, the protocol has been changed to split the LastIDs packet and
introduce a new Recovery, since it does not make sense anymore to ask last ids
during recovery.https://lab.nexedi.com/nexedi/neoppod/-/commit/3e3eab5bed678ab76639026099823e9a948e0c3cPerform DB truncation during recovery, send PT to storages before verification2015-11-30T14:04:52+01:00Julien Muchembledjm@nexedi.com
Currently, the database may only be truncated when leaving backup mode, but
the issue will be the same when neoctl gets a new command to truncate at an
arbitrary tid: we want to be sure that all nodes are truncated before anything
else.
Therefore, we stop sending Truncate orders before stopping operation because
nodes could fail/exit before actually processing them. Truncation must also
happen before asking nodes their last ids.
With this commit, if a truncation is requested:
- this is always the first thing done when a storage node connects to the
primary master during the RECOVERING phase,
- and the cluster does not start automatically if there are missing nodes,
unless an admin forces it.
Other changes:
- Connections to storage nodes don't need to be aborted anymore when leaving
backup mode.
- The master always initiates communication when a storage node identifies,
which simplifies code and reduces the number of exchanged packets.https://lab.nexedi.com/nexedi/neoppod/-/commit/2485f1518b0d96fce17cdd3e97fa9f67525872a1master: fix possible blockage during recovery after a storage disconnection2015-11-30T14:04:52+01:00Julien Muchembledjm@nexedi.com
At some point, the master asks a storage node its partition table. If this node
is lost before getting an answer, another node (or the same one if it comes
back) must be asked.
Before this change, the master node had to be restarted.https://lab.nexedi.com/nexedi/neoppod/-/commit/dec81519af7b60bf5ad505f1e3c17da1435a3006master: last tid/oid after recovery/verification2015-11-30T14:04:52+01:00Julien Muchembledjm@nexedi.com
The important bugfix is to update the last oid when the master verifies a
transaction with new oids.
By resetting the transaction manager at the beginning of the recovery phase,
it become possible to avoid tid/oid holes:
- by reallocating previously unused allocated oids
- when going back "in the past", i.e. reverting to an older version of the
database (with fewer oids) and/or adjusting the clockhttps://lab.nexedi.com/nexedi/neoppod/-/commit/e1f9a7dac51f66057cdc730749ce934665407416Go back/stay in RECOVERING state when the partition table can't be operational2015-11-30T14:04:52+01:00Julien Muchembledjm@nexedi.com
This fixes several cases where the partition table could become corrupt and
the whole cluster being stuck in VERIFYING state.
This also reduces the probability the have cells out of date when restarting
several storage nodes simultaneously.
At last, if a master node becomes primary again, a cluster must not be started
automatically if nodes with readable cells are missing, in order to avoid
a split of the database. This could happen if this master node was previously
forced to start it.https://lab.nexedi.com/nexedi/neoppod/-/commit/7eb7cf1b1da334228bf29bdcf958b6f27625b2beMinimize the amount of work during tpc_finish2015-11-30T14:04:45+01:00Julien Muchembledjm@nexedi.com
NEO did not ensure that all data and metadata are written on disk before
tpc_finish, and it was for example vulnerable to ENOSPC errors.
In other words, some work had to be moved to tpc_vote:
- In tpc_vote, all involved storage nodes are now asked to write all metadata
to ttrans/tobj and _commit_. Because the final tid is not known yet, the tid
column of ttrans and tobj now contains NULL and the ttid respectively.
- In tpc_finish, AskLockInformation is still required for read locking,
ttrans.tid is updated with the final value and this change is _committed_.
- The verification phase is greatly simplified, more reliable and faster. For
all voted transactions, we can know if a tpc_finish was started by getting
the final tid from the ttid, either from ttrans or from trans. And we know
that such transactions can't be partial so we don't need to check oids.
So in addition to minimizing the risk of failures during tpc_finish, we also
fix a bug causing the verification phase to discard transactions with objects
for which readCurrent was called.
On performance side:
- Although tpc_vote now asks all involved storages, instead of only those
storing the transaction metadata, the client has been improved to do this
in parallel. The additional commits are also all done in parallel.
- A possible improvement to compensate the additional commits is to delay the
commit done by the unlock.
- By minimizing the time to lock transactions, objects are read-locked for a
much shorter period. This is even more important that locked transactions
must be unlocked in the same order.
Transactions with too many modified objects will now timeout inside tpc_vote
instead of tpc_finish. Of course, such transactions may still cause other
transaction to timeout in tpc_finish.https://lab.nexedi.com/nexedi/neoppod/-/commit/99ac542c440e530d15b8dd417475926ed08c1e00Do not send useless node information to bootstraping node2015-11-30T14:00:34+01:00Julien Muchembledjm@nexedi.comhttps://lab.nexedi.com/nexedi/neoppod/-/commit/cff279af125242e41c527b152c6ca1b621b76297fixup! storage: fix pruning of data when deleting partial transactions during...2015-11-30T12:44:32+01:00Julien Muchembledjm@nexedi.com
This fixes a regression in commit <a href="/nexedi/neoppod/-/commit/83fe64bf104908e503d40634b97c36c5c4b4b835" data-original="83fe64bf104908e503d40634b97c36c5c4b4b835" data-link="false" data-link-reference="false" data-project="72" data-commit="83fe64bf104908e503d40634b97c36c5c4b4b835" data-reference-type="commit" data-container="body" data-placement="top" data-html="true" title="storage: fix pruning of data when deleting partial transactions during verification" class="gfm gfm-commit has-tooltip">83fe64bf</a>
when ttrans has several rows to the same data_id.https://lab.nexedi.com/nexedi/neoppod/-/commit/a63bf12fecf0f3d10e2001a8fc33a765a557f560threaded: prevent neoctl to loop forever when something went wrong during the...2015-11-30T12:07:38+01:00Julien Muchembledjm@nexedi.comhttps://lab.nexedi.com/nexedi/neoppod/-/commit/fe487c070b26f146d94915304cc16a33a510f55cssl: fix handshaking connections being stuck when they're aborted2015-11-30T12:07:38+01:00Julien Muchembledjm@nexedi.comhttps://lab.nexedi.com/nexedi/neoppod/-/commit/aaefaf8b363be3d0e8ecdf920120b3f6664780c6ssl: consider connections completed after the handshake2015-11-30T12:07:38+01:00Julien Muchembledjm@nexedi.com
- Server connections can now be in 'connecting' state.
- connectionAccepted event (which has never been used so far) is merged into
connectionCompleted.https://lab.nexedi.com/nexedi/neoppod/-/commit/6b1f198f58ed4ad9467e56811c012b2780f0e0abstorage: always restart replication of outdated cells from the beginning (ZER...2015-11-25T17:07:24+01:00Julien Muchembledjm@nexedi.com
This is a workaround to fix holes if replication is interrupted after new data
is committed.https://lab.nexedi.com/nexedi/neoppod/-/commit/949f7e0f0793a1e876fc4d409dbcbc807d50f297threaded: fix typo2015-11-25T17:07:24+01:00Julien Muchembledjm@nexedi.comhttps://lab.nexedi.com/nexedi/neoppod/-/commit/34a2fea34f605ba830b3c482515857c3ef58d5aaIgnore but log exceptions while closing a connection for which a assertion fa...2015-11-25T17:07:24+01:00Julien Muchembledjm@nexedi.com
AssertionError are certainly more severe that any other exception
(including OperationFailure) because the process is in an unknown state.https://lab.nexedi.com/nexedi/neoppod/-/commit/50134569a196948e6c47712dccbc1a5e8f4a3652threaded: make it possible to send packets from a connection filter2015-11-25T17:07:24+01:00Julien Muchembledjm@nexedi.com
This could have been useful in testStorageFailureDuringTpcFinish:
close() could not be called from answerTransactionFinished because it
deadlocked while trying to send notifications.https://lab.nexedi.com/nexedi/neoppod/-/commit/c5913373a6332304ba289fb862c5df4076451502tests: clarify intention in testStorageFailureDuringTpcFinish2015-11-25T17:07:24+01:00Julien Muchembledjm@nexedi.com
The test was relying on fact on the fact that 'c.abort()' caused an assertion
failure, which closed the connection and then raised OperationFailure.
Actually, I wanted to close the connection on master, but it's clearer this way.https://lab.nexedi.com/nexedi/neoppod/-/commit/20b7cecd3ae34ae157c8a768c54c93238ec683ceTODO: review election timeouts and transaction aborting on client disconnection2015-11-25T17:07:24+01:00Julien Muchembledjm@nexedi.comhttps://lab.nexedi.com/nexedi/neoppod/-/commit/79ea07c8972e2f491772fe45045d9a6c6714f727Small optimizations & cleanups2015-11-25T17:07:24+01:00Julien Muchembledjm@nexedi.comhttps://lab.nexedi.com/nexedi/neoppod/-/commit/0d36de7bb3d81244c21a67a1d2944140b808c54dFix 2 'except' statements that will bug when moving to Python 32015-11-25T17:07:24+01:00Julien Muchembledjm@nexedi.com
Previous code relied on the fact that the exception target is kept past
the end of the except clause. 2to3 is not smart enough to detect that.
Without this change, a different OperationalError exception would be
ignored because there's already a local variable of the same name.https://lab.nexedi.com/nexedi/neoppod/-/commit/b0023b43310d6d9964cc6aba208d35c76123a764mysql: drop 'bigdata' table when erasing the database2015-11-25T17:07:24+01:00Julien Muchembledjm@nexedi.com
This was forgotten when this table was introduced in
commit <a href="/nexedi/neoppod/-/commit/f9a8500d896fb5b4484772e8e9eef83d20f391cd" data-original="f9a8500d896fb5b4484772e8e9eef83d20f391cd" data-link="false" data-link-reference="false" data-project="72" data-commit="f9a8500d896fb5b4484772e8e9eef83d20f391cd" data-reference-type="commit" data-container="body" data-placement="top" data-html="true" title="mysql: the largest value allowed by TokuDB enginge is 32 MB" class="gfm gfm-commit has-tooltip">f9a8500d</a>