Commit 63324838 authored by Julien Muchembled's avatar Julien Muchembled

TODO: safer tpc_finish and faster storage

parent 6275f7c6
...@@ -16,13 +16,6 @@ Although this should happen rarely enough not to affect performance, this can ...@@ -16,13 +16,6 @@ Although this should happen rarely enough not to affect performance, this can
be an issue if your application can't afford restarting the transaction, be an issue if your application can't afford restarting the transaction,
e.g. because it interacted with external environment. e.g. because it interacted with external environment.
Client always raises in tpc_finish if a failure happen
------------------------------------------------------
This is wrong because the failure may actually happen just after the transaction
is actually committed. Client should not raise if it can reconnect and note
that the transaction exist.
Storage failure or update may lead to POSException or break undoLog() Storage failure or update may lead to POSException or break undoLog()
--------------------------------------------------------------------- ---------------------------------------------------------------------
......
...@@ -58,9 +58,12 @@ ...@@ -58,9 +58,12 @@
committed by future transactions. committed by future transactions.
- Add a 'devid' storage configuration so that master do not distribute - Add a 'devid' storage configuration so that master do not distribute
replicated partitions on storages with same 'devid'. replicated partitions on storages with same 'devid'.
- Make tpc_finish safer as described in its __doc__: moving work to
tpc_vote and recover from master failure when possible.
Storage Storage
- Use HailDB instead of a stand-alone MySQL server. - Use libmysqld instead of a stand-alone MySQL server.
- It should be possible to defer the commit at the end of finishTransaction.
- Notify master when storage becomes available for clients (LATENCY) - Notify master when storage becomes available for clients (LATENCY)
Currently, storage presence is broadcasted to client nodes too early, as Currently, storage presence is broadcasted to client nodes too early, as
the storage node would refuse them until it has only up-to-date data (not the storage node would refuse them until it has only up-to-date data (not
...@@ -131,14 +134,6 @@ ...@@ -131,14 +134,6 @@
table hasn't changed by pinging the master and retry if necessary. table hasn't changed by pinging the master and retry if necessary.
- Implement IStorageRestoreable (ZODB API) in order to preserve data - Implement IStorageRestoreable (ZODB API) in order to preserve data
serials (i.e. undo information). serials (i.e. undo information).
- tpc_finish might raise while transaction got successfully committed.
This can happen if it gets disconnected from primary master while waiting
for AnswerFinishTransaction after primary received it and hence will
commit transaction independently from client presence. Client could
legitimately think transaction is not committed, and might decide to
retry. To solve this, client can know if its TTID got successfuly
committed by looking at currently unused '(t)trans.ttid' column.
See neo.threaded.test.Test.testStorageFailureDuringTpcFinish
- Fix and reenable deadlock avoidance (SPEED). This is required for - Fix and reenable deadlock avoidance (SPEED). This is required for
neo.threaded.test.Test.testDeadlockAvoidance neo.threaded.test.Test.testDeadlockAvoidance
......
...@@ -660,7 +660,45 @@ class Application(ThreadedApplication): ...@@ -660,7 +660,45 @@ class Application(ThreadedApplication):
self.dispatcher.forget_queue(txn_context['queue'], flush_queue=False) self.dispatcher.forget_queue(txn_context['queue'], flush_queue=False)
def tpc_finish(self, transaction, tryToResolveConflict, f=None): def tpc_finish(self, transaction, tryToResolveConflict, f=None):
"""Finish current transaction.""" """Finish current transaction
To avoid inconsistencies between several databases involved in the
same transaction, an IStorage implementation must do its best not to
fail in tpc_finish. In particular, making a transaction permanent
should ideally be as simple as switching a bit permanently.
In NEO, tpc_finish breaks this promise by not ensuring earlier that all
data and metadata are written, and it is for example vulnerable to
ENOSPC errors. In other words, some work should be moved to tpc_vote.
TODO: - In tpc_vote, all involved storage nodes must be asked to write
all metadata to ttrans/tobj and _commit_. AskStoreTransaction
can be extended for this: for nodes that don't store anything
in ttrans, it can just contain the ttid. The final tid is not
known yet, so ttrans/tobj would contain the ttid.
- In tpc_finish, AskLockInformation is still required for read
locking, ttrans.tid must be updated with the final value and
ttrans _committed_.
- The Verification phase would need some change because
ttrans/tobj may contain data for which tpc_finish was not
called. The ttid is also in trans so a mapping ttid<->tid is
always possible and can be forwarded via the master so that all
storage are still able to update the tid column with the final
value when moving rows from tobj to obj.
The resulting cost is:
- additional RPCs in tpc_vote
- 1 updated row in ttrans + commit
TODO: We should recover from master failures when the transaction got
successfully committed. More precisely, we should not raise:
- if any failure happens after all storage nodes have processed
successfully the LockInformation packets from the master;
- and if we can reconnect to the cluster to check that the ttid
got successfuly committed, which is possible because storage
nodes remember the ttid of all transactions.
See neo.threaded.test.Test.testStorageFailureDuringTpcFinish
This bug exists in ZEO.
"""
txn_container = self._txn_container txn_container = self._txn_container
if 'voted' not in txn_container.get(transaction): if 'voted' not in txn_container.get(transaction):
self.tpc_vote(transaction, tryToResolveConflict) self.tpc_vote(transaction, tryToResolveConflict)
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment