Commit 82eea0cd authored by Julien Muchembled's avatar Julien Muchembled

master: fix tpc_finish possibly trying to kill too many nodes after client-storage failures

When concurrent transactions fail with different storages (e.g. only network
issues between C1-S2 and C2-S1), in such a way that each transaction can be
committed but not both (or the cluster would be non-operational), and if the
first transaction is aborted (between tpc_vote and tpc_finish), then the second
wrongly failed with INCOMPLETE_TRANSACTION.

And if both transactions could be committed (e.g. more than 1 replica),
some nodes would be disconnected for nothing.
parent 5ee0b0a3
......@@ -321,17 +321,17 @@ class TransactionManager(EventQueue):
# No way to commit this transaction because there are
# non-replicated storage nodes with failed stores.
return False
failed = failed.copy()
all_failed = failed.copy()
for t in self._ttid_dict.itervalues():
failed |= t._failed
if not operational(failed):
all_failed |= t._failed
if not operational(all_failed):
# Other transactions were voted and unless they're aborted,
# we won't be able to finish this one, because that would make
# the cluster non-operational. Let's tell the caller to retry
# later.
raise DelayEvent
# Allow the client to finish the transaction,
# even if it will disconnect storage nodes.
# even if this will disconnect storage nodes.
txn._failed = failed
return True
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment