Commit 7f754b5e authored by Julien Muchembled's avatar Julien Muchembled

storage: fix bug not replicating unfinished transactions when the last ones are aborted

This was found by the first assertion of answerRebaseObject (client) because
a storage node missed a few transactions and reported a conflict with an older
serial than the one being stored: this must never happen and this commit adds a
more generic assertion on the storage side.

The above case is when the "first phase" of replication of a partition
(all history up to the tid before unfinished transactions) ended after
that the unfinished transactions are finished: this was a corruption bug,
where UP_TO_DATE cells could miss data.

Otherwise, if the "first phase" ended before, then the partition remained stuck
in OUT_OF_DATE state. Restarting the storage node was enough to recover.
parent 44452395
......@@ -144,10 +144,29 @@ class Replicator(object):
return
min_ttid = min(self.ttid_set) if self.ttid_set else INVALID_TID
for offset, p in self.partition_dict.iteritems():
if p.max_ttid and p.max_ttid < min_ttid:
p.max_ttid = None
if p.max_ttid:
if max_tid:
# Filling replicate_dict while there are still unfinished
# transactions for this partition is not the most
# efficient (due to the overhead of potentially replicating
# the last transactions in several times), but that's a
# simple way to make sure it is filled even if the
# remaining unfinished transactions are aborted.
self.replicate_dict[offset] = max_tid
if p.max_ttid < min_ttid:
# no more unfinished transaction for this partition
if not (offset == self.current_partition
or offset in self.replicate_dict):
logging.debug(
"All unfinished transactions have been aborted."
" Mark partition %u as already fully replicated",
offset)
# We don't have anymore the previous value of
# self.replicate_dict[offset], but p.max_ttid is not
# wrong. Anyway here, we're not in backup mode and this
# value will be ignored.
self.app.tm.replicated(offset, p.max_ttid)
p.max_ttid = None
self._nextPartition()
def getBackupTID(self):
......
......@@ -382,6 +382,7 @@ class TransactionManager(EventQueue):
# "C+A vs. B -> C+A+B" rarely costs more than "C+A vs. C+B -> C+A+B".
# However, this would be against the optimistic principle of ZODB.
if previous_serial is not None and previous_serial != serial:
assert serial < previous_serial, (serial, previous_serial)
logging.info('Conflict on %s:%s with %s',
dump(oid), dump(ttid), dump(previous_serial))
raise ConflictError(previous_serial)
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment