Commit 42fd89bc authored by Julien Muchembled's avatar Julien Muchembled

client: fix load error during conflict resolution in case of late invalidation

parent b22a94df
......@@ -544,6 +544,8 @@ class Application(ThreadedApplication):
# A later serial has already been resolved, skip.
resolved_serial_set.update(conflict_serial_set)
continue
if self.last_tid < conflict_serial:
self.sync() # possible late invalidation (very rare)
try:
new_data = tryToResolveConflict(oid, conflict_serial,
serial, data)
......
  • Why are we subject to last_tid during conflict resolution ? I thought conflict resolution always happened outside of the transaction snapshot by using loadSerial. Then cache should miss and the load should reach storage without needing to explicitely wait for the invalidation.

    On a related note, I see tryToResolveConflict supports an extra argument, committedData, which would allow storage to send us the conflicting data without requiring an extra round-trip. It may be nice to make use of this.

    Edited by Vincent Pelletier
  • I found this while writing testPartialConflict in 94e4f469d087159ee67928ae6e63b3b5cd403d6b (a previous version of 74c69d54). In this test:

    • I delay communications with the master
    • 1 storage node is being disconnected by the master because it missed a transaction (actually the one that caused the conflict)

    IOW:

    • 2 clients C1, C2
    • 2 storages S1, S2 replicated
    • network cut between C2 & S1 just before the last committed transaction
    • network delay between C1 & M
    • TCP FIN delayed or dropped from M to S1

    The client is not written to load data past last_tid. Without this, it fails in 3 ways:

    1. Failed assertion in _loadFromCache

      Easy to solve: return None if result or result[1] < at_tid. That's what I did at the beginning.

    2. Failed assertion when storing the value in cache (assuming _loadFromStorage works)

      Also easy: just return there if item.tid < tid. But this commit is better with respect to the cache.

    3. a storage may fail to read the data

      A pending notification that the down storage may also be late and _loadFromStorage may try it first. When this node is not aware yet that it's not up-to-date anymore, it answers happily that such oid does not exist at this revision. I don't want to change _loadFromStorage to ignore this error when a second load from another cell works. It looks too difficult to make sure this is reliable.

    Dealing with outdated partition table is actually a subject we'll have to review.

    Edited by Julien Muchembled
  • I see tryToResolveConflict supports an extra argument, committedData, which would allow storage to send us the conflicting data without requiring an extra round-trip. It may be nice to make use of this.

    2 problems:

    • the data may already be loaded by another thread
    • with replicas, the data would be transferred several times on the network
  • The client is not written to load data past last_tid. Without this, it fails in 3 ways:

    Ways 1 and 2 are related to current cache's rigidity. I happen to drop both asserts in my old load-lock-removall patchset.

    For 3, I think client has to try the next storage on such failure during a loadSerial, as it is not supposed to be scoped to any running transaction AFAIK. It must indeed be legal for a storage to fall out-of-date without it nor the client knowing it through a partition table update.

    2 problems:

    • the data may already be loaded by another thread

    Not sure this is very relevant: the more clients there are, the less likely the current one caused the invalidation so the less likely it is in local cache. Also, the more clients there are the more likely latency is getting higher (actual networking becomes more likely, instead of localhost use). And the less clients there are, the less likely the database will be the bottleneck (client-side cpu being more likely).

    • with replicas, the data would be transferred several times on the network

    Good point, althgouh I'm not sure of the cost in practice (how many replicas will be in use ? how long does an individual pickle takes to be transfered ? what is the typical latency ?).

  • as it is not supposed to be scoped to any running transaction

    What I mean here is that while we can expect consistency across storages while we are inside a transaction snapshot (and I believe it still holds when partition table updates are lagging - if a storage was good enough for a given transaction before whatever caused it to be outdated, it should still be good enough between the time it became outdated and the time client receives the updated partition table), we should probably not expect consistency outside of a transaction snapshot.

  • I forgot another reason. The commit mentioned in my previous comment describes the same scenario in TransactionManager.lockObject (storage), which now rejects multiple stores that aren't undos: the master disconnects S1 (node/pt notifications) before the invalidation* for the conflicting transaction (missed by S1), so this sync guarantees that the client does not store the same oid twice to S1 (either down, or up with a new TM).

    So reverting this commit also means that the storage should reset itself in the above case, something like app.master_conn.close().

    * actually, just before locking the transaction (see TransactionManager.prepare), to make sure that the verification phase works with a new PT.

  • I'm going to revert the raise in case of multiple stores of an oid. There's one valid case, when the previous undo resulted in a resolved conflict.

Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment