      Fix oids remaining write-locked forever · 9b33b1db
      Julien Muchembled authored
      This happened in 2 cases:
      - Commit a4c06242 ("Review aborting of
        transactions") introduced a race condition causing oids to remain
        write-locked forever after that the transaction modifying them is aborted.
      - An unfinished transaction is not locked/unlocked during tpc_finish: oids
        must be unlocked when being notified that the transaction is finished.
      storage: fix bug not replicating unfinished transactions when the last ones are aborted · 7f754b5e
      Julien Muchembled authored
      This was found by the first assertion of answerRebaseObject (client) because
      a storage node missed a few transactions and reported a conflict with an older
      serial than the one being stored: this must never happen and this commit adds a
      more generic assertion on the storage side.
      The above case is when the "first phase" of replication of a partition
      (all history up to the tid before unfinished transactions) ended after
      that the unfinished transactions are finished: this was a corruption bug,
      where UP_TO_DATE cells could miss data.
      Otherwise, if the "first phase" ended before, then the partition remained stuck
      in OUT_OF_DATE state. Restarting the storage node was enough to recover.
      client: fix an AssertionError while processing late AnswerRebaseObject · 44452395
      Traceback (most recent call last):
        File "neo/client/app.py", line 507, in tpc_vote
        File "neo/client/app.py", line 500, in waitStoreResponses
        File "neo/client/app.py", line 150, in _waitAnyTransactionMessage
        File "neo/client/app.py", line 474, in _handleConflicts
          self._store(txn_context, oid, conflict_serial, data)
        File "neo/client/app.py", line 410, in _store
          self._waitAnyTransactionMessage(txn_context, False)
        File "neo/client/app.py", line 145, in _waitAnyTransactionMessage
          self._waitAnyMessage(queue, block=block)
        File "neo/client/app.py", line 133, in _waitAnyMessage
          _handlePacket(conn, packet, kw)
        File "neo/lib/threaded_app.py", line 133, in _handlePacket
          handler.dispatch(conn, packet, kw)
        File "neo/lib/handler.py", line 72, in dispatch
          method(conn, *args, **kw)
        File "neo/client/handlers/storage.py", line 122, in answerRebaseObject
          assert txn_context.conflict_dict[oid] == (serial, conflict)
      0. unanswered rebase from S2
      1. conflict resolved between t1 and t2 -> S1 & S2
      2. S1 reports a new conflict
      3. S2 answers to the rebase:
         returned serial (t1) is smaller than in conflict_dict (t2)
      4. S2 reports the same conflict as in 2
      storage: fix an AssertionError in internal replication · 560e4fb1
      Traceback (most recent call last):
        File "neo/storage/handlers/storage.py", line 111, in answerFetchObjects
        File "neo/storage/replicator.py", line 370, in finish
        File "neo/storage/replicator.py", line 279, in _nextPartition
          assert app.pt.getCell(offset, app.uuid).isOutOfDate()
      The scenario is:
      1. partition A: start of replication, with unfinished transactions
      2. partition A: all unfinished transactions are finished
      3. partition A: end of replication with ReplicationDone notification
      4. replication of partition B
      5. partition A: AssertionError when starting replication
      The bug is that in 3, the partition A is partially replicated and the storage
      node must not notify the master.
