1. 27 Feb, 2017 3 commits
    • Julien Muchembled's avatar
      Fix oids remaining write-locked forever · 9b33b1db
      Julien Muchembled authored
      This happened in 2 cases:
      - Commit a4c06242 ("Review aborting of
        transactions") introduced a race condition causing oids to remain
        write-locked forever after that the transaction modifying them is aborted.
      - An unfinished transaction is not locked/unlocked during tpc_finish: oids
        must be unlocked when being notified that the transaction is finished.
      9b33b1db
    • Julien Muchembled's avatar
      storage: fix bug not replicating unfinished transactions when the last ones are aborted · 7f754b5e
      Julien Muchembled authored
      This was found by the first assertion of answerRebaseObject (client) because
      a storage node missed a few transactions and reported a conflict with an older
      serial than the one being stored: this must never happen and this commit adds a
      more generic assertion on the storage side.
      
      The above case is when the "first phase" of replication of a partition
      (all history up to the tid before unfinished transactions) ended after
      that the unfinished transactions are finished: this was a corruption bug,
      where UP_TO_DATE cells could miss data.
      
      Otherwise, if the "first phase" ended before, then the partition remained stuck
      in OUT_OF_DATE state. Restarting the storage node was enough to recover.
      7f754b5e
    • Julien Muchembled's avatar
      client: fix an AssertionError while processing late AnswerRebaseObject · 44452395
      Julien Muchembled authored
      Traceback (most recent call last):
        ...
        File "neo/client/app.py", line 507, in tpc_vote
          self.waitStoreResponses(txn_context)
        File "neo/client/app.py", line 500, in waitStoreResponses
          _waitAnyTransactionMessage(txn_context)
        File "neo/client/app.py", line 150, in _waitAnyTransactionMessage
          self._handleConflicts(txn_context)
        File "neo/client/app.py", line 474, in _handleConflicts
          self._store(txn_context, oid, conflict_serial, data)
        File "neo/client/app.py", line 410, in _store
          self._waitAnyTransactionMessage(txn_context, False)
        File "neo/client/app.py", line 145, in _waitAnyTransactionMessage
          self._waitAnyMessage(queue, block=block)
        File "neo/client/app.py", line 133, in _waitAnyMessage
          _handlePacket(conn, packet, kw)
        File "neo/lib/threaded_app.py", line 133, in _handlePacket
          handler.dispatch(conn, packet, kw)
        File "neo/lib/handler.py", line 72, in dispatch
          method(conn, *args, **kw)
        File "neo/client/handlers/storage.py", line 122, in answerRebaseObject
          assert txn_context.conflict_dict[oid] == (serial, conflict)
      AssertionError
      
      Scenario:
      0. unanswered rebase from S2
      1. conflict resolved between t1 and t2 -> S1 & S2
      2. S1 reports a new conflict
      3. S2 answers to the rebase:
         returned serial (t1) is smaller than in conflict_dict (t2)
      4. S2 reports the same conflict as in 2
      44452395
  2. 24 Feb, 2017 2 commits
    • Julien Muchembled's avatar
      storage: fix an AssertionError in internal replication · 560e4fb1
      Julien Muchembled authored
      Traceback (most recent call last):
        ...
        File "neo/storage/handlers/storage.py", line 111, in answerFetchObjects
          self.app.replicator.finish()
        File "neo/storage/replicator.py", line 370, in finish
          self._nextPartition()
        File "neo/storage/replicator.py", line 279, in _nextPartition
          assert app.pt.getCell(offset, app.uuid).isOutOfDate()
      AssertionError
      
      The scenario is:
      1. partition A: start of replication, with unfinished transactions
      2. partition A: all unfinished transactions are finished
      3. partition A: end of replication with ReplicationDone notification
      4. replication of partition B
      5. partition A: AssertionError when starting replication
      
      The bug is that in 3, the partition A is partially replicated and the storage
      node must not notify the master.
      560e4fb1
    • Julien Muchembled's avatar
  3. 23 Feb, 2017 1 commit
  4. 21 Feb, 2017 6 commits
  5. 14 Feb, 2017 8 commits
  6. 02 Feb, 2017 10 commits
  7. 26 Jan, 2017 1 commit
  8. 19 Jan, 2017 2 commits
  9. 18 Jan, 2017 4 commits
  10. 17 Jan, 2017 3 commits