1. 23 Mar, 2017 9 commits
  2. 22 Mar, 2017 1 commit
  3. 21 Mar, 2017 1 commit
  4. 20 Mar, 2017 2 commits
  5. 18 Mar, 2017 1 commit
    • Julien Muchembled's avatar
      master: fix crash when a transaction begins while a storage node starts operation · 781b4eb5
      Julien Muchembled authored
      Traceback (most recent call last):
        ...
        File "neo/lib/handler.py", line 72, in dispatch
          method(conn, *args, **kw)
        File "neo/master/handlers/client.py", line 70, in askFinishTransaction
          conn.getPeerId(),
        File "neo/master/transactions.py", line 387, in prepare
          assert node_list, (ready, failed)
      AssertionError: (set([]), frozenset([]))
      
      Master log leading to the crash:
        PACKET    #0x0009 StartOperation                 > S1
        PACKET    #0x0004 BeginTransaction               < C1
        DEBUG     Begin <...>
        PACKET    #0x0004 AnswerBeginTransaction         > C1
        PACKET    #0x0001 NotifyReady                    < S1
      
      It was wrong to process BeginTransaction before receiving NotifyReady.
      
      The changes in the storage are cosmetics: the 'ready' attribute has become
      redundant with 'operational'.
      781b4eb5
  6. 17 Mar, 2017 3 commits
  7. 14 Mar, 2017 4 commits
  8. 07 Mar, 2017 1 commit
  9. 03 Mar, 2017 1 commit
    • Julien Muchembled's avatar
      qa: fix random failure of check_checkCurrentSerialInTransaction · fec9a3a5
      Julien Muchembled authored
      Generators are not thread-safe:
      
      Exception in thread T2:
      Traceback (most recent call last):
        ...
        File "ZODB/tests/StorageTestBase.py", line 157, in _dostore
          r2 = self._storage.tpc_vote(t)
        File "neo/client/Storage.py", line 95, in tpc_vote
          return self.app.tpc_vote(transaction)
        File "neo/client/app.py", line 507, in tpc_vote
          self.waitStoreResponses(txn_context)
        File "neo/client/app.py", line 500, in waitStoreResponses
          _waitAnyTransactionMessage(txn_context)
        File "neo/client/app.py", line 145, in _waitAnyTransactionMessage
          self._waitAnyMessage(queue, block=block)
        File "neo/client/app.py", line 128, in _waitAnyMessage
          conn, packet, kw = get(block)
        File "neo/lib/locking.py", line 203, in get
          self._lock()
        File "neo/tests/threaded/__init__.py", line 590, in _lock
          for i in TIC_LOOP:
      ValueError: generator already executing
      
      ======================================================================
      FAIL: check_checkCurrentSerialInTransaction (neo.tests.zodb.testBasic.BasicTests)
      ----------------------------------------------------------------------
      Traceback (most recent call last):
        File "neo/tests/zodb/testBasic.py", line 33, in check_checkCurrentSerialInTransaction
          super(BasicTests, self).check_checkCurrentSerialInTransaction()
        File "ZODB/tests/BasicStorage.py", line 294, in check_checkCurrentSerialInTransaction
          utils.load_current(self._storage, b'\0\0\0\0\0\0\0\xf4')[1])
      failureException: False is not true
      fec9a3a5
  10. 02 Mar, 2017 2 commits
    • Julien Muchembled's avatar
      storage: fix PT updates in case of late AnswerUnfinishedTransactions · a74937c8
      Julien Muchembled authored
      This is done by moving
              self.replicator.populate()
      after the switch to MasterOperationHandler, so that the latter is not delayed.
      
      This change comes with some refactoring of the main loop,
      to clean up app.checker and app.replicator properly (like app.tm).
      
      Another option could have been to process notifications with the last handler,
      instead of the first one. But if possible, cleaning up the whole code to not
      delay handlers anymore looks the best option.
      a74937c8
    • Julien Muchembled's avatar
      mysql: code clean up · 041a3eda
      Julien Muchembled authored
      041a3eda
  11. 27 Feb, 2017 3 commits
    • Julien Muchembled's avatar
      Fix oids remaining write-locked forever · 9b33b1db
      Julien Muchembled authored
      This happened in 2 cases:
      - Commit a4c06242 ("Review aborting of
        transactions") introduced a race condition causing oids to remain
        write-locked forever after that the transaction modifying them is aborted.
      - An unfinished transaction is not locked/unlocked during tpc_finish: oids
        must be unlocked when being notified that the transaction is finished.
      9b33b1db
    • Julien Muchembled's avatar
      storage: fix bug not replicating unfinished transactions when the last ones are aborted · 7f754b5e
      Julien Muchembled authored
      This was found by the first assertion of answerRebaseObject (client) because
      a storage node missed a few transactions and reported a conflict with an older
      serial than the one being stored: this must never happen and this commit adds a
      more generic assertion on the storage side.
      
      The above case is when the "first phase" of replication of a partition
      (all history up to the tid before unfinished transactions) ended after
      that the unfinished transactions are finished: this was a corruption bug,
      where UP_TO_DATE cells could miss data.
      
      Otherwise, if the "first phase" ended before, then the partition remained stuck
      in OUT_OF_DATE state. Restarting the storage node was enough to recover.
      7f754b5e
    • Julien Muchembled's avatar
      client: fix an AssertionError while processing late AnswerRebaseObject · 44452395
      Julien Muchembled authored
      Traceback (most recent call last):
        ...
        File "neo/client/app.py", line 507, in tpc_vote
          self.waitStoreResponses(txn_context)
        File "neo/client/app.py", line 500, in waitStoreResponses
          _waitAnyTransactionMessage(txn_context)
        File "neo/client/app.py", line 150, in _waitAnyTransactionMessage
          self._handleConflicts(txn_context)
        File "neo/client/app.py", line 474, in _handleConflicts
          self._store(txn_context, oid, conflict_serial, data)
        File "neo/client/app.py", line 410, in _store
          self._waitAnyTransactionMessage(txn_context, False)
        File "neo/client/app.py", line 145, in _waitAnyTransactionMessage
          self._waitAnyMessage(queue, block=block)
        File "neo/client/app.py", line 133, in _waitAnyMessage
          _handlePacket(conn, packet, kw)
        File "neo/lib/threaded_app.py", line 133, in _handlePacket
          handler.dispatch(conn, packet, kw)
        File "neo/lib/handler.py", line 72, in dispatch
          method(conn, *args, **kw)
        File "neo/client/handlers/storage.py", line 122, in answerRebaseObject
          assert txn_context.conflict_dict[oid] == (serial, conflict)
      AssertionError
      
      Scenario:
      0. unanswered rebase from S2
      1. conflict resolved between t1 and t2 -> S1 & S2
      2. S1 reports a new conflict
      3. S2 answers to the rebase:
         returned serial (t1) is smaller than in conflict_dict (t2)
      4. S2 reports the same conflict as in 2
      44452395
  12. 24 Feb, 2017 2 commits
    • Julien Muchembled's avatar
      storage: fix an AssertionError in internal replication · 560e4fb1
      Julien Muchembled authored
      Traceback (most recent call last):
        ...
        File "neo/storage/handlers/storage.py", line 111, in answerFetchObjects
          self.app.replicator.finish()
        File "neo/storage/replicator.py", line 370, in finish
          self._nextPartition()
        File "neo/storage/replicator.py", line 279, in _nextPartition
          assert app.pt.getCell(offset, app.uuid).isOutOfDate()
      AssertionError
      
      The scenario is:
      1. partition A: start of replication, with unfinished transactions
      2. partition A: all unfinished transactions are finished
      3. partition A: end of replication with ReplicationDone notification
      4. replication of partition B
      5. partition A: AssertionError when starting replication
      
      The bug is that in 3, the partition A is partially replicated and the storage
      node must not notify the master.
      560e4fb1
    • Julien Muchembled's avatar
  13. 23 Feb, 2017 1 commit
  14. 21 Feb, 2017 6 commits
  15. 14 Feb, 2017 3 commits