• Traceback (most recent call last):
      ...
      File "neo/lib/handler.py", line 72, in dispatch
        method(conn, *args, **kw)
      File "neo/master/handlers/client.py", line 70, in askFinishTransaction
        conn.getPeerId(),
      File "neo/master/transactions.py", line 387, in prepare
        assert node_list, (ready, failed)
    AssertionError: (set([]), frozenset([]))
    
    Master log leading to the crash:
      PACKET    #0x0009 StartOperation                 > S1
      PACKET    #0x0004 BeginTransaction               < C1
      DEBUG     Begin <...>
      PACKET    #0x0004 AnswerBeginTransaction         > C1
      PACKET    #0x0001 NotifyReady                    < S1
    
    It was wrong to process BeginTransaction before receiving NotifyReady.
    
    The changes in the storage are cosmetics: the 'ready' attribute has become
    redundant with 'operational'.
    by Julien Muchembled
     
    Browse Files




  • Generators are not thread-safe:
    
    Exception in thread T2:
    Traceback (most recent call last):
      ...
      File "ZODB/tests/StorageTestBase.py", line 157, in _dostore
        r2 = self._storage.tpc_vote(t)
      File "neo/client/Storage.py", line 95, in tpc_vote
        return self.app.tpc_vote(transaction)
      File "neo/client/app.py", line 507, in tpc_vote
        self.waitStoreResponses(txn_context)
      File "neo/client/app.py", line 500, in waitStoreResponses
        _waitAnyTransactionMessage(txn_context)
      File "neo/client/app.py", line 145, in _waitAnyTransactionMessage
        self._waitAnyMessage(queue, block=block)
      File "neo/client/app.py", line 128, in _waitAnyMessage
        conn, packet, kw = get(block)
      File "neo/lib/locking.py", line 203, in get
        self._lock()
      File "neo/tests/threaded/__init__.py", line 590, in _lock
        for i in TIC_LOOP:
    ValueError: generator already executing
    
    ======================================================================
    FAIL: check_checkCurrentSerialInTransaction (neo.tests.zodb.testBasic.BasicTests)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "neo/tests/zodb/testBasic.py", line 33, in check_checkCurrentSerialInTransaction
        super(BasicTests, self).check_checkCurrentSerialInTransaction()
      File "ZODB/tests/BasicStorage.py", line 294, in check_checkCurrentSerialInTransaction
        utils.load_current(self._storage, b'\0\0\0\0\0\0\0\xf4')[1])
    failureException: False is not true
    by Julien Muchembled
     
    Browse Files

  • This is done by moving
            self.replicator.populate()
    after the switch to MasterOperationHandler, so that the latter is not delayed.
    
    This change comes with some refactoring of the main loop,
    to clean up app.checker and app.replicator properly (like app.tm).
    
    Another option could have been to process notifications with the last handler,
    instead of the first one. But if possible, cleaning up the whole code to not
    delay handlers anymore looks the best option.
    by Julien Muchembled
     
    Browse Files
  • by Julien Muchembled
     
    Browse Files

  • This happened in 2 cases:
    - Commit a4c06242 ("Review aborting of
      transactions") introduced a race condition causing oids to remain
      write-locked forever after that the transaction modifying them is aborted.
    - An unfinished transaction is not locked/unlocked during tpc_finish: oids
      must be unlocked when being notified that the transaction is finished.
    by Julien Muchembled
     
    Browse Files
  • This was found by the first assertion of answerRebaseObject (client) because
    a storage node missed a few transactions and reported a conflict with an older
    serial than the one being stored: this must never happen and this commit adds a
    more generic assertion on the storage side.
    
    The above case is when the "first phase" of replication of a partition
    (all history up to the tid before unfinished transactions) ended after
    that the unfinished transactions are finished: this was a corruption bug,
    where UP_TO_DATE cells could miss data.
    
    Otherwise, if the "first phase" ended before, then the partition remained stuck
    in OUT_OF_DATE state. Restarting the storage node was enough to recover.
    by Julien Muchembled
     
    Browse Files
  • Traceback (most recent call last):
      ...
      File "neo/client/app.py", line 507, in tpc_vote
        self.waitStoreResponses(txn_context)
      File "neo/client/app.py", line 500, in waitStoreResponses
        _waitAnyTransactionMessage(txn_context)
      File "neo/client/app.py", line 150, in _waitAnyTransactionMessage
        self._handleConflicts(txn_context)
      File "neo/client/app.py", line 474, in _handleConflicts
        self._store(txn_context, oid, conflict_serial, data)
      File "neo/client/app.py", line 410, in _store
        self._waitAnyTransactionMessage(txn_context, False)
      File "neo/client/app.py", line 145, in _waitAnyTransactionMessage
        self._waitAnyMessage(queue, block=block)
      File "neo/client/app.py", line 133, in _waitAnyMessage
        _handlePacket(conn, packet, kw)
      File "neo/lib/threaded_app.py", line 133, in _handlePacket
        handler.dispatch(conn, packet, kw)
      File "neo/lib/handler.py", line 72, in dispatch
        method(conn, *args, **kw)
      File "neo/client/handlers/storage.py", line 122, in answerRebaseObject
        assert txn_context.conflict_dict[oid] == (serial, conflict)
    AssertionError
    
    Scenario:
    0. unanswered rebase from S2
    1. conflict resolved between t1 and t2 -> S1 & S2
    2. S1 reports a new conflict
    3. S2 answers to the rebase:
       returned serial (t1) is smaller than in conflict_dict (t2)
    4. S2 reports the same conflict as in 2
    by Julien Muchembled
     
    Browse Files

  • Traceback (most recent call last):
      ...
      File "neo/storage/handlers/storage.py", line 111, in answerFetchObjects
        self.app.replicator.finish()
      File "neo/storage/replicator.py", line 370, in finish
        self._nextPartition()
      File "neo/storage/replicator.py", line 279, in _nextPartition
        assert app.pt.getCell(offset, app.uuid).isOutOfDate()
    AssertionError
    
    The scenario is:
    1. partition A: start of replication, with unfinished transactions
    2. partition A: all unfinished transactions are finished
    3. partition A: end of replication with ReplicationDone notification
    4. replication of partition B
    5. partition A: AssertionError when starting replication
    
    The bug is that in 3, the partition A is partially replicated and the storage
    node must not notify the master.
    by Julien Muchembled
     
    Browse Files
  • by Julien Muchembled
     
    Browse Files