1. 25 Nov, 2016 1 commit
  2. 29 Aug, 2016 1 commit
  3. 22 Mar, 2016 1 commit
  4. 04 Mar, 2016 1 commit
    • Julien Muchembled's avatar
      storage: defer commit when unlocking a transaction (-> better performance) · eaa07e25
      Julien Muchembled authored
      Before this change, a storage node did 3 commits per transaction:
      - once all data are stored
      - when locking the transaction
      - when unlocking the transaction
      The last one is not important for ACID. In case of a crash, the transaction
      is unlocked again (verification phase). By deferring it by 1 second, we
      only have 2 commits per transaction during high activity because all pending
      changes are merged with the commits caused by other transactions.
      This change compensates the extra commit(s) per transaction that were
      introduced in commit 7eb7cf1b
      ("Minimize the amount of work during tpc_finish").
  5. 01 Dec, 2015 1 commit
    • Julien Muchembled's avatar
      Safer DB truncation, new 'truncate' ctl command · d3c8b76d
      Julien Muchembled authored
      With the previous commit, the request to truncate the DB was not stored
      persistently, which means that this operation was still vulnerable to the case
      where the master is restarted after some nodes, but not all, have already
      truncated. The master didn't have the information to fix this and the result
      was a DB partially truncated.
      -> On a Truncate packet, a storage node only stores the tid somewhere, to send
         it back to the master, which stays in RECOVERING state as long as any node
         has a different value than that of the node with the latest partition table.
      We also want to make sure that there is no unfinished data, because a user may
      truncate at a tid higher than a locked one.
      -> Truncation is now effective at the end on the VERIFYING phase, just before
         returning the last ids to the master.
      At last all nodes should be truncated, to avoid that an offline node comes back
      with a different history. Currently, this would not be an issue since
      replication is always restart from the beginning, but later we'd like they
      remember where they stopped to replicate.
      -> If a truncation is requested, the master waits for all nodes to be pending,
         even if it was previously started (the user can still force the cluster to
         start with neoctl). And any lost node during verification also causes the
         master to go back to recovery.
      Obviously, the protocol has been changed to split the LastIDs packet and
      introduce a new Recovery, since it does not make sense anymore to ask last ids
      during recovery.
  6. 30 Nov, 2015 1 commit
    • Julien Muchembled's avatar
      Minimize the amount of work during tpc_finish · 7eb7cf1b
      Julien Muchembled authored
      NEO did not ensure that all data and metadata are written on disk before
      tpc_finish, and it was for example vulnerable to ENOSPC errors.
      In other words, some work had to be moved to tpc_vote:
      - In tpc_vote, all involved storage nodes are now asked to write all metadata
        to ttrans/tobj and _commit_. Because the final tid is not known yet, the tid
        column of ttrans and tobj now contains NULL and the ttid respectively.
      - In tpc_finish, AskLockInformation is still required for read locking,
        ttrans.tid is updated with the final value and this change is _committed_.
      - The verification phase is greatly simplified, more reliable and faster. For
        all voted transactions, we can know if a tpc_finish was started by getting
        the final tid from the ttid, either from ttrans or from trans. And we know
        that such transactions can't be partial so we don't need to check oids.
      So in addition to minimizing the risk of failures during tpc_finish, we also
      fix a bug causing the verification phase to discard transactions with objects
      for which readCurrent was called.
      On performance side:
      - Although tpc_vote now asks all involved storages, instead of only those
        storing the transaction metadata, the client has been improved to do this
        in parallel. The additional commits are also all done in parallel.
      - A possible improvement to compensate the additional commits is to delay the
        commit done by the unlock.
      - By minimizing the time to lock transactions, objects are read-locked for a
        much shorter period. This is even more important that locked transactions
        must be unlocked in the same order.
      Transactions with too many modified objects will now timeout inside tpc_vote
      instead of tpc_finish. Of course, such transactions may still cause other
      transaction to timeout in tpc_finish.
  7. 25 Nov, 2015 1 commit
  8. 29 Oct, 2015 1 commit
  9. 05 Oct, 2015 1 commit
  10. 24 Sep, 2015 1 commit
  11. 23 Sep, 2015 1 commit
  12. 15 Sep, 2015 1 commit
  13. 28 Aug, 2015 1 commit
    • Julien Muchembled's avatar
      Fix occasional deadlocks in threaded tests · 0b93b1fb
      Julien Muchembled authored
      deadlocks mainly happened while stopping a cluster, hence the complete review
      of NEOCluster.stop()
      A major change is to make the client node handle its lock like other nodes
      (i.e. in the polling thread itself) to better know when to call
      Serialized.background() (there was a race condition with the test of
      'self.poll_thread.isAlive()' in ClientApplication.close).
  14. 14 Aug, 2015 1 commit
    • Julien Muchembled's avatar
      Do not reconnect too quickly to a node after an error · d898a83d
      Julien Muchembled authored
      For example, a backup storage node that was rejected because the upstream
      cluster was not ready could reconnect in loop without delay, using 100% CPU
      and flooding logs.
      A new 'setReconnectionNoDelay' method on Connection can be used for cases where
      it's legitimate to quickly reconnect.
      With this new delayed reconnection, it's possible to remove the remaining
  15. 12 Aug, 2015 1 commit
  16. 13 Jul, 2015 1 commit
  17. 10 Jul, 2015 1 commit
  18. 25 Jul, 2014 2 commits
  19. 20 Jun, 2014 1 commit
    • Julien Muchembled's avatar
      client: clean up import/export code · d562bf8f
      Julien Muchembled authored
      - Remove leftover warning about a bug that was fixed in
        commit e76af297
      - In neomigrate script, open NEO storage read-only.
      - IStorageIteration is already implemented.
      - Review comments.
      - In neomigrate script, warn that IStorageRestoreable is not implemented.
      - Do not call 'close' method on source iterator. BaseStorage does not do it and
        this is not part of ZODB API. In the case of FileStorage, resource are freed
        automatically during garbage collection.
  20. 03 Jun, 2014 1 commit
  21. 29 May, 2014 1 commit
  22. 07 Jan, 2014 1 commit
    • Julien Muchembled's avatar
      Add test showing that clients may be stuck on an old snapshot in case of failure during tpc_finish · fd4cfaa9
      Julien Muchembled authored
      If anything wrong happens after a transaction is locked and before the end of
      onTransactionCommitted, recovery phase should be run again, so that the master
      gets correct last tid.
      Following patch by Vincent is an attempt to fix this:
      --- a/neo/master/app.py
      +++ b/neo/master/app.py
      @@ -329,8 +329,8 @@ def playPrimaryRole(self):
               # recover the cluster status at startup
      -            self.runManager(RecoveryManager)
                   while True:
      +                self.runManager(RecoveryManager)
                           if self.backup_tid:
      @@ -338,10 +338,6 @@ def playPrimaryRole(self):
                                   raise RuntimeError("No upstream cluster to backup"
                                                      " defined in configuration")
      -                        # Reset connection with storages (and go through a
      -                        # recovery phase) when leaving backup mode in order
      -                        # to get correct last oid/tid.
      -                        self.runManager(RecoveryManager)
                       except OperationFailure:
  23. 23 Aug, 2012 1 commit
  24. 20 Aug, 2012 2 commits
    • Julien Muchembled's avatar
      Comment about backup limitations · dd556379
      Julien Muchembled authored
    • Julien Muchembled's avatar
      More bugfixes to backup mode · 08742377
      Julien Muchembled authored
      - catch OperationFailure
      - reset transaction manager when leaving backup mode
      - send appropriate target tid to a storage that updates a outdated cell
      - clean up partition table when leaving BACKINGUP state unexpectedly
      - make sure all readable cells of a partition have the same 'backup_tid'
        if they have the same data, so that we know when internal replication is
        finished when leaving backup mode
      - fix storage not finished internal replication when leaving backup mode
  25. 16 Aug, 2012 1 commit
  26. 15 Aug, 2012 1 commit
  27. 10 Aug, 2012 1 commit
    • Julien Muchembled's avatar
      Start renaming UUID into NID, because node IDs are not 128 bits length anymore · b81ae60a
      Julien Muchembled authored
      SQL tables can be upgraded using:
        UPDATE config SET name = 'nid' WHERE name = 'uuid';
      then for MySQL:
        ALTER TABLE pt CHANGE uuid nid INT NOT NULL;
      or SQLite:
        ALTER TABLE pt RENAME TO old_pt;
        INSERT INTO pt SELECT * from old_pt;
        DROP TABLE old_pt;
  28. 23 Jul, 2012 2 commits
  29. 13 Jul, 2012 1 commit
  30. 06 Jul, 2012 1 commit
  31. 05 Jul, 2012 1 commit
  32. 23 Apr, 2012 1 commit
    • Vincent Pelletier's avatar
      Document an RC bug on tpc_finish. · 6c500078
      Vincent Pelletier authored
      Also, change the way TTIDs are generated in preparation for that bug's fix:
      we will need TTID to be monotonous across master restarts and TID generator
      provides this feature.
  33. 12 Mar, 2012 2 commits
  34. 24 Feb, 2012 1 commit
    • Julien Muchembled's avatar
      Implements backup using specialised storage nodes and relying on replication · 8e3c7b01
      Julien Muchembled authored
      Replication is also fully reimplemented:
      - It is not done anymore on whole partitions.
      - It runs at lowest priority not to degrades performance for client nodes.
      Schema of MySQL table is changed to optimize storage layout: rows are now
      grouped by age, for good partial replication performance.
      This certainly also speeds up simple loads/stores.
  35. 29 Sep, 2011 1 commit
  36. 09 Sep, 2011 1 commit