1. 30 Nov, 2015 4 commits
    • Julien Muchembled's avatar
      Perform DB truncation during recovery, send PT to storages before verification · 3e3eab5b
      Julien Muchembled authored
      Currently, the database may only be truncated when leaving backup mode, but
      the issue will be the same when neoctl gets a new command to truncate at an
      arbitrary tid: we want to be sure that all nodes are truncated before anything
      else.
      
      Therefore, we stop sending Truncate orders before stopping operation because
      nodes could fail/exit before actually processing them. Truncation must also
      happen before asking nodes their last ids.
      
      With this commit, if a truncation is requested:
      - this is always the first thing done when a storage node connects to the
        primary master during the RECOVERING phase,
      - and the cluster does not start automatically if there are missing nodes,
        unless an admin forces it.
      
      Other changes:
      - Connections to storage nodes don't need to be aborted anymore when leaving
        backup mode.
      - The master always initiates communication when a storage node identifies,
        which simplifies code and reduces the number of exchanged packets.
      3e3eab5b
    • Julien Muchembled's avatar
      Go back/stay in RECOVERING state when the partition table can't be operational · e1f9a7da
      Julien Muchembled authored
      This fixes several cases where the partition table could become corrupt and
      the whole cluster being stuck in VERIFYING state.
      
      This also reduces the probability the have cells out of date when restarting
      several storage nodes simultaneously.
      
      At last, if a master node becomes primary again, a cluster must not be started
      automatically if nodes with readable cells are missing, in order to avoid
      a split of the database. This could happen if this master node was previously
      forced to start it.
      e1f9a7da
    • Julien Muchembled's avatar
      Minimize the amount of work during tpc_finish · 7eb7cf1b
      Julien Muchembled authored
      NEO did not ensure that all data and metadata are written on disk before
      tpc_finish, and it was for example vulnerable to ENOSPC errors.
      In other words, some work had to be moved to tpc_vote:
      
      - In tpc_vote, all involved storage nodes are now asked to write all metadata
        to ttrans/tobj and _commit_. Because the final tid is not known yet, the tid
        column of ttrans and tobj now contains NULL and the ttid respectively.
      
      - In tpc_finish, AskLockInformation is still required for read locking,
        ttrans.tid is updated with the final value and this change is _committed_.
      
      - The verification phase is greatly simplified, more reliable and faster. For
        all voted transactions, we can know if a tpc_finish was started by getting
        the final tid from the ttid, either from ttrans or from trans. And we know
        that such transactions can't be partial so we don't need to check oids.
      
      So in addition to minimizing the risk of failures during tpc_finish, we also
      fix a bug causing the verification phase to discard transactions with objects
      for which readCurrent was called.
      
      On performance side:
      
      - Although tpc_vote now asks all involved storages, instead of only those
        storing the transaction metadata, the client has been improved to do this
        in parallel. The additional commits are also all done in parallel.
      
      - A possible improvement to compensate the additional commits is to delay the
        commit done by the unlock.
      
      - By minimizing the time to lock transactions, objects are read-locked for a
        much shorter period. This is even more important that locked transactions
        must be unlocked in the same order.
      
      Transactions with too many modified objects will now timeout inside tpc_vote
      instead of tpc_finish. Of course, such transactions may still cause other
      transaction to timeout in tpc_finish.
      7eb7cf1b
    • Julien Muchembled's avatar
      fixup! storage: fix pruning of data when deleting partial transactions during verification · cff279af
      Julien Muchembled authored
      This fixes a regression in commit 83fe64bf
      when ttrans has several rows to the same data_id.
      cff279af
  2. 25 Nov, 2015 4 commits
  3. 03 Nov, 2015 1 commit
  4. 26 Oct, 2015 1 commit
  5. 20 Oct, 2015 1 commit
  6. 19 Oct, 2015 3 commits
  7. 16 Oct, 2015 1 commit
  8. 13 Oct, 2015 1 commit
  9. 12 Oct, 2015 1 commit
  10. 05 Oct, 2015 1 commit
  11. 02 Oct, 2015 1 commit
  12. 01 Oct, 2015 1 commit
    • Julien Muchembled's avatar
      Review API betweeen connections and connectors · 57481c35
      Julien Muchembled authored
      - Review error handling. Only 2 exceptions remain in connector.py:
      
        - Drop useless exception handling for EAGAIN since it should not happen
          if the kernel says the socket is ready.
        - Do not distinguish other socket errors. Just close and log in a generic way.
        - No need to raise a specific exception for EOF.
        - Make 'connect' return a boolean instead of raising an exception.
        - Raise appropriate exception when answer/ask/notify is called on a closed
          non-MT connection.
      
      - Add support for more complex connectors, which may need to write for a read
        operation, or to read when there's pending data to send. This will be
        required for SSL support (more exactly, the handshake will be done in
        a transparent way):
      
        - Move write buffer to connector.
        - Make 'receive' fill the read buffer, instead of returning the read data.
        - Make 'receive' & 'send' return a boolean to switch polling for writing.
        - Tolerate that sockets return 0 as number of bytes sent.
      
      - In testConnection, simply delete all failing tests, as announced
        in commit 71e30fb9.
      57481c35
  13. 24 Sep, 2015 2 commits
  14. 28 Aug, 2015 1 commit
    • Julien Muchembled's avatar
      storage: fix history() not waiting oid to be unlocked · e27358d1
      Julien Muchembled authored
      This fixes a random failure in testClientReconnection:
      
      Traceback (most recent call last):
        File "neo/tests/threaded/test.py", line 754, in testClientReconnection
          self.assertTrue(cluster.client.history(x1._p_oid))
      failureException: None is not true
      e27358d1
  15. 14 Aug, 2015 1 commit
    • Julien Muchembled's avatar
      Do not reconnect too quickly to a node after an error · d898a83d
      Julien Muchembled authored
      For example, a backup storage node that was rejected because the upstream
      cluster was not ready could reconnect in loop without delay, using 100% CPU
      and flooding logs.
      
      A new 'setReconnectionNoDelay' method on Connection can be used for cases where
      it's legitimate to quickly reconnect.
      
      With this new delayed reconnection, it's possible to remove the remaining
      time.sleep().
      d898a83d
  16. 12 Aug, 2015 2 commits
  17. 24 Jun, 2015 4 commits
  18. 15 Jun, 2015 1 commit
  19. 09 Jun, 2015 2 commits
  20. 21 May, 2015 1 commit
  21. 05 May, 2015 1 commit
  22. 26 Apr, 2015 2 commits
  23. 05 Dec, 2014 2 commits
  24. 07 Nov, 2014 1 commit