1. 30 Nov, 2015 9 commits
    • Julien Muchembled's avatar
      master: fix possible blockage during recovery after a storage disconnection · 2485f151
      Julien Muchembled authored
      At some point, the master asks a storage node its partition table. If this node
      is lost before getting an answer, another node (or the same one if it comes
      back) must be asked.
      
      Before this change, the master node had to be restarted.
      2485f151
    • Julien Muchembled's avatar
      master: last tid/oid after recovery/verification · dec81519
      Julien Muchembled authored
      The important bugfix is to update the last oid when the master verifies a
      transaction with new oids.
      
      By resetting the transaction manager at the beginning of the recovery phase,
      it become possible to avoid tid/oid holes:
      - by reallocating previously unused allocated oids
      - when going back "in the past", i.e. reverting to an older version of the
        database (with fewer oids) and/or adjusting the clock
      dec81519
    • Julien Muchembled's avatar
      Go back/stay in RECOVERING state when the partition table can't be operational · e1f9a7da
      Julien Muchembled authored
      This fixes several cases where the partition table could become corrupt and
      the whole cluster being stuck in VERIFYING state.
      
      This also reduces the probability the have cells out of date when restarting
      several storage nodes simultaneously.
      
      At last, if a master node becomes primary again, a cluster must not be started
      automatically if nodes with readable cells are missing, in order to avoid
      a split of the database. This could happen if this master node was previously
      forced to start it.
      e1f9a7da
    • Julien Muchembled's avatar
      Minimize the amount of work during tpc_finish · 7eb7cf1b
      Julien Muchembled authored
      NEO did not ensure that all data and metadata are written on disk before
      tpc_finish, and it was for example vulnerable to ENOSPC errors.
      In other words, some work had to be moved to tpc_vote:
      
      - In tpc_vote, all involved storage nodes are now asked to write all metadata
        to ttrans/tobj and _commit_. Because the final tid is not known yet, the tid
        column of ttrans and tobj now contains NULL and the ttid respectively.
      
      - In tpc_finish, AskLockInformation is still required for read locking,
        ttrans.tid is updated with the final value and this change is _committed_.
      
      - The verification phase is greatly simplified, more reliable and faster. For
        all voted transactions, we can know if a tpc_finish was started by getting
        the final tid from the ttid, either from ttrans or from trans. And we know
        that such transactions can't be partial so we don't need to check oids.
      
      So in addition to minimizing the risk of failures during tpc_finish, we also
      fix a bug causing the verification phase to discard transactions with objects
      for which readCurrent was called.
      
      On performance side:
      
      - Although tpc_vote now asks all involved storages, instead of only those
        storing the transaction metadata, the client has been improved to do this
        in parallel. The additional commits are also all done in parallel.
      
      - A possible improvement to compensate the additional commits is to delay the
        commit done by the unlock.
      
      - By minimizing the time to lock transactions, objects are read-locked for a
        much shorter period. This is even more important that locked transactions
        must be unlocked in the same order.
      
      Transactions with too many modified objects will now timeout inside tpc_vote
      instead of tpc_finish. Of course, such transactions may still cause other
      transaction to timeout in tpc_finish.
      7eb7cf1b
    • Julien Muchembled's avatar
    • Julien Muchembled's avatar
      fixup! storage: fix pruning of data when deleting partial transactions during verification · cff279af
      Julien Muchembled authored
      This fixes a regression in commit 83fe64bf
      when ttrans has several rows to the same data_id.
      cff279af
    • Julien Muchembled's avatar
    • Julien Muchembled's avatar
    • Julien Muchembled's avatar
      ssl: consider connections completed after the handshake · aaefaf8b
      Julien Muchembled authored
      - Server connections can now be in 'connecting' state.
      - connectionAccepted event (which has never been used so far) is merged into
        connectionCompleted.
      aaefaf8b
  2. 25 Nov, 2015 13 commits
  3. 03 Nov, 2015 2 commits
  4. 29 Oct, 2015 2 commits
  5. 26 Oct, 2015 2 commits
  6. 21 Oct, 2015 3 commits
  7. 20 Oct, 2015 1 commit
  8. 19 Oct, 2015 4 commits
  9. 16 Oct, 2015 1 commit
  10. 13 Oct, 2015 1 commit
  11. 12 Oct, 2015 1 commit
  12. 05 Oct, 2015 1 commit