1. 15 Jun, 2017 2 commits
  2. 14 Jun, 2017 1 commit
  3. 13 Jun, 2017 1 commit
  4. 12 Jun, 2017 7 commits
  5. 12 May, 2017 5 commits
  6. 11 May, 2017 1 commit
  7. 10 May, 2017 2 commits
  8. 04 May, 2017 1 commit
  9. 02 May, 2017 1 commit
    • Julien Muchembled's avatar
      master: fix identification of unknown masters · fbcf9c50
      Julien Muchembled authored
      This fixes the following crash:
      
        Traceback (most recent call last):
          ...
          File "neo/master/handlers/identification.py", line 94, in requestIdentification
            uuid = app.getNewUUID(uuid, address, node_type)
          File "neo/master/app.py", line 449, in getNewUUID
            assert uuid != self.uuid
        AssertionError
      fbcf9c50
  10. 28 Apr, 2017 3 commits
    • Julien Muchembled's avatar
      Better logging of connector errors · 29e8323c
      Julien Muchembled authored
      29e8323c
    • Julien Muchembled's avatar
    • Julien Muchembled's avatar
      client: fix possible data corruption after conflict resolutions with replicas · 46c36465
      Julien Muchembled authored
      This really fixes the bug described in
      commit 40bac312,
      which could probably be reverted. It only reduced the probability of failure.
      
      What happened is that the second conflict on 'a' for t3 what first reported by
      an answer to first store with:
      - a base serial at which a=0
      - a conflict serial at which a=7
      However, the cached data is not 8 anymore but 12, since a second store already
      occurred after the first conflict (reported by the other storage node).
      
      When this conflict was resolved before receiving the conflict for second store,
      it gave:
      
        resolve(old=0, saved=7, new=12) -> 19
      
      instead of:
      
        resolve(old=4, saved=7, new=12) -> 15
      
      (if we still had the data of the first store, we could also do
        resolve(old=0, saved=7, new=8)
       but that would be inefficient from a memory point of view)
      
      The bug was difficult to reproduce. testNotifyReplicated had to be run many
      many times before that race conditions trigger it. The test was changed to
      enforce some of them, and the above scenario now happens almost always.
      46c36465
  11. 27 Apr, 2017 7 commits
  12. 25 Apr, 2017 4 commits
  13. 24 Apr, 2017 5 commits
    • Julien Muchembled's avatar
      Reimplement election (of the primary master) · 23b6a66a
      Julien Muchembled authored
      The election is not a separate process anymore.
      It happens during the RECOVERING phase, and there's no use of timeouts anymore.
      
      Each master node keeps a timestamp of when it started to play the primary role,
      and the node with the smallest timestamp is elected. The election stops when
      the cluster is started: as long as it is operational, the primary master can't
      be deposed.
      
      An election must happen whenever the cluster is not operational anymore, to
      handle the case of a network cut between a primary master and all other nodes:
      then another master node (secondary) takes over and when the initial primary
      master is back, it loses against the new primary master if the cluster is
      already started.
      23b6a66a
    • Julien Muchembled's avatar
    • Julien Muchembled's avatar
      Remove BROKEN node state · 9d7f9795
      Julien Muchembled authored
      9d7f9795
    • Julien Muchembled's avatar
      Remove HIDDEN node state · b8210d58
      Julien Muchembled authored
      b8210d58
    • Julien Muchembled's avatar
      On NM update, fix removal of nodes that aren't part of the cluster anymore · f051b7a0
      Julien Muchembled authored
      In order to do that correctly, this commit contains several other changes:
      
      When connecting to a primary master, a full node list always follows the
      identification. For storage nodes, this means that they now know all nodes
      during the RECOVERING phase.
      
      The initial full node list now always contains a node tuple for:
      - the server-side node (i.e. the primary master): on a master, this is
        done by always having a node describing itself in its node manager.
      - the client-side node, to make sure it gets a id timestamp:
        now an admin node also receives a node for itself.
      f051b7a0