1. 04 Jul, 2017 1 commit
  2. 30 Jun, 2017 1 commit
  3. 29 Jun, 2017 1 commit
  4. 16 Jun, 2017 1 commit
  5. 15 Jun, 2017 2 commits
  6. 14 Jun, 2017 1 commit
  7. 13 Jun, 2017 1 commit
  8. 12 Jun, 2017 7 commits
  9. 12 May, 2017 5 commits
  10. 11 May, 2017 1 commit
  11. 10 May, 2017 2 commits
  12. 04 May, 2017 1 commit
  13. 02 May, 2017 1 commit
    • Julien Muchembled's avatar
      master: fix identification of unknown masters · fbcf9c50
      Julien Muchembled authored
      This fixes the following crash:
      
        Traceback (most recent call last):
          ...
          File "neo/master/handlers/identification.py", line 94, in requestIdentification
            uuid = app.getNewUUID(uuid, address, node_type)
          File "neo/master/app.py", line 449, in getNewUUID
            assert uuid != self.uuid
        AssertionError
      fbcf9c50
  14. 28 Apr, 2017 3 commits
    • Julien Muchembled's avatar
      Better logging of connector errors · 29e8323c
      Julien Muchembled authored
      29e8323c
    • Julien Muchembled's avatar
    • Julien Muchembled's avatar
      client: fix possible data corruption after conflict resolutions with replicas · 46c36465
      Julien Muchembled authored
      This really fixes the bug described in
      commit 40bac312,
      which could probably be reverted. It only reduced the probability of failure.
      
      What happened is that the second conflict on 'a' for t3 what first reported by
      an answer to first store with:
      - a base serial at which a=0
      - a conflict serial at which a=7
      However, the cached data is not 8 anymore but 12, since a second store already
      occurred after the first conflict (reported by the other storage node).
      
      When this conflict was resolved before receiving the conflict for second store,
      it gave:
      
        resolve(old=0, saved=7, new=12) -> 19
      
      instead of:
      
        resolve(old=4, saved=7, new=12) -> 15
      
      (if we still had the data of the first store, we could also do
        resolve(old=0, saved=7, new=8)
       but that would be inefficient from a memory point of view)
      
      The bug was difficult to reproduce. testNotifyReplicated had to be run many
      many times before that race conditions trigger it. The test was changed to
      enforce some of them, and the above scenario now happens almost always.
      46c36465
  15. 27 Apr, 2017 7 commits
  16. 25 Apr, 2017 4 commits
  17. 24 Apr, 2017 1 commit
    • Julien Muchembled's avatar
      Reimplement election (of the primary master) · 23b6a66a
      Julien Muchembled authored
      The election is not a separate process anymore.
      It happens during the RECOVERING phase, and there's no use of timeouts anymore.
      
      Each master node keeps a timestamp of when it started to play the primary role,
      and the node with the smallest timestamp is elected. The election stops when
      the cluster is started: as long as it is operational, the primary master can't
      be deposed.
      
      An election must happen whenever the cluster is not operational anymore, to
      handle the case of a network cut between a primary master and all other nodes:
      then another master node (secondary) takes over and when the initial primary
      master is back, it loses against the new primary master if the cluster is
      already started.
      23b6a66a