1. 27 Apr, 2017 4 commits
    • Julien Muchembled's avatar
    • Julien Muchembled's avatar
      Improvements to --dynamic-master-list · 8e7d4aa7
      Julien Muchembled authored
      - atomic write to disk to avoid corruption
      - update when the address changes (not only when a node is removed/added)
    • Julien Muchembled's avatar
      Make NodeManager.remove stricter · 017f248d
      Julien Muchembled authored
    • Julien Muchembled's avatar
      Check protocol version, on both connection sides, before parsing any packet · a60e36e8
      Julien Muchembled authored
      This fixes 2 issues:
      - Because neoctl connects to admin nodes without requesting identification,
        the protocol version was not checked, which could even be dangerous
        (think of a user asking for information, but the packet sent by neoctl
        could be decoded as a packet to alter data, like Truncate).
      - In case of mismatched protocol version, the error was not loggued on the
        node that initiated the connection.
      Compatibility is handled as follows:
      - For an old node receiving data from a new node, the 2 high order bytes of the
        packet id, which is always 0 for the first packet, is decoded as the packet
        code. Packet 0 has never existed, which results in PacketMalformedError.
      - For a new node receiving data from an old node, the id of the first packet,
        which is always 0, is decoded as the version, which results in a version
        mismatch error.
      This new protocol also guarantees that there's no conflict with SSL.
      For simplification, the packet length does not count the header anymore.
  2. 25 Apr, 2017 4 commits
  3. 24 Apr, 2017 6 commits
  4. 19 Apr, 2017 2 commits
  5. 18 Apr, 2017 4 commits
  6. 13 Apr, 2017 1 commit
  7. 04 Apr, 2017 1 commit
  8. 31 Mar, 2017 14 commits
  9. 30 Mar, 2017 1 commit
  10. 23 Mar, 2017 3 commits
    • Julien Muchembled's avatar
      storage: in deadlock avoidance, fix performance issue that could freeze the cluster · 1280f73e
      Julien Muchembled authored
      In the worst case, with many clients trying to lock the same oids,
      the cluster could enter in an infinite cascade of deadlocks.
      Here is an overview with 3 storage nodes and 3 transactions:
       S1     S2     S3     order of locking tids          # abbreviations:
       l1     l1     l2     123                            #  l: lock
       q23    q23    d1q3   231                            #  d: deadlock triggered
       r1:l3  r1:l2  (r1)   # for S3, we still have l2     #  q: queued
       d2q1   q13    q13    312                            #  r: rebase
      Above, we show what happens when a random transaction gets a lock just after
      that another is rebased. Here, the result is that the last 2 lines are a
      permutation of the first 2, and this can repeat indefinitely with bad luck.
      This commit reduces the probability of deadlock by processing delayed
      stores/checks in the order of their locking tid. In the above example,
      S1 would give the lock to 2 when 1 is rebased, and 2 would vote successfully.
    • Julien Muchembled's avatar
    • Julien Muchembled's avatar
      storage: discard answers from aborted replications · ad43dcd3
      Julien Muchembled authored
      This fixes a bug that could to data corruption or crashes.