Commits · 9d4d5b40f07428c395c11eabeb2566338da43e04 · nexedi / neoppod

27 Apr, 2017 4 commits

doc: add advice about the number of master nodes to set up · 9d4d5b40
Julien Muchembled authored Apr 25, 2017

9d4d5b40

Improvements to --dynamic-master-list · 8e7d4aa7

Julien Muchembled authored Apr 20, 2017

- atomic write to disk to avoid corruption
- update when the address changes (not only when a node is removed/added)

8e7d4aa7

Make NodeManager.remove stricter · 017f248d
Julien Muchembled authored Apr 25, 2017

017f248d

Check protocol version, on both connection sides, before parsing any packet · a60e36e8

Julien Muchembled authored Apr 24, 2017

This fixes 2 issues:
- Because neoctl connects to admin nodes without requesting identification,
  the protocol version was not checked, which could even be dangerous
  (think of a user asking for information, but the packet sent by neoctl
  could be decoded as a packet to alter data, like Truncate).
- In case of mismatched protocol version, the error was not loggued on the
  node that initiated the connection.

Compatibility is handled as follows:
- For an old node receiving data from a new node, the 2 high order bytes of the
  packet id, which is always 0 for the first packet, is decoded as the packet
  code. Packet 0 has never existed, which results in PacketMalformedError.
- For a new node receiving data from an old node, the id of the first packet,
  which is always 0, is decoded as the version, which results in a version
  mismatch error.

This new protocol also guarantees that there's no conflict with SSL.

For simplification, the packet length does not count the header anymore.

a60e36e8

25 Apr, 2017 4 commits
- Clean up neo.lib.protocol · da5d7a89
  Julien Muchembled authored Apr 25, 2017
```
When using network byte order ('!'), the size of struct items is independant of
the platform. They have never changed from one version of Python to another.
```
  da5d7a89
- Use ProtocolError instead of Notify for unexpected answers, and drop Notify · e2e9c2f5
  Julien Muchembled authored Apr 25, 2017
  
  e2e9c2f5
- Rename node states: DOWN -> UNKNOWN, TEMPORARILY_DOWN -> DOWN · b27db46f
  Julien Muchembled authored Apr 25, 2017
  
  b27db46f
- Remove UNKNOWN node state · f39babe5
  Julien Muchembled authored Apr 25, 2017
  
  f39babe5
24 Apr, 2017 6 commits

Reimplement election (of the primary master) · 23b6a66a

Julien Muchembled authored Apr 10, 2017

The election is not a separate process anymore.
It happens during the RECOVERING phase, and there's no use of timeouts anymore.

Each master node keeps a timestamp of when it started to play the primary role,
and the node with the smallest timestamp is elected. The election stops when
the cluster is started: as long as it is operational, the primary master can't
be deposed.

An election must happen whenever the cluster is not operational anymore, to
handle the case of a network cut between a primary master and all other nodes:
then another master node (secondary) takes over and when the initial primary
master is back, it loses against the new primary master if the cluster is
already started.

23b6a66a

Use existing generic way to ignore AcceptIdentification on closed connections · 0a3dba8b
Julien Muchembled authored Apr 24, 2017

0a3dba8b
Remove BROKEN node state · 9d7f9795
Julien Muchembled authored Apr 01, 2017

9d7f9795
Remove HIDDEN node state · b8210d58
Julien Muchembled authored Apr 01, 2017

b8210d58

On NM update, fix removal of nodes that aren't part of the cluster anymore · f051b7a0

Julien Muchembled authored Apr 12, 2017

In order to do that correctly, this commit contains several other changes:

When connecting to a primary master, a full node list always follows the
identification. For storage nodes, this means that they now know all nodes
during the RECOVERING phase.

The initial full node list now always contains a node tuple for:
- the server-side node (i.e. the primary master): on a master, this is
  done by always having a node describing itself in its node manager.
- the client-side node, to make sure it gets a id timestamp:
  now an admin node also receives a node for itself.

f051b7a0

When processing an answer, also update timeout and handler switcher on exception · 9e54a8e0
Julien Muchembled authored Apr 21, 2017
```
This keeps the connection fully functional when a handler raises an exception.
```
9e54a8e0

19 Apr, 2017 2 commits

Silently ignore answers to packets that aren't ignored on closed connection · 4a82657b

Julien Muchembled authored Apr 19, 2017

Commits like 7eb7cf1b
("Minimize the amount of work during tpc_finish") dropped what was done in
commit 07b48079
("Ignore some requests, based on connection state") to protect request handlers
when they respond.

This commit fixes this in a generic way.

4a82657b

Do not process any packet for aborted connections · 8b1189d3
Julien Muchembled authored Apr 19, 2017

8b1189d3

18 Apr, 2017 4 commits

Fix sorting of delayed events · 40bac312

Julien Muchembled authored Apr 18, 2017

The initial intention was to rely on stable sorting when several events have
the same key. For this to happen, sorting must not continue the comparison with
the second item of events.

This could lead to data corruption (conflict resolution with wrong base):

  FAIL: testNotifyReplicated (neo.tests.threaded.test.Test)
  ----------------------------------------------------------------------
  Traceback (most recent call last):
    File "neo/tests/threaded/__init__.py", line 1093, in wrapper
      return wrapped(self, cluster, *args, **kw)
    File "neo/tests/threaded/test.py", line 2019, in testNotifyReplicated
      self.assertEqual([15, 11, 13, 16], [r[x].value for x in 'abcd'])
    File "neo/tests/__init__.py", line 187, in assertEqual
      return super(NeoTestBase, self).assertEqual(first, second, msg=msg)
  failureException: Lists differ: [15, 11, 13, 16] != [19, 11, 13, 16]

  First differing element 0:
  15
  19

  - [15, 11, 13, 16]
  ?   ^

  + [19, 11, 13, 16]
  ?   ^

40bac312

Do never parse any packet from aborted connection · 6841e2f2
Julien Muchembled authored Apr 18, 2017

6841e2f2
fixup! Add file descriptor and aborted flag to __repr__ of connections · 557b5bd5
Julien Muchembled authored Apr 18, 2017
```
'aborted' could appear twice.
```
557b5bd5
fixup! qa: add a basic assertion in Patch to detect when patched code changes · e95847fa
Julien Muchembled authored Apr 18, 2017

e95847fa

13 Apr, 2017 1 commit
- qa: fix occasional deadlock when starting subprocesses in functional tests · c0adf81c
  Julien Muchembled authored Apr 12, 2017
  
  c0adf81c
04 Apr, 2017 1 commit

client: Add support for zodburi · 01a01c8c

Kirill Smelkov authored Apr 04, 2017

zodburi[1] provides a way to open ZODB storages by URL/URI. It already
has support for file:// zeo:// zconfig:// memory:// etc schemes out of
the box and third-party-to-ZODB storages can add support for their
schemes via providing zodburi.resolvers entrypoint.

For example relstorage and newtdb do this.

Let's also teach NEO to open itself via neo:// URI schema.

[1] http://docs.pylonsproject.org/projects/zodburi
[2] https://github.com/zodb/relstorage/blob/2.1a1-15-g68c8cf1/relstorage/zodburi_resolver.py
[3] https://github.com/newtdb/db/blob/0.5.2-1-gbd36e90/src/newt/db/zodburi.py

01a01c8c

31 Mar, 2017 14 commits

bug: on exist/crash, storage space for non-voted data may be leaked · 3bf2a0c6

Julien Muchembled authored Mar 31, 2017

Commit 58d0b602 didn't fix the issue completely.
Storage space can be freed with --repair option.

This adds an expectedFailure test.

3bf2a0c6

storage: fix commit activity when cells are discarded or when they become readable · 34d797e2

Julien Muchembled authored Mar 29, 2017

This is a follow up of commit 64afd7d2,
which focused on read accesses when there is no transaction activity.

This commit also includes a test to check a simpler scenario that the one
described in the previous commit.

34d797e2

client: speed up cell sorting on read-access · 6a75a654
Julien Muchembled authored Mar 10, 2017

6a75a654
Bump protocol version · 0e57eb05
Julien Muchembled authored Mar 31, 2017
```
Commit ad43dcd3 should have bumped it as well.
```
0e57eb05
qa: new ConnectionFilter.retry() · aefa65a2
Julien Muchembled authored Mar 29, 2017
```
Unused but it is likely to be useful in the future.
```
aefa65a2

Fix race when tweak touches partitions that are being reported as replicated · 87c5178b

Julien Muchembled authored Mar 15, 2017

The bug could lead to data corruption (if a partition is wrongly marked as
UP_TO_DATE) or crashes (assertion failure on either the storage or the master).

The protocol is extended to handle the following scenario:

    S                                    M
    partition 0 outdated
      <-- UnfinishedTransactions ------>
    replication of partition 0 ...
    partition 1 outdated
      --- UnfinishedTransactions ...
    ... replication finished
      --- ReplicationDone ...
                                         tweak
      <-- partition 1 discarded --------
                                         tweak
      <-- partition 1 outdated ---------
          ... UnfinishedTransactions -->
          ... ReplicationDone --------->

The master can't simply mark all outdated cells as being updatable when it
receives an UnfinishedTransactions packet.

87c5178b

qa: add a basic assertion in Patch to detect when patched code changes · cb78e6b2
Julien Muchembled authored Mar 24, 2017

cb78e6b2

Forbid read-accesses to cells that are actually non-readable · 64afd7d2

Julien Muchembled authored Mar 08, 2017

After an attempt to read from a non-readable, which happens when a client has
a newer or older PT than storage's, the client now retries to read.

This bugfix is for all kinds of read-access except undoLog, which can still
report incomplete results.

64afd7d2

Fix potential EMFILE when retrying to connect indefinitely · 43fdd059
Julien Muchembled authored Mar 17, 2017

43fdd059
The partition table must forget dropped nodes · 6f86c773
Julien Muchembled authored Mar 15, 2017

6f86c773

master: make sure that storage nodes have an up-to-date PT/NM when they're added · 7ffc96fd

Julien Muchembled authored Mar 15, 2017

This revert commit bddc1802,
to fix the following storage crash:

  Traceback (most recent call last):
    ...
    File "neo/lib/handler.py", line 72, in dispatch
      method(conn, *args, **kw)
    File "neo/storage/handlers/master.py", line 44, in notifyPartitionChanges
      app.pt.update(ptid, cell_list, app.nm)
    File "neo/lib/pt.py", line 231, in update
      assert node is not None, 'No node found for uuid ' + uuid_str(uuid)
  AssertionError: No node found for uuid S3

Partitition table updates must also be processed with InitializationHandler
when nodes remain in PENDING state because they're not added to the cluster.

7ffc96fd

In STOPPING cluster state, really wait for all transaction to be finished · 9e433594
Julien Muchembled authored Mar 15, 2017

9e433594
master: fix random crashes on shutdown when using several master nodes · 35737c9b
Julien Muchembled authored Mar 20, 2017

35737c9b
Mention RocksDB as a possible MySQL engine in neo.conf · 35468667
Julien Muchembled authored Mar 29, 2017

35468667

30 Mar, 2017 1 commit
- README: update location of automated test results · ca980d33
  Julien Muchembled authored Mar 30, 2017
  
  ca980d33
23 Mar, 2017 3 commits

storage: in deadlock avoidance, fix performance issue that could freeze the cluster · 1280f73e

Julien Muchembled authored Mar 14, 2017

In the worst case, with many clients trying to lock the same oids,
the cluster could enter in an infinite cascade of deadlocks.

Here is an overview with 3 storage nodes and 3 transactions:

S1 S2 S3 order of locking tids # abbreviations:
l1 l1 l2 123 # l: lock
q23 q23 d1q3 231 # d: deadlock triggered
r1:l3 r1:l2 (r1) # for S3, we still have l2 # q: queued
d2q1 q13 q13 312 # r: rebase

Above, we show what happens when a random transaction gets a lock just after
that another is rebased. Here, the result is that the last 2 lines are a
permutation of the first 2, and this can repeat indefinitely with bad luck.

This commit reduces the probability of deadlock by processing delayed
stores/checks in the order of their locking tid. In the above example,
S1 would give the lock to 2 when 1 is rebased, and 2 would vote successfully.

1280f73e

qa: document and fortify testCascadedDeadlockAvoidanceOnCheckCurrent · 1b9f8f72
Julien Muchembled authored Mar 14, 2017

1b9f8f72
storage: discard answers from aborted replications · ad43dcd3
Julien Muchembled authored Mar 06, 2017
```
This fixes a bug that could to data corruption or crashes.
```
ad43dcd3