Commits · 4b34d91057ece5536341a3a470852258430b317d · nexedi / neoppod

15 Jun, 2017 2 commits

doc: update TODO about missing invalidations in read-only mode · 4b34d910
Julien Muchembled authored Jun 15, 2017

4b34d910

mysql: remove obsolete comment about broken PARTITIONing support · 3bd2fbc9

Julien Muchembled authored Jun 15, 2017

Using NEO 0.9.1 and partitioning enabled, I could reproduce the issue with
MariaDB 5.3.5, but not with MariaDB 5.3.12 and 5.5.23. I suppose it was fixed.

In testOudatedCellsOnDownStorage,
  'select count(*) from obj' returned a wrong value (always 1).

Strangely, 'select count(*) from test_neo0.obj' was always correct (102).

3bd2fbc9

14 Jun, 2017 1 commit

qa: make ClusterPdb compatible with the simple pdb of neo.tests · d75e7b89

Julien Muchembled authored Jun 14, 2017

When 'neo.tests.cluster' is loaded (usually when functional tests are run),
__builtin__.pdb is replaced by an extended pdb, which should behave the same
way if it is used like the former.

winpdb is so slow that a console pdb is often preferred.

d75e7b89

13 Jun, 2017 1 commit
- client: fix NameError when a secondary master reports that it's not the primary · 71421744
  Julien Muchembled authored Jun 13, 2017
  
  71421744
12 Jun, 2017 7 commits
- storage: new --disable-drop-partitions option · c995200a
  Julien Muchembled authored Jun 12, 2017
  
  c995200a
- qa: add testDropPartitions · 6ee35041
  Julien Muchembled authored Jun 12, 2017
  
  6ee35041
- Better use of __import__ · 32988e04
  Julien Muchembled authored May 23, 2017
  
  32988e04
- qa: update list of excluded tests in testSSL · 48e29a3f
  Julien Muchembled authored May 21, 2017
  
  48e29a3f
- master: improve algorithm to tweak the partition table · 2ca7c335
  Julien Muchembled authored May 15, 2017
```
The most important change is that it does not discard readable cells too
quickly anymore. A partition can now have multiple FEEDING cells, to avoid
going below the wanted level of replication.

The new algorithm is also better at minimizing the amount replication.
```
  2ca7c335
- storage: ignore unassigned partitions when looking for last oids/tids · ca58ccd7
  Julien Muchembled authored May 22, 2017
```
The MySQL implementation is written to work around the issue reported at
  https://jira.mariadb.org/browse/MDEV-12867
```
  ca58ccd7
- neolog: new option to hide the node column · 3600e59c
  Julien Muchembled authored May 22, 2017
  
  3600e59c
12 May, 2017 5 commits
- Remove packet timeouts · f6eb02b4
  Julien Muchembled authored May 04, 2017
```
Since it's not worth anymore to keep track of the last connection activity
(which, btw, ignored TCP ACKs, i.e. timeouts could theorically be triggered
before all the data were actually sent), the semantics of closeClient has also
changed. Before this commit, the 1-minute timeout was reset whenever there was
activity (connection still used as server). Now, it happens exactly 100 seconds
after the connection is not used anymore as client.
```
  f6eb02b4
- Use TCP keepalives instead of applicative pings · 9b70f88f
  Julien Muchembled authored May 03, 2017
  
  9b70f88f
- Remove unused 'on_timeout' feature on connections · d51920c0
  Julien Muchembled authored Mar 06, 2016
  
  d51920c0
- setup.py: use SHA256 instead of MD5 to check the integrity of mock.py · 735fb9d1
  Julien Muchembled authored May 12, 2017
  
  735fb9d1
- storage: prevent 2 nodes from working with the same database · 86b7ebbd
  Julien Muchembled authored May 12, 2017
  
  86b7ebbd
11 May, 2017 1 commit

client: remove useless debug log · 8d42a2e6

Julien Muchembled authored May 11, 2017

The next line (MTClientConnection) already logs new connections
and the storage node is necessarily in RUNNING state.

8d42a2e6

10 May, 2017 2 commits

Make the identification of the primary master easier with 'neoctl print node' · 09bc404f

Julien Muchembled authored May 10, 2017

Now, the primary master is the running master with None displayed in the
last column. Before, it could be the id timestamp of when it was secondary,
which was obsolete information.

09bc404f

client: fix AttributeError when connected to a master that happens to be secondary · 27c27f0b

Julien Muchembled authored May 10, 2017

This fixes up commit 23b6a66a,
which reimplements election.

  poll raised, retrying
  Traceback (most recent call last):
    ...
    File "neo/client/handlers/master.py", line 41, in notPrimaryMaster
      super(PrimaryNotificationsHandler, self).notPrimaryMaster(*args)
    File "neo/lib/handler.py", line 157, in notPrimaryMaster
      assert primary != self.app.server
    File "neo/client/app.py", line 109, in __getattr__
      return self.__getattribute__(attr)
  AttributeError: 'Application' object has no attribute 'server'

27c27f0b

04 May, 2017 1 commit
- qa: speed up SQlite tests by accessing DBs in unsafe mode (e.g. no sync) · 1a093d76
  Julien Muchembled authored May 04, 2017
  
  1a093d76
02 May, 2017 1 commit

master: fix identification of unknown masters · fbcf9c50

Julien Muchembled authored May 02, 2017

This fixes the following crash:

  Traceback (most recent call last):
    ...
    File "neo/master/handlers/identification.py", line 94, in requestIdentification
      uuid = app.getNewUUID(uuid, address, node_type)
    File "neo/master/app.py", line 449, in getNewUUID
      assert uuid != self.uuid
  AssertionError

fbcf9c50

28 Apr, 2017 3 commits

Better logging of connector errors · 29e8323c
Julien Muchembled authored Apr 28, 2017

29e8323c
qa: relax assertion in testDeadlockAvoidanceBeforeInvolvingAnotherNode · d33cedc0
Julien Muchembled authored Apr 27, 2017

d33cedc0

client: fix possible data corruption after conflict resolutions with replicas · 46c36465

Julien Muchembled authored Apr 27, 2017

This really fixes the bug described in
commit 40bac312,
which could probably be reverted. It only reduced the probability of failure.

What happened is that the second conflict on 'a' for t3 what first reported by
an answer to first store with:
- a base serial at which a=0
- a conflict serial at which a=7
However, the cached data is not 8 anymore but 12, since a second store already
occurred after the first conflict (reported by the other storage node).

When this conflict was resolved before receiving the conflict for second store,
it gave:

  resolve(old=0, saved=7, new=12) -> 19

instead of:

  resolve(old=4, saved=7, new=12) -> 15

(if we still had the data of the first store, we could also do
  resolve(old=0, saved=7, new=8)
 but that would be inefficient from a memory point of view)

The bug was difficult to reproduce. testNotifyReplicated had to be run many
many times before that race conditions trigger it. The test was changed to
enforce some of them, and the above scenario now happens almost always.

46c36465

27 Apr, 2017 7 commits

qa: new --readable-tid runner option · 3dabba3a
Julien Muchembled authored Apr 27, 2017

3dabba3a
qa: make Patch work on functions · 2b65e3ae
Julien Muchembled authored Apr 27, 2017

2b65e3ae
qa: new --continue-on-success and --stop-on-success runner options · 823e831f
Julien Muchembled authored Apr 27, 2017

823e831f
doc: add advice about the number of master nodes to set up · 9d4d5b40
Julien Muchembled authored Apr 25, 2017

9d4d5b40

Improvements to --dynamic-master-list · 8e7d4aa7

Julien Muchembled authored Apr 20, 2017

- atomic write to disk to avoid corruption
- update when the address changes (not only when a node is removed/added)

8e7d4aa7

Make NodeManager.remove stricter · 017f248d
Julien Muchembled authored Apr 25, 2017

017f248d

Check protocol version, on both connection sides, before parsing any packet · a60e36e8

Julien Muchembled authored Apr 24, 2017

This fixes 2 issues:
- Because neoctl connects to admin nodes without requesting identification,
  the protocol version was not checked, which could even be dangerous
  (think of a user asking for information, but the packet sent by neoctl
  could be decoded as a packet to alter data, like Truncate).
- In case of mismatched protocol version, the error was not loggued on the
  node that initiated the connection.

Compatibility is handled as follows:
- For an old node receiving data from a new node, the 2 high order bytes of the
  packet id, which is always 0 for the first packet, is decoded as the packet
  code. Packet 0 has never existed, which results in PacketMalformedError.
- For a new node receiving data from an old node, the id of the first packet,
  which is always 0, is decoded as the version, which results in a version
  mismatch error.

This new protocol also guarantees that there's no conflict with SSL.

For simplification, the packet length does not count the header anymore.

a60e36e8

25 Apr, 2017 4 commits
- Clean up neo.lib.protocol · da5d7a89
  Julien Muchembled authored Apr 25, 2017
```
When using network byte order ('!'), the size of struct items is independant of
the platform. They have never changed from one version of Python to another.
```
  da5d7a89
- Use ProtocolError instead of Notify for unexpected answers, and drop Notify · e2e9c2f5
  Julien Muchembled authored Apr 25, 2017
  
  e2e9c2f5
- Rename node states: DOWN -> UNKNOWN, TEMPORARILY_DOWN -> DOWN · b27db46f
  Julien Muchembled authored Apr 25, 2017
  
  b27db46f
- Remove UNKNOWN node state · f39babe5
  Julien Muchembled authored Apr 25, 2017
  
  f39babe5
24 Apr, 2017 5 commits

Reimplement election (of the primary master) · 23b6a66a

Julien Muchembled authored Apr 10, 2017

The election is not a separate process anymore.
It happens during the RECOVERING phase, and there's no use of timeouts anymore.

Each master node keeps a timestamp of when it started to play the primary role,
and the node with the smallest timestamp is elected. The election stops when
the cluster is started: as long as it is operational, the primary master can't
be deposed.

An election must happen whenever the cluster is not operational anymore, to
handle the case of a network cut between a primary master and all other nodes:
then another master node (secondary) takes over and when the initial primary
master is back, it loses against the new primary master if the cluster is
already started.

23b6a66a

Use existing generic way to ignore AcceptIdentification on closed connections · 0a3dba8b
Julien Muchembled authored Apr 24, 2017

0a3dba8b
Remove BROKEN node state · 9d7f9795
Julien Muchembled authored Apr 01, 2017

9d7f9795
Remove HIDDEN node state · b8210d58
Julien Muchembled authored Apr 01, 2017

b8210d58

On NM update, fix removal of nodes that aren't part of the cluster anymore · f051b7a0

Julien Muchembled authored Apr 12, 2017

In order to do that correctly, this commit contains several other changes:

When connecting to a primary master, a full node list always follows the
identification. For storage nodes, this means that they now know all nodes
during the RECOVERING phase.

The initial full node list now always contains a node tuple for:
- the server-side node (i.e. the primary master): on a master, this is
  done by always having a node describing itself in its node manager.
- the client-side node, to make sure it gets a id timestamp:
  now an admin node also receives a node for itself.

f051b7a0