Commits · 2b321c023fb4c5b83aad497c53f86238f59f3ab6 · Vincent Pelletier / neoppod

07 Oct, 2015 1 commit

WIP: Make admin node a web-app · 2b321c02

Julien Muchembled authored Aug 31, 2015

The goal is to get rid off the neoctl command-line tool, and to manage the
cluster via a web browser, or tools like 'wget'. Then, it will be possible to
provide an web user interface to connect to the underlying DB of any storage
node, usually a SQL client.

The design of admin app is finished:
- it's threaded like clients
- it's a WSGI app

I also hacked a HTTP API as quickly as possible to make all tests pass.

TODO:
- define a better HTTP API
- there's no UI at all yet
- remove all unused packets from the protocol (those that were only used
  between neoctl and admin node)

There's currently no UI implemented.

There are a few dead files, not deleted yet, in case that they contain a few
pieces of useful code:
 neo/neoctl/app.py
 neo/neoctl/handler.py
 neo/scripts/neoctl.py

2b321c02

05 Oct, 2015 3 commits
- Release version 1.5 · 9bd14bf2
  Julien Muchembled authored Oct 05, 2015
  
  9bd14bf2
- Add SSL support · bff5c82f
  Julien Muchembled authored Oct 02, 2015
  
  bff5c82f
- neoctl: make -l option log everything on disk automatically · 107ca7df
  Julien Muchembled authored Oct 02, 2015
  
  107ca7df
02 Oct, 2015 2 commits
- In importer.conf example, explain why the source DB can't be open read-only · 9d26bb51
  Julien Muchembled authored Oct 02, 2015
  
  9d26bb51
- Expand ~(user) construction for all paths in configuration · f2babb12
  Julien Muchembled authored Oct 02, 2015
```
Before, it was only done for 'logfile'.
```
  f2babb12
01 Oct, 2015 1 commit

Review API betweeen connections and connectors · 57481c35

Julien Muchembled authored Sep 25, 2015

- Review error handling. Only 2 exceptions remain in connector.py:

  - Drop useless exception handling for EAGAIN since it should not happen
    if the kernel says the socket is ready.
  - Do not distinguish other socket errors. Just close and log in a generic way.
  - No need to raise a specific exception for EOF.
  - Make 'connect' return a boolean instead of raising an exception.
  - Raise appropriate exception when answer/ask/notify is called on a closed
    non-MT connection.

- Add support for more complex connectors, which may need to write for a read
  operation, or to read when there's pending data to send. This will be
  required for SSL support (more exactly, the handshake will be done in
  a transparent way):

  - Move write buffer to connector.
  - Make 'receive' fill the read buffer, instead of returning the read data.
  - Make 'receive' & 'send' return a boolean to switch polling for writing.
  - Tolerate that sockets return 0 as number of bytes sent.

- In testConnection, simply delete all failing tests, as announced
  in commit 71e30fb9.

57481c35

30 Sep, 2015 1 commit
- tests: add "last" symlink to last temporary directory · 36a32f23
  Julien Muchembled authored Sep 30, 2015
  
  36a32f23
24 Sep, 2015 4 commits
- Allow to specify log file in configuration file, and expand ~(user) construction · 518c7588
  Julien Muchembled authored Sep 24, 2015
  
  518c7588
- Move common command-line options to neo.lib.config · 32b2d173
  Julien Muchembled authored Sep 24, 2015
  
  32b2d173
- Pass app as first parameter of (*Client|Listening)Connection · 727899e2
  Julien Muchembled authored Sep 24, 2015
```
Application will hold SSL parameters.
```
  727899e2
- Fix remaining memory leaks and make handler instances become singletons · 7d5b1559
  Julien Muchembled authored Sep 23, 2015
  
  7d5b1559
23 Sep, 2015 3 commits
- Simplify cleanup of HandlerSwitcher when closing a connection · 9fdd750f
  Julien Muchembled authored Sep 23, 2015
```
This frees a reference to the last handler and there's no need to make the
instance reusable.
```
  9fdd750f
- client: do nothing (instead of raising) if a closed Storage is closed again · aaf2251e
  Julien Muchembled authored Sep 22, 2015
```
This follows the behaviour of FileStorage.
```
  aaf2251e
- Fix leak of file descriptors in unit tests · d75fcc59
  Julien Muchembled authored Sep 21, 2015
```
There remain only one leak in ClientApplicationTests.test_connectToPrimaryNode
because of Mock objects.
```
  d75fcc59
15 Sep, 2015 8 commits

TODO: document which mock library we should use · c88c6ac5
Julien Muchembled authored Sep 15, 2015

c88c6ac5

admin: do not reset the list of known masters from configuration (or command... · a72ddfb3

Julien Muchembled authored Aug 31, 2015

admin: do not reset the list of known masters from configuration (or command line) when reconnecting

This is questionable but a lot of NodeManager must be reviewed if we want to do
differently. At least, admin nodes now behave like clients.

a72ddfb3

Simplify setup of monkey-patches in threaded tests · 6f6d071d
Julien Muchembled authored Aug 31, 2015

6f6d071d

Simplify polling thread in threaded apps · 3e1ed6a4

Julien Muchembled authored Aug 25, 2015

It's been a long time that the polling thread never ends and don't need to be
restarted. On the other side, there will be a need for the admin to define a
different polling loop, hence the move from threaded_poll to threaded_app.

3e1ed6a4

Move code from neo.client to neo.lib, since admins will be also multi-threaded · f5f42522
Julien Muchembled authored Aug 21, 2015

f5f42522
Drop 'background' mode completely in threaded tests · 50d25d00
Julien Muchembled authored Sep 09, 2015
```
It was still used to stop a cluster.
```
50d25d00

Stop using 'background' mode in threaded tests · 4253d24f

Julien Muchembled authored Sep 09, 2015

This makes tests easier to write, with more determinism.
If only I had the idea to monkey-patch SimpleQueue several years ago.

4253d24f

Rewrite of scheduler for threaded tests · 7025db52

Julien Muchembled authored Sep 03, 2015

The previous implementation was built around a 'pending' global variable that
was set by a few monkey-patches when some network activity was pending between
nodes. All this is replaced by an extra epoll object is used to wait for nodes
that have pending network events: this is simpler, and faster since it
significantly reduces the number of context switches.

7025db52

14 Sep, 2015 1 commit
- Thread.isAlive is deprecated · 61009341
  Julien Muchembled authored Sep 14, 2015
  
  61009341
07 Sep, 2015 1 commit

Fix potential deadlock when connecting to primary master · af06676a

Julien Muchembled authored Sep 07, 2015

This is a regression caused by commit eef52c27
("Tickless poll loop, for lowest latency and cpu usage"), affecting:
- admins
- storages
- primary masters of backup clusters

af06676a

28 Aug, 2015 6 commits

client: drop now useless wrapper to log safely in poll thread during shutdown · 9531c9cb
Julien Muchembled authored Aug 28, 2015
```
Recent Python already catches exceptions due to garbage collection on exit.
```
9531c9cb

storage: fix history() not waiting oid to be unlocked · e27358d1

Julien Muchembled authored Aug 28, 2015

This fixes a random failure in testClientReconnection:

Traceback (most recent call last):
  File "neo/tests/threaded/test.py", line 754, in testClientReconnection
    self.assertTrue(cluster.client.history(x1._p_oid))
failureException: None is not true

e27358d1

Fix random failure in testRecycledClientUUID · 79be7787

Julien Muchembled authored Aug 28, 2015

Traceback (most recent call last):
  File "neo/tests/threaded/test.py", line 838, in testRecycledClientUUID
    x = client.load(ZERO_TID)
  [...]
  File "neo/tests/threaded/test.py", line 822, in notReady
    m2s.remove(delayNotifyInformation)
  File "neo/tests/threaded/__init__.py", line 482, in remove
    del self.filter_dict[filter]
KeyError: <function delayNotifyInformation at 0x7f511063a578>

79be7787

Fix several random failures in tests that didn't wait for transaction to be unlocked · c4ac45a8

Julien Muchembled authored Aug 28, 2015

NEOCluster.tic() gets a new 'slave' parameter that must be True when a client
node is in 'master' mode (i.e. setPoll(True)). In this case, tic() will wait
that all nodes finish their work and the client polls with a non-zero timeout.

Here, tic(slave=1) is used to wait for the storage to process
NotifyUnlockInformation notification from the master.

Traceback (most recent call last):
File "neo/tests/threaded/test.py", line 80, in testBasicStore
self.assertEqual(data_info, cluster.storage.getDataLockInfo())
File "neo/tests/__init__.py", line 170, in assertEqual
return super(NeoTestBase, self).assertEqual(first, second, msg=msg)
failureException: {('\x0b\xee\xc7\xb5\xea?\x0f\xdb\xc9]\r\xd4\x7f<[\xc2u\xda\x8a3', 0): 0} != {('\x0b\xee\xc7\xb5\xea?\x0f\xdb\xc9]\r\xd4\x7f<[\xc2u\xda\x8a3', 0): 1}

c4ac45a8

Several improvements to verbose locks · 5dc1f06c

Julien Muchembled authored Aug 28, 2015

All these changes were useful to debug deadlocks in threaded tests:
- New verbose Semaphore.
- Logs with numerical 'ident' were too annoying to read so revert to thread
  name (before commit 5b69d553), with an
  exception for threaded tests. There remains one case where the result is not
  unique: when several client apps are instantiated.
- Make deadlock detection optional.
- Make it possible to name locks.
- Make output more compact.
- Remove useless 'debug_lock' option.
- Add timing information.
- Make exception more verbose when an un-acquired lock is released.

Here is how I used 'locking':

--- a/neo/tests/threaded/__init__.py
+++ b/neo/tests/threaded/__init__.py
@@ -37,0 +38 @@
+from neo.lib.locking import VerboseSemaphore
@@ -71 +72,2 @@ def init(cls):
-        cls._global_lock = threading.Semaphore(0)
+        cls._global_lock = VerboseSemaphore(0, check_owner=False,
+                                            name="Serialized._global_lock")
@@ -265 +267,2 @@ def start(self):
-        self.em._lock = l = threading.Semaphore(0)
+        self.em._lock = l = VerboseSemaphore(0, check_owner=False,
+                                             name=self.node_name)
@@ -346 +349,2 @@ def __init__(self, master_nodes, name, **kw):
-        self.em._lock = threading.Semaphore(0)
+        self.em._lock = VerboseSemaphore(0, check_owner=False,
+                                         name=repr(self))

5dc1f06c

Fix occasional deadlocks in threaded tests · 0b93b1fb

Julien Muchembled authored Aug 28, 2015

deadlocks mainly happened while stopping a cluster, hence the complete review
of NEOCluster.stop()

A major change is to make the client node handle its lock like other nodes
(i.e. in the polling thread itself) to better know when to call
Serialized.background() (there was a race condition with the test of
'self.poll_thread.isAlive()' in ClientApplication.close).

0b93b1fb

14 Aug, 2015 2 commits

Remove useless assert in a private method of MTClientConnection · 1ab594b4
Julien Muchembled authored Aug 12, 2015

1ab594b4

Do not reconnect too quickly to a node after an error · d898a83d

Julien Muchembled authored Aug 09, 2015

For example, a backup storage node that was rejected because the upstream
cluster was not ready could reconnect in loop without delay, using 100% CPU
and flooding logs.

A new 'setReconnectionNoDelay' method on Connection can be used for cases where
it's legitimate to quickly reconnect.

With this new delayed reconnection, it's possible to remove the remaining
time.sleep().

d898a83d

12 Aug, 2015 7 commits
- Remove useless testEvent · 71e30fb9
  Julien Muchembled authored Aug 12, 2015
```
Such kind of test has never helped to detect regressions and any bug in
EpollEventManager would be quickly reported by other tests.

testConnection may go the same way if it keeps annoying me too much.
```
  71e30fb9
- client: do not wait for the remote to close the connection if it's not ready · f9df31be
  Julien Muchembled authored Aug 10, 2015
```
This is currently not an issue because the 'time.sleep(1)' in iterateForObject
(storage) and _connectToPrimaryNode (master) leave enough time. What could
happen is a new connection attempt for a node that already has a connection
(causing a failure assertion in Node.setConnection).
```
  f9df31be
- Fix invalid processing of unregistered connections · a4731a0c
  Julien Muchembled authored Aug 09, 2015
```
This could happen if a file descriptor was reallocated by the kernel.
```
  a4731a0c
- Simplify API to establish connections and accept mix of IPv4/IPv6 · ed50edca
  Julien Muchembled authored Aug 08, 2015
  
  ed50edca
- Rename parameter of polling methods now that _poll computes the timeout itself · c2c97752
  Julien Muchembled authored Aug 12, 2015
  
  c2c97752
- Tickless poll loop, for lowest latency and cpu usage · eef52c27
  Julien Muchembled authored Aug 02, 2015
```
With this patch, the epolling object is not awoken every second to check
if a timeout has expired. The API of Connection is changed to get the smallest
timeout.
```
  eef52c27
- tests: make Patch usable as a context manager · fd0b9c98
  Julien Muchembled authored Aug 05, 2015
  
  fd0b9c98