- 07 Oct, 2015 1 commit
-
-
Julien Muchembled authored
The goal is to get rid off the neoctl command-line tool, and to manage the cluster via a web browser, or tools like 'wget'. Then, it will be possible to provide an web user interface to connect to the underlying DB of any storage node, usually a SQL client. The design of admin app is finished: - it's threaded like clients - it's a WSGI app I also hacked a HTTP API as quickly as possible to make all tests pass. TODO: - define a better HTTP API - there's no UI at all yet - remove all unused packets from the protocol (those that were only used between neoctl and admin node) There's currently no UI implemented. There are a few dead files, not deleted yet, in case that they contain a few pieces of useful code: neo/neoctl/app.py neo/neoctl/handler.py neo/scripts/neoctl.py
-
- 05 Oct, 2015 3 commits
-
-
Julien Muchembled authored
-
Julien Muchembled authored
-
Julien Muchembled authored
-
- 02 Oct, 2015 2 commits
-
-
Julien Muchembled authored
-
Julien Muchembled authored
Before, it was only done for 'logfile'.
-
- 01 Oct, 2015 1 commit
-
-
Julien Muchembled authored
- Review error handling. Only 2 exceptions remain in connector.py: - Drop useless exception handling for EAGAIN since it should not happen if the kernel says the socket is ready. - Do not distinguish other socket errors. Just close and log in a generic way. - No need to raise a specific exception for EOF. - Make 'connect' return a boolean instead of raising an exception. - Raise appropriate exception when answer/ask/notify is called on a closed non-MT connection. - Add support for more complex connectors, which may need to write for a read operation, or to read when there's pending data to send. This will be required for SSL support (more exactly, the handshake will be done in a transparent way): - Move write buffer to connector. - Make 'receive' fill the read buffer, instead of returning the read data. - Make 'receive' & 'send' return a boolean to switch polling for writing. - Tolerate that sockets return 0 as number of bytes sent. - In testConnection, simply delete all failing tests, as announced in commit 71e30fb9.
-
- 30 Sep, 2015 1 commit
-
-
Julien Muchembled authored
-
- 24 Sep, 2015 4 commits
-
-
Julien Muchembled authored
-
Julien Muchembled authored
-
Julien Muchembled authored
Application will hold SSL parameters.
-
Julien Muchembled authored
-
- 23 Sep, 2015 3 commits
-
-
Julien Muchembled authored
This frees a reference to the last handler and there's no need to make the instance reusable.
-
Julien Muchembled authored
This follows the behaviour of FileStorage.
-
Julien Muchembled authored
There remain only one leak in ClientApplicationTests.test_connectToPrimaryNode because of Mock objects.
-
- 15 Sep, 2015 8 commits
-
-
Julien Muchembled authored
-
Julien Muchembled authored
admin: do not reset the list of known masters from configuration (or command line) when reconnecting This is questionable but a lot of NodeManager must be reviewed if we want to do differently. At least, admin nodes now behave like clients.
-
Julien Muchembled authored
-
Julien Muchembled authored
It's been a long time that the polling thread never ends and don't need to be restarted. On the other side, there will be a need for the admin to define a different polling loop, hence the move from threaded_poll to threaded_app.
-
Julien Muchembled authored
-
Julien Muchembled authored
It was still used to stop a cluster.
-
Julien Muchembled authored
This makes tests easier to write, with more determinism. If only I had the idea to monkey-patch SimpleQueue several years ago.
-
Julien Muchembled authored
The previous implementation was built around a 'pending' global variable that was set by a few monkey-patches when some network activity was pending between nodes. All this is replaced by an extra epoll object is used to wait for nodes that have pending network events: this is simpler, and faster since it significantly reduces the number of context switches.
-
- 14 Sep, 2015 1 commit
-
-
Julien Muchembled authored
-
- 07 Sep, 2015 1 commit
-
-
Julien Muchembled authored
This is a regression caused by commit eef52c27 ("Tickless poll loop, for lowest latency and cpu usage"), affecting: - admins - storages - primary masters of backup clusters
-
- 28 Aug, 2015 6 commits
-
-
Julien Muchembled authored
Recent Python already catches exceptions due to garbage collection on exit.
-
Julien Muchembled authored
This fixes a random failure in testClientReconnection: Traceback (most recent call last): File "neo/tests/threaded/test.py", line 754, in testClientReconnection self.assertTrue(cluster.client.history(x1._p_oid)) failureException: None is not true
-
Julien Muchembled authored
Traceback (most recent call last): File "neo/tests/threaded/test.py", line 838, in testRecycledClientUUID x = client.load(ZERO_TID) [...] File "neo/tests/threaded/test.py", line 822, in notReady m2s.remove(delayNotifyInformation) File "neo/tests/threaded/__init__.py", line 482, in remove del self.filter_dict[filter] KeyError: <function delayNotifyInformation at 0x7f511063a578>
-
Julien Muchembled authored
NEOCluster.tic() gets a new 'slave' parameter that must be True when a client node is in 'master' mode (i.e. setPoll(True)). In this case, tic() will wait that all nodes finish their work and the client polls with a non-zero timeout. Here, tic(slave=1) is used to wait for the storage to process NotifyUnlockInformation notification from the master. Traceback (most recent call last): File "neo/tests/threaded/test.py", line 80, in testBasicStore self.assertEqual(data_info, cluster.storage.getDataLockInfo()) File "neo/tests/__init__.py", line 170, in assertEqual return super(NeoTestBase, self).assertEqual(first, second, msg=msg) failureException: {('\x0b\xee\xc7\xb5\xea?\x0f\xdb\xc9]\r\xd4\x7f<[\xc2u\xda\x8a3', 0): 0} != {('\x0b\xee\xc7\xb5\xea?\x0f\xdb\xc9]\r\xd4\x7f<[\xc2u\xda\x8a3', 0): 1}
-
Julien Muchembled authored
All these changes were useful to debug deadlocks in threaded tests: - New verbose Semaphore. - Logs with numerical 'ident' were too annoying to read so revert to thread name (before commit 5b69d553), with an exception for threaded tests. There remains one case where the result is not unique: when several client apps are instantiated. - Make deadlock detection optional. - Make it possible to name locks. - Make output more compact. - Remove useless 'debug_lock' option. - Add timing information. - Make exception more verbose when an un-acquired lock is released. Here is how I used 'locking': --- a/neo/tests/threaded/__init__.py +++ b/neo/tests/threaded/__init__.py @@ -37,0 +38 @@ +from neo.lib.locking import VerboseSemaphore @@ -71 +72,2 @@ def init(cls): - cls._global_lock = threading.Semaphore(0) + cls._global_lock = VerboseSemaphore(0, check_owner=False, + name="Serialized._global_lock") @@ -265 +267,2 @@ def start(self): - self.em._lock = l = threading.Semaphore(0) + self.em._lock = l = VerboseSemaphore(0, check_owner=False, + name=self.node_name) @@ -346 +349,2 @@ def __init__(self, master_nodes, name, **kw): - self.em._lock = threading.Semaphore(0) + self.em._lock = VerboseSemaphore(0, check_owner=False, + name=repr(self))
-
Julien Muchembled authored
deadlocks mainly happened while stopping a cluster, hence the complete review of NEOCluster.stop() A major change is to make the client node handle its lock like other nodes (i.e. in the polling thread itself) to better know when to call Serialized.background() (there was a race condition with the test of 'self.poll_thread.isAlive()' in ClientApplication.close).
-
- 14 Aug, 2015 2 commits
-
-
Julien Muchembled authored
-
Julien Muchembled authored
For example, a backup storage node that was rejected because the upstream cluster was not ready could reconnect in loop without delay, using 100% CPU and flooding logs. A new 'setReconnectionNoDelay' method on Connection can be used for cases where it's legitimate to quickly reconnect. With this new delayed reconnection, it's possible to remove the remaining time.sleep().
-
- 12 Aug, 2015 7 commits
-
-
Julien Muchembled authored
Such kind of test has never helped to detect regressions and any bug in EpollEventManager would be quickly reported by other tests. testConnection may go the same way if it keeps annoying me too much.
-
Julien Muchembled authored
This is currently not an issue because the 'time.sleep(1)' in iterateForObject (storage) and _connectToPrimaryNode (master) leave enough time. What could happen is a new connection attempt for a node that already has a connection (causing a failure assertion in Node.setConnection).
-
Julien Muchembled authored
This could happen if a file descriptor was reallocated by the kernel.
-
Julien Muchembled authored
-
Julien Muchembled authored
-
Julien Muchembled authored
With this patch, the epolling object is not awoken every second to check if a timeout has expired. The API of Connection is changed to get the smallest timeout.
-
Julien Muchembled authored
-