neoppod:f5f42522a9b4ceb54bd80d34e345df2b9993001a commitshttps://lab.nexedi.com/nexedi/neoppod/-/commits/f5f42522a9b4ceb54bd80d34e345df2b9993001a2015-09-15T16:48:55+02:00https://lab.nexedi.com/nexedi/neoppod/-/commit/f5f42522a9b4ceb54bd80d34e345df2b9993001aMove code from neo.client to neo.lib, since admins will be also multi-threaded2015-09-15T16:48:55+02:00Julien Muchembledjm@nexedi.comhttps://lab.nexedi.com/nexedi/neoppod/-/commit/50d25d007dc49edc9d2c11d71e9c2f9d4c7eda9aDrop 'background' mode completely in threaded tests2015-09-15T16:41:09+02:00Julien Muchembledjm@nexedi.com
It was still used to stop a cluster.https://lab.nexedi.com/nexedi/neoppod/-/commit/4253d24fe0c01b5ae66c87b6dcf709cbe1506573Stop using 'background' mode in threaded tests2015-09-15T16:37:44+02:00Julien Muchembledjm@nexedi.com
This makes tests easier to write, with more determinism.
If only I had the idea to monkey-patch SimpleQueue several years ago.https://lab.nexedi.com/nexedi/neoppod/-/commit/7025db52513639f881e5996c8a87850cdc4c3fa5Rewrite of scheduler for threaded tests2015-09-15T15:53:38+02:00Julien Muchembledjm@nexedi.com
The previous implementation was built around a 'pending' global variable that
was set by a few monkey-patches when some network activity was pending between
nodes. All this is replaced by an extra epoll object is used to wait for nodes
that have pending network events: this is simpler, and faster since it
significantly reduces the number of context switches.https://lab.nexedi.com/nexedi/neoppod/-/commit/610093411e2c24d68c40f5b2696660000754f407Thread.isAlive is deprecated2015-09-14T18:04:47+02:00Julien Muchembledjm@nexedi.comhttps://lab.nexedi.com/nexedi/neoppod/-/commit/af06676a2404e688f7e4f30f3b593d083aab6057Fix potential deadlock when connecting to primary master2015-09-07T11:30:05+02:00Julien Muchembledjm@nexedi.com
This is a regression caused by commit <a href="/kirr/neo/-/commit/eef52c27bc9955f8e68f0442089afb8fc03987f7" data-original="eef52c27bc9955f8e68f0442089afb8fc03987f7" data-link="false" data-link-reference="false" data-project="73" data-commit="eef52c27bc9955f8e68f0442089afb8fc03987f7" data-reference-type="commit" data-container="body" data-placement="top" data-html="true" title="Tickless poll loop, for lowest latency and cpu usage" class="gfm gfm-commit has-tooltip">eef52c27</a>
("Tickless poll loop, for lowest latency and cpu usage"), affecting:
- admins
- storages
- primary masters of backup clustershttps://lab.nexedi.com/nexedi/neoppod/-/commit/9531c9cb70e5f86747fd980822995bb3a2a526ddclient: drop now useless wrapper to log safely in poll thread during shutdown2015-08-28T20:52:09+02:00Julien Muchembledjm@nexedi.com
Recent Python already catches exceptions due to garbage collection on exit.https://lab.nexedi.com/nexedi/neoppod/-/commit/e27358d130e1222d5244d9bb686a769ed22a979dstorage: fix history() not waiting oid to be unlocked2015-08-28T20:52:09+02:00Julien Muchembledjm@nexedi.com
This fixes a random failure in testClientReconnection:
Traceback (most recent call last):
File "neo/tests/threaded/test.py", line 754, in testClientReconnection
self.assertTrue(cluster.client.history(x1._p_oid))
failureException: None is not truehttps://lab.nexedi.com/nexedi/neoppod/-/commit/79be7787add43ed9cd2d100cf87586466cb0cf6eFix random failure in testRecycledClientUUID2015-08-28T20:52:09+02:00Julien Muchembledjm@nexedi.com
Traceback (most recent call last):
File "neo/tests/threaded/test.py", line 838, in testRecycledClientUUID
x = client.load(ZERO_TID)
[...]
File "neo/tests/threaded/test.py", line 822, in notReady
m2s.remove(delayNotifyInformation)
File "neo/tests/threaded/__init__.py", line 482, in remove
del self.filter_dict[filter]
KeyError: <function delayNotifyInformation at 0x7f511063a578>https://lab.nexedi.com/nexedi/neoppod/-/commit/c4ac45a8e8059c2af887742863c9c926139bca0bFix several random failures in tests that didn't wait for transaction to be u...2015-08-28T20:52:09+02:00Julien Muchembledjm@nexedi.com
NEOCluster.tic() gets a new 'slave' parameter that must be True when a client
node is in 'master' mode (i.e. setPoll(True)). In this case, tic() will wait
that all nodes finish their work and the client polls with a non-zero timeout.
Here, tic(slave=1) is used to wait for the storage to process
NotifyUnlockInformation notification from the master.
Traceback (most recent call last):
File "neo/tests/threaded/test.py", line 80, in testBasicStore
self.assertEqual(data_info, cluster.storage.getDataLockInfo())
File "neo/tests/__init__.py", line 170, in assertEqual
return super(NeoTestBase, self).assertEqual(first, second, msg=msg)
failureException: {('\x0b\xee\xc7\xb5\xea?\x0f\xdb\xc9]\r\xd4\x7f<[\xc2u\xda\x8a3', 0): 0} != {('\x0b\xee\xc7\xb5\xea?\x0f\xdb\xc9]\r\xd4\x7f<[\xc2u\xda\x8a3', 0): 1}https://lab.nexedi.com/nexedi/neoppod/-/commit/5dc1f06cc6f20c547e3bcd8c8d49f4832a5042fdSeveral improvements to verbose locks2015-08-28T20:51:55+02:00Julien Muchembledjm@nexedi.com
All these changes were useful to debug deadlocks in threaded tests:
- New verbose Semaphore.
- Logs with numerical 'ident' were too annoying to read so revert to thread
name (before commit <a href="/kirr/neo/-/commit/5b69d5531955191c78aebfc3898398dc6787dd6e" data-original="5b69d5531955191c78aebfc3898398dc6787dd6e" data-link="false" data-link-reference="false" data-project="73" data-commit="5b69d5531955191c78aebfc3898398dc6787dd6e" data-reference-type="commit" data-container="body" data-placement="top" data-html="true" title="Better output of verbose locks" class="gfm gfm-commit has-tooltip">5b69d553</a>), with an
exception for threaded tests. There remains one case where the result is not
unique: when several client apps are instantiated.
- Make deadlock detection optional.
- Make it possible to name locks.
- Make output more compact.
- Remove useless 'debug_lock' option.
- Add timing information.
- Make exception more verbose when an un-acquired lock is released.
Here is how I used 'locking':
--- a/neo/tests/threaded/__init__.py
+++ b/neo/tests/threaded/__init__.py
@@ -37,0 +38 @@
+from neo.lib.locking import VerboseSemaphore
@@ -71 +72,2 @@ def init(cls):
- cls._global_lock = threading.Semaphore(0)
+ cls._global_lock = VerboseSemaphore(0, check_owner=False,
+ name="Serialized._global_lock")
@@ -265 +267,2 @@ def start(self):
- self.em._lock = l = threading.Semaphore(0)
+ self.em._lock = l = VerboseSemaphore(0, check_owner=False,
+ name=self.node_name)
@@ -346 +349,2 @@ def __init__(self, master_nodes, name, **kw):
- self.em._lock = threading.Semaphore(0)
+ self.em._lock = VerboseSemaphore(0, check_owner=False,
+ name=repr(self))https://lab.nexedi.com/nexedi/neoppod/-/commit/0b93b1fb4f8418fc898a6660933daad1b01a1246Fix occasional deadlocks in threaded tests2015-08-28T20:13:52+02:00Julien Muchembledjm@nexedi.com
deadlocks mainly happened while stopping a cluster, hence the complete review
of NEOCluster.stop()
A major change is to make the client node handle its lock like other nodes
(i.e. in the polling thread itself) to better know when to call
Serialized.background() (there was a race condition with the test of
'self.poll_thread.isAlive()' in ClientApplication.close).https://lab.nexedi.com/nexedi/neoppod/-/commit/1ab594b412834706e502250493e0de9cedce64afRemove useless assert in a private method of MTClientConnection2015-08-14T12:01:48+02:00Julien Muchembledjm@nexedi.comhttps://lab.nexedi.com/nexedi/neoppod/-/commit/d898a83d51e6d7701c14fb5cb6bb33b4869ad8a7Do not reconnect too quickly to a node after an error2015-08-14T12:01:16+02:00Julien Muchembledjm@nexedi.com
For example, a backup storage node that was rejected because the upstream
cluster was not ready could reconnect in loop without delay, using 100% CPU
and flooding logs.
A new 'setReconnectionNoDelay' method on Connection can be used for cases where
it's legitimate to quickly reconnect.
With this new delayed reconnection, it's possible to remove the remaining
time.sleep().https://lab.nexedi.com/nexedi/neoppod/-/commit/71e30fb9b8941b200dc1768d59a58e73fe5b354fRemove useless testEvent2015-08-12T19:18:46+02:00Julien Muchembledjm@nexedi.com
Such kind of test has never helped to detect regressions and any bug in
EpollEventManager would be quickly reported by other tests.
testConnection may go the same way if it keeps annoying me too much.https://lab.nexedi.com/nexedi/neoppod/-/commit/f9df31be57e13a47f49448a26e784594fe09261fclient: do not wait for the remote to close the connection if it's not ready2015-08-12T19:18:46+02:00Julien Muchembledjm@nexedi.com
This is currently not an issue because the 'time.sleep(1)' in iterateForObject
(storage) and _connectToPrimaryNode (master) leave enough time. What could
happen is a new connection attempt for a node that already has a connection
(causing a failure assertion in Node.setConnection).https://lab.nexedi.com/nexedi/neoppod/-/commit/a4731a0c7b8a0e8d938d114b9ca0151f3f21e98dFix invalid processing of unregistered connections2015-08-12T19:18:46+02:00Julien Muchembledjm@nexedi.com
This could happen if a file descriptor was reallocated by the kernel.https://lab.nexedi.com/nexedi/neoppod/-/commit/ed50edca14d92fbb332594cb8f80ea25e533f028Simplify API to establish connections and accept mix of IPv4/IPv62015-08-12T19:18:46+02:00Julien Muchembledjm@nexedi.comhttps://lab.nexedi.com/nexedi/neoppod/-/commit/c2c97752b0809a4bcf19be09467bcf30d3cc7e46Rename parameter of polling methods now that _poll computes the timeout itself2015-08-12T19:18:46+02:00Julien Muchembledjm@nexedi.comhttps://lab.nexedi.com/nexedi/neoppod/-/commit/eef52c27bc9955f8e68f0442089afb8fc03987f7Tickless poll loop, for lowest latency and cpu usage2015-08-12T19:18:46+02:00Julien Muchembledjm@nexedi.com
With this patch, the epolling object is not awoken every second to check
if a timeout has expired. The API of Connection is changed to get the smallest
timeout.https://lab.nexedi.com/nexedi/neoppod/-/commit/fd0b9c98384184a675dc45c1bfe33ab9782250bdtests: make Patch usable as a context manager2015-08-12T15:55:52+02:00Julien Muchembledjm@nexedi.comhttps://lab.nexedi.com/nexedi/neoppod/-/commit/91c663569c6b77bdff6cb82663173bb2e5cd84d7Add file descriptor and aborted flag to __repr__ of connections2015-08-12T15:55:52+02:00Julien Muchembledjm@nexedi.comhttps://lab.nexedi.com/nexedi/neoppod/-/commit/cb8a5a88bc9660e6eb4aaa1be5889e028f21df6cclient: replace Event by a pipe as a way to stop the poll loop2015-08-12T15:55:52+02:00Julien Muchembledjm@nexedi.com
This is a prerequisite for tickless poll loops.https://lab.nexedi.com/nexedi/neoppod/-/commit/4a328ade971b8f6d773b7f48384a5a50905f683fFix 100% CPU usage when the closure of a connection is delayed2015-08-12T15:55:46+02:00Julien Muchembledjm@nexedi.comhttps://lab.nexedi.com/nexedi/neoppod/-/commit/4e739de4f78549097c2b5f2a25759a04d55678b7client: review connection locking (MTClientConnection)2015-08-12T12:25:28+02:00Julien Muchembledjm@nexedi.com
This mainly changes several methods to lock automatically instead of asserting
that the caller did it. This removes any overhead for non-MT classes, and
the use of 'with' instead of lock/unlock methods also simplifies the API.https://lab.nexedi.com/nexedi/neoppod/-/commit/e438f86477f6c1e38484b596c37b80b45949ac5cclient: a simple lock is enough for the connection pool2015-08-12T12:25:11+02:00Julien Muchembledjm@nexedi.comhttps://lab.nexedi.com/nexedi/neoppod/-/commit/c319b0658bb4850bd3807a8814ace3ae17226554Remove useless socket shutdown on close2015-08-12T12:25:11+02:00Julien Muchembledjm@nexedi.com
shutdown is implicit because we don't duplicate sockets.https://lab.nexedi.com/nexedi/neoppod/-/commit/19745e7c6c839f68d8eec8015c4a8d549990b30bSmall optimizations & cleanups2015-08-12T12:25:06+02:00Julien Muchembledjm@nexedi.comhttps://lab.nexedi.com/nexedi/neoppod/-/commit/5b69d5531955191c78aebfc3898398dc6787dd6eBetter output of verbose locks2015-08-12T12:23:27+02:00Julien Muchembledjm@nexedi.com
- For all threads except the main one, the id is displayed instead of the name,
because the latter is not always unique.
- Outputs may be interlaced by concurrent thread, so tracebacks are also
prefixed by their idents.https://lab.nexedi.com/nexedi/neoppod/-/commit/ede173f8801d67628c3894427a6dcf12a8ef2d20Fix verbose locks when acquiring without blocking2015-08-12T12:23:27+02:00Julien Muchembledjm@nexedi.comhttps://lab.nexedi.com/nexedi/neoppod/-/commit/52ed5aab05204110ca4454dd640f7afaab24e7adAdd a neo/debug.py example to display tracebacks of threads2015-07-28T20:26:52+02:00Julien Muchembledjm@nexedi.comhttps://lab.nexedi.com/nexedi/neoppod/-/commit/f4e656f634d0b2ef6b7ec06020be5b1231c6388bRelease version 1.42015-07-13T19:57:50+02:00Julien Muchembledjm@nexedi.comhttps://lab.nexedi.com/nexedi/neoppod/-/commit/167ad36ba5e958d9f4218f4e7df0c9039e09b798Better handling of NotReady error2015-07-13T14:00:47+02:00Julien Muchembledjm@nexedi.comhttps://lab.nexedi.com/nexedi/neoppod/-/commit/8ec873792aca98068233d4401fcc17cf2a1befddSome documentation cleanup2015-07-10T17:01:36+02:00Julien Muchembledjm@nexedi.comhttps://lab.nexedi.com/nexedi/neoppod/-/commit/197054be2f1cf43a64c1b0b5fc5ba5222f8e4195client: fix misleading exception message in case of mismatch checksum2015-07-09T21:03:02+02:00Julien Muchembledjm@nexedi.comhttps://lab.nexedi.com/nexedi/neoppod/-/commit/9e026d082842b3a9152f21c964b71c73b39577f9Fix neo/debug.py example for clients2015-07-03T18:09:57+02:00Julien Muchembledjm@nexedi.comhttps://lab.nexedi.com/nexedi/neoppod/-/commit/e03a836a31ee673631a97d403ea9e73e725146ddclient: prevent RTMIN+3 from connecting to master if not connected yet2015-07-03T11:52:30+02:00Julien Muchembledjm@nexedi.comhttps://lab.nexedi.com/nexedi/neoppod/-/commit/c324955d72edd7839724ad676565f540761d1d91client: fix "signal only works in main thread" when adding a ZODB Mount Point...2015-07-03T11:50:52+02:00Julien Muchembledjm@nexedi.comhttps://lab.nexedi.com/nexedi/neoppod/-/commit/79fca358d3a269415b38e4e528604335c089d667Update changelog2015-07-01T12:25:49+02:00Julien Muchembledjm@nexedi.comhttps://lab.nexedi.com/nexedi/neoppod/-/commit/02a5b4e3ea012eb78315edc7de20106c03da279bAdd upgrade notes about MySQL/SQLite schema changes since NEO 1.32015-06-30T17:20:53+02:00Julien Muchembledjm@nexedi.com