Sort cell list after randomising it.
There are 2 objectives: - Prevent randomly trying to connect to an unresponsive storage node, which impairs performances a lot. Note that this happens only when the master didn't notice the disconnection, so the node is still in running state in the node manager. - Increase connection reuse, saving the cost of establishing a new connection and a slot in connection pool. Randomisation should be kept to even out storage node use. git-svn-id: https://svn.erp5.org/repos/neo/trunk@2173 71dcc9de-d417-0410-9af5-da40c76e7ee4
-
Owner
That's not ideal. At startup or after nodes are back, this prevents full load balancing until some data is written. And for some not-so-rare use cases, it may be forever.
That's actually what happened during an upgrade: a migration had to read a lot of data and I noticed that only 1 node was busy.
Since this commit, the code has changed a lot but the idea of distinguishing connected and other good nodes was kept. I consider dropping it. The code would become:
--- a/neo/client/pool.py +++ b/neo/client/pool.py @@ -62,19 +62,16 @@ def _initNodeConnection(self, node): def getCellSortKey(self, cell, random=random.random): # The use of 'random' suffles cells to randomise node to access. uuid = cell.getUUID() - # First, prefer a connected node. - if uuid in self.connection_dict: - return random() - # Then one that didn't fail recently. + # Prefer a node that didn't fail recently. failure = self.node_failure_dict.get(uuid) if failure: if time.time() < failure: - # At last, order by date of connection failure. + # Or order by date of connection failure. return failure # Do not use 'del' statement: we didn't lock, so another # thread might have removed uuid from node_failure_dict. self.node_failure_dict.pop(uuid, None) - return 1 + random() + return random() def getConnForNode(self, node): """Return a locked connection object to a given node
-
Owner
A compromise could be to overlap the 2 cases, e.g.
random()
vs0.5 + random()
, but I doubt it's worth the complexity. -
Owner
It is related to the original idea of limiting the number of connections kept opened, yes.
About having to write for actual balancing to happen, what about ordering cells when there is no connection, and then continuing with this approach ?
Or, if startup time does not hurt too much from it, what about establishing all connections from the begining ?
-
Owner
About having to write for actual balancing to happen, what about ordering cells when there is no connection, and then continuing with this approach ?
I don't understand.
Anyway, without any objection on the fact that we don't limit anymore the number of connections, my original suggestion is the simplest so I'll do that.
-
Owner
Here was my thought process:
The way I understand "we have to write" affecting node selection for reads is that until a write we will not have connections to all nodes.
The way I understand "not having connections to all nodes" affecting node selection is that we prefer nodes we are already connected with.
The way I understand "nodes we are already connected with" as being a load balancing issue is if there is no randomisation before the first connection got established (so all first connections end up on the same node).
So it looks like randomisation is not applied as it should be: whenever there is no clear outstanding node, which includes the case where no connection exists at all.
Am I missing/misunderstanding something ? I did not re-read the code beyond above diff nor look for the most recent implementation.