WIP: Teach NEO/go to handle multiple master nodes
Hello Kirill,
in the process of migrating to WC2 on our WWM/wind clone instance, I realized that it's currently not yet possible to use a NEO/go client with a NEO server which has more than one master node.
There are multiple reasons why this isn't possible yet.
Not all of them are related to NEO/go: I also made a mistake in nexedi/slapos@0cf70a6e when re-working the NEO URL formatting for WCFS (I forgot to split the master nodes IPv6 from each other with a comma ,
).
On the NEO/go side - as far as I can see - we need to finish the following steps:
-
We need to fix the deadlock when a node connects to another node and this other node sends a packet with a different connection id (which is documented as to-fix in the source code). This need to be fixed here, because when we have multiple masters and we try to connect to a secondary master, this secondary master sends us a
NotPrimaryMaster
packet, which is not an answer ofRequestIdentification
and therefore has a different msg id and therefore results in a deadlock.Dial
should instead return an error, so thatTalkMaster
could try a different node. -
We need to finish the TODO in
TalkMaster
: if our initially tried master node is not the primary master, we should try the other nodes (as it's stated in the protocol definition). -
If I'm not mistaken, this also means we need to change the
MasterAddr
attribute ofNode
type to something likeMasterAddrArray
(and do the same in related code for instanceNewMasteredNode
). -
Finally we also need to split the host part of a NEO URI by
,
toMasterAddrArray
as we do it in NEO/py.
I'm fine with working on all steps. I want to open this MR to discuss what's the best way how this should be done. I hope (2), (3) and (4) shouldn't be too difficult. But (1) seems to be more complicated.
Of course, the easiest solution would be if any answer to a request would have the same connection id, I guess this would make most sense for the NEO/go connection based model. I also guess this won't happen. So let's find a different way.
You already sketched a possible solution with
link.CloseAccept()
link.Ask1(reqID, accept)
link.Listen()
where I assume that Listen
is meant to be Accept
in current NEO/go/t?
When trying this I had the problem that the other nodes packet was dropped/not accepted in serveRecv
, then in serveRecv
s next iteration it catched an EOF
error which made the NodeLink
to be closed and therefore creates an error
in Conn.recvPkt
, which is finally propagated back to Expect
-> Ask1
-> Dial
.
Is there another case where we send a message to another node and the other node is expected to reply with a packet which is not the answer e.g. has a different message id (and is not a simple Error
or CloseClient
message)?
I can't think of any solution which stays in the isolated-connection-model as long as we can receive answers with a different connection id.
If this is an exception we could introduce something like Ask1RPC
.
In pseudo-code this could do something like this:
func Ask1RPC(link, request, ...response):
conn = link.newConn()
conn.sendMsgDirect(request)
return link.ExpectRPC(request, ...response)
func ExpectRPC(link, request, ...response):
loop forever:
for each connection in link:
if connection.hasNewPackage:
if connection.newPackage is any of response:
return connection.newPackage index in response
else: return error
e.g. Ask1RPC
could temporarily ignore the connection-isolation and process any incoming packet from any connection.
I think in case of an initial dialing this could be ok, because there aren't so many different packet we can expected to arrive from the dialed node.
What do you think about this Kirill, do you think this could be an acceptable trial, would you choose a different approach?
Best, Levin
/cc @kirr