Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
N neo
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Labels
    • Labels
  • Merge requests 1
    • Merge requests 1
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Operations
    • Operations
    • Environments
  • Analytics
    • Analytics
    • CI/CD
    • Repository
    • Value Stream
  • Members
    • Members
  • Activity
  • Graph
  • Jobs
  • Commits
Collapse sidebar
  • Kirill Smelkov
  • neo
  • Merge requests
  • !2

Merged
Created Mar 29, 2023 by Levin Zimmermann@levin.zimmermann

WIP: Teach NEO/go to handle multiple master nodes

  • Overview 9
  • Commits 9
  • Changes 12

Hello Kirill,

in the process of migrating to WC2 on our WWM/wind clone instance, I realized that it's currently not yet possible to use a NEO/go client with a NEO server which has more than one master node.

There are multiple reasons why this isn't possible yet. Not all of them are related to NEO/go: I also made a mistake in nexedi/slapos@0cf70a6e when re-working the NEO URL formatting for WCFS (I forgot to split the master nodes IPv6 from each other with a comma ,).

On the NEO/go side - as far as I can see - we need to finish the following steps:

  1. We need to fix the deadlock when a node connects to another node and this other node sends a packet with a different connection id (which is documented as to-fix in the source code). This need to be fixed here, because when we have multiple masters and we try to connect to a secondary master, this secondary master sends us a NotPrimaryMaster packet, which is not an answer of RequestIdentification and therefore has a different msg id and therefore results in a deadlock. Dial should instead return an error, so that TalkMaster could try a different node.

  2. We need to finish the TODO in TalkMaster: if our initially tried master node is not the primary master, we should try the other nodes (as it's stated in the protocol definition).

  3. If I'm not mistaken, this also means we need to change the MasterAddr attribute of Node type to something like MasterAddrArray (and do the same in related code for instance NewMasteredNode).

  4. Finally we also need to split the host part of a NEO URI by , to MasterAddrArray as we do it in NEO/py.

I'm fine with working on all steps. I want to open this MR to discuss what's the best way how this should be done. I hope (2), (3) and (4) shouldn't be too difficult. But (1) seems to be more complicated.

Of course, the easiest solution would be if any answer to a request would have the same connection id, I guess this would make most sense for the NEO/go connection based model. I also guess this won't happen. So let's find a different way.

You already sketched a possible solution with

link.CloseAccept()
link.Ask1(reqID, accept)
link.Listen()

where I assume that Listen is meant to be Accept in current NEO/go/t?

When trying this I had the problem that the other nodes packet was dropped/not accepted in serveRecv, then in serveRecvs next iteration it catched an EOF error which made the NodeLink to be closed and therefore creates an error in Conn.recvPkt, which is finally propagated back to Expect -> Ask1 -> Dial.

Is there another case where we send a message to another node and the other node is expected to reply with a packet which is not the answer e.g. has a different message id (and is not a simple Error or CloseClient message)?

I can't think of any solution which stays in the isolated-connection-model as long as we can receive answers with a different connection id. If this is an exception we could introduce something like Ask1RPC. In pseudo-code this could do something like this:

func Ask1RPC(link, request, ...response):
	conn = link.newConn()
	conn.sendMsgDirect(request)
	return link.ExpectRPC(request, ...response)

func ExpectRPC(link, request, ...response):
	loop forever:
		for each connection in link:
			if connection.hasNewPackage:
				if connection.newPackage is any of response:
					return connection.newPackage index in response
                                else: return error

e.g. Ask1RPC could temporarily ignore the connection-isolation and process any incoming packet from any connection. I think in case of an initial dialing this could be ok, because there aren't so many different packet we can expected to arrive from the dialed node.

What do you think about this Kirill, do you think this could be an acceptable trial, would you choose a different approach?

Best, Levin

/cc @kirr

Assignee
Assign to
Reviewer
Request review from
None
Milestone
None
Assign milestone
Time tracking
Source branch: t-with-multiple-master-nodes
GitLab Nexedi Edition | About GitLab | About Nexedi | 沪ICP备2021021310号-2 | 沪ICP备2021021310号-7