Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rendezvous protocol #144

Closed
acolytec3 opened this issue Apr 22, 2022 · 13 comments
Closed

Rendezvous protocol #144

acolytec3 opened this issue Apr 22, 2022 · 13 comments

Comments

@acolytec3
Copy link
Contributor

acolytec3 commented Apr 22, 2022

One major roadblock to Portal Network really working at the scale desired running on consumer/mobile devices is that most of these devices sit behind routers/firewalls/NATs of various sorts and are not reachable by inbound requests from other nodes and so the network will only really scale as much as there are people running nodes on publicly accessible IP addresses (which is not most people). Not really anything new as this affects Ethereum today and any other p2p network aspiring to reach true scale.
Inspired by this work done in libp2p, I did some experimentation within Ultralight to see if we could implement something similar to allow nodes behind a firewall/NAT to be reachable via inbound connections. It still requires publicly accessible nodes to act as the relay (the "rendezvous" if you will - and yes, also stolen shamelessly from Protocol Labs/libp2p) to set up an initial connection but it would at least give this conceptually much larger set of semi-unreachable nodes a chance to participate in the network since they can find a much larger set of nodes than just the ones with public IP addresses.

My initial work on it is in this PR and the basic flow is as below.

    sequenceDiagram
        Requestor->>RendezvousNode: FIND TargetNodeId
        RendezvousNode->>TargetNode: Liveness check PING
        TargetNode->>RendezvousNode: PONG
        RendezvousNode->>Requestor: TargetNode ENR
        Requestor->>RendezvousNode: SYNC TargetNodeId
        RendezvousNode->>TargetNode: SYNC Requestor ENR
        Requestor->>TargetNode: DIRECT PortalNetwork PING
        TargetNode->>Requestor: DIRECT Portalnetwork PING
Loading

I envision this as being an extension of the recursive nodeLookup and contentLookup where upon receiving ENRs (either as FOUNDCONTENT payload or a NODES response) and attempting and failing to connect to an ENR, the requesting node then initiates the Rendezvous process outlined above with the node that provided the ENRs to establish a new connection with the previously unreachable node.

Some additional considerations here:

  1. We would need to have relatively frequent liveness checks to keep UDP entries in firewall/NAT/routers active (maybe every 30 seconds is a reasonable threshold)
  2. These sorts of connections would have to be reestablished each time a node behind NAT/firewall restarts.

Otherwise, this feels like a relatively lightweight extension of our current wire spec that would allow a much larger number of devices to potentially participate in the network.

@pipermerriam
Copy link
Member

One idea that @carver and I had was to have nodes "signal" their willingness to be rendezvous nodes via something like a key in their ENR.

The other structure we discussed was having nodes only be willing to do rendezvous for nodes that are in their routing table...

This has the appealing property of anytime you find a node that you want to connect to via a FINDNODES request, and the node serving the request has signaled that they are willing to be a rendezvous node, then you as the "requester" can have some confidence that you are already talking to a node that is able and willing to establish a connection to the target node.

The actual message flow would be similar or even the same as what is diagram'd above.

@acolytec3
Copy link
Contributor Author

The other structure we discussed was having nodes only be willing to do rendezvous for nodes that are in their routing table...

This has the appealing property of anytime you find a node that you want to connect to via a FINDNODES request, and the node serving the request has signaled that they are willing to be a rendezvous node, then you as the "requester" can have some confidence that you are already talking to a node that is able and willing to establish a connection to the target node.

This variant works nicely with the idea of just supporting rendezvous if you've included an ENR in a NODES response and the node you sent them to discovers it can't reach a node from the NODES response.

@carver
Copy link
Contributor

carver commented May 6, 2022

The other structure we discussed was having nodes only be willing to do rendezvous for nodes that are in their routing table...

Yes, I like this variant best. It's the best way to know that the rendezvous node can help you contact the relevant peer. Also, it means all peers are rendezvous nodes by default.

2. These sorts of connections would have to be reestablished each time a node behind NAT/firewall restarts.

It's not clear to me why this part would be true. The NAT router doesn't know that your process restarted.

@carver
Copy link
Contributor

carver commented May 6, 2022

Also, I think there's a big benefit to having the TargetNode send some packet that does not initiate a connection. That way the direction-ness is maintained. It could just send a garbage packet, I don't think it matters, the point is just to trigger the NAT to open the port.

If we ping both ways, then the client logic for handling whether the connection is truly outbound/inbound gets a lot more complex. I think simplicity here is worth the cost of maybe forcing the Requestor to send an extra PING or two, to resolve the race condition.

@pipermerriam
Copy link
Member

I started to codify this into something resembling a spec over here, FYI, we might move discussion over that way: ethereum/devp2p#207

@acolytec3
Copy link
Contributor Author

Also, I think there's a big benefit to having the TargetNode send some packet that does not initiate a connection. That way the direction-ness is maintained. It could just send a garbage packet, I don't think it matters, the point is just to trigger the NAT to open the port.

Ah, I hadn't thought about that. It could definitely lead to additional missed handshakes if both are sending the WHOAREYOU/random packet at the same time. I may update my PoC with some of this and see how it goes. I haven't integrated my rendezvous code into actual automated processes yet so not sure how well it will behave when I'm not manually triggering it.

@acolytec3
Copy link
Contributor Author

I started to codify this into something resembling a spec over here, FYI, we might move discussion over that way: ethereum/devp2p#207

Just so I don't ask ignorant questions in larger forums (:sweat_smile:), is your issue/PR on devp2p aimed at adding new message types to the discv5 wire protocol or within our Portal Network layer?

@acolytec3
Copy link
Contributor Author

  1. These sorts of connections would have to be reestablished each time a node behind NAT/firewall restarts.

It's not clear to me why this part would be true. The NAT router doesn't know that your process restarted.

Good point. This is more likely to apply to our mobile app where the app could go into the background and not be able to maintain liveness checks and then the NAT the phone is connected to garbage collects those routing table entries in the meanwhile.

@pipermerriam
Copy link
Member

is your issue/PR on devp2p aimed at adding new message types to the discv5 wire protocol or within our Portal Network layer?

Aimed at discv5 layer, but we should just implement the equivalent at the portal layer since the two should be roughly identical.

@acolytec3
Copy link
Contributor Author

Circling back on this as we're refactoring some code and thinking about whether to keep this in our code base or just remove entirely. It was an interesting prototype but what I currently have is a complete hack and not worth keeping without a proper rethink of a spec with SSZ types and all that. Is there any appetite to work on this in the near to medium future across the implementations?

@carver
Copy link
Contributor

carver commented Aug 2, 2022

For now I see it as a distraction to doing the most important next steps of interop testing on the Alpha network. I'm happy to put attention on it, soon after we feel confident about interop.

@acolytec3
Copy link
Contributor Author

I looked at the issue @pipermerriam opened on the discv5 specs repo and it looks like sigma prime is working on a proof of concept for this at the discv5 level. Chainsafe also has indicated openness to implementing it there so I suggest we look to support that work rather than pursue it further at our level.

@pipermerriam
Copy link
Member

closing because this is being officially handled by the base protocol now(soon).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants