Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

discv5: additional text around NAT hole punching #231

Open
wants to merge 5 commits into
base: discv5-v5.2
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
123 changes: 117 additions & 6 deletions discv5/discv5-rationale.md
Original file line number Diff line number Diff line change
Expand Up @@ -340,40 +340,151 @@ disturb the operation of the protocol. Session keys per node-ID/IP generally pre
replay across sessions. The `request-id`, mirrored in response packets, prevents replay of
responses within a session.

### Session message, the new packet type

The [session message packet] is a unicast notification so by definition it doesn't trigger a
response. Introducing this new message container removes the otherwise induced DoS attack
vector that sending a [RELAYMSG] to a target will trigger the target to send 2 messages, a
response as well as the [RELAYINIT] message.

A session message doesn't trigger a response on the higher layer, the discv5 equivalent to
TLS if you may, neither. If a session is not at hand to encrypt a session message, the session
message is dropped, contrary to an ordinary message packet which then would be sent a random
packet and await a [WHOAREYOU] response from the peer.

### Redundancy of ENRs in NODES responses and connectivity status assumptions about Relay and Bob

Often the same peers get passed around in NODES responses by different peers. The chance
of seeing a peer received in a NODES response again in another NODES response is high as
k-buckets favour long lived connections to new ones. This makes the need for a storing
back up relays for peers small.

Apart from the state that is saved by not storing more than the last peer to send us an
ENR as its potential relay, the longer time that has passed since a peer sent us an ENR,
the less guarantee we have that the peer is in fact connected to the owner of that ENR and
hence of its ability to relay.

### Job of keeping the hole punched falls on Bob and Bob's incentive to do so

UDP session table entry lifetimes are configurable, though a common lower bound is 20
seconds. Bob must periodically reset the session table entry for Alice in its NAT to keep
the hole for Alice punched. If Alice too is behind NAT, it must do the same for Bob.
Implementations must ensure, when a node behind NAT does not send a packet to a peer
within the entry lifetime then an empty packet is sent that is dropped by the peer.

NAT hole punching unavoidably creates overhead for all nodes in the network but once a
hole is punched, keeping it punched requires no interaction between peers. The incentive
for nodes behind NAT to keep holes for its peers punched is to avoid reestablishing a
session. If there is no hole for Alice in Bob's NAT when Alice carries out a [liveness
check], Bob is considered offline and the session to Bob useless. If the node behind NAT
intends to frequently communicate with a peer, reestablishing the session is more costly
than managing the interval of sent packets to that peer.

### Discovering if the local node is behind NAT and must keep holes for peers punched

A node may at start-up be assigned an externally reachable socket to advertise as well as
a listen socket. If those sockets are not equivalent, the node is behind NAT and the
[mechanism for keeping holes punched] is activated. Like so, a node assumes it is behind
NAT if an externally reachable socket is omitted from the initial configuration and must
activate the mechanism for keeping holes punched. The [runtime address discovery]
mechanism can discovery the external endpoint address used by the local node. Once a new
externally reachable endpoint is known, implementations will try to bind to its IP address
at some number of randomly selected ports from a given range of probably unused ports. If
binding succeeds with any port, the node is not behind NAT and the mechanism for keeping
holes punched is deactivated.

This solution assumes, in most scenarios where port-forwarding cannot be configured the
local node host's address is private to the address realm of the device operating the NAT
level furthest from the local node host. If the host and NAT device use the same IP
address, binding will always succeed, so this method may give a false negative. However,
this is not detrimental. A node behind NAT that deactivates the mechanism for keeping
holes punched will more frequently have to re-establish sessions to its peers.

### Limiting resource consumption of peers behind symmetric NATs, useful for light-clients

Peers with unreachable ENRs do not get inserted into node table buckets. Nodes that are
behind symmetric NATs will naturally never succeed in pinpointing one external socket for
peers to reach them on by [runtime address discovery] and therefore their unreachable ENRs
will never update to reachable ENRs.

This means, these peers will never respond to requests, as only peers in the node table
are sent requests. This does not cohere with the p2p-model, rather the server-client model
where the peer with a unreachable ENR acts as the client. This misalignment is especially
bothersome for well behaving (externally reachable) light-clients operating on limited
resources. Discv5.2 corrects this side-effect of runtime address discovery by introducing
a configurable limit to the number of sessions at a time with peers with unreachable ENRs,
the lower limit being 1. Nodes must accept sessions with at least one peer with a
unreachable ENR to for runtime address discovery to be enabled on the discv5.2 network.

### Fault tolerance and connectivity

As was already the case in discv5, if a request to a node times out, (after the configured
number of retries) that peer is considered unresponsive and the session can be failed.
Theoretically, peers behind NAT that repeatedly fail at taking responsibility for their
inconvenient topology, i.e. fail at keeping holes punched in their NAT for their peers,
could be banned by other peers from re-connecting within a set duration. This would
certainly add to the incentive for nodes behind NAT to behave, however it can result in
bad recovery from network-wide failure if many nodes simultaneously get blocked from
further sessions establishment due to unresponsiveness.

The relation between two discv5 peers' k-buckets is not necessarily symmetric, one node
having a peer in its k-buckets doesn't necessarily mean vv is true. Behaving nodes may very
well have sessions to peers they don't have in their k-buckets. Increasing the frequency of
discv5's [PING] liveness check for peers in k-buckets is therefore not enough to ensure
connectivity across discv5.2. To keep NATs open for new incoming connections, packets sent
to keep holes punched must be sent to all peers which a node has a session with.

# References

- Petar Maymounkov and David Mazières.
*Kademlia: A Peer-to-peer Information System Based on the XOR Metric.* 2002.\
<https://www.scs.stanford.edu/~dm/home/papers/kpos.pdf>

- Atul Singh, Tsuen-Wan “Johnny” Ngan, Peter Druschel, Dan S. Wallach.
- Atul Singh, Tsuen-Wan “Johnny” Ngan, Peter Druschel and Dan S. Wallach.
*Eclipse Attacks on Overlay Networks: Threats and Defenses*. 2006.\
<https://www.cs.rice.edu/~dwallach/pub/eclipse-infocom06.pdf>

- Ingmar Baumgart and Sebastian Mies.
*S/Kademlia: A Practicable Approach Towards Secure Key-Based Routing.* 2007.\
<https://telematics.tm.kit.edu/publications/Files/267/SKademlia_2007.pdf>

- Xin Sun, Ruben Torres and Sanjay Rao. *Feasiblity of DDoS Attacks with P2P Systems and
- Xin Sun, Ruben Torres and Sanjay Rao. *Feasibility of DDoS Attacks with P2P Systems and
Prevention through Robust Membership Management.* 2007.\
<https://docs.lib.purdue.edu/cgi/viewcontent.cgi?article=1357&context=ecetr>

- Erik Hjelmvik, Wolfgang John. *Breaking and Improving Protocol Obfuscation.* 2010.\
- Erik Hjelmvik and Wolfgang John. *Breaking and Improving Protocol Obfuscation.* 2010.\
<https://internetstiftelsen.se/docs/hjelmvik_breaking.pdf>

- Adam Langley, Wan-Teh Chang. *QUIC Crypto*. 2016.\
- Adam Langley and Wan-Teh Chang. *QUIC Crypto*. 2016.\
<https://docs.google.com/document/d/1g5nIXAIkN_Y-7XJW5K45IblHd_L2f5LTaDUDwvZ5L6g>

- W3C Credentials Community Group. *Decentralized Identifiers (DIDs) Spec.* 2017.\
<https://w3c-ccg.github.io/did-spec>

- Seoung Kyun Kim, Zane Ma, Siddharth Murali, Joshua Mason, Andrew Miller, Michael Bailey.
- Seoung Kyun Kim, Zane Ma, Siddharth Murali, Joshua Mason, Andrew Miller and Michael Bailey.
*Measuring Ethereum Network Peers*. 2018.\
<http://mdbailey.ece.illinois.edu/publications/imc18_ethereum.pdf>

- Yuval Marcus, Ethan Heilman, Sharon Goldberg.
- Yuval Marcus, Ethan Heilman and Sharon Goldberg.
*Low-Resource Eclipse Attacks on Ethereum’s Peer-to-Peer Network.* 2018.\
<https://eprint.iacr.org/2018/236.pdf>

- Dan Kegel, Bryan Ford and Pyda Srisuresh.
*Peer-to-peer communication across network address translators - USENIX*. 2005.\
<https://www.usenix.org/legacy/event/usenix05/tech/general/full_papers/ford/ford.pdf>

- Cullen Fluffy Jennings and François Audet.
*RFC 478: FTP Server-server interaction - II, IETF Datatracker*. 1973.\
<https://datatracker.ietf.org/doc/html/rfc478>

- Philip Matthews, Iljitsch Van Beijnum and Marcelo Bagnulo.
*RFC FT-IETF-behave-V6V4-xlate-stateful: Stateful NAT64: Network address and protocol translation from IPv6 clients to ipv4 servers, IETF Datatracker*. 2011.\
<https://datatracker.ietf.org/doc/html/rfc6146>

[wire protocol]: ./discv5-wire.md
[node records]: ../enr.md
[session message packet]: ./discv5-wire.md#session-message-packet-flag--3
[WHOAREYOU]: ./discv5-wire.md#whoareyou-packet-flag--1
[PING]: ./discv5-wire.md#ping-request-0x01
[RELAYINIT]: ./discv5-wire.md#relayinit-notification-0x07
[RELAYMSG]: ./discv5-wire.md#relaymsg-notification-0x08
41 changes: 25 additions & 16 deletions discv5/discv5-theory.md
Original file line number Diff line number Diff line change
Expand Up @@ -354,9 +354,7 @@ regardless of the host they are sent to. The hole-punching mechanism does not wo
Node Alice may or may not behind a symmetric NAT.

Finally, it is assumed that a common lower bound on lifetime of NAT mappings is 20
seconds, and that mappings will be refreshed when any packet is sent through them. For
more background information about common NAT setups, please consult [RFC4787], [RFC6146]
and [this paper][natpaper].
seconds, and that mappings will be refreshed when any packet is sent through them.

### Message flow

Expand Down Expand Up @@ -396,25 +394,37 @@ The Relay sends a [RELAYMSG] notification containing Alice's message nonce and E

Bob disassembles the [RELAYMSG] and uses the `nonce` to assemble a [WHOAREYOU packet],
then sends it to Alice. Bob knows about Alice's endpoint from the `initiator-enr` given in
RELAYMSG.
[RELAYMSG].

Bob's NAT adds the mapping `(Bob's-LAN-ip, Bob's-LAN-port, Alice's-WAN-ip,
Alice's-WAN-port, entry-lifetime)`. A hole is punched in Bob's NAT for Alice for the
duration of `entry-lifetime`.

From here on it's business as usual. See [Sessions].

### Redundancy of ENRs in NODES responses and connectivity status assumptions about Relay and Bob
### Roles initiator, target and relay

Often the same peers get passed around in NODES responses by different peers. The chance
of seeing a peer received in a NODES response again in another NODES response is high as
k-buckets favour long lived connections to new ones. This makes the need for a storing
back up relays for peers small.
The initiator must store each yet not contacted node together with the `node-id` of the node
which sent it. Upon timed out request, this way the initiator has a relay at hand which is
more likely than most other nodes to have a connection to the target. Upon successful session
establishment, the initiator can forget about the relay stored for that node. Implementations
may store relays for nodes as they like, although it is recommend in the case that one ENR is
received from more than one peer in the time window between sending the initial request (random
packet) to it timing out, to replace the relay stored with the latest seen relay. It is crucial
that the initiator doesn't discard the timed out request when initiating a hole punch attempt,
it is need to map the [WHOAREYOU packet] to an expected response.

Apart from the state that is saved by not storing more than the last peer to send us an
ENR as its potential relay, the longer time that has passed since a peer sent us an ENR,
the less guarantee we have that the peer is in fact connected to the owner of that ENR and
hence of its ability to relay.
The target is a node the initiator assumes is behind NAT. Upon receiving a [RELAYMSG] from a
relay it must use the `nonce` in that message to assemble a [WHOAREYOU packet] to the
initiator.

Relays are only expected to relay to targets that are in its k-buckets, as almost only then is
it possible that the relay has sent the target's ENR in a NODES response and the peer requesting
the relay, the initiator, can hence make assumptions about the connectivity between the relay and
the target.

Regardless of role, all nodes behind NAT are responsible for keeping holes punched for their
peers after initial session establishment to further allow their requests to reach them.

[EIP-778]: ../enr.md
[identity scheme]: ../enr.md#record-structure
Expand All @@ -430,6 +440,5 @@ hence of its ability to relay.
[RELAYMSG]: ./discv5-wire.md#relaymsg-notification-0x08

[Sessions]: ./discv5-theory.md#sessions
[natpaper]: https://pdos.csail.mit.edu/papers/p2pnat.pdf
[RFC4787]: https://datatracker.ietf.org/doc/html/rfc4787
[RFC6146]: https://datatracker.ietf.org/doc/html/rfc6146
[runtime address discovery]: ./discv5-theory.md#maintaining-the-local-node-record
[mechanism for keeping holes punched]: ./discv5-theory.md#job-of-keeping-the-hole-punched-falls-on-bob-and-bobs-incentive-to-do-so