Adjust approach of randomNodes to a more lookup based system for providing peers #271

kdeme · 2020-07-09T11:00:06Z

Currently lookups are done automatically at specific interval.
Applications using discoveryv5 can then request nodes with randomNodes. This will select nodes from the routing table.
If wanted, lookup or randomLookup can still be called to provide potentially extra peers.

It might however be of interest to only use randomLookup (or lookup) to provide peers. Discovery could have a timer to be sure that a lookup is done every x minutes, in case it would not be trigger by the application.

EDIT: Some discussion and thinking has brought us to the conclusion (see comments below) that randomNodes in its current form is not only inefficient, but also likely to have security flaws. Additionally, the idea of using randomLookup directly does not seem to get much approval. The alternative idea is something proposed a while back with an additional structure holding latest discovered nodes through lookup, which would be triggered to provide these peers after a new lookup.

The text was updated successfully, but these errors were encountered:

kdeme · 2020-07-13T09:59:56Z

The interest might be two-folded:

More usefull nodes might be passed to the application using them, especially in case of a small routing table (say bitsPerHop=1).
It sounds like a more resilient way in case of a big amount of malicious nodes in the routing table. Say that the routing table is practically full, and contains a lot of malicious nodes, then lookups should still provide a decent amount of other nodes (at least if there are enough good peers still in the routing table).

kdeme · 2020-07-23T07:50:58Z

TLDR: I believe there is a way to abuse the current implementation of randomNodes, especially if it is called on short intervals. And especially at startup of discovery.

On the topic of efficiency, suggestions have been made (mostly by @cheatfate) to work on improving the randomNodes call.
One such suggestion was to trigger an event when new nodes are added to the routing table, and to make randomNodes an async proc waiting on this event. Next to that, to return only nodes that were not returned in previous call (or perhaps even calls).

Giving this some thought, I believe this can introduce a security risk. Especially when the number of nodes in the routing table is still low or almost none, e.g. before or during first lookup.
A malicious actor could make it so that the first set of nodes to be passed on for the application to be used, be a set of nodes that it controls (or at least a big part of them). And it could probably make the next sets biased.

This could be done passively, e.g. if it would control a bootstrap node or manage to poison a bootstrap node with many of its nodes. And then if that node would be within the first ones used to do a lookup. But this would not be so easy achievable (as findNode requests are done in parallel by 3).
But this could also be done actively, by pinging the target node with many of its controlled nodes. If this is timed right, this would already fill the routing table with many nodes, while only few nodes are discovered yet through lookup (this depend of course on how fast the lookup is going, and might be difficult because of this). The event would make these nodes quickly available. Here I see an immediate security concern.

And giving this some further thought, the exact same abuse can be done currently already with the randomNodes call. As this is e.g. in NBC called every second, which is likely to be frequent enough to have practical same behaviour as the above event based way.
So this is actually worse than the original concern I had about randomNodes for which this issue was created.

All this might be difficult to abuse in practise, but nevertheless we need to be careful when trying to optimise the provisioning of peers and keep this in mind. Else the original intent of using kademlia based discovery might be ignored/negated.

The way I see forward now is:

Either do the lookupRandom directly from application, and use those nodes.
Or have some additional structure inside discovery that provides the newly discovered nodes, possible with some filtering, but need to be careful there not introducing bias. This is what I was thinking about before the NBC side of seen peers filtering was introduced.

cheatfate · 2020-07-23T08:29:57Z

This could be done passively, e.g. if it would control a bootstrap node or manage to poison a bootstrap node with many of its nodes. And then if that node would be within the first ones used to do a lookup. But this would not be so easy achievable (as findNode requests are done in parallel by 3).

The proper fix for this issue is to maintain predefined set of peers which was obtained via previous runs and saved in some sort of peer table.

But this could also be done actively, by pinging the target node with many of its controlled nodes. If this is timed right, this would already fill the routing table with many nodes, while only few nodes are discovered yet through lookup (this depend of course on how fast the lookup is going, and might be difficult because of this). The event would make these nodes quickly available. Here I see an immediate security concern.

To fix this issue you should maintain proper peerPool and number of places which can be controlled by remote nodes should be limited. So for example you can have 80% of places which you populating by your own discovery and 20% of places can be occupied by remote pings. Also you need to implement network rate limits.

kdeme · 2020-07-23T08:38:38Z

The proper fix for this issue is to maintain predefined set of peers which was obtained via previous runs and saved in some sort of peer table.

Yes, this is what I proposed as option at the end of previous comment: "Or have some additional structure inside discovery that provides the newly discovered nodes, possible with some filtering, but need to be careful there not introducing bias. This is what I was thinking about before the NBC side of seen peers filtering was introduced."

To fix this issue you should maintain proper peerPool and number of places which can be controlled by remote nodes should be limited. So for example you can have 80% of places which you populating by your own discovery and 20% of places can be occupied by remote pings. Also you need to implement network rate limits.

Yes, with structure mentioned above in place, which fills only discovered nodes, by which I meant, discovered through lookup, you would be practically doing this. Just a general lookup from application would also do this, but I understand you don't want to deal with that from the application.
Perhaps extra restriction is needed on adds through ping (and also incoming findNode requests) in the routing table itself, thank you for pointing that out, TBI.

Rate limits is another point, that has not only implications on the poisoning of peers and thus that deserves its separate issue. Created that issue: #284

kdeme · 2021-01-06T09:35:32Z

The idea that the lookup approach would be more resilient than the randomNodes on a poisoned routing table is quite flawed.
Both would likely provide an equally amount malicious nodes.

Additionally the lookup approach could also be abused by a malicious actor by precomputing lots of IDs to be able to provide the closest ids and as such filling the lookup request with only/mostly the node ids of the malicious actor. This is described in this paper: https://arxiv.org/pdf/1908.10141.pdf

For this reason, the query proc (next to lookup proc) was introduced, which will directly call findNode to the current 16 closest nodes in the routing table. And all discovered nodes (including the initial neighbours set taken from the routing table) will be provided (not just the closests), and in no closest sorted order. FindNode requests are also limited to 16 nodes returned max. If needed, an additional measure could be take here to also shuffle the sequence, so that no 16 nodes returned by a malicious actor would be provided in consecutive order.

Because of this, the query call will be more resilient than the lookup call, and also faster. Thus it can be used instead of randomNodes, and will provide more unique peers.
lookup can still be used of course, if needed, to find a specific node or search for closest nodes.

An additional Node/ENR cache could still be added, that provides a set of "latest discovered" nodes (and perhaps with some timer on which ones were last provided to the application above), but I'm not sure if would give very different results. It might turn out to be useful however if we need to get new peers with a specific ENR field (e.g. eth2 attnets) fast (no time to do findnode requests).

kdeme · 2021-01-14T08:56:28Z

Resolved with #322

kdeme added the discoveryv5 Discovery v5 related issues label Jul 9, 2020

kdeme mentioned this issue Jul 10, 2020

Use brHmacDrbgGenerate for all random operations in discovery #272

Merged

kdeme added the security Issues that are certain to have security implications label Jul 12, 2020

kdeme changed the title ~~Investigate approach of randomNodes versus lookupRandom for providing peers~~ Adjust approach of randomNodes to a more lookup based system for providing peers Jul 23, 2020

kdeme mentioned this issue Jul 23, 2020

Rate limiting for discovery v5 messages and/or packets #284

Open

kdeme mentioned this issue Jan 6, 2021

Only do discovery queries to refresh the table #322

Merged

kdeme closed this as completed Jan 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adjust approach of randomNodes to a more lookup based system for providing peers #271

Adjust approach of randomNodes to a more lookup based system for providing peers #271

kdeme commented Jul 9, 2020 •

edited

Loading

kdeme commented Jul 13, 2020

kdeme commented Jul 23, 2020 •

edited

Loading

cheatfate commented Jul 23, 2020 •

edited

Loading

kdeme commented Jul 23, 2020 •

edited

Loading

kdeme commented Jan 6, 2021

kdeme commented Jan 14, 2021

Adjust approach of randomNodes to a more lookup based system for providing peers #271

Adjust approach of randomNodes to a more lookup based system for providing peers #271

Comments

kdeme commented Jul 9, 2020 • edited Loading

kdeme commented Jul 13, 2020

kdeme commented Jul 23, 2020 • edited Loading

cheatfate commented Jul 23, 2020 • edited Loading

kdeme commented Jul 23, 2020 • edited Loading

kdeme commented Jan 6, 2021

kdeme commented Jan 14, 2021

kdeme commented Jul 9, 2020 •

edited

Loading

kdeme commented Jul 23, 2020 •

edited

Loading

cheatfate commented Jul 23, 2020 •

edited

Loading

kdeme commented Jul 23, 2020 •

edited

Loading