Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust approach of randomNodes to a more lookup based system for providing peers #271

Closed
kdeme opened this issue Jul 9, 2020 · 6 comments
Labels
discoveryv5 Discovery v5 related issues security Issues that are certain to have security implications

Comments

@kdeme
Copy link
Contributor

kdeme commented Jul 9, 2020

Currently lookups are done automatically at specific interval.
Applications using discoveryv5 can then request nodes with randomNodes. This will select nodes from the routing table.
If wanted, lookup or randomLookup can still be called to provide potentially extra peers.

It might however be of interest to only use randomLookup (or lookup) to provide peers. Discovery could have a timer to be sure that a lookup is done every x minutes, in case it would not be trigger by the application.

EDIT: Some discussion and thinking has brought us to the conclusion (see comments below) that randomNodes in its current form is not only inefficient, but also likely to have security flaws. Additionally, the idea of using randomLookup directly does not seem to get much approval. The alternative idea is something proposed a while back with an additional structure holding latest discovered nodes through lookup, which would be triggered to provide these peers after a new lookup.

@kdeme kdeme added the discoveryv5 Discovery v5 related issues label Jul 9, 2020
@kdeme kdeme added the security Issues that are certain to have security implications label Jul 12, 2020
@kdeme
Copy link
Contributor Author

kdeme commented Jul 13, 2020

The interest might be two-folded:

  • More usefull nodes might be passed to the application using them, especially in case of a small routing table (say bitsPerHop=1).
  • It sounds like a more resilient way in case of a big amount of malicious nodes in the routing table. Say that the routing table is practically full, and contains a lot of malicious nodes, then lookups should still provide a decent amount of other nodes (at least if there are enough good peers still in the routing table).

@kdeme
Copy link
Contributor Author

kdeme commented Jul 23, 2020

TLDR: I believe there is a way to abuse the current implementation of randomNodes, especially if it is called on short intervals. And especially at startup of discovery.

On the topic of efficiency, suggestions have been made (mostly by @cheatfate) to work on improving the randomNodes call.
One such suggestion was to trigger an event when new nodes are added to the routing table, and to make randomNodes an async proc waiting on this event. Next to that, to return only nodes that were not returned in previous call (or perhaps even calls).

Giving this some thought, I believe this can introduce a security risk. Especially when the number of nodes in the routing table is still low or almost none, e.g. before or during first lookup.
A malicious actor could make it so that the first set of nodes to be passed on for the application to be used, be a set of nodes that it controls (or at least a big part of them). And it could probably make the next sets biased.

  • This could be done passively, e.g. if it would control a bootstrap node or manage to poison a bootstrap node with many of its nodes. And then if that node would be within the first ones used to do a lookup. But this would not be so easy achievable (as findNode requests are done in parallel by 3).
  • But this could also be done actively, by pinging the target node with many of its controlled nodes. If this is timed right, this would already fill the routing table with many nodes, while only few nodes are discovered yet through lookup (this depend of course on how fast the lookup is going, and might be difficult because of this). The event would make these nodes quickly available. Here I see an immediate security concern.

And giving this some further thought, the exact same abuse can be done currently already with the randomNodes call. As this is e.g. in NBC called every second, which is likely to be frequent enough to have practical same behaviour as the above event based way.
So this is actually worse than the original concern I had about randomNodes for which this issue was created.

All this might be difficult to abuse in practise, but nevertheless we need to be careful when trying to optimise the provisioning of peers and keep this in mind. Else the original intent of using kademlia based discovery might be ignored/negated.

The way I see forward now is:

  • Either do the lookupRandom directly from application, and use those nodes.
  • Or have some additional structure inside discovery that provides the newly discovered nodes, possible with some filtering, but need to be careful there not introducing bias. This is what I was thinking about before the NBC side of seen peers filtering was introduced.

@cheatfate
Copy link
Contributor

cheatfate commented Jul 23, 2020

This could be done passively, e.g. if it would control a bootstrap node or manage to poison a bootstrap node with many of its nodes. And then if that node would be within the first ones used to do a lookup. But this would not be so easy achievable (as findNode requests are done in parallel by 3).

The proper fix for this issue is to maintain predefined set of peers which was obtained via previous runs and saved in some sort of peer table.

But this could also be done actively, by pinging the target node with many of its controlled nodes. If this is timed right, this would already fill the routing table with many nodes, while only few nodes are discovered yet through lookup (this depend of course on how fast the lookup is going, and might be difficult because of this). The event would make these nodes quickly available. Here I see an immediate security concern.

To fix this issue you should maintain proper peerPool and number of places which can be controlled by remote nodes should be limited. So for example you can have 80% of places which you populating by your own discovery and 20% of places can be occupied by remote pings. Also you need to implement network rate limits.

@kdeme
Copy link
Contributor Author

kdeme commented Jul 23, 2020

The proper fix for this issue is to maintain predefined set of peers which was obtained via previous runs and saved in some sort of peer table.

Yes, this is what I proposed as option at the end of previous comment: "Or have some additional structure inside discovery that provides the newly discovered nodes, possible with some filtering, but need to be careful there not introducing bias. This is what I was thinking about before the NBC side of seen peers filtering was introduced."

To fix this issue you should maintain proper peerPool and number of places which can be controlled by remote nodes should be limited. So for example you can have 80% of places which you populating by your own discovery and 20% of places can be occupied by remote pings. Also you need to implement network rate limits.

Yes, with structure mentioned above in place, which fills only discovered nodes, by which I meant, discovered through lookup, you would be practically doing this. Just a general lookup from application would also do this, but I understand you don't want to deal with that from the application.
Perhaps extra restriction is needed on adds through ping (and also incoming findNode requests) in the routing table itself, thank you for pointing that out, TBI.

Rate limits is another point, that has not only implications on the poisoning of peers and thus that deserves its separate issue. Created that issue: #284

@kdeme kdeme changed the title Investigate approach of randomNodes versus lookupRandom for providing peers Adjust approach of randomNodes to a more lookup based system for providing peers Jul 23, 2020
@kdeme
Copy link
Contributor Author

kdeme commented Jan 6, 2021

The idea that the lookup approach would be more resilient than the randomNodes on a poisoned routing table is quite flawed.
Both would likely provide an equally amount malicious nodes.

Additionally the lookup approach could also be abused by a malicious actor by precomputing lots of IDs to be able to provide the closest ids and as such filling the lookup request with only/mostly the node ids of the malicious actor. This is described in this paper: https://arxiv.org/pdf/1908.10141.pdf

For this reason, the query proc (next to lookup proc) was introduced, which will directly call findNode to the current 16 closest nodes in the routing table. And all discovered nodes (including the initial neighbours set taken from the routing table) will be provided (not just the closests), and in no closest sorted order. FindNode requests are also limited to 16 nodes returned max. If needed, an additional measure could be take here to also shuffle the sequence, so that no 16 nodes returned by a malicious actor would be provided in consecutive order.

Because of this, the query call will be more resilient than the lookup call, and also faster. Thus it can be used instead of randomNodes, and will provide more unique peers.
lookup can still be used of course, if needed, to find a specific node or search for closest nodes.

An additional Node/ENR cache could still be added, that provides a set of "latest discovered" nodes (and perhaps with some timer on which ones were last provided to the application above), but I'm not sure if would give very different results. It might turn out to be useful however if we need to get new peers with a specific ENR field (e.g. eth2 attnets) fast (no time to do findnode requests).

@kdeme
Copy link
Contributor Author

kdeme commented Jan 14, 2021

Resolved with #322

@kdeme kdeme closed this as completed Jan 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discoveryv5 Discovery v5 related issues security Issues that are certain to have security implications
Projects
None yet
Development

No branches or pull requests

2 participants