-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discv5 Protocol: Add support for banning nodes #769
Conversation
@kdeme Are there any other bans which I should put in place for Discv5 that you are aware of? |
../stubloglevel, | ||
./discv5_test_helper | ||
|
||
suite "Discovery v5 Ban Nodes Enabled Tests": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this being a copy of the tests without banlist it isn't really clear what we are testing or what we are testing extra in each test with respect to having banlist on or off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My goal here was to fully test that the existing functionality is still working both with and without the ban list being enabled. The existing tests have banNodes = false so that the ban list is not enabled. The new tests cover mostly the same scenarios except banNodes = true so the ban list is enabled. Some of the new tests were modified based on the changed expected behavior and verify the case when a node is banned etc so I believe all/most the scenarios are covered.
How would you suggest I improve this? Open to suggestions. I wasn't sure if these duplicated regression tests are necessary or not. I guess I could remove them and just add a few tests with node bans enabled to the existing tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I understand the intention. I wasn't aware that some of the test cases were modified, which confirms my point that it is not very clear what exactly is being tested. Which test cases are not exactly copy-pasted from the original tests?
I understand that there is some value in running (most of) the same tests again with the banNodes = true to verify if it doesn't break general usage, but I don't really like it much in this duplicated way:
- It will be difficult to maintain, the tests are already pretty complicated.
- It is not really clear what is being tested and in the future if something breaks it will not be immediatly clear what the behaviour should be in banNodes = true versus banNodes = false.
So I think it would be good to have some (perhaps simpler) more specific test cases that verify the banNodes = true versus false cases.
If we were to also run the existing tests in both scenarios, then I think they should be altered to be usable in both ways. But this is not mandatory for me. And I think this also depends on whether we want to remove the banNodes
option in the future and always have it enable. I was thinking of eventually removing this option once it is well tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I see your point. I'll clean up the duplicated tests and add a few more specific scenarios covering banning of nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the tests and added a few test cases specifically covering banned nodes.
@@ -437,6 +461,10 @@ proc receive*(d: Protocol, a: Address, packet: openArray[byte]) = | |||
let packet = decoded[] | |||
case packet.flag | |||
of OrdinaryMessage: | |||
if d.isBanned(packet.srcId): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An improvement for the future here could be to ban without actually doing the decryption of the message (only the header). But they way the decodePacket
call is currently designed this is not really possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I guess we could use the src-id in the authdata section of the packet header for ordinary messages and handshake messages.
@@ -464,6 +492,10 @@ proc receive*(d: Protocol, a: Address, packet: openArray[byte]) = | |||
else: | |||
debug "Timed out or unrequested whoareyou packet", address = a | |||
of HandshakeMessage: | |||
if d.isBanned(packet.srcIdHs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idem above.
eth/p2p/discoveryv5/protocol.nim
Outdated
discovery_message_requests_outgoing.inc(labelValues = ["invalid_response"]) | ||
return err("Invalid response to find node message") | ||
else: | ||
d.banNode(fromNode, NodeBanDurationNoResponse) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd prefer not to ban in this situation for now. Even with a 5min timer I'd like to understand the effects of it better. E.g. also in cases where the actual local node is overloaded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I will remove it for now.
Currently I'm only putting bans in place when checking the response of ping/findNode/talkReq messages. I thought about adding bans when handing messages in other scenarios such as in |
Yeah, I think it is fine to start with those reasons to ban. |
This PR implements banning of nodes in the Discv5 protocol. This allows the protocol to temporarily stop interacting with nodes that are misbehaving for various reasons. See the related task here: status-im/nimbus-eth1#2809
Currently there is only one ban in place:
For now the bans are rather short. They will become more useful if we implement some form of counter for each type of violation then after a certain number of events we can ban the node for a longer period of time. Banning a node for a single violation is too extreme which is why the current bans are short but having short bans isn't as beneficial when interacting with bad nodes.
I opted to only put these two bans in place because these were the only locations where we currently replace nodes in the routing table. There may be other reasons to ban nodes in which case we can add additional bans to either this PR or a future PR.
Expired bans are cleaned up in the background every 5 mins in one of the background loops.
The banning is configurable and can be enabled/disabled. When disabled the protocol should behave in exactly the same way as before this PR.