Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discv5 Protocol: Add support for banning nodes #769

Merged
merged 16 commits into from
Jan 30, 2025
Merged

Conversation

bhartnett
Copy link
Contributor

@bhartnett bhartnett commented Jan 24, 2025

This PR implements banning of nodes in the Discv5 protocol. This allows the protocol to temporarily stop interacting with nodes that are misbehaving for various reasons. See the related task here: status-im/nimbus-eth1#2809

Currently there is only one ban in place:

  • Nodes are banned when providing an invalid response message for a given request message

For now the bans are rather short. They will become more useful if we implement some form of counter for each type of violation then after a certain number of events we can ban the node for a longer period of time. Banning a node for a single violation is too extreme which is why the current bans are short but having short bans isn't as beneficial when interacting with bad nodes.

I opted to only put these two bans in place because these were the only locations where we currently replace nodes in the routing table. There may be other reasons to ban nodes in which case we can add additional bans to either this PR or a future PR.

Expired bans are cleaned up in the background every 5 mins in one of the background loops.
The banning is configurable and can be enabled/disabled. When disabled the protocol should behave in exactly the same way as before this PR.

@bhartnett bhartnett requested a review from kdeme January 24, 2025 06:20
@bhartnett
Copy link
Contributor Author

@kdeme Are there any other bans which I should put in place for Discv5 that you are aware of?

../stubloglevel,
./discv5_test_helper

suite "Discovery v5 Ban Nodes Enabled Tests":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this being a copy of the tests without banlist it isn't really clear what we are testing or what we are testing extra in each test with respect to having banlist on or off.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My goal here was to fully test that the existing functionality is still working both with and without the ban list being enabled. The existing tests have banNodes = false so that the ban list is not enabled. The new tests cover mostly the same scenarios except banNodes = true so the ban list is enabled. Some of the new tests were modified based on the changed expected behavior and verify the case when a node is banned etc so I believe all/most the scenarios are covered.

How would you suggest I improve this? Open to suggestions. I wasn't sure if these duplicated regression tests are necessary or not. I guess I could remove them and just add a few tests with node bans enabled to the existing tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I understand the intention. I wasn't aware that some of the test cases were modified, which confirms my point that it is not very clear what exactly is being tested. Which test cases are not exactly copy-pasted from the original tests?

I understand that there is some value in running (most of) the same tests again with the banNodes = true to verify if it doesn't break general usage, but I don't really like it much in this duplicated way:

  • It will be difficult to maintain, the tests are already pretty complicated.
  • It is not really clear what is being tested and in the future if something breaks it will not be immediatly clear what the behaviour should be in banNodes = true versus banNodes = false.

So I think it would be good to have some (perhaps simpler) more specific test cases that verify the banNodes = true versus false cases.

If we were to also run the existing tests in both scenarios, then I think they should be altered to be usable in both ways. But this is not mandatory for me. And I think this also depends on whether we want to remove the banNodes option in the future and always have it enable. I was thinking of eventually removing this option once it is well tested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I see your point. I'll clean up the duplicated tests and add a few more specific scenarios covering banning of nodes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the tests and added a few test cases specifically covering banned nodes.

@@ -437,6 +461,10 @@ proc receive*(d: Protocol, a: Address, packet: openArray[byte]) =
let packet = decoded[]
case packet.flag
of OrdinaryMessage:
if d.isBanned(packet.srcId):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An improvement for the future here could be to ban without actually doing the decryption of the message (only the header). But they way the decodePacket call is currently designed this is not really possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I guess we could use the src-id in the authdata section of the packet header for ordinary messages and handshake messages.

@@ -464,6 +492,10 @@ proc receive*(d: Protocol, a: Address, packet: openArray[byte]) =
else:
debug "Timed out or unrequested whoareyou packet", address = a
of HandshakeMessage:
if d.isBanned(packet.srcIdHs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idem above.

discovery_message_requests_outgoing.inc(labelValues = ["invalid_response"])
return err("Invalid response to find node message")
else:
d.banNode(fromNode, NodeBanDurationNoResponse)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd prefer not to ban in this situation for now. Even with a 5min timer I'd like to understand the effects of it better. E.g. also in cases where the actual local node is overloaded.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will remove it for now.

@bhartnett
Copy link
Contributor Author

bhartnett commented Jan 28, 2025

@kdeme Are there any other bans which I should put in place for Discv5 that you are aware of?

Currently I'm only putting bans in place when checking the response of ping/findNode/talkReq messages. I thought about adding bans when handing messages in other scenarios such as in receive when a packet fails to decode but because we use UDP, bans at such a low level probably aren't a good idea. I guess we could just go with the current bans and if any scenarios come up we could put more bans in place when it makes sense to do so.

@kdeme
Copy link
Contributor

kdeme commented Jan 29, 2025

@kdeme Are there any other bans which I should put in place for Discv5 that you are aware of?

Currently I'm only putting bans in place when checking the response of ping/findNode/talkReq messages. I thought about adding bans when handing messages in other scenarios such as in receive when a packet fails to decode but because we use UDP, bans at such a low level probably aren't a good idea. I guess we could just go with the current bans and if any scenarios come up we could put more bans in place when it makes sense to do so.

Yeah, I think it is fine to start with those reasons to ban.

@bhartnett bhartnett requested a review from kdeme January 29, 2025 15:08
@bhartnett bhartnett merged commit c640d3c into master Jan 30, 2025
18 checks passed
@bhartnett bhartnett deleted the discv5-ban-list branch January 30, 2025 11:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants