Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tests for connected-peers protocol (DSN). #1874

Merged
merged 4 commits into from
Aug 29, 2023
Merged

Conversation

shamil-gadelshin
Copy link
Member

This PR adds several tests to connected-peers protocol.

Code contributor checklist:

@shamil-gadelshin shamil-gadelshin added the networking Subspace networking (DSN) label Aug 24, 2023
@shamil-gadelshin shamil-gadelshin self-assigned this Aug 24, 2023
Copy link
Member

@nazar-pc nazar-pc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd really appreciate some comments about what invariants you're testing and why the outcomes are what they are because they don't seem logical to me. No need to write useless water, just short explanation. Look at pallet-subspace or subspace-archiver for examples for such comments.

}

#[tokio::test()]
async fn test_connection_decision() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and other tests will all run for 2 seconds, are you sure we can't make it run MUCH faster? Like 100ms decision timeout and large delay of 200ms?

Copy link
Member Author

@shamil-gadelshin shamil-gadelshin Aug 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 seconds are insignificant compared to the whole testing duration of the project. However, I can try setting it to 1s. Generally, I want to set delays several orders of magnitude higher than the expected overhead of all the scaffolding.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is significant. Tests on my machine take ~5 minutes to run. 5 tests here will increase time to run tests by at least 2 seconds when running all in parallel, while not really using any meaningful amount of CPU. So having faster tests is certainly a desirable property. And in CI we only have 2 cores right now, so these 5 tests will take 6 seconds to run (depending on how they are scheduled). 3 seconds for 5 simple test cases is a lot.

Comment on lines +141 to +142
assert!(!peer1.is_connected(peer2.local_peer_id()));
assert!(!peer2.is_connected(peer1.local_peer_id()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would they not remain connected if one of the peers still wants to?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A peer can't force remote peers to maintain connections.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm... then we have a problem I think. Connected peers doesn't really maintain peers in connected state! It only prevents local end of the connection from being closed, which as you just mentioned is not sufficient to keep connections established.

So what this seems to mean is that we will still have a lot of reconnections unless both peers are interested in each other. More specifically this means nodes will continuously disconnect from farmers over and over again because farmers are never interested in nodes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm... then we have a problem I think. Connected peers doesn't really maintain peers in connected state! It only prevents local end of the connection from being closed, which as you just mentioned is not sufficient to keep connections established.

I'm not sure I understand the problem. Peers can drop connections at any time with or without protocols for many reasons unless two peers decide to maintain them for some time.

So what this seems to mean is that we will still have a lot of reconnections unless both peers are interested in each other. More specifically this means nodes will continuously disconnect from farmers over and over again because farmers are never interested in nodes.

We could have reconnections with the same peers if you meant that but it would be the case after failing connections with all the other known peers.

Nodes will maintain connections with farmers because we have two instances of this protocol on the farmer and one of them maintains connections with nodes.

Comment on lines +264 to +266
.authenticate(PlainText2Config {
local_public_key: identity.public(),
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why plain text though, how do we benefit from it? Crypto cost should be minimal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially removed that but swarm-test crate has "test-only" quality and it doesn't work without it.

Copy link
Member Author

@shamil-gadelshin shamil-gadelshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd really appreciate some comments about what invariants you're testing and why the outcomes are what they are because they don't seem logical to me. No need to write useless water, just short explanation. Look at pallet-subspace or subspace-archiver for examples for such comments.

Hmm, I write long test names but apparently, it's not enough. I will add some extra comments.

Comment on lines +141 to +142
assert!(!peer1.is_connected(peer2.local_peer_id()));
assert!(!peer2.is_connected(peer1.local_peer_id()));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A peer can't force remote peers to maintain connections.

Comment on lines +264 to +266
.authenticate(PlainText2Config {
local_public_key: identity.public(),
})
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially removed that but swarm-test crate has "test-only" quality and it doesn't work without it.

}

#[tokio::test()]
async fn test_connection_decision() {
Copy link
Member Author

@shamil-gadelshin shamil-gadelshin Aug 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 seconds are insignificant compared to the whole testing duration of the project. However, I can try setting it to 1s. Generally, I want to set delays several orders of magnitude higher than the expected overhead of all the scaffolding.

@shamil-gadelshin shamil-gadelshin enabled auto-merge (squash) August 29, 2023 06:48
}

#[tokio::test()]
async fn test_connection_decision() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is significant. Tests on my machine take ~5 minutes to run. 5 tests here will increase time to run tests by at least 2 seconds when running all in parallel, while not really using any meaningful amount of CPU. So having faster tests is certainly a desirable property. And in CI we only have 2 cores right now, so these 5 tests will take 6 seconds to run (depending on how they are scheduled). 3 seconds for 5 simple test cases is a lot.

Comment on lines +141 to +142
assert!(!peer1.is_connected(peer2.local_peer_id()));
assert!(!peer2.is_connected(peer1.local_peer_id()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm... then we have a problem I think. Connected peers doesn't really maintain peers in connected state! It only prevents local end of the connection from being closed, which as you just mentioned is not sufficient to keep connections established.

So what this seems to mean is that we will still have a lot of reconnections unless both peers are interested in each other. More specifically this means nodes will continuously disconnect from farmers over and over again because farmers are never interested in nodes.

break;
}
}
}

// We don't maintain with new peers when we have enough connected peers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is helpful, but there are 3 blocks. The comment explains the high idea behind the whole test, it doesn't really comment what is happening here. And what is happening in these 3 blocks of assertions is checks where some peers are within limits and thus maintain connections and others are not and thus don't maintain connections.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the comment.

@nazar-pc nazar-pc disabled auto-merge August 29, 2023 16:42
@nazar-pc nazar-pc merged commit 70bc957 into main Aug 29, 2023
9 checks passed
@nazar-pc nazar-pc deleted the connected-peers-tests branch August 29, 2023 16:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
networking Subspace networking (DSN)
Projects
Development

Successfully merging this pull request may close these issues.

2 participants