-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENG-578: IT (Part 5) - Docker start #744
Conversation
278f384
to
83ea43d
Compare
7fdf31f
to
8e9180e
Compare
19bf736
to
f18ad79
Compare
8e9180e
to
e3858ff
Compare
c7908e4
to
63d1727
Compare
63d1727
to
80216b9
Compare
84a09d4
to
5a57215
Compare
docker: Docker, | ||
dropper: DropHandle, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for splitting this up!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it felt like both are needed most of the time, but it was very awkward.
/// | ||
/// The loop will exit when all clones of the sender channel have been dropped. | ||
pub fn start(docker: Docker) -> DropHandle { | ||
let (tx, mut rx) = tokio::sync::mpsc::unbounded_channel(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, so this is a nice Rust way to implement the producer/consumer pattern between async tasks, nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'd say this is the more common practice, and then with the select!
macro in tokio
you can do a whole lot of stuff like prioritising certain queues, doing timeouts, like here. The STM stuff is way less common. What I did with the runtime handle compiled but didn't run.
|
||
lazy_static! { | ||
static ref CI_PROFILE: bool = std::env::var("PROFILE").unwrap_or_default() == "ci"; | ||
static ref STARTUP_WAIT_SECS: u64 = if *CI_PROFILE { 20 } else { 15 }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because of slow CI I assume?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it was traditionally slower than the local one, which is why this ci.env
existed in the cargo make
TOML files. It probably shouldn't be hardcoded, and there isn't even that much of a difference, but still, here we are.
F: for<'a> FnOnce( | ||
&Manifest, | ||
&mut DockerMaterializer, | ||
&'a mut Testnet<DockerMaterials, DockerMaterializer>, | ||
) -> Pin<Box<dyn Future<Output = anyhow::Result<()>> + 'a>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Djeesus 😵💫, that is some advanced Rust lifetime syntax!
Haven't seen this for<'a>
before, looked it up [here](https://doc.rust-lang.org/stable/reference/trait-bounds.html#higher-ranked-trait-bounds, sounds like fun!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not used to them at all either! The only other place I have used an example for this was here where I read impl<T> ResolvePool<T> where for<'a> ResolveKey: From<&'a T>,
as there exists a transformation from "any reference to T to ResolveKey".
What's happening here, I didn't even know if it was possible. This is a very similar situation to the interpreters which consume some State
as input and then return it as output, partly because I didn't know this could work. The compromise there was that if the interpreter fails and returns an Err
, the state is lost, but that was okay there as I said any error like that can result in a panic and shut down Fendermint after logging. Here, however, we want to tear down the testnet, so I need it even if there is an error, but I can't expect it to be returned along with an error.
So this syntax means something like: whatever lifetime the testnet has, the future returned has the exact same lifetime, so it cannot outlive the testnet it borrowed. I tried first with two lifetimes and tried to say 'a: 'b
and that Future<...> + 'b
but that didn't work.
.context("failed to set up testnet")?; | ||
|
||
// Allow time for things to consolidate and blocks to be created. | ||
tokio::time::sleep(Duration::from_secs(*STARTUP_WAIT_SECS)).await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Optimally we want to wait until all nodes are up, maybe we can poll here the RPC API on the cometBFT container until a block has been produced?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree, this is very crude. I think the place to do it though is in the test, where we at least know what nodes should exist, and what interface is available to query. We can always assume CometBFT I suppose, but I'm not sure every test will always be able to progress, maybe some setups are deliberately not starting a node that would allow quorum.
Should we do these utilities as we add tests? I'm sure patterns will emerge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I can add a loop to wait until some general API responds from CometBFT, not necessarily block production.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, just a loop that exercises all the APIs to see if they respond to the most basic query.
Co-authored-by: Friðrik Ásmundsson <[email protected]>
Co-authored-by: Friðrik Ásmundsson <[email protected]>
Co-authored-by: Friðrik Ásmundsson <[email protected]>
14f02b6
to
4d47cf6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🥇
Builds on #700
Trying to implement
start_node
.TODO:
Clean up docker materializer directory unless told not toThe test leaves the files around for inspection, removing them on the next run. We can add more parameters to theDockerMaterializer
if we add a CLI interface to it.Drop
for theTestnet
to first drop containers, then the networkStop and restart the docker nodeThe Eth API test demonstrates that things work, let's leave it at that in this PR.Makefile
, exclude from regular testseth-addr
to be the full version, not the shortened display oneWriting tests
There are two modules to look at for how to write a test:
with_testnet
method to set up a testnet from a manifest, test it, log output, and tear it down.with_testnet
fromdocker.rs
The tests are imported by
docker.rs
and not the other way around so they can be a single compilation unit. I also thought I'd be able to apply#[serial]
indocker.rs
but that doesn't seem to be the case, so don't forget to annotate tests with it, because the tests share a common materializer state file.To write a test, one is supposed to create a manifest in
testing/materializer/tests/manifests
and then refer to it by name. Try to stick to usinganyhow::Result
rather than.unwrap()
andassert!()
otherwisewith_testnet
won't print the logs on CI. (I thought about adding something like aDrop
handler but logs are async and the test would be panicking already when we'd be trying to prevent other cleanups; it might work but not straight forward).Testing
cargo test -p fendermint_testing_materializer --test docker -- --nocapture
The test run leaves the files around for inspection: