Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENG-578: IT (Part 5) - Docker start #744

Merged
merged 46 commits into from
Mar 1, 2024
Merged

ENG-578: IT (Part 5) - Docker start #744

merged 46 commits into from
Mar 1, 2024

Conversation

aakoshh
Copy link
Contributor

@aakoshh aakoshh commented Feb 23, 2024

Builds on #700

Trying to implement start_node.

TODO:

  • Clean up docker materializer directory unless told not to The test leaves the files around for inspection, removing them on the next run. We can add more parameters to the DockerMaterializer if we add a CLI interface to it.
  • Instead of passing around a handle to a runtime, start a background task to remove docker artifacts that receives orders over a channel
  • Implement Drop for the Testnet to first drop containers, then the network
  • Try to access the eth API from the test
  • Stop and restart the docker node The Eth API test demonstrates that things work, let's leave it at that in this PR.
  • Check container status before starting
  • Check the exit code and return error if failed
  • Add to Makefile, exclude from regular tests
  • Log the error in the ABCI adapter; the panic doesn't show up in log files
  • Print the logs of all containers upon test failure, so we can see what went wrong on CI.
  • Fix the eth-addr to be the full version, not the shortened display one
  • Fix cometbft init steps to not be gated by the existence of the directory

Writing tests

There are two modules to look at for how to write a test:

  • docker.rs: this is the entry point for all docker materializer based tests, and provides the with_testnet method to set up a testnet from a manifest, test it, log output, and tear it down.
  • root_only.rs is an example of a test module that uses the root_only.yaml manifest and the with_testnet from docker.rs

The tests are imported by docker.rs and not the other way around so they can be a single compilation unit. I also thought I'd be able to apply #[serial] in docker.rs but that doesn't seem to be the case, so don't forget to annotate tests with it, because the tests share a common materializer state file.

To write a test, one is supposed to create a manifest in testing/materializer/tests/manifests and then refer to it by name. Try to stick to using anyhow::Result rather than .unwrap() and assert!() otherwise with_testnet won't print the logs on CI. (I thought about adding something like a Drop handler but logs are async and the test would be panicking already when we'd be trying to prevent other cleanups; it might work but not straight forward).

    #[serial_test::serial]
    #[tokio::test]
    async fn test_my_manifest() {
        with_testnet("my-manifest.yaml", |_materializer, _manifest, testnet| {
            let test = async {
                let my_node = testnet.root().node("my-node");
                let my_docker_node = testnet.node(&my_node)?;

                let my_provider = my_docker_node
                    .ethapi_http_provider()?
                    .ok_or_else(|| anyhow!("my-node has ethapi enabled"))?;

                let bn = provider.get_block_number().await?;

                if bn <= U64::one() {
                    bail!("expected higher blocks than genesis");
                }

                Ok(())
            };

            test.boxed_local()
        })
        .await
        .unwrap()
    }

Testing

cargo test -p fendermint_testing_materializer --test docker -- --nocapture

The test run leaves the files around for inspection:

tree testing/materializer/tests/docker-materializer-data/
testing/materializer/tests/docker-materializer-data/
├── materializer-state.json
├── scripts
│   └── docker-entry.sh
└── testnets
    └── root-only
        ├── accounts
        │   ├── alice
        │   │   ├── eth-addr
        │   │   ├── fvm-addr
        │   │   ├── public.b64
        │   │   ├── public.hex
        │   │   ├── secret.b64
        │   │   └── secret.hex
        │   ├── bob
        │   │   ├── eth-addr
        │   │   ├── fvm-addr
        │   │   ├── public.b64
        │   │   ├── public.hex
        │   │   ├── secret.b64
        │   │   └── secret.hex
        │   └── charlie
        │       ├── eth-addr
        │       ├── fvm-addr
        │       ├── public.b64
        │       ├── public.hex
        │       ├── secret.b64
        │       └── secret.hex
        └── root
            ├── genesis.json
            └── nodes
                ├── node-1
                │   ├── cometbft
                │   │   ├── config
                │   │   │   ├── addrbook.json
                │   │   │   ├── config.toml
                │   │   │   ├── genesis.json
                │   │   │   ├── node_key.json
                │   │   │   └── priv_validator_key.json
                │   │   └── data
                │   │       ├── blockstore.db
                │   │       │   ├── 000001.log
                │   │       │   ├── CURRENT
                │   │       │   ├── LOCK
                │   │       │   ├── LOG
                │   │       │   └── MANIFEST-000000
                │   │       ├── cs.wal
                │   │       │   └── wal
                │   │       ├── evidence.db
                │   │       │   ├── 000001.log
                │   │       │   ├── CURRENT
                │   │       │   ├── LOCK
                │   │       │   ├── LOG
                │   │       │   └── MANIFEST-000000
                │   │       ├── priv_validator_state.json
                │   │       ├── state.db
                │   │       │   ├── 000001.log
                │   │       │   ├── CURRENT
                │   │       │   ├── LOCK
                │   │       │   ├── LOG
                │   │       │   └── MANIFEST-000000
                │   │       └── tx_index.db
                │   │           ├── 000001.log
                │   │           ├── CURRENT
                │   │           ├── LOCK
                │   │           ├── LOG
                │   │           └── MANIFEST-000000
                │   ├── dynamic.env
                │   ├── fendermint
                │   │   ├── data
                │   │   │   └── rocksdb
                │   │   │       ├── 000004.log
                │   │   │       ├── CURRENT
                │   │   │       ├── IDENTITY
                │   │   │       ├── LOCK
                │   │   │       ├── LOG
                │   │   │       ├── MANIFEST-000005
                │   │   │       ├── OPTIONS-000013
                │   │   │       └── OPTIONS-000015
                │   │   ├── logs
                │   │   │   └── fendermint.2024-02-25.log
                │   │   └── snapshots
                │   ├── keys
                │   │   ├── cometbft-node-id
                │   │   ├── fendermint-peer-id
                │   │   ├── network_key.pk
                │   │   ├── network_key.sk
                │   │   └── validator_key.sk
                │   └── static.env
                └── node-2
                    ├── cometbft
                    │   ├── config
                    │   │   ├── addrbook.json
                    │   │   ├── config.toml
                    │   │   ├── genesis.json
                    │   │   ├── node_key.json
                    │   │   ├── priv_validator_key.json
                    │   │   └── write-file-atomic-06193269378025858889
                    │   └── data
                    │       ├── blockstore.db
                    │       │   ├── 000001.log
                    │       │   ├── CURRENT
                    │       │   ├── LOCK
                    │       │   ├── LOG
                    │       │   └── MANIFEST-000000
                    │       ├── cs.wal
                    │       │   └── wal
                    │       ├── evidence.db
                    │       │   ├── 000001.log
                    │       │   ├── CURRENT
                    │       │   ├── LOCK
                    │       │   ├── LOG
                    │       │   └── MANIFEST-000000
                    │       ├── priv_validator_state.json
                    │       ├── state.db
                    │       │   ├── 000001.log
                    │       │   ├── CURRENT
                    │       │   ├── LOCK
                    │       │   ├── LOG
                    │       │   └── MANIFEST-000000
                    │       └── tx_index.db
                    │           ├── 000001.log
                    │           ├── CURRENT
                    │           ├── LOCK
                    │           ├── LOG
                    │           └── MANIFEST-000000
                    ├── dynamic.env
                    ├── fendermint
                    │   ├── data
                    │   │   └── rocksdb
                    │   │       ├── 000004.log
                    │   │       ├── CURRENT
                    │   │       ├── IDENTITY
                    │   │       ├── LOCK
                    │   │       ├── LOG
                    │   │       ├── MANIFEST-000005
                    │   │       ├── OPTIONS-000013
                    │   │       └── OPTIONS-000015
                    │   ├── logs
                    │   │   └── fendermint.2024-02-25.log
                    │   └── snapshots
                    ├── keys
                    │   ├── cometbft-node-id
                    │   ├── fendermint-peer-id
                    │   ├── network_key.pk
                    │   └── network_key.sk
                    └── static.env

Copy link

linear bot commented Feb 23, 2024

@aakoshh aakoshh force-pushed the 578-integ-part5 branch 3 times, most recently from 278f384 to 83ea43d Compare February 23, 2024 20:32
@aakoshh aakoshh marked this pull request as ready for review February 25, 2024 23:51
@aakoshh aakoshh mentioned this pull request Feb 26, 2024
18 tasks
@aakoshh aakoshh force-pushed the 578-integ-part4 branch 2 times, most recently from 19bf736 to f18ad79 Compare February 26, 2024 10:50
Base automatically changed from 578-integ-part4 to main February 26, 2024 19:41
fendermint/testing/materializer/src/docker/dropper.rs Outdated Show resolved Hide resolved
Comment on lines +24 to +25
docker: Docker,
dropper: DropHandle,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for splitting this up!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it felt like both are needed most of the time, but it was very awkward.

///
/// The loop will exit when all clones of the sender channel have been dropped.
pub fn start(docker: Docker) -> DropHandle {
let (tx, mut rx) = tokio::sync::mpsc::unbounded_channel();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so this is a nice Rust way to implement the producer/consumer pattern between async tasks, nice!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'd say this is the more common practice, and then with the select! macro in tokio you can do a whole lot of stuff like prioritising certain queues, doing timeouts, like here. The STM stuff is way less common. What I did with the runtime handle compiled but didn't run.


lazy_static! {
static ref CI_PROFILE: bool = std::env::var("PROFILE").unwrap_or_default() == "ci";
static ref STARTUP_WAIT_SECS: u64 = if *CI_PROFILE { 20 } else { 15 };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because of slow CI I assume?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it was traditionally slower than the local one, which is why this ci.env existed in the cargo make TOML files. It probably shouldn't be hardcoded, and there isn't even that much of a difference, but still, here we are.

Comment on lines 62 to 66
F: for<'a> FnOnce(
&Manifest,
&mut DockerMaterializer,
&'a mut Testnet<DockerMaterials, DockerMaterializer>,
) -> Pin<Box<dyn Future<Output = anyhow::Result<()>> + 'a>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Djeesus 😵‍💫, that is some advanced Rust lifetime syntax!

Haven't seen this for<'a> before, looked it up [here](https://doc.rust-lang.org/stable/reference/trait-bounds.html#higher-ranked-trait-bounds, sounds like fun!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not used to them at all either! The only other place I have used an example for this was here where I read impl<T> ResolvePool<T> where for<'a> ResolveKey: From<&'a T>, as there exists a transformation from "any reference to T to ResolveKey".

What's happening here, I didn't even know if it was possible. This is a very similar situation to the interpreters which consume some State as input and then return it as output, partly because I didn't know this could work. The compromise there was that if the interpreter fails and returns an Err, the state is lost, but that was okay there as I said any error like that can result in a panic and shut down Fendermint after logging. Here, however, we want to tear down the testnet, so I need it even if there is an error, but I can't expect it to be returned along with an error.

So this syntax means something like: whatever lifetime the testnet has, the future returned has the exact same lifetime, so it cannot outlive the testnet it borrowed. I tried first with two lifetimes and tried to say 'a: 'b and that Future<...> + 'b but that didn't work.

fendermint/testing/materializer/tests/docker.rs Outdated Show resolved Hide resolved
.context("failed to set up testnet")?;

// Allow time for things to consolidate and blocks to be created.
tokio::time::sleep(Duration::from_secs(*STARTUP_WAIT_SECS)).await;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Optimally we want to wait until all nodes are up, maybe we can poll here the RPC API on the cometBFT container until a block has been produced?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree, this is very crude. I think the place to do it though is in the test, where we at least know what nodes should exist, and what interface is available to query. We can always assume CometBFT I suppose, but I'm not sure every test will always be able to progress, maybe some setups are deliberately not starting a node that would allow quorum.

Should we do these utilities as we add tests? I'm sure patterns will emerge.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I can add a loop to wait until some general API responds from CometBFT, not necessarily block production.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, just a loop that exercises all the APIs to see if they respond to the most basic query.

Copy link
Contributor

@fridrik01 fridrik01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥇

@aakoshh aakoshh merged commit 3211f79 into main Mar 1, 2024
17 checks passed
@aakoshh aakoshh deleted the 578-integ-part5 branch March 1, 2024 13:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants