ENG-578: IT (Part 5) - Docker start #744

aakoshh · 2024-02-23T14:02:01Z

Builds on #700

Trying to implement start_node.

TODO:

Writing tests

There are two modules to look at for how to write a test:

docker.rs: this is the entry point for all docker materializer based tests, and provides the with_testnet method to set up a testnet from a manifest, test it, log output, and tear it down.
root_only.rs is an example of a test module that uses the root_only.yaml manifest and the with_testnet from docker.rs

The tests are imported by docker.rs and not the other way around so they can be a single compilation unit. I also thought I'd be able to apply #[serial] in docker.rs but that doesn't seem to be the case, so don't forget to annotate tests with it, because the tests share a common materializer state file.

To write a test, one is supposed to create a manifest in testing/materializer/tests/manifests and then refer to it by name. Try to stick to using anyhow::Result rather than .unwrap() and assert!() otherwise with_testnet won't print the logs on CI. (I thought about adding something like a Drop handler but logs are async and the test would be panicking already when we'd be trying to prevent other cleanups; it might work but not straight forward).

    #[serial_test::serial]
    #[tokio::test]
    async fn test_my_manifest() {
        with_testnet("my-manifest.yaml", |_materializer, _manifest, testnet| {
            let test = async {
                let my_node = testnet.root().node("my-node");
                let my_docker_node = testnet.node(&my_node)?;

                let my_provider = my_docker_node
                    .ethapi_http_provider()?
                    .ok_or_else(|| anyhow!("my-node has ethapi enabled"))?;

                let bn = provider.get_block_number().await?;

                if bn <= U64::one() {
                    bail!("expected higher blocks than genesis");
                }

                Ok(())
            };

            test.boxed_local()
        })
        .await
        .unwrap()
    }

Testing

cargo test -p fendermint_testing_materializer --test docker -- --nocapture

The test run leaves the files around for inspection:

❯ tree testing/materializer/tests/docker-materializer-data/
testing/materializer/tests/docker-materializer-data/
├── materializer-state.json
├── scripts
│   └── docker-entry.sh
└── testnets
    └── root-only
        ├── accounts
        │   ├── alice
        │   │   ├── eth-addr
        │   │   ├── fvm-addr
        │   │   ├── public.b64
        │   │   ├── public.hex
        │   │   ├── secret.b64
        │   │   └── secret.hex
        │   ├── bob
        │   │   ├── eth-addr
        │   │   ├── fvm-addr
        │   │   ├── public.b64
        │   │   ├── public.hex
        │   │   ├── secret.b64
        │   │   └── secret.hex
        │   └── charlie
        │       ├── eth-addr
        │       ├── fvm-addr
        │       ├── public.b64
        │       ├── public.hex
        │       ├── secret.b64
        │       └── secret.hex
        └── root
            ├── genesis.json
            └── nodes
                ├── node-1
                │   ├── cometbft
                │   │   ├── config
                │   │   │   ├── addrbook.json
                │   │   │   ├── config.toml
                │   │   │   ├── genesis.json
                │   │   │   ├── node_key.json
                │   │   │   └── priv_validator_key.json
                │   │   └── data
                │   │       ├── blockstore.db
                │   │       │   ├── 000001.log
                │   │       │   ├── CURRENT
                │   │       │   ├── LOCK
                │   │       │   ├── LOG
                │   │       │   └── MANIFEST-000000
                │   │       ├── cs.wal
                │   │       │   └── wal
                │   │       ├── evidence.db
                │   │       │   ├── 000001.log
                │   │       │   ├── CURRENT
                │   │       │   ├── LOCK
                │   │       │   ├── LOG
                │   │       │   └── MANIFEST-000000
                │   │       ├── priv_validator_state.json
                │   │       ├── state.db
                │   │       │   ├── 000001.log
                │   │       │   ├── CURRENT
                │   │       │   ├── LOCK
                │   │       │   ├── LOG
                │   │       │   └── MANIFEST-000000
                │   │       └── tx_index.db
                │   │           ├── 000001.log
                │   │           ├── CURRENT
                │   │           ├── LOCK
                │   │           ├── LOG
                │   │           └── MANIFEST-000000
                │   ├── dynamic.env
                │   ├── fendermint
                │   │   ├── data
                │   │   │   └── rocksdb
                │   │   │       ├── 000004.log
                │   │   │       ├── CURRENT
                │   │   │       ├── IDENTITY
                │   │   │       ├── LOCK
                │   │   │       ├── LOG
                │   │   │       ├── MANIFEST-000005
                │   │   │       ├── OPTIONS-000013
                │   │   │       └── OPTIONS-000015
                │   │   ├── logs
                │   │   │   └── fendermint.2024-02-25.log
                │   │   └── snapshots
                │   ├── keys
                │   │   ├── cometbft-node-id
                │   │   ├── fendermint-peer-id
                │   │   ├── network_key.pk
                │   │   ├── network_key.sk
                │   │   └── validator_key.sk
                │   └── static.env
                └── node-2
                    ├── cometbft
                    │   ├── config
                    │   │   ├── addrbook.json
                    │   │   ├── config.toml
                    │   │   ├── genesis.json
                    │   │   ├── node_key.json
                    │   │   ├── priv_validator_key.json
                    │   │   └── write-file-atomic-06193269378025858889
                    │   └── data
                    │       ├── blockstore.db
                    │       │   ├── 000001.log
                    │       │   ├── CURRENT
                    │       │   ├── LOCK
                    │       │   ├── LOG
                    │       │   └── MANIFEST-000000
                    │       ├── cs.wal
                    │       │   └── wal
                    │       ├── evidence.db
                    │       │   ├── 000001.log
                    │       │   ├── CURRENT
                    │       │   ├── LOCK
                    │       │   ├── LOG
                    │       │   └── MANIFEST-000000
                    │       ├── priv_validator_state.json
                    │       ├── state.db
                    │       │   ├── 000001.log
                    │       │   ├── CURRENT
                    │       │   ├── LOCK
                    │       │   ├── LOG
                    │       │   └── MANIFEST-000000
                    │       └── tx_index.db
                    │           ├── 000001.log
                    │           ├── CURRENT
                    │           ├── LOCK
                    │           ├── LOG
                    │           └── MANIFEST-000000
                    ├── dynamic.env
                    ├── fendermint
                    │   ├── data
                    │   │   └── rocksdb
                    │   │       ├── 000004.log
                    │   │       ├── CURRENT
                    │   │       ├── IDENTITY
                    │   │       ├── LOCK
                    │   │       ├── LOG
                    │   │       ├── MANIFEST-000005
                    │   │       ├── OPTIONS-000013
                    │   │       └── OPTIONS-000015
                    │   ├── logs
                    │   │   └── fendermint.2024-02-25.log
                    │   └── snapshots
                    ├── keys
                    │   ├── cometbft-node-id
                    │   ├── fendermint-peer-id
                    │   ├── network_key.pk
                    │   └── network_key.sk
                    └── static.env

linear · 2024-02-23T14:02:04Z

ENG-578 Bootstrap the integration testing framework

fendermint/testing/materializer/src/docker/dropper.rs

fridrik01 · 2024-02-28T16:54:54Z

fendermint/testing/materializer/src/docker/container.rs

+    docker: Docker,
+    dropper: DropHandle,


+1 for splitting this up!

Yeah, it felt like both are needed most of the time, but it was very awkward.

fridrik01 · 2024-02-28T17:00:15Z

fendermint/testing/materializer/src/docker/dropper.rs

+///
+/// The loop will exit when all clones of the sender channel have been dropped.
+pub fn start(docker: Docker) -> DropHandle {
+    let (tx, mut rx) = tokio::sync::mpsc::unbounded_channel();


Ah, so this is a nice Rust way to implement the producer/consumer pattern between async tasks, nice!

Yes, I'd say this is the more common practice, and then with the select! macro in tokio you can do a whole lot of stuff like prioritising certain queues, doing timeouts, like here. The STM stuff is way less common. What I did with the runtime handle compiled but didn't run.

fridrik01 · 2024-02-28T17:23:04Z

fendermint/testing/materializer/tests/docker.rs

+
+lazy_static! {
+    static ref CI_PROFILE: bool = std::env::var("PROFILE").unwrap_or_default() == "ci";
+    static ref STARTUP_WAIT_SECS: u64 = if *CI_PROFILE { 20 } else { 15 };


because of slow CI I assume?

Yeah it was traditionally slower than the local one, which is why this ci.env existed in the cargo make TOML files. It probably shouldn't be hardcoded, and there isn't even that much of a difference, but still, here we are.

fridrik01 · 2024-02-28T17:29:03Z

fendermint/testing/materializer/tests/docker.rs

+    F: for<'a> FnOnce(
+        &Manifest,
+        &mut DockerMaterializer,
+        &'a mut Testnet<DockerMaterials, DockerMaterializer>,
+    ) -> Pin<Box<dyn Future<Output = anyhow::Result<()>> + 'a>>,


Djeesus 😵‍💫, that is some advanced Rust lifetime syntax!

Haven't seen this for<'a> before, looked it up [here](https://doc.rust-lang.org/stable/reference/trait-bounds.html#higher-ranked-trait-bounds, sounds like fun!

I'm not used to them at all either! The only other place I have used an example for this was here where I read impl<T> ResolvePool<T> where for<'a> ResolveKey: From<&'a T>, as there exists a transformation from "any reference to T to ResolveKey".

What's happening here, I didn't even know if it was possible. This is a very similar situation to the interpreters which consume some State as input and then return it as output, partly because I didn't know this could work. The compromise there was that if the interpreter fails and returns an Err, the state is lost, but that was okay there as I said any error like that can result in a panic and shut down Fendermint after logging. Here, however, we want to tear down the testnet, so I need it even if there is an error, but I can't expect it to be returned along with an error.

So this syntax means something like: whatever lifetime the testnet has, the future returned has the exact same lifetime, so it cannot outlive the testnet it borrowed. I tried first with two lifetimes and tried to say 'a: 'b and that Future<...> + 'b but that didn't work.

fendermint/testing/materializer/tests/docker.rs

fridrik01 · 2024-02-28T17:40:08Z

fendermint/testing/materializer/tests/docker.rs

+        .context("failed to set up testnet")?;
+
+    // Allow time for things to consolidate and blocks to be created.
+    tokio::time::sleep(Duration::from_secs(*STARTUP_WAIT_SECS)).await;


nit: Optimally we want to wait until all nodes are up, maybe we can poll here the RPC API on the cometBFT container until a block has been produced?

Yeah I agree, this is very crude. I think the place to do it though is in the test, where we at least know what nodes should exist, and what interface is available to query. We can always assume CometBFT I suppose, but I'm not sure every test will always be able to progress, maybe some setups are deliberately not starting a node that would allow quorum.

Should we do these utilities as we add tests? I'm sure patterns will emerge.

Maybe I can add a loop to wait until some general API responds from CometBFT, not necessarily block production.

Done, just a loop that exercises all the APIs to see if they respond to the most basic query.

Co-authored-by: Friðrik Ásmundsson <[email protected]>

fridrik01

🥇

aakoshh force-pushed the 578-integ-part5 branch 3 times, most recently from 278f384 to 83ea43d Compare February 23, 2024 20:32

aakoshh marked this pull request as ready for review February 25, 2024 23:51

aakoshh requested review from raulk, fridrik01 and cryptoAtwill February 25, 2024 23:55

aakoshh mentioned this pull request Feb 26, 2024

ENG-578: IT (Part 4) - Docker #700

Merged

18 tasks

aakoshh force-pushed the 578-integ-part5 branch from 7fdf31f to 8e9180e Compare February 26, 2024 10:19

aakoshh force-pushed the 578-integ-part4 branch 2 times, most recently from 19bf736 to f18ad79 Compare February 26, 2024 10:50

aakoshh force-pushed the 578-integ-part5 branch from 8e9180e to e3858ff Compare February 26, 2024 10:58

Base automatically changed from 578-integ-part4 to main February 26, 2024 19:41

aakoshh force-pushed the 578-integ-part5 branch from c7908e4 to 63d1727 Compare February 26, 2024 19:43

aakoshh added 15 commits February 27, 2024 09:34

ENG-578: Try capture docker output

407ed24

ENG-578: Configure dyanmic env vars

6af6069

ENG-578: Start containers

c36e8f3

ENG-578: Test manifest

33bc862

ENG-578: Try materialize a rootnet

702b524

ENG-578: Refactor tests. Remove testnet

0f3b6d4

ENG-578: Use channel with drop commands

07dc0b9

ENG-578: Fix dropping a testnet

6d640d5

ENG-578: Fixing and debugging

a475958

ENG-578: Serial tests

8d61d06

ENG-578: Try connecting to the network

95e74b5

ENG-578: Trying to remove by ID

4ed3172

ENG-578: Don't remove the link

906fc3b

ENG-578: Drop nodes first

b505a9b

ENG-578: Separate out DockerRunner

80621cf

aakoshh added 15 commits February 27, 2024 09:35

ENG-578: Fix bottom-up check period

cdff3f6

ENG-578: Adjust sleep time

b720694

ENG-578: Remove accidental checkin

d7f5cd5

ENG-578: Lint

48405de

ENG-578: Fix opts

87ff2f6

ENG-578: Fix eth-addr

bd4d618

ENG-578: Try to free disk space before e2e tests

94e557c

ENG-578: Variable name for kill timeout

3375f06

ENG-578: Fix resumable file creation

f0bd411

ENG-578: Longer wait time on CI

2da0595

ENG-578: Pass mutable testnet to closure

eab95e9

ENG-578: Print logs if failed on CI

2226f45

ENG-578: Split out test from the util

b61e854

ENG-578: Rename data dir

2e56627

FIX: Typo

80216b9

aakoshh force-pushed the 578-integ-part5 branch from 63d1727 to 80216b9 Compare February 27, 2024 09:35

ENG-578: Rename to fendermint_materializer

5a57215

aakoshh force-pushed the 578-integ-part5 branch from 84a09d4 to 5a57215 Compare February 27, 2024 12:25

ENG-578: lazy_static non-optional

ffc7f7a

aakoshh mentioned this pull request Feb 28, 2024

ENG-578: IT (Part 6) - Materializer CLI #753

Merged

fridrik01 reviewed Feb 28, 2024

View reviewed changes

aakoshh mentioned this pull request Feb 29, 2024

DEBUG: Log panics so they show up in the log files, not just the console #759

Merged

aakoshh and others added 4 commits February 29, 2024 19:02

Update fendermint/testing/materializer/src/docker/dropper.rs

02fe4ae

Co-authored-by: Friðrik Ásmundsson <[email protected]>

Update fendermint/testing/materializer/tests/docker.rs

f8e4fc1

Co-authored-by: Friðrik Ásmundsson <[email protected]>

Update fendermint/testing/materializer/tests/docker.rs

94f377b

Co-authored-by: Friðrik Ásmundsson <[email protected]>

ENG-578: Wait in a loop for the APIs to start

4d47cf6

aakoshh force-pushed the 578-integ-part5 branch from 14f02b6 to 4d47cf6 Compare February 29, 2024 20:19

fridrik01 approved these changes Mar 1, 2024

View reviewed changes

aakoshh merged commit 3211f79 into main Mar 1, 2024
17 checks passed

aakoshh deleted the 578-integ-part5 branch March 1, 2024 13:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENG-578: IT (Part 5) - Docker start #744

ENG-578: IT (Part 5) - Docker start #744

aakoshh commented Feb 23, 2024 •

edited

Loading

linear bot commented Feb 23, 2024

fridrik01 Feb 28, 2024

aakoshh Feb 29, 2024

fridrik01 Feb 28, 2024

aakoshh Feb 29, 2024

fridrik01 Feb 28, 2024

aakoshh Feb 29, 2024

fridrik01 Feb 28, 2024

aakoshh Feb 29, 2024

fridrik01 Feb 28, 2024

aakoshh Feb 29, 2024

aakoshh Feb 29, 2024

aakoshh Feb 29, 2024

fridrik01 left a comment

ENG-578: IT (Part 5) - Docker start #744

ENG-578: IT (Part 5) - Docker start #744

Conversation

aakoshh commented Feb 23, 2024 • edited Loading

Writing tests

Testing

linear bot commented Feb 23, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fridrik01 left a comment

Choose a reason for hiding this comment

aakoshh commented Feb 23, 2024 •

edited

Loading