[Bug]: Mainnet state exported localnet requires second node after upgrading to SDK 47 #17078

sampocs · 2023-07-20T14:18:50Z

Context

For each Stride upgrade, the upgrade is tested on a mainnet-state-exported local network in docker (docs). Osmosis does the same (which is where localstride is borrowed from), and I assume there are many other chains that follow a similar testing process.

Summary of Bug

When testing the upgrade from SDK 46 to 47, the upgrade passed successfully, but then the network was halted immediately after. The solution wound up being that we needed to start up a 2nd node and peer it together with the first, which was able to jump start the first node. After blocks started churning again, we were able to turn off the 2nd node.

Considering we've run this mainnet-state-exported upgrade process on all prior upgrades without seeing this issue, I'm led to believe it's something related to SDK 47.

Version

Upgrading from v0.46.7 to v0.47.3

Steps to Reproduce

The steps to reproduce are a bit complex. It involves following this guide, and starting with stride binary version v9.2.1 and upgrading to version v10.0.0.

I'm mostly posting this for awareness to other teams that test their upgrades with mainnet state exported testnets. That said, if you would like to debug this, I'm happy to hop on a call and walk through the setup to reproduce!

The text was updated successfully, but these errors were encountered:

alexanderbez · 2023-07-20T15:31:20Z

Hi @sampocs, thanks for posting! Curious, what makes you think a 2nd node was needed? Obviously it worked after the 2nd node was started, but indicated that this was necessary?

What were the logs from the 1st node? Was it stuck? Was it trying to produce a block?

sampocs · 2023-07-20T21:28:31Z

Unfortunately I don't have the logs handy anymore 😞. But it was producing endless p2p logs (which was the first hint).

The logs did show that the upgrade was successful and I added logs to the begin/end blocker that showed it completed the block that corresponded to the upgrade height. So this gave me the hunch that the issue was not related to any specific upgrade handler code and was likely a networking issue (this was later confirmed as this upgrade was successful on mainnet).

Truth be told, the real idea for adding a 2nd node came from an Osmosis engineer who I was debugging this with. Back in the day, he used to always have to add a 2nd node to these mainnet upgrade tests, before he realized that if he could just disable fast sync instead. But in our case, fast sync was already disabled, and adding a 2nd node was more just a last ditch guess since nothing else we had tried could get it to run 😂 . And neither of us have any guess at why this solved it

Apologies, I know that's not super helpful!

tac0turtle · 2024-02-20T09:53:38Z

this is an open issue on CometBFT, we dont have anything we can do in the sdk here

sampocs added the T:Bug label Jul 20, 2023

tac0turtle closed this as completed Feb 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Mainnet state exported localnet requires second node after upgrading to SDK 47 #17078

[Bug]: Mainnet state exported localnet requires second node after upgrading to SDK 47 #17078

sampocs commented Jul 20, 2023

alexanderbez commented Jul 20, 2023

sampocs commented Jul 20, 2023 •

edited

Loading

tac0turtle commented Feb 20, 2024

[Bug]: Mainnet state exported localnet requires second node after upgrading to SDK 47 #17078

[Bug]: Mainnet state exported localnet requires second node after upgrading to SDK 47 #17078

Comments

sampocs commented Jul 20, 2023

Context

Summary of Bug

Version

Steps to Reproduce

alexanderbez commented Jul 20, 2023

sampocs commented Jul 20, 2023 • edited Loading

tac0turtle commented Feb 20, 2024

sampocs commented Jul 20, 2023 •

edited

Loading