You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For each Stride upgrade, the upgrade is tested on a mainnet-state-exported local network in docker (docs). Osmosis does the same (which is where localstride is borrowed from), and I assume there are many other chains that follow a similar testing process.
Summary of Bug
When testing the upgrade from SDK 46 to 47, the upgrade passed successfully, but then the network was halted immediately after. The solution wound up being that we needed to start up a 2nd node and peer it together with the first, which was able to jump start the first node. After blocks started churning again, we were able to turn off the 2nd node.
Considering we've run this mainnet-state-exported upgrade process on all prior upgrades without seeing this issue, I'm led to believe it's something related to SDK 47.
Version
Upgrading from v0.46.7 to v0.47.3
Steps to Reproduce
The steps to reproduce are a bit complex. It involves following this guide, and starting with stride binary version v9.2.1 and upgrading to version v10.0.0.
I'm mostly posting this for awareness to other teams that test their upgrades with mainnet state exported testnets. That said, if you would like to debug this, I'm happy to hop on a call and walk through the setup to reproduce!
The text was updated successfully, but these errors were encountered:
Hi @sampocs, thanks for posting! Curious, what makes you think a 2nd node was needed? Obviously it worked after the 2nd node was started, but indicated that this was necessary?
What were the logs from the 1st node? Was it stuck? Was it trying to produce a block?
Unfortunately I don't have the logs handy anymore 😞. But it was producing endless p2p logs (which was the first hint).
The logs did show that the upgrade was successful and I added logs to the begin/end blocker that showed it completed the block that corresponded to the upgrade height. So this gave me the hunch that the issue was not related to any specific upgrade handler code and was likely a networking issue (this was later confirmed as this upgrade was successful on mainnet).
Truth be told, the real idea for adding a 2nd node came from an Osmosis engineer who I was debugging this with. Back in the day, he used to always have to add a 2nd node to these mainnet upgrade tests, before he realized that if he could just disable fast sync instead. But in our case, fast sync was already disabled, and adding a 2nd node was more just a last ditch guess since nothing else we had tried could get it to run 😂 . And neither of us have any guess at why this solved it
Context
For each Stride upgrade, the upgrade is tested on a mainnet-state-exported local network in docker (docs). Osmosis does the same (which is where localstride is borrowed from), and I assume there are many other chains that follow a similar testing process.
Summary of Bug
When testing the upgrade from SDK 46 to 47, the upgrade passed successfully, but then the network was halted immediately after. The solution wound up being that we needed to start up a 2nd node and peer it together with the first, which was able to jump start the first node. After blocks started churning again, we were able to turn off the 2nd node.
Considering we've run this mainnet-state-exported upgrade process on all prior upgrades without seeing this issue, I'm led to believe it's something related to SDK 47.
Version
Upgrading from
v0.46.7
tov0.47.3
Steps to Reproduce
The steps to reproduce are a bit complex. It involves following this guide, and starting with stride binary version v9.2.1 and upgrading to version v10.0.0.
I'm mostly posting this for awareness to other teams that test their upgrades with mainnet state exported testnets. That said, if you would like to debug this, I'm happy to hop on a call and walk through the setup to reproduce!
The text was updated successfully, but these errors were encountered: