You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
At some point recently, telemetry.polkadot.io went downwith lots of errors like:
2022-09-30 10:33:26,536 ERROR [telemetry_core::aggregator::inner_loop] Cannot find ID for node with shard/connectionId of ConnId(5)/ShardNodeId(174701)
2022-09-30 10:33:26,538 ERROR [telemetry_core::aggregator::inner_loop] Cannot find ID for node with shard/connectionId of ConnId(1)/ShardNodeId(217267)
2022-09-30 10:33:26,905 ERROR [telemetry_core::aggregator::inner_loop] Cannot find ID for node with shard/connectionId of ConnId(1)/ShardNodeId(217346)
2022-09-30 10:33:27,001 ERROR [telemetry_core::aggregator::inner_loop] Cannot find ID for node with shard/connectionId of ConnId(5)/ShardNodeId(174702)
2022-09-30 10:33:27,001 ERROR [telemetry_core::aggregator::inner_loop] Cannot find ID for node with shard/connectionId of ConnId(5)/ShardNodeId(174702)
2022-09-30 10:33:27,070 ERROR [telemetry_core::aggregator::inner_loop] Cannot find ID for node with shard/connectionId of ConnId(2)/ShardNodeId(217363)
2022-09-30 10:33:27,070 ERROR [telemetry_core::aggregator::inner_loop] Cannot find ID for node with shard/connectionId of ConnId(2)/ShardNodeId(217363)
2022-09-30 10:33:27,202 ERROR [telemetry_core::aggregator::inner_loop] Cannot find ID for node with shard/connectionId of ConnId(2)/ShardNodeId(217364)
2022-09-30 10:33:27,204 ERROR [telemetry_core::aggregator::inner_loop] Cannot find ID for node with shard/connectionId of ConnId(2)/ShardNodeId(217364)
2022-09-30 10:33:27,834 ERROR [telemetry_core::aggregator::inner_loop] Cannot find ID for node with shard/connectionId of ConnId(5)/ShardNodeId(174703)
2022-09-30 10:33:27,834 ERROR [telemetry_core::aggregator::inner_loop] Cannot find ID for node with shard/connectionId of ConnId(5)/ShardNodeId(174703)
2022-09-30 10:33:28,577 ERROR [telemetry_core::aggregator::inner_loop] Cannot find ID for node with shard/connectionId of ConnId(5)/ShardNodeId(174704)
2022-09-30 10:33:28,577 ERROR [telemetry_core::aggregator::inner_loop] Cannot find ID for node with shard/connectionId of ConnId(5)/ShardNodeId(174704)
2022-09-30 10:33:28,680 ERROR [telemetry_core::aggregator::inner_loop] Cannot find ID for node with shard/connectionId of ConnId(3)/ShardNodeId(217030)
2022-09-30 10:33:29,421 ERROR [telemetry_core::aggregator::inner_loop] Cannot find ID for node with shard/connectionId of ConnId(3)/ShardNodeId(216564)
2022-09-30 10:33:29,458 ERROR [telemetry_core::aggregator::inner_loop] Cannot find ID for node with shard/connectionId of ConnId(3)/ShardNodeId(217031)
2022-09-30 10:33:29,458 ERROR [telemetry_core::aggregator::inner_loop] Cannot find ID for node with shard/connectionId of ConnId(3)/ShardNodeId(217031)
^C
Restarting the telemetry-core pod didn't help.
Restarting the shards make things work again.
These errors imply that shards were sending information abotu nodes that the core knew nothing about.
Is there a chance that the core was restarted at some point (perhaps due to being out of memory or whatnot) and the shards didn't properly handle this and send new node information?
Alternately, is it possible that the connection between core and shards faultered and the core didn't properly clean up its internal state when this happened? (Right offhand I can't see anything that would drop all of the nodes in the core when a shard connection was lost).
The latter is also something that's a little harder to test locally (we'll have tested restarting shards and core plenty). Perhaps #497 also arose as a result of some conneciton issue like this that led to duplicates not being cleaned up?
The text was updated successfully, but these errors were encountered:
At some point recently, telemetry.polkadot.io went downwith lots of errors like:
Restarting the telemetry-core pod didn't help.
Restarting the shards make things work again.
These errors imply that shards were sending information abotu nodes that the core knew nothing about.
Is there a chance that the core was restarted at some point (perhaps due to being out of memory or whatnot) and the shards didn't properly handle this and send new node information?
Alternately, is it possible that the connection between core and shards faultered and the core didn't properly clean up its internal state when this happened? (Right offhand I can't see anything that would drop all of the nodes in the core when a shard connection was lost).
The latter is also something that's a little harder to test locally (we'll have tested restarting shards and core plenty). Perhaps #497 also arose as a result of some conneciton issue like this that led to duplicates not being cleaned up?
The text was updated successfully, but these errors were encountered: