You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Occasionally we experience stability problems with our internal network, causing Akka hearbeats to use long time, sometimes leading to SBR shutting down the cluster (with log message like this:
("SBR detected instability and will down all nodes: reachability changed 1 times since 35285,0482 ms ago, latest change was 35285,0482 ms ago")
Wouldn't it be useful if Phobos exposed some of the metrics that showed gossip response times?
The text was updated successfully, but these errors were encountered:
I think this would be helpful - the only thing that's a tad tricky is measuring this from the outside of the cluster heartbeat actors who measure it. Probably the best way to handle this would be to modify some of the system actors to emit events when things are happening, that way measurement can occur out of band - this is what we did for tracking the total number of live actors / starts / stops.
Occasionally we experience stability problems with our internal network, causing Akka hearbeats to use long time, sometimes leading to SBR shutting down the cluster (with log message like this:
("SBR detected instability and will down all nodes: reachability changed 1 times since 35285,0482 ms ago, latest change was 35285,0482 ms ago")
Wouldn't it be useful if Phobos exposed some of the metrics that showed gossip response times?
The text was updated successfully, but these errors were encountered: