Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics about heartbeat times can be useful #79

Open
object opened this issue Sep 26, 2024 · 3 comments
Open

Metrics about heartbeat times can be useful #79

object opened this issue Sep 26, 2024 · 3 comments
Labels
opentelemetry OTel metrics and tracing support.

Comments

@object
Copy link

object commented Sep 26, 2024

Occasionally we experience stability problems with our internal network, causing Akka hearbeats to use long time, sometimes leading to SBR shutting down the cluster (with log message like this:
("SBR detected instability and will down all nodes: reachability changed 1 times since 35285,0482 ms ago, latest change was 35285,0482 ms ago")

Wouldn't it be useful if Phobos exposed some of the metrics that showed gossip response times?

@Aaronontheweb Aaronontheweb added the opentelemetry OTel metrics and tracing support. label Sep 26, 2024
@Aaronontheweb
Copy link
Member

I think this would be helpful - the only thing that's a tad tricky is measuring this from the outside of the cluster heartbeat actors who measure it. Probably the best way to handle this would be to modify some of the system actors to emit events when things are happening, that way measurement can occur out of band - this is what we did for tracking the total number of live actors / starts / stops.

@Aaronontheweb
Copy link
Member

FYI @object , added a proposal for this here akkadotnet/akka.net#7427

@object
Copy link
Author

object commented Dec 20, 2024

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
opentelemetry OTel metrics and tracing support.
Projects
None yet
Development

No branches or pull requests

2 participants