Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harvester Stopped Reporting #114

Open
Jahorse opened this issue Oct 21, 2023 · 2 comments
Open

Harvester Stopped Reporting #114

Jahorse opened this issue Oct 21, 2023 · 2 comments

Comments

@Jahorse
Copy link

Jahorse commented Oct 21, 2023

I'm using chia-exporter v0.10.0 running as a service on Ubuntu 22.04.3 with Chia v2.0.0.

My chia exporter and my harvester both seem to be running fine, but the harvester metrics the exporter is reporting haven't updated in a week. It looks like there was a timeout a week ago and after that only the chia_full_node was reporting, but this machine is only running a harvester... I see this in the log:

Oct 14 16:35:42 ... level=info msg="recv: chia_harvester farming_info\n"
Oct 14 16:35:45 ... level=info msg="cron: chia_full_node updating file sizes"
Oct 14 16:36:02 ... level=info msg="cron: chia_full_node updating file sizes"
Oct 14 16:36:02 ... write tcp 127.0.0.1:60624->127.0.0.1:55400: i/o timeout
Oct 14 16:36:02 ... level=error msg="write tcp 127.0.0.1:60624->127.0.0.1:55400: i/o timeout\n"
Oct 14 16:36:02 ... level=error msg="write tcp 127.0.0.1:60624->127.0.0.1:55400: i/o timeout\n"
Oct 14 16:36:02 ... Trying to reconnect...
Oct 14 16:39:35 ... Reconnected!
Oct 14 16:39:35 ... level=info msg="cron: chia_full_node updating file sizes"

From that point on, the log is just filled with those chia_full_node updating file sizes lines. Before that I saw everything.

Restarting the service fixed the issue, but it kind of sucks that all the counters got reset. Also I would need to be paying really close attention to catch when this happens, the exporter continues to serve metrics, they just don't update.

@cmmarslender
Copy link
Contributor

Thanks for reporting. I think this happens for another person internally as well. I'm not entirely sure what causes it, but I can look and see if its possible to detect when events stop coming in and do something about it.

As far as counter resets go, this is just a quirk of how prometheus counters work. Any metrics here that are counter values you'd typically want to use with rate() or similar functions provided by prometheus that account for counter resets https://prometheus.io/docs/prometheus/latest/querying/functions/#rate

@Jahorse
Copy link
Author

Jahorse commented Oct 21, 2023

Thanks, yeah I'm not too worried about resetting the counters or why it's happening, it would just be nice if it could detect when some service metrics are no longer being received and restart whatever it needs to fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants