-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(#70): disable all non-custom Postgres metrics #79
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The postgres-exporter
container kept exiting with this log:
postgres_exporter: error: unknown long flag '--no-collector.postmaster', try --help
I must have had an old version of the postgres-exporter image because everything worked as expected after I ran docker pull prometheuscommunity/postgres-exporter:latest
Let's keep an eye on logs after this gets merged and deployed because I have a feeling we might encounter this on the server as well 😅
@jkuester Just a couple of thoughts about this PR:
|
The difference is big thank you for this improvement. |
Thanks for the feedback @lorerod!
Good call! I have logged medic/cht-docs#1143 to add documentation about the supported Postgres version. For the postgres-exporter things are a bit more tricky since we currently are not really tracking the version for the exporter (but just always testing against
This is a good point. Currently we do not have any automated tests that verify the postgres-exporter functionality. However, if/when those get added it would be valuable to run those tests against both the min version of Postgres and the latest. For now, it is possible to manually test connecting Watchdog to the min version of Postgres by following these steps: #59 (review)
Also a good thought (though probably not as high on the priority list yet). The good news is that the Allies Watchdog instance is configured to run against a large number of CHT instances as well as a large Postgres dataset. So, it provides a good manual check (it is how we identified this issue to begin with...)! |
🎉 This PR is included in version 1.8.2 🎉 The release is available on GitHub release Your semantic-release bot 📦🚀 |
Closes #70
The goal of this PR is to minimize the number of metrics collected from Postgres. This will address 2 problems at the same time! First, the random log errors and other various querying issues when running against crazy old Postgres versions are being triggered when collecting the various default metrics that the postgres_exporter collects about the Postgres instance and its databases. Turning off these metrics should avoid the errors and quiet the logs. Secondly, when testing this against the prod Postgres instance, we observed that metric scraping was insanely slow (well above 10sec). Radically reducing the amount of metrics we are actually collecting from Postgres should help improve the scrape time!
Note that none of the metrics that are being removed were being used in any dashboards or alerts. They were all simply monitoring data that was focused on the Postgres instance itself (and not directly connected to monitoring the health of the CHT instance). Some of these metrics might be valuable to track in the future (to help understand the health of the Postgres instance), but they can be re-enabled when they are needed. Currently, when running against the fake instance, the postgres_exporter scrape returns
1153
lines of data in its response to each scrape. After these changes, the scrape response is only20
lines (containing just our custom metrics and a few metrics directly connected to the performance of the exporter itself).Regarding the config changes, it may look redundant to disable all the collectors individually in addition to setting
--disable-default-metrics
and--disable-settings-metrics
. The main thing I can say here is that I tested various combinations and this was the way that ultimately resulted in the least amount of metrics being collected. Based on this comment I believe the postgres_exporter is being gradually being re-factored to use these "collectors" so that going forwards thedisable-*-metrics
flags will be removed, but for now thedisable-*-metrics
flags seem to disable the metrics that have not yet been refactored, while theno-collector
flags disable to new-style metrics. 🤷Testing considerations
The main thing to verify after these changes is that our custom
couch2pg_progress_sequence
metric is still being collected. This can be done by running Watchdog with the fake-cht configuration:CHT Admin Overview
dashboardCouch2Pg Backlog
panel (the value is dynamic, but it should not beNo data
)Additionally, if you want to manually check the response data from the postgres_exporter (e.g. to do a before/after comparison) here is how:
docker exec -it cht-watchdog-fake-cht-1 bash
curl "http://postgres-exporter:9187/probe?auth_module=postgres%3A5432%2Fcht&target=postgresql%3A%2F%2Fpostgres%3A5432%2Fcht"