feat: add cht-sync collector and update the sql exporter example #111

njuguna-n · 2024-08-28T09:04:56Z

Add a collector for CHT Sync

njuguna-n · 2024-09-04T14:36:29Z

@mrjones-plip please have another look now that the docs PR is merged.

mrjones-plip · 2024-09-04T21:22:09Z

I went to go test and started by setting up all inclusive CHT Sync and hit a snag.

however, now I realize I don't have a good way of putting data into that stand alone couchdb instance, so maybe I'll try using an existing docker helper instance to test.

stay tuned!

mrjones-plip

While I totally think this should work - I can't get to to work following the new docs we just published :/

I keep on getting the error in the cht-watchdog-sql_exporter-1 service:

ts=2024-09-05T21:14:26.875Z caller=klog.go:157 level=error func=Fatalf msg="Error creating exporter: unknown collector \"cht-sync\" referenced in job \"db_targets\""

Checking, this looks right as the collectors here matches where we define the collector.

renaming the collector file to exporters/postgres/cht-sync_collector.yml didn't help, and changing everything to either cht_sync or chtsync didn't fix it either 🤷

Happy to pair with you if wanna see what I see!

njuguna-n · 2024-09-23T14:45:54Z

@mrjones-plip I needed to update collector_files as well. Give it another try now.

mrjones-plip · 2024-10-01T22:31:51Z

exporters/postgres/sql_servers_example.yml

@@ -29,4 +29,3 @@ jobs:
          # sql server (eg "postgres-rds-prod", "postgres-rds-dev1" etc.)
          #     'postgres://USERNAME:PASSWORD@DB_SERVER_IP/DATABASE:PORT
          "db1": 'postgres://cht_couch2pg:[email protected]:5432/cht?sslmode=disable' # //NOSONAR - password is safe to commit


Is this ever going to be cht database? I think, for the near term, it's always going to be data yeah? I got this error initially:

ts=2024-10-01T22:28:33.507Z caller=klog.go:134 level=error func=Errorf msg="Error gathering metrics: [from Gatherer #1] [job=db_targets,target=db1] pq: database \"cht\" does not exist"

So I think this is the fix. Also, since we default to postgres as the user, I think we should use that here too, yeah? Finally - I think we can further demote couch2pg by just using password:

Suggested change

"db1": 'postgres://cht_couch2pg:cht_couch2pg_password@172.17.0.1:5432/cht?sslmode=disable' # //NOSONAR - password is safe to commit

"db1": 'postgres://postgres:password@172.17.0.1:5432/code?sslmode=disable' # //NOSONAR - password is safe to commit

mrjones-plip

thanks for circling back on this!

I gave suggestion to ease how much the user has to work (what is the DB name? cht? data ? other?), but also am getting this error now:

ts=2024-10-01T22:29:22.502Z caller=klog.go:134 level=error func=Errorf msg="Error gathering metrics: [from Gatherer #1] [job=db_targets,target=db1,collector=cht-sync,query=couch2pg-query] pq: syntax error at or near \"{\""

Steps to reproduce were:

set up docker helper cht core instance
set up bare bones watchdog to monitor that instance
set up cht sync to monitor docker helper instance
follow the steps to update watchdog to monitor CHT sync

edit sql_servers.yml to have this targets::

"db1": 'postgres://postgres:[email protected]:5432/data?sslmode=disable'

run compose command: docker compose -f docker-compose.yml -f exporters/postgres/compose.yml up -d
check logs with docker logs cht-watchdog-sql_exporter-1

njuguna-n · 2024-10-09T10:41:36Z

@mrjones-plip the error is being caused by how we define the queries with environment variables. I have tried to update it into something like ${POSTGRES_SCHEMA} and then using the envsubst command to replace the variables but updating the run command to command: ["/bin/sh", "-c", "envsubst < /etc/sql_exporter/cht_sync_collector.yml > /tmp/sql_exporter/cht_sync_collector.yml && sql_exporter --config /etc/sql_exporter/cht_sync_collector.yml"] and different variations of this command was giving me errors.

A simple solution is to hard code the default variables as they are defined in CHT Sync and update the documentation to reflect that these should be changed if they are different. What do you think of this approach?

mrjones-plip · 2024-10-15T19:31:11Z

Sorry this took so long to get back to! I didn't even read it and thought I'd need to set up my who test suite - it was just a question 😭 - I'll do better next time!

So I see that we have hard coded variables (or simply no database/schema names?) in the dbt-latency query. As well, the old couch2pg query doesn't specify schema. Can we just not specify it and have it implicitly use the right one based off the target definition? I suspect my naïveté around postgres specifics is showing here 😅

If we must hard code a deeply linked v1 schema in the sql_servers_example.yml file, ultimately, I think this is fine. I suspect this won't change that often (<1/yr) and we can handle migrations from one schema to another as needed.

njuguna-n · 2024-10-16T10:53:29Z

Can we just not specify it and have it implicitly use the right one based off the target definition?

Failing to specify the schema would default the queries to running in the public schema which would almost certainly be wrong. I would rather we have the hard-coded schema but make sure we specify in the docs that it should be changed should it be different.

njuguna-n · 2024-10-16T11:11:24Z

@mrjones-plip I have added the hard-coded schema and a comment stating the values should be updated as needed. Please review.

mrjones-plip · 2024-10-16T15:10:07Z

sounds good! I'll get to this review today for sure.

mrjones-plip · 2024-10-16T21:42:22Z

oh no! I might not get to this 'til tomorrow... I'll keep ya posted.

mrjones-plip

OK!!! We're much closer. Thanks for pushing through on this!

I'm still seeing some errors though - see my deep dive one comment.

mrjones-plip · 2024-10-16T22:30:45Z

exporters/postgres/cht_sync_collector.yml

+        EXTRACT(EPOCH FROM(couchdb.latest - dbt_root.latest)) AS dbt_latency
+      FROM
+        (SELECT MAX(saved_timestamp) as latest FROM v1.document_metadata) dbt_root,
+        (SELECT MAX(saved_timestamp) as latest FROM v1.couchdb) couchdb


I'm getting an error:

ts=2024-10-16T22:22:52.357Z caller=klog.go:134 level=error func=Errorf msg="Error gathering metrics: [from Gatherer #1] [job=db_targets,target=db1,collector=cht-sync,query=dbt-latency] pq: relation \"v1.couchdb\" does not exist" ts=2024-10-16T22:22:53.357Z caller=klog.go:134 level=error func=Errorf msg="Error gathering metrics: [from Gatherer #1] [job=db_targets,target=db1,collector=cht-sync,query=dbt-latency] pq: relation \"v1.couchdb\" does not exist" ts=2024-10-16T22:22:54.356Z caller=klog.go:134 level=error func=Errorf msg="Error gathering metrics: [from Gatherer #1] [job=db_targets,target=db1,collector=cht-sync,query=dbt-latency] pq: relation \"v1.couchdb\" does not exist"

Doing some digging in the actual data in my DB, this might work?

Suggested change

(SELECT MAX(saved_timestamp) as latest FROM v1.couchdb) couchdb

(SELECT max(v1.couchdb_progress.updated_at) as latest FROM v1.couchdb_progress) couchdb

Checking prometheus targets at http://localhost:9090/targets?search= I see http://sql_exporter:9399/metrics marked as UP, and checking that URL I see metrics I'd expect:

# HELP couch2pg_progress_pending approximate number of changes left to sync from couch to postgres # TYPE couch2pg_progress_pending gauge couch2pg_progress_pending{cht_instance="192-168-68-26.local-ip.medicmobile.org",db="medic",job="db_targets",target="db1"} 0 # HELP couch2pg_progress_sequence current sequence number for couch2pg # TYPE couch2pg_progress_sequence counter couch2pg_progress_sequence{cht_instance="192-168-68-26.local-ip.medicmobile.org",db="medic",job="db_targets",target="db1"} 221 # HELP couch2pg_up 1 if couch2pg is running and has updated in the last minute, 0 if not # TYPE couch2pg_up gauge couch2pg_up{cht_instance="192-168-68-26.local-ip.medicmobile.org",db="medic",job="db_targets",target="db1"} 1 # HELP dbt_execution_time dbt run last execution time (ms) # TYPE dbt_execution_time gauge dbt_execution_time{job="db_targets",table_name="contact",target="db1"} 0.09859967231750488 dbt_execution_time{job="db_targets",table_name="contact_type",target="db1"} 0.07459473609924316 dbt_execution_time{job="db_targets",table_name="data_record",target="db1"} 0.10951995849609375 dbt_execution_time{job="db_targets",table_name="dbt_results",target="db1"} 0.23592472076416016 dbt_execution_time{job="db_targets",table_name="document_metadata",target="db1"} 0.2093191146850586 dbt_execution_time{job="db_targets",table_name="patient",target="db1"} 0.0865786075592041 dbt_execution_time{job="db_targets",table_name="person",target="db1"} 0.10456228256225586 dbt_execution_time{job="db_targets",table_name="place",target="db1"} 0.16501379013061523 dbt_execution_time{job="db_targets",table_name="user",target="db1"} 0.0803828239440918 # HELP dbt_latency difference between last timestamp in dbt models and current time (seconds) # TYPE dbt_latency gauge dbt_latency{job="db_targets",target="db1"} 1180.199879 # HELP scrape_duration_seconds How long it took to scrape the target in seconds # TYPE scrape_duration_seconds gauge scrape_duration_seconds{job="db_targets",target="db1"} 0.000738378 # HELP up 1 if the target is reachable, or 0 if the scrape failed # TYPE up gauge up{job="db_targets",target="db1"} 1

that all said, my couch2pg backlog panel in Grafana shows no data:

@mrjones-plip I suspect the first error you got is because your POSTGRES_TABLE env variable was set to medic but that was updated to couchdb in this commit. I have managed to replicate the No data issue but I don't know why since the SQL query works as expected. I have booked a 30 minute session in your calendar to go through this together and hopefully get this over the line.

mrjones-plip

Thanks thanks THANKS! for the big push to see this through - really good work here.

we got it all working and added the JSON to the PR

approved !

njuguna-n · 2024-10-18T08:17:25Z

I really appreciate the time you made to review this in a call. Your attention to detail and making sure it works as expected led us to uncover a few issues that we can fix to make this better.

medic-ci · 2024-10-18T08:18:29Z

🎉 This PR is included in version 1.17.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

njuguna-n added 2 commits August 28, 2024 11:26

feat: add cht-sync collector and update the sql exporter example

bf74bf9

chore: set cht-sync as the default collector

6e82ec7

njuguna-n requested a review from mrjones-plip August 28, 2024 09:06

njuguna-n marked this pull request as ready for review August 28, 2024 09:06

mrjones-plip mentioned this pull request Aug 28, 2024

docs: add instructions for connecting CHT Watchdog to CHT Sync medic/cht-docs#1506

Merged

mrjones-plip requested changes Sep 5, 2024

View reviewed changes

fix: upadte collector file

32c4903

njuguna-n requested a review from mrjones-plip September 23, 2024 14:45

mrjones-plip reviewed Oct 1, 2024

View reviewed changes

mrjones-plip requested changes Oct 1, 2024

View reviewed changes

andrablaj assigned njuguna-n Oct 8, 2024

chore: update default postgres connection string

c095898

feat: add hard coded schema and table name

caffe4e

mrjones-plip requested changes Oct 16, 2024

View reviewed changes

update grafana json to have new cht-sync widget

bf0e215

mrjones-plip approved these changes Oct 17, 2024

View reviewed changes

njuguna-n merged commit b31f6b0 into main Oct 18, 2024
4 checks passed

njuguna-n deleted the add-cht-sync-collector branch October 18, 2024 08:17

medic-ci added the released label Oct 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add cht-sync collector and update the sql exporter example #111

feat: add cht-sync collector and update the sql exporter example #111

njuguna-n commented Aug 28, 2024

njuguna-n commented Sep 4, 2024

mrjones-plip commented Sep 4, 2024

mrjones-plip left a comment

njuguna-n commented Sep 23, 2024

mrjones-plip Oct 1, 2024

mrjones-plip left a comment

njuguna-n commented Oct 9, 2024

mrjones-plip commented Oct 15, 2024

njuguna-n commented Oct 16, 2024

njuguna-n commented Oct 16, 2024

mrjones-plip commented Oct 16, 2024

mrjones-plip commented Oct 16, 2024

mrjones-plip left a comment

mrjones-plip Oct 16, 2024

njuguna-n Oct 17, 2024

mrjones-plip left a comment

njuguna-n commented Oct 18, 2024

medic-ci commented Oct 18, 2024

	"db1": 'postgres://cht_couch2pg:cht_couch2pg_password@172.17.0.1:5432/cht?sslmode=disable' # //NOSONAR - password is safe to commit
	"db1": 'postgres://postgres:password@172.17.0.1:5432/code?sslmode=disable' # //NOSONAR - password is safe to commit

	(SELECT MAX(saved_timestamp) as latest FROM v1.couchdb) couchdb
	(SELECT max(v1.couchdb_progress.updated_at) as latest FROM v1.couchdb_progress) couchdb

feat: add cht-sync collector and update the sql exporter example #111

feat: add cht-sync collector and update the sql exporter example #111

Conversation

njuguna-n commented Aug 28, 2024

njuguna-n commented Sep 4, 2024

mrjones-plip commented Sep 4, 2024

mrjones-plip left a comment

Choose a reason for hiding this comment

njuguna-n commented Sep 23, 2024

mrjones-plip Oct 1, 2024

Choose a reason for hiding this comment

mrjones-plip left a comment

Choose a reason for hiding this comment

njuguna-n commented Oct 9, 2024

mrjones-plip commented Oct 15, 2024

njuguna-n commented Oct 16, 2024

njuguna-n commented Oct 16, 2024

mrjones-plip commented Oct 16, 2024

mrjones-plip commented Oct 16, 2024

mrjones-plip left a comment

Choose a reason for hiding this comment

mrjones-plip Oct 16, 2024

Choose a reason for hiding this comment

njuguna-n Oct 17, 2024

Choose a reason for hiding this comment

mrjones-plip left a comment

Choose a reason for hiding this comment

njuguna-n commented Oct 18, 2024

medic-ci commented Oct 18, 2024