Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add PrometheusRules for DataHub #125

Closed
wants to merge 2 commits into from
Closed

Conversation

tom-webber
Copy link
Contributor

@tom-webber tom-webber commented May 16, 2024

Add PrometheusRule alerts for the DataHub namespaces in Cloud Platform, pointing at the aws resources (rds, opensearch), resource usage metrics (for the datahub-gms pod), deployment metrics, pod status metrics (out of memory, crashloop backoff, frequent restarts), ingress metrics (modsecurity blocking events, servicing error responses).

Opensearch metrics may not be suffificient, as during a recent bottleneck event, Opensearch was unresponsive but Prometheus metrics were absent during the period of downtime.

loop over prometheus alert yaml files for apply
export `opensearch_domain` and `rds_domain` for populating env vars in yaml files
@tom-webber tom-webber linked an issue May 17, 2024 that may be closed by this pull request
7 tasks
@tom-webber
Copy link
Contributor Author

no longer necessary - prometheusrule is now being deployed via cloud-platform

@tom-webber tom-webber closed this May 21, 2024
@tom-webber tom-webber deleted the add-cp-prom-alerts branch May 21, 2024 12:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Set up Cloud Platform Alerting for DataHub
1 participant