You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have now several times run into issues with upstream dependencies (socketlabs for emails, cinder for abuse reports) where the service is down for some period of time and we only find out about it indirectly and after some period of time.
What did you expect to happen?
When a critical upstream dependency is down, we should be alerted promptly, probably via slack in our production channel.
Idea: We can utilize an existing path for sending slack notifications by adding a scheduled github action workflow that pings our monitors.json endpoint and if any service is "state" == false, then we ping in slack.
Idea: we could probably integrate this with pager duty somehow, but we don't really use pagerduty in AMO (yet) and even then we cannot slack directly from pagerduty because our prod channel is private.
Idea: we could use an amo controlled cron job, but this would require some way to ping slack directly from AMO... might not be a bad thing to have but still more work than the first idea.
What happened?
We have now several times run into issues with upstream dependencies (socketlabs for emails, cinder for abuse reports) where the service is down for some period of time and we only find out about it indirectly and after some period of time.
What did you expect to happen?
When a critical upstream dependency is down, we should be alerted promptly, probably via slack in our production channel.
Is there an existing issue for this?
┆Issue is synchronized with this Jira Task
The text was updated successfully, but these errors were encountered: