Monitoring and alerting overview

Our main internal monitoring is based on Datadog. This will have all the details when machines are down, running out of disk, memory or CPU. It will be sent by infrastructure email.

Pingdom have HTTP checks on public services. Pingdom will also create tickets in helpdesk.

We do have accounts in pageduty, which will alert whoever is on-call. Pageduty can be triggered by critical tickets on helpdesk, pingdom alerts.

We also have a dashboard with the status of our infrastructure, with data coming from pingdom.

Read this before updating this wiki.

Home

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Monitoring and alerting overview

Clone this wiki locally