Skip to content

Monitoring and alerting overview

Cintia Del Rio edited this page Jan 28, 2025 · 1 revision

Our main internal monitoring is based on Datadog. This will have all the details when machines are down, running out of disk, memory or CPU. It will be sent by infrastructure email.

Pingdom have HTTP checks on public services. Pingdom will also create tickets in helpdesk.

We do have accounts in pageduty, which will alert whoever is on-call. Pageduty can be triggered by critical tickets on helpdesk, pingdom alerts.

We also have a dashboard with the status of our infrastructure, with data coming from pingdom.

Clone this wiki locally