Parallel HTTP health monitoring using HEAD requests for large scale website monitoring.
The service relies on confirmation from external servers to verify that sites are indeed offline. This mitigates the Internet weather issue sometimes giving false positives. The code for these servers can be found in the verifliers directory.
Jetmon will periodically (every 5 minutes) loop over a list of Jetpack sites and perform a HEAD request to check their current status.
When a status change is detected, Jetmon will notify WPCOM including the related notification data in the request.
Here are the possible flows, depending on the status change:
Previous Status | Current status | Action |
---|---|---|
DOWN | UP | Notify WPCOM about status change |
UP | DOWN | Verify status down via the Veriflier services and notify WPCOM about status change |
DOWN | DOWN (confirmed) | Notify WPCOM about status change |
The Jetmon master service is responsible for communicating with the database in order to fetch a list of sites to check. It will spawn and re-allocate workers every five seconds and update stats repeatedly based on STATS_UPDATE_INTERVAL_MS
.
The jetmon-workers internally use an Node Addon written in C++ to check the connection by sending a HEAD request to the server.
The Veriflier service, which is written in C++ and uses the QT Framework, does something similar to the Node Addon mentioned before, but lives in its own server. Note that the production environment consists of multiple Verifliers, though the local development environment consists of a single Veriflier service.
Here are the current notification data, Jetmon sends to WPCOM upon detecting a site status change:
blog_id
: The site's WPCOM IDmonitor_url
: The URL Jetmon checkedstatus_id
: The site's current status. Enum:0
is status down,1
is status running and2
status confirmed down.last_check
: The datetime of the last checklast_status_change
: The datetime of the last status changechecks
: An array of the checks results from both Jetmon and Veriflier services. Each entry consists of:type
: Enum:1
refers to a Jetmon check, while2
to a Veriflier check.host
: The server hostname.status
: The site's current status. Enum:0
is status down,1
is status running and2
status confirmed down.rtt
: Round-trip time (RTT) in milliseconds (ms).code
: The HTTP response status code.
-
Make sure you have installed Docker and docker-compose
-
Clone the Jetmon monorepo
-
Copy the environment variables file from within the
docker
folder:cp jetmon/docker/.env-sample jetmon/docker/.env
-
Open
jetmon/docker/.env
and make any modifications you'd like. -
Run
docker compose build
from within thedocker
folder
The Jetmon configuration lives under config/config.json
. This file is generated on the fly, if not present, each time you run the Jetmon service, using the config-sample.json
and the corresponding environment variables defined in docker/.env
.
Feel free to modify your local config file as needed.
The Veriflier configuration lives under veriflier/config/veriflier.json
. This file is generated on the fly, if not present, each time you run the Veriflier service, using the veriflier-sample.json
and the corresponding environment variables defined in docker/.env
.
Run docker compose up -d
from within the docker
folder.
Main Table Schema:
CREATE TABLE `jetpack_monitor_sites` (
`jetpack_monitor_site_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT PRIMARY KEY,
`blog_id` bigint(20) unsigned NOT NULL,
`bucket_no` smallint(2) unsigned NOT NULL,
`monitor_url` varchar(300) NOT NULL,
`monitor_active` tinyint(1) unsigned NOT NULL DEFAULT 1,
`site_status` tinyint(1) unsigned NOT NULL DEFAULT 1,
`last_status_change` timestamp NULL DEFAULT current_timestamp(),
`check_interval` tinyint(1) unsigned NOT NULL DEFAULT 5,
INDEX `blog_id_monitor_url` (`blog_id`, `monitor_url`),
INDEX `bucket_no_monitor_active_check_interval` (`bucket_no`, `monitor_active`, `check_interval`)
);