-
Notifications
You must be signed in to change notification settings - Fork 535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add GitHub Actions health check workflow #23036
base: master
Are you sure you want to change the base?
Conversation
359ed4d
to
14df65d
Compare
Introduces a new GitHub Actions workflow to periodically check service health and version information for specified endpoints. The workflow includes: - Scheduled runs every 5 minutes - Checks version information - Monitors service status - Raises an error if any services are down TMP: test Test it
14df65d
to
311332d
Compare
- name: Set up Python | ||
uses: actions/setup-python@v5 | ||
with: | ||
python-version: '3.11' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
python-version: '3.11' | |
python-version: '3.12' |
For consistency with addons-server itself
|
||
try: | ||
response = requests.get(url) | ||
response.raise_for_status() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We return a 500
error in the monitor view if one or more services is down alongside a valid json response, so if you raise for status, data
is not set and that causes your script to raise UnboundLocalError
. We want to do the data = response.json()
bit first because it can work even if the response status is not 20x.
# TODO: maybe we could use the local environmnet here | ||
('test', ''), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove all the special test
bits here and below and just use:
# TODO: maybe we could use the local environmnet here | |
('test', ''), | |
# For local environments hit the nginx container as set in docker-compose.yml | |
('local', 'http://nginx'), |
'up': {'state': True}, | ||
'down': {'state': False, 'status': 'something is wrong'}, | ||
} | ||
return self._fetch('services/monitor.json') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We likely want to check __heartbeat__
and services/__heartbeat__
(separately) instead
data=$(echo $monitors | jq -r 'to_entries[] | select(.value.state == false) | .key') | ||
for monitor in $data; do | ||
message="$message\n- $monitor: $(echo $monitors | jq -r ".[\"$monitor\"].status")" | ||
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(goes with my suggestion below to not raise_for_status()
too early)
I know this is the first step but we likely want to exit 1
here too if one of the state
values is false
so that the health check action is recorded as failing
Relates to: mozilla/addons#15308
Description
Introduces a new GitHub Actions workflow to periodically check service health and version information for specified endpoints. The workflow includes:
Context
This PR adds the workflow for checking the health of our services/dependencies. A future PR will integrate a slack notification when things are found
Testing
Running the workflow
You can run the workflow manually via bash (example here)
Select "Health Check" and then visit https://github.com/mozilla/addons-server/actions/workflows/health_check.yml to see your run
Running locally
To inspect the logic of the health check script you can run it manually passing in an environment to ping.
Note
You should do this inside a running web container if you don't want to have to install dependencies manually on your host
Warning
Passing no environment should error
Checking for failure
You can manually run the bash script the healthcheck runs and modify the monitors json to test for error state.
false
and add a status message.Checklist
#ISSUENUM
at the top of your PR to an existing open issue in the mozilla/addons repository.