Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GitHub Actions health check workflow #23036

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 67 additions & 0 deletions .github/workflows/health_check.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
name: Health Check

on:
# Run the workflow test on push events
push:
# Run the main workflow on workflow_dispatch or schedule
workflow_dispatch:
schedule:
# Every 5 minutes
- cron: '*/5 * * * *'

jobs:
health_check:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
environment: ${{fromJson(github.event_name == 'push' && '["test"]' || '["dev","stage","prod"]')}}

steps:
- uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
python-version: '3.11'
python-version: '3.12'

For consistency with addons-server itself

cache: 'pip'

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install requests

- name: Run Health Checks
shell: bash
run: |
set -ue

environment="${{ matrix.environment }}"
output_file="out.json"
./scripts/health_check.py --env $environment --verbose --output $output_file

version=$(cat $output_file | jq -r '.version')
monitors=$(cat $output_file | jq -r '.monitors')

echo "Version: $version"
echo "Monitors: $monitors"

if [ "$version" = "null" ] || [ "$monitors" = "null" ]; then
echo "Environment $environment is not reachable"
exit 1
fi

message=""

data=$(echo $monitors | jq -r 'to_entries[] | select(.value.state == false) | .key')
for monitor in $data; do
message="$message\n- $monitor: $(echo $monitors | jq -r ".[\"$monitor\"].status")"
done
Comment on lines +56 to +59
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(goes with my suggestion below to not raise_for_status() too early)

I know this is the first step but we likely want to exit 1 here too if one of the state values is false so that the health check action is recorded as failing


echo "Environment: $environment"
echo "Message:"
echo $message




87 changes: 87 additions & 0 deletions scripts/health_check.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
#!/usr/bin/env python3

import argparse
import json
from enum import Enum

import requests


ENV_ENUM = Enum(
'ENV',
[
('dev', 'https://addons-dev.allizom.org'),
('stage', 'https://addons.allizom.org'),
('prod', 'https://addons.mozilla.org'),
# TODO: maybe we could use the local environmnet here
('test', ''),
Comment on lines +16 to +17
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove all the special test bits here and below and just use:

Suggested change
# TODO: maybe we could use the local environmnet here
('test', ''),
# For local environments hit the nginx container as set in docker-compose.yml
('local', 'http://nginx'),

],
)


class Fetcher:
def __init__(self, env: ENV_ENUM, verbose: bool = False):
self.environment = ENV_ENUM[env]
self.verbose = verbose

def _fetch(self, path: str) -> dict[str, str] | None:
url = f'{self.environment.value}/{path}'
if self.verbose:
print(f'Requesting {url} for {self.environment.name}')

try:
response = requests.get(url)
response.raise_for_status()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We return a 500 error in the monitor view if one or more services is down alongside a valid json response, so if you raise for status, data is not set and that causes your script to raise UnboundLocalError. We want to do the data = response.json() bit first because it can work even if the response status is not 20x.

try:
data = response.json()
except json.JSONDecodeError as e:
if self.verbose:
print(f'Error decoding JSON for {url}: {e}')

except requests.exceptions.HTTPError as e:
if self.verbose:
print(f'Error fetching {url}: {e}')

if self.verbose and data is not None:
print(json.dumps(data, indent=2))

return data

def version(self):
if self.environment.name == 'test':
return {}
return self._fetch('__version__')

def monitors(self):
if self.environment.name == 'test':
return {
'up': {'state': True},
'down': {'state': False, 'status': 'something is wrong'},
}
return self._fetch('services/monitor.json')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We likely want to check __heartbeat__ and services/__heartbeat__ (separately) instead



def main(env: ENV_ENUM, verbose: bool = False, output: str | None = None):
fetcher = Fetcher(env, verbose)

version_data = fetcher.version()
monitors_data = fetcher.monitors()

if output:
with open(output, 'w') as f:
json.dump({'version': version_data, 'monitors': monitors_data}, f, indent=2)
elif monitors_data is not None:
if any(monitor['state'] is False for monitor in monitors_data.values()):
raise ValueError(f'Some monitors are failing {monitors_data}')


if __name__ == '__main__':
args = argparse.ArgumentParser()
args.add_argument(
'--env', type=str, choices=list(ENV_ENUM.__members__.keys()), required=True
)
args.add_argument('--verbose', action='store_true')
args.add_argument('--output', type=str, required=False)
args = args.parse_args()

main(args.env, args.verbose, args.output)
Loading