-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
CICD: Runs a full GPU install on an EC2 instance (#157)
Pretty big (but useful!) change. Adds a GHA workflow step that in a fully automatic way: - Creates a new g4 EC2 instance - Installs EE on it. (Installs K3s, and installs our YAMLs.) - Checks that the EE install script runs successfully - Checks that the k8 deployments come up - Checks that the SDK can do minimal things through the EE (whoami, list-detectors) - (Does not tears down the EC2 infra - there's a sweeper which will run every 30 minutes and do that async so the pipeline doesn't have to wait for it, because it takes ~7 minutes to turn off a G4 instance.) It uses pulumi for infra, and relies on pulumi infra defined in the GL_Public AWS account. It does NOT yet: - Check that any inference models work. (Note: there is replication of #169 which improves the workflow YAML and validation thereof directly.) (Note: this relies on resources defined in the `gl_public` account defined in the internal `ci-infra` repo.) --------- Co-authored-by: Ubuntu <[email protected]> Co-authored-by: Ubuntu <robotrapta@groundlight>
- Loading branch information
1 parent
063e8d3
commit 1695c3f
Showing
16 changed files
with
1,724 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,3 +12,4 @@ rules: | |
comments: disable | ||
trailing-spaces: disable | ||
empty-lines: disable | ||
new-line-at-end-of-file: disable |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
name: sweeper-eeut | ||
# This workflow tears down old EEUT stacks from pulumi. | ||
# We do this as a background sweeper job, because the teardown is VERY slow (~7 minutes for a g4) | ||
# and we don't want to slow down the main pipeline for that. | ||
on: | ||
schedule: | ||
- cron: '*/15 * * * *' # Every 15 minutes | ||
# Note cron workflows only run from the main branch. | ||
push: | ||
branches: | ||
# If you're working on this stuff, name your branch e2e-something and this will run. | ||
- e2e* | ||
concurrency: | ||
group: sweeper-eeut | ||
env: | ||
PYTHON_VERSION: "3.11" | ||
|
||
jobs: | ||
destroy-expired-eeut-stacks: | ||
#runs-on: ubuntu-22.04 # preferably | ||
# Currently running on self-hosted because something is wrong with the AWS perms on the GH runners. | ||
runs-on: self-hosted | ||
env: | ||
PULUMI_ACCESS_TOKEN: ${{ secrets.PULUMI_CICD_PAT }} | ||
defaults: | ||
run: | ||
working-directory: cicd/pulumi | ||
steps: | ||
- name: Check out code | ||
uses: actions/checkout@v3 | ||
|
||
- name: Set AWS credentials | ||
uses: aws-actions/configure-aws-credentials@v2 | ||
with: | ||
aws-region: us-west-2 | ||
# TODO: move these back to GH-provided secrets | ||
# Currently using IAM roles on the self-hosted runner instance. | ||
#aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }} | ||
#aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }} | ||
#aws-session-token: ${{ secrets.AWS_SESSION_TOKEN }} | ||
|
||
- name: Set up Python | ||
uses: actions/setup-python@v4 | ||
with: | ||
python-version: ${{ env.PYTHON_VERSION }} | ||
|
||
- name: Install Pulumi | ||
run: | | ||
curl -fsSL https://get.pulumi.com | sh | ||
export HOME=$(eval echo ~$(whoami)) | ||
echo "$HOME/.pulumi/bin" >> $GITHUB_PATH | ||
- name: Check that pulumi is installed and authenticated | ||
run: | | ||
set -ex | ||
pulumi whoami | ||
- name: Destroy old EEUT stacks | ||
working-directory: cicd/pulumi | ||
run: | | ||
./sweep-destroy-eeut-stacks.sh |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,102 @@ | ||
#! /bin/bash | ||
# This script is intended to run on a new ubuntu instance to set it up | ||
# Sets up an edge-endpoint environment. | ||
# It is tested in the CICD pipeline to install the edge-endpoint on a new | ||
# g4dn.xlarge EC2 instance with Ubuntu 22.04LTS. | ||
|
||
# As a user-data script on ubuntu, this file probably lands at | ||
# /var/lib/cloud/instance/user-data.txt | ||
echo "Setting up Groundlight Edge Endpoint. Follow along at /var/log/cloud-init-output.log" > /etc/motd | ||
|
||
echo "Starting cloud init. Uptime: $(uptime)" | ||
|
||
# Set up signals about the status of the installation | ||
mkdir -p /opt/groundlight/ee-install-status | ||
touch /opt/groundlight/ee-install-status/installing | ||
SETUP_COMPLETE=0 | ||
record_result() { | ||
if [ "$SETUP_COMPLETE" -eq 0 ]; then | ||
echo "Setup failed at $(date)" | ||
touch /opt/groundlight/ee-install-status/failed | ||
echo "Groundlight Edge Endpoint setup FAILED. See /var/log/cloud-init-output.log for details." > /etc/motd | ||
else | ||
echo "Setup complete at $(date)" | ||
echo "Groundlight Edge Endpoint setup complete. See /var/log/cloud-init-output.log for details." > /etc/motd | ||
touch /opt/groundlight/ee-install-status/success | ||
fi | ||
# Remove "installing" at the end to avoid a race where there is no status | ||
rm -f /opt/groundlight/ee-install-status/installing | ||
} | ||
trap record_result EXIT | ||
|
||
set -e # Exit on error of any command. | ||
|
||
wait_for_apt_lock() { | ||
# We wait for any apt or dpkg processes to finish to avoid lock collisions | ||
# Unattended-upgrades can hold the lock and cause the install to fail | ||
while sudo fuser /var/lib/dpkg/lock-frontend >/dev/null 2>&1; do | ||
echo "Another apt/dpkg process is running. Waiting for it to finish..." | ||
sleep 5 | ||
done | ||
} | ||
|
||
# Install basic tools | ||
wait_for_apt_lock | ||
sudo apt update | ||
wait_for_apt_lock | ||
sudo apt install -y \ | ||
git \ | ||
vim \ | ||
tmux \ | ||
htop \ | ||
curl \ | ||
wget \ | ||
tree \ | ||
bash-completion \ | ||
ffmpeg | ||
|
||
# Download the edge-endpoint code | ||
CODE_BASE=/opt/groundlight/src/ | ||
mkdir -p ${CODE_BASE} | ||
cd ${CODE_BASE} | ||
git clone https://github.com/groundlight/edge-endpoint | ||
cd edge-endpoint/ | ||
# The launching script should update this to a specific commit. | ||
SPECIFIC_COMMIT="__EE_COMMIT_HASH__" | ||
if [ -n "$SPECIFIC_COMMIT" ]; then | ||
# See if the string got substituted. Note can't compare to the whole thing | ||
# because that would be substituted too! | ||
if [ "${SPECIFIC_COMMIT:0:10}" != "__EE_COMMIT" ]; then | ||
echo "Checking out commit ${SPECIFIC_COMMIT}" | ||
git checkout ${SPECIFIC_COMMIT} | ||
else | ||
echo "It appears the commit hash was not substituted. Staying on main." | ||
fi | ||
else | ||
echo "A blank commit hash was provided. Staying on main." | ||
fi | ||
|
||
# Set up k3s with GPU support | ||
./deploy/bin/install-k3s-nvidia.sh | ||
|
||
# Set up some shell niceties | ||
TARGET_USER="ubuntu" | ||
echo "alias k='kubectl'" >> /home/${TARGET_USER}/.bashrc | ||
echo "source <(kubectl completion bash)" >> /home/${TARGET_USER}/.bashrc | ||
echo "complete -F __start_kubectl k" >> /home/${TARGET_USER}/.bashrc | ||
echo "set -o vi" >> /home/${TARGET_USER}/.bashrc | ||
|
||
# Configure the edge-endpoint with environment variables | ||
export DEPLOYMENT_NAMESPACE="gl-edge" | ||
export INFERENCE_FLAVOR="GPU" | ||
export GROUNDLIGHT_API_TOKEN="api_token_not_set" | ||
|
||
# Install the edge-endpoint | ||
kubectl create namespace gl-edge | ||
kubectl config set-context edge --namespace=gl-edge --cluster=default --user=default | ||
kubectl config use-context edge | ||
./deploy/bin/setup-ee.sh | ||
|
||
# Indicate that setup is complete | ||
SETUP_COMPLETE=1 | ||
echo "EE is installed into kubernetes, which will attempt to finish the setup." |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
echo "This is a uv project. Remember to 'uv run ...' everything" | ||
uv sync | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
|
||
*.pyc | ||
venv/ | ||
.venv/ | ||
__pycache__/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
name: ee-cicd | ||
runtime: | ||
name: python | ||
options: | ||
toolchain: uv | ||
description: CI/CD for Edge Endpoint | ||
config: | ||
ee-cicd:instanceType: g4dn.xlarge | ||
# Default to "main" so things are sensible if this doesn't get customized. | ||
# But for testing purposes, this should be set to the specific commit you want to test. | ||
ee-cicd:targetCommit: main |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
# Pulumi automation | ||
|
||
Pulumi automation to build an EE from scratch in EC2 and run basic integration tests. | ||
|
||
|
Oops, something went wrong.