-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CICD: Runs a full GPU install on an EC2 instance #157
Open
robotrapta
wants to merge
126
commits into
main
Choose a base branch
from
e2e-cicd
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+1,722
−3
Open
Changes from all commits
Commits
Show all changes
126 commits
Select commit
Hold shift + click to select a range
c9cc231
first crack at pulumi automation for cicd
robotrapta 3c4a81c
Merge remote-tracking branch 'origin/main' into e2e-cicd
robotrapta 6fc83cc
Adding e2e test in the main pipeline yaml.
robotrapta 7688c2b
Merge branch 'main' into e2e-cicd
robotrapta ff802b1
Fixing pulumi typo
robotrapta f5c493e
moving test-install-g4 onto self-hosted runnner
robotrapta d18739b
sets default dir
robotrapta dece567
Commenting out pulumi up
robotrapta 4301572
Changing triggers on main pipeline to only include PR's not every push.
robotrapta cda512b
Removing redundant runs-on
robotrapta 344aa8d
Adding check on workflow formatting.
robotrapta 8e24ea2
Adding yamllint config
robotrapta daed2f4
Iterating on yamllint rules.
robotrapta 34f3d29
YAMLlint should be working now.
robotrapta d0cecdb
Tweaking yamllint. Fixing deliberate failure.
robotrapta 68cbc0e
Working on self-hosted runner check.
robotrapta 7aa119b
Check for this specific PR while developing.
robotrapta 59bf1b6
Fixing path on pulumi
robotrapta 2833ae1
faster iteration
robotrapta c2b6b8f
Tweaking pulumi auth & install.
robotrapta bdf03e9
fixing GHA yaml
robotrapta dcf7bd7
Trying to get pulumi on the path.
robotrapta cdb1c97
Trying again to set pulumi in the path.
robotrapta 04c1f9e
path path path
robotrapta 39514bb
Switching pulumi to use uv
robotrapta fbaa838
Iterating on installing uv
robotrapta dfbec55
iterating.
robotrapta 1d5a5c5
tweak
robotrapta a7ba440
uv
robotrapta 005a5db
installing python
robotrapta d7e376f
installing pulumi
robotrapta 5d44aa2
switching to frigging pip
robotrapta d841730
Cleaning out useless uv stuff.
robotrapta e0a4bfa
Getting the names right of the network resources.
robotrapta aa69069
name tag, not name.
robotrapta 2e964ec
Find the firstrun script.
robotrapta f2e2444
Actually stand up the stack!!!
robotrapta fce0874
Adding some automated reporting on setup success/failure.
robotrapta a6ebbad
Using smaller (non-gpu) instance type - maybe faster?
robotrapta 80d09d8
Adding first crack at fabric commands to verify if EEUT is working.
robotrapta 6009351
Adding fab tests, which can't possibly pass yet.
robotrapta 6c2090a
actually gets the private ip of the eeut
24e90cb
Fab can connect to EEUT
c2e1a32
Adding a script to connect to eeut.
b185ba9
rename
b84a265
Activate fab!
robotrapta 5e8071b
Make fab more patient to connect over ssh
robotrapta a5a29d1
Disabling ipv6 in EEUT. Fixing fab call for ee-setup check
robotrapta edfdc1a
More patience waiting for init script to run.
robotrapta f2f4242
Tweaking EEUT install tests.
robotrapta 0a0fcc7
Give the EEUT a public IP.
robotrapta fb2b35c
yamllint is not a workflow.
robotrapta 9c08a38
Switching to g4 for test.
robotrapta f4c355d
Comment on script.
robotrapta 2eaed9e
Merge remote-tracking branch 'origin/main' into e2e-cicd
robotrapta f35cded
Adding workflow to validate workflow yamls.
robotrapta 32dff5f
Taking out the TODO's in the workflows pipeline.
robotrapta 2007c53
Merge remote-tracking branch 'origin/main' into e2e-cicd
robotrapta a4cfb5c
Delays deleting stacks until sweeper runs, to speed up the pipeline.
robotrapta 380ff5d
Tweaking GHA rules.
robotrapta 471d498
yaml lint
robotrapta e8502b7
Improving the workflow validation to catch semantic errors.
robotrapta 09bfde7
FIxing sweeper-eeut gha yaml
robotrapta 56162e5
Merge remote-tracking branch 'origin/main' into validate-workflow-yamls
robotrapta cb0b882
Improving the workflow validation to catch semantic errors.
robotrapta 2fc60a9
Fixing comment.
robotrapta 38c4c5f
Runs actionlint twice - once for errors, again for warnings.
robotrapta 0c6861c
Ignoring shellcheck warnings.
robotrapta 5981d7c
Merge branch 'validate-workflow-yamls' into e2e-cicd
robotrapta d2f1a68
Setting aws region.
robotrapta 1f788a6
Setting wd
robotrapta 4972363
Correct filename
robotrapta e83f546
Cleanup output on sweep-destroy.
robotrapta 35615ae
Using instance profile with rights to pull from ECR
robotrapta eacb689
Serious crack at checking k8
robotrapta 911fab9
Finds the instance profile properly
c90f9ec
Decent looking k8 test.
c4029c6
Runs the e2e test on all PRs
02eb995
Runs the check k8 deployment test e2e
80425a2
Refactoring some checking and expiration code.
c2c44a8
Further refactoring.
a33ea3f
Adding a server-port check.
c393bbb
Using serverport check
9143ae2
(Barely) functional SDK test
18a86ed
More disk!
ba77382
Adding full-check.
89e4868
Fixup pipeline dependency naming miss.
88a2a11
Basic OO fail
32c020d
Avoid collision with unattended-upgrade
c3df51f
Reordering things.
robotrapta 4946f2d
Longer timeout for GPU to come online. Also installing into /opt/gro…
robotrapta ebca6c9
bugfix on expiring the stack
robotrapta 4dc4656
Don't rename the stack. Don't `rm` the stack because it's not workin…
robotrapta d67e1eb
Always terminate g4 at the end.
robotrapta 72e5ac3
Forgot to activate venv
robotrapta d6d8a17
typo in fab
robotrapta 060b413
Switching to uv for faster pipelines.
robotrapta eeac387
worfklow syntax error.
robotrapta 2039b38
Tweaking uv setup
robotrapta 338bf54
activating uv's venv
robotrapta dd713e9
syntax error in uv cache.
robotrapta 09fde70
losing uv venv
robotrapta cccc3fe
Explicitly installing pulumi again.
robotrapta d0de5e0
Taking out comments in pipeline.
robotrapta 2588af2
Adding uv sync.
robotrapta 797b38e
Swallows error shutting down instance.
robotrapta 096faad
Makes sure the EEUT uses the code in our current branch - Derp!
robotrapta 5fc91bf
forgot import - tired.
robotrapta a9cdd0a
WOrking around pulumi stupid
robotrapta 8439e93
tweak
robotrapta 80bbd57
robustificating again.
robotrapta 8998ef4
Trying again to load the correct code.
robotrapta bd4d841
ANother attempt to set the proper code into the test envirohnment.
robotrapta 28ec988
Simpler
robotrapta 1d91a88
Moving sweeper to self-hosted runners.
robotrapta d9ae8ef
Trying to understand commit hashes
robotrapta 375b903
USing self-hosted runner aws creds
robotrapta c7148cc
iterating debugging
robotrapta f625676
trying more
robotrapta f7c1d0e
AVoiding merge commit for test.
robotrapta 4867cbf
Taking out the debugging job.
robotrapta 4aee362
minor comments
robotrapta 78f918f
upping GPU ready timeout to 10 minutes
robotrapta e086943
Deliberately broken YAML for edge deployment.
robotrapta 7c59b30
fixing deliberately broken YAML
robotrapta 8a1adaa
Merge remote-tracking branch 'origin/main' into e2e-cicd
robotrapta File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,3 +12,4 @@ rules: | |
comments: disable | ||
trailing-spaces: disable | ||
empty-lines: disable | ||
new-line-at-end-of-file: disable |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
name: sweeper-eeut | ||
# This workflow tears down old EEUT stacks from pulumi. | ||
# We do this as a background sweeper job, because the teardown is VERY slow (~7 minutes for a g4) | ||
# and we don't want to slow down the main pipeline for that. | ||
on: | ||
schedule: | ||
- cron: '*/15 * * * *' # Every 15 minutes | ||
# Note cron workflows only run from the main branch. | ||
push: | ||
branches: | ||
# If you're working on this stuff, name your branch e2e-something and this will run. | ||
- e2e* | ||
concurrency: | ||
group: sweeper-eeut | ||
env: | ||
PYTHON_VERSION: "3.11" | ||
|
||
jobs: | ||
destroy-expired-eeut-stacks: | ||
#runs-on: ubuntu-22.04 # preferably | ||
# Currently running on self-hosted because something is wrong with the AWS perms on the GH runners. | ||
runs-on: self-hosted | ||
env: | ||
PULUMI_ACCESS_TOKEN: ${{ secrets.PULUMI_CICD_PAT }} | ||
defaults: | ||
run: | ||
working-directory: cicd/pulumi | ||
steps: | ||
- name: Check out code | ||
uses: actions/checkout@v3 | ||
|
||
- name: Set AWS credentials | ||
uses: aws-actions/configure-aws-credentials@v2 | ||
with: | ||
aws-region: us-west-2 | ||
# TODO: move these back to GH-provided secrets | ||
# Currently using IAM roles on the self-hosted runner instance. | ||
#aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }} | ||
#aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }} | ||
#aws-session-token: ${{ secrets.AWS_SESSION_TOKEN }} | ||
|
||
- name: Set up Python | ||
uses: actions/setup-python@v4 | ||
with: | ||
python-version: ${{ env.PYTHON_VERSION }} | ||
|
||
- name: Install Pulumi | ||
run: | | ||
curl -fsSL https://get.pulumi.com | sh | ||
export HOME=$(eval echo ~$(whoami)) | ||
echo "$HOME/.pulumi/bin" >> $GITHUB_PATH | ||
|
||
- name: Check that pulumi is installed and authenticated | ||
run: | | ||
set -ex | ||
pulumi whoami | ||
|
||
- name: Destroy old EEUT stacks | ||
working-directory: cicd/pulumi | ||
run: | | ||
./sweep-destroy-eeut-stacks.sh |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,102 @@ | ||
#! /bin/bash | ||
# This script is intended to run on a new ubuntu instance to set it up | ||
# Sets up an edge-endpoint environment. | ||
# It is tested in the CICD pipeline to install the edge-endpoint on a new | ||
# g4dn.xlarge EC2 instance with Ubuntu 22.04LTS. | ||
|
||
# As a user-data script on ubuntu, this file probably lands at | ||
# /var/lib/cloud/instance/user-data.txt | ||
echo "Setting up Groundlight Edge Endpoint. Follow along at /var/log/cloud-init-output.log" > /etc/motd | ||
|
||
echo "Starting cloud init. Uptime: $(uptime)" | ||
|
||
# Set up signals about the status of the installation | ||
mkdir -p /opt/groundlight/ee-install-status | ||
touch /opt/groundlight/ee-install-status/installing | ||
SETUP_COMPLETE=0 | ||
record_result() { | ||
if [ "$SETUP_COMPLETE" -eq 0 ]; then | ||
echo "Setup failed at $(date)" | ||
touch /opt/groundlight/ee-install-status/failed | ||
echo "Groundlight Edge Endpoint setup FAILED. See /var/log/cloud-init-output.log for details." > /etc/motd | ||
else | ||
echo "Setup complete at $(date)" | ||
echo "Groundlight Edge Endpoint setup complete. See /var/log/cloud-init-output.log for details." > /etc/motd | ||
touch /opt/groundlight/ee-install-status/success | ||
fi | ||
# Remove "installing" at the end to avoid a race where there is no status | ||
rm -f /opt/groundlight/ee-install-status/installing | ||
} | ||
trap record_result EXIT | ||
|
||
set -e # Exit on error of any command. | ||
|
||
wait_for_apt_lock() { | ||
# We wait for any apt or dpkg processes to finish to avoid lock collisions | ||
# Unattended-upgrades can hold the lock and cause the install to fail | ||
while sudo fuser /var/lib/dpkg/lock-frontend >/dev/null 2>&1; do | ||
echo "Another apt/dpkg process is running. Waiting for it to finish..." | ||
sleep 5 | ||
done | ||
} | ||
|
||
# Install basic tools | ||
wait_for_apt_lock | ||
sudo apt update | ||
wait_for_apt_lock | ||
sudo apt install -y \ | ||
git \ | ||
vim \ | ||
tmux \ | ||
htop \ | ||
curl \ | ||
wget \ | ||
tree \ | ||
bash-completion \ | ||
ffmpeg | ||
|
||
# Download the edge-endpoint code | ||
CODE_BASE=/opt/groundlight/src/ | ||
mkdir -p ${CODE_BASE} | ||
cd ${CODE_BASE} | ||
git clone https://github.com/groundlight/edge-endpoint | ||
cd edge-endpoint/ | ||
# The launching script should update this to a specific commit. | ||
SPECIFIC_COMMIT="__EE_COMMIT_HASH__" | ||
if [ -n "$SPECIFIC_COMMIT" ]; then | ||
# See if the string got substituted. Note can't compare to the whole thing | ||
# because that would be substituted too! | ||
if [ "${SPECIFIC_COMMIT:0:10}" != "__EE_COMMIT" ]; then | ||
echo "Checking out commit ${SPECIFIC_COMMIT}" | ||
git checkout ${SPECIFIC_COMMIT} | ||
else | ||
echo "It appears the commit hash was not substituted. Staying on main." | ||
fi | ||
else | ||
echo "A blank commit hash was provided. Staying on main." | ||
fi | ||
|
||
# Set up k3s with GPU support | ||
./deploy/bin/install-k3s-nvidia.sh | ||
|
||
# Set up some shell niceties | ||
TARGET_USER="ubuntu" | ||
echo "alias k='kubectl'" >> /home/${TARGET_USER}/.bashrc | ||
echo "source <(kubectl completion bash)" >> /home/${TARGET_USER}/.bashrc | ||
echo "complete -F __start_kubectl k" >> /home/${TARGET_USER}/.bashrc | ||
echo "set -o vi" >> /home/${TARGET_USER}/.bashrc | ||
|
||
# Configure the edge-endpoint with environment variables | ||
export DEPLOYMENT_NAMESPACE="gl-edge" | ||
export INFERENCE_FLAVOR="GPU" | ||
export GROUNDLIGHT_API_TOKEN="api_token_not_set" | ||
|
||
# Install the edge-endpoint | ||
kubectl create namespace gl-edge | ||
kubectl config set-context edge --namespace=gl-edge --cluster=default --user=default | ||
kubectl config use-context edge | ||
./deploy/bin/setup-ee.sh | ||
|
||
# Indicate that setup is complete | ||
SETUP_COMPLETE=1 | ||
echo "EE is installed into kubernetes, which will attempt to finish the setup." |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
echo "This is a uv project. Remember to 'uv run ...' everything" | ||
uv sync | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
|
||
*.pyc | ||
venv/ | ||
.venv/ | ||
__pycache__/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
name: ee-cicd | ||
runtime: | ||
name: python | ||
options: | ||
toolchain: uv | ||
description: CI/CD for Edge Endpoint | ||
config: | ||
ee-cicd:instanceType: g4dn.xlarge | ||
# Default to "main" so things are sensible if this doesn't get customized. | ||
# But for testing purposes, this should be set to the specific commit you want to test. | ||
ee-cicd:targetCommit: main |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
# Pulumi automation | ||
|
||
Pulumi automation to build an EE from scratch in EC2 and run basic integration tests. | ||
|
||
|
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oooohhh