Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Production Deploy 2024-03-27 #278

Merged
merged 41 commits into from
Mar 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
0a86910
Begin DR creation
ronardcaktus Jan 27, 2023
aaae16f
Begin DR creation
ronardcaktus Jan 27, 2023
139f403
Begin DR creation
ronardcaktus Jan 27, 2023
ac1c384
Commit missing files
ronardcaktus Jan 27, 2023
03833f2
add dr domain name
ronardcaktus Mar 1, 2023
c6265b8
Fix suggestions
ronardcaktus Mar 2, 2023
e32c12a
Add namespace
ronardcaktus Mar 7, 2023
df59adb
Refactor prod, staging, and dr envs
ronardcaktus Mar 7, 2023
b7fa932
Refactor prod, staging, and dr envs
ronardcaktus Mar 7, 2023
79c4698
Merge branch 'develop' of github.com:caktus/philly-hip into CU-30tprq…
ronardcaktus Mar 23, 2023
7d3cd56
Merge pull request #262 from caktus/CU-30tprqy_HIP-Philly
ronardcaktus Apr 3, 2023
3ef72b0
Upgrade Wagtail
ronardcaktus Apr 10, 2023
073f7e1
Merge pull request #267 from caktus/CU-862jkp7xj_Update-Wagtail-to-422
ronardcaktus Apr 11, 2023
272e6a0
Add aws-cloudwatch-metrics Helm chart and alarms (#268)
tobiasmcnulty May 19, 2023
ad7adf5
Add jq to Dockerfile dev container layer
ronardcaktus Jun 1, 2023
811625c
Update deployment requirements
ronardcaktus Jun 2, 2023
a22c8f2
Update dev requirements
ronardcaktus Jun 2, 2023
58a5678
Update descheduler to v0.25.1
ronardcaktus Jun 2, 2023
e3ba2ff
Update cert manager chart version
ronardcaktus Jun 6, 2023
2085ba4
Update kube client version and helm
ronardcaktus Jun 6, 2023
ce8677d
update gitignore
ronardcaktus Jun 6, 2023
2322e7b
Merge pull request #269 from caktus/CU-86774uhzq_Philly-hip
ronardcaktus Jun 13, 2023
2b403a2
return a 404 status code response when serving HealthAlertDetailPage …
dchukhin Jun 30, 2023
ed75e58
add tests for HealthAlertDetailPage's serve() and serve_preview() met…
dchukhin Jun 30, 2023
e0db1b6
Merge pull request #270 from caktus/8684y6f05-fix-server-error-on-hea…
Jul 3, 2023
ce65485
update to latest Django bugfix version
dchukhin Jul 5, 2023
c94eb26
Update pyyaml, awscli, boto3, and botocore
ronardcaktus Jul 31, 2023
f7850ac
explcitly add PyYAML dependency to base requirements file, so we can …
dchukhin Aug 2, 2023
b7b252c
update generated requirements files based on changes to .in files
dchukhin Aug 2, 2023
6fec16d
temporarily enable CI to deploy this branch
dchukhin Aug 4, 2023
b6eb26f
revert temporary change to CI config file
dchukhin Aug 4, 2023
a5efd7a
Merge pull request #272 from caktus/django-3-20
Aug 4, 2023
b37b59c
Merge pull request #273 from caktus/pyyaml-fix
ronardcaktus Aug 4, 2023
be637e4
upgrade to latest bugfix version
dchukhin Sep 11, 2023
ca5e35f
upgrade Django to latest bugfix version
dchukhin Oct 5, 2023
eb829e4
Enables solely LTS version notification
ronardcaktus Oct 30, 2023
b5cabbb
Reload workers after a specified amount of requests
ronardcaktus Oct 30, 2023
cbb63d0
Move max request up in file
ronardcaktus Nov 1, 2023
68040f0
Merge pull request #275 from caktus/max-requests
ronardcaktus Nov 2, 2023
49fc9a0
Update Django to latest bugfix release
ronardcaktus Nov 2, 2023
cd4e19f
Merge pull request #274 from caktus/django-3-2-21
Nov 2, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ env-local.sh
.direnv
venv
webpack-stats.json
bin
.kube

# Project Static Files
node_modules/*
Expand All @@ -28,6 +30,7 @@ media/*
hip/static/bundles/main.js

# Ansible
bin/
deploy/roles/*
.vault_pass
*.retry
Expand Down
8 changes: 6 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,9 @@ ENV UWSGI_HTTP=:8000 UWSGI_MASTER=1 UWSGI_HTTP_AUTO_CHUNKED=1 UWSGI_HTTP_KEEPALI
# Number of uWSGI workers and threads per worker (customize as needed):
ENV UWSGI_WORKERS=2 UWSGI_THREADS=4

# Reload workers after the specified amount of managed requests (avoid memory leaks)
ENV UWSGI_MAX_REQUESTS=1000

# uWSGI static file serving configuration (customize or comment out if not needed):
ENV UWSGI_STATIC_MAP="/static/=/code/static/" UWSGI_STATIC_EXPIRES_URI="/static/.*\.[a-f0-9]{12,}\.(css|js|png|jpg|jpeg|gif|ico|woff|ttf|otf|svg|scss|map|txt) 315360000"

Expand Down Expand Up @@ -118,8 +121,8 @@ RUN groupadd --gid $USER_GID $USERNAME \
# openssh-client -- for git over SSH
# sudo -- to run commands as superuser
# vim -- enhanced vi editor for commits
ENV KUBE_CLIENT_VERSION="v1.22.15"
ENV HELM_VERSION="3.8.2"
ENV KUBE_CLIENT_VERSION="v1.25.10"
ENV HELM_VERSION="3.12.0"
RUN --mount=type=cache,target=/var/cache/apt --mount=type=cache,target=/var/lib/apt \
--mount=type=cache,mode=0755,target=/root/.cache/pip \
set -ex \
Expand All @@ -129,6 +132,7 @@ RUN --mount=type=cache,target=/var/cache/apt --mount=type=cache,target=/var/lib/
docker-compose-plugin \
git-core \
gnupg2 \
jq \
libpcre3 \
libpq-dev \
libpng-dev \
Expand Down
5 changes: 5 additions & 0 deletions apps/health_alerts/models.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import datetime

from django.db import models
from django.http import Http404
from django.shortcuts import redirect

from phonenumber_field.modelfields import PhoneNumberField
Expand Down Expand Up @@ -108,6 +109,10 @@ def get_priority_color(self):
return ""

def serve(self, request):
"""Return the URL for the HealthAlertDetailPage's alert_file (or a 404 page)."""
# If the HealthAlertDetailPage does not have an alert_file, then return a 404 page.
if not self.alert_file:
raise Http404()
return redirect(self.alert_file.url)

# Because we have overridden the serve() method of this model, we also need to
Expand Down
51 changes: 51 additions & 0 deletions apps/health_alerts/tests/test_models.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
from django.http import Http404

import pytest

from apps.hip.tests.factories import DocumentFactory

from .factories import HealthAlertDetailPageFactory


@pytest.mark.parametrize(
"page_is_live,page_has_alert_file,expected_response_status_code",
[
(True, True, 302),
(True, False, 404),
(False, True, 302),
(False, False, 404),
],
)
def test_serve_health_alert_detail_page(
db,
client,
request,
page_is_live,
page_has_alert_file,
expected_response_status_code,
):
"""Assert that loading a HealthAlertDetailPage does not cause a server error."""
health_alert_page = HealthAlertDetailPageFactory(live=page_is_live, alert_file=None)
if page_has_alert_file:
health_alert_page.alert_file = DocumentFactory()
health_alert_page.save()

# Call the serve() method, and verify that the response is as expected.
if expected_response_status_code == 404:
with pytest.raises(Http404):
health_alert_page.serve(request)
else:
response_serve = health_alert_page.serve(request)
assert expected_response_status_code == response_serve.status_code
if expected_response_status_code == 302:
assert health_alert_page.alert_file.url == response_serve.url

# Call the serve_preview() method, and verify that the response is as expected.
if expected_response_status_code == 404:
with pytest.raises(Http404):
health_alert_page.serve_preview(request, "")
else:
response_serve_preview = health_alert_page.serve_preview(request, "")
assert expected_response_status_code == response_serve_preview.status_code
if expected_response_status_code == 302:
assert health_alert_page.alert_file.url == response_serve_preview.url
5 changes: 5 additions & 0 deletions deploy/db-restore.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
- hosts: k8s
tasks:
- import_role:
name: caktus.k8s-hosting-services
tasks_from: restore
57 changes: 57 additions & 0 deletions deploy/deploy-cluster.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
- role: caktus.k8s-web-cluster
tasks:
- name: Add AWS for fluent bit helm chart (centralized logging)
tags: fluentbit
community.kubernetes.helm:
context: "{{ k8s_context|mandatory }}"
kubeconfig: "{{ k8s_kubeconfig }}"
Expand All @@ -26,3 +27,59 @@
elasticsearch:
enabled: false
wait: yes
- name: Create Amazon CloudWatch Metrics namespace
tags: cloudwatch
community.kubernetes.k8s:
context: "{{ k8s_context|mandatory }}"
kubeconfig: "{{ k8s_kubeconfig }}"
name: "{{ k8s_aws_cloudwatch_metrics_namespace }}"
api_version: v1
kind: Namespace
state: present
- name: Add AWS CloudWatch Metrics helm chart (monitoring)
tags: cloudwatch
community.kubernetes.helm:
context: "{{ k8s_context|mandatory }}"
kubeconfig: "{{ k8s_kubeconfig }}"
chart_repo_url: "https://aws.github.io/eks-charts"
chart_ref: aws-cloudwatch-metrics
# https://artifacthub.io/packages/helm/aws/aws-cloudwatch-metrics
chart_version: "{{ k8s_aws_cloudwatch_metrics_chart_version }}"
release_name: aws-cloudwatch-metrics
release_namespace: "{{ k8s_aws_cloudwatch_metrics_namespace }}"
release_values:
clusterName: philly-hip-stack-cluster
wait: yes
- name: Create alarms
tags: cloudwatch
amazon.aws.cloudwatch_metric_alarm:
state: present
region: us-east-1
name: "{{ item.name }}"
description: "{{ item.description }}"
metric: "{{ item.metric }}"
namespace: "ContainerInsights"
dimensions:
ClusterName: philly-hip-stack-cluster
statistic: Average
comparison: "{{ item.comparison }}"
threshold: "{{ item.threshold }}"
period: "{{ item.period }}"
evaluation_periods: "{{ item.evaluation_periods }}"
alarm_actions:
- arn:aws:sns:us-east-1:061553509755:HIP_Errors_CloudWatch_Alarms_Topic
loop:
- name: node-cpu-high
description: This will alarm when a instance's CPU usage average is greater than 50% for 15 minutes.
metric: node_cpu_utilization
comparison: GreaterThanOrEqualToThreshold
threshold: 50
period: 300
evaluation_periods: 3
- name: node-count-low
description: This will alarm when a cluster's node count drops below 2 for 15 minutes.
metric: cluster_node_count
comparison: LessThanThreshold
threshold: 2
period: 300
evaluation_periods: 3
87 changes: 82 additions & 5 deletions deploy/group_vars/all.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
---

# ----------------------------------------------------------------------------
# Global: Common configuration variables for all inventory groups
# ----------------------------------------------------------------------------
Expand All @@ -12,9 +11,17 @@ stack_name: "{{ long_app_name }}-stack"
aws_profile: "{{ long_app_name }}"
cluster_name: "{{ stack_name }}-cluster"

ansible_connection: local
ansible_python_interpreter: "{{ ansible_playbook_python }}"

k8s_cluster_name: "{{ cluster_name }}"
k8s_namespace: "{{ app_name }}-{{ env_name }}"

# CloudFormation Outputs
# These values are taken from the CF 'Output' tab
# aws eks describe-cluster --name=philly-hip-stack-cluster | grep endpoint
ClusterEndpoint: https://C3219F3CB49E4B1C82CFE8C82A846345.sk1.us-east-1.eks.amazonaws.com
# aws rds describe-db-instances
DatabaseAddress: pd13w6wwn2hbn7f.cp7c2yqiusbp.us-east-1.rds.amazonaws.com
RepositoryURL: 061553509755.dkr.ecr.us-east-1.amazonaws.com/philly-hip-stack-applicationrepository-kk92mehevd86

Expand Down Expand Up @@ -69,15 +76,21 @@ cloudformation_stack:
# --------------------------------------------------------------------------

k8s_cluster_type: aws
# aws eks describe-cluster --name=philly-hip-stack-cluster --query 'cluster.arn'
k8s_context: "arn:aws:eks:us-east-1:061553509755:cluster/{{ cluster_name }}"
k8s_ingress_nginx_chart_version: "4.4.2"
k8s_cert_manager_chart_version: "v1.11.0"
k8s_ingress_nginx_chart_version: "4.6.0"
k8s_cert_manager_chart_version: "v1.11.1"
k8s_letsencrypt_email: [email protected]
k8s_iam_users: [noop] # https://github.com/caktus/ansible-role-k8s-web-cluster/issues/17
# aws-for-fluent-bit
# - https://github.com/aws/eks-charts/tree/master/stable/aws-for-fluent-bit
# - https://artifacthub.io/packages/helm/aws/aws-for-fluent-bit
k8s_aws_fluent_bit_chart_version: "0.1.18"
# aws-cloudwatch-metrics:
# - https://github.com/aws/eks-charts/tree/master/stable/aws-cloudwatch-metrics
# - https://artifacthub.io/packages/helm/aws/aws-cloudwatch-metrics
k8s_aws_cloudwatch_metrics_chart_version: "0.0.9"
k8s_aws_cloudwatch_metrics_namespace: amazon-cloudwatch

# ----------------------------------------------------------------------------
# caktus.django-k8s: Shared configuration variables for staging and production
Expand All @@ -86,7 +99,6 @@ k8s_aws_fluent_bit_chart_version: "0.1.18"

k8s_auth_host: "{{ ClusterEndpoint }}"
k8s_auth_ssl_ca_cert: "k8s_auth_ssl_ca_cert.txt"
k8s_namespace: "{{ app_name }}-{{ env_name }}"
k8s_memcached_enabled: true

# App pod configuration:
Expand All @@ -95,6 +107,13 @@ k8s_container_port: 8000
k8s_container_image: "{{ RepositoryURL }}"
k8s_container_image_pull_policy: Always
k8s_container_replicas: 2

# Lower resources to preserve Node resources
k8s_container_resources:
requests:
memory: "256Mi"
cpu: "50m"

k8s_migrations_enabled: true
k8s_collectstatic_enabled: false
k8s_container_ingress_annotations:
Expand Down Expand Up @@ -129,12 +148,70 @@ k8s_ci_username: hip-ci-user
k8s_ci_repository_arn: "arn:aws:ecr:us-east-1:061553509755:repository/philly-hip-stack-applicationrepository-kk92mehevd86"
k8s_ci_vault_password_arn: "arn:aws:secretsmanager:us-east-1:061553509755:secret:hip-ansible-vault-password-JYhbao"

# Email:
env_email_host: email-smtp.us-east-1.amazonaws.com
env_email_use_tls: "true"
env_email_host_user: !vault |
$ANSIBLE_VAULT;1.1;AES256
30613761343565386331633239623831303665313461356663393563346633373533316134633031
3766633834376434363137646333666266353865343937360a613838306134663961333237393030
39356265383036633765363635633232373066633639323763363935373934313632303830323964
3265383761653137350a366134306338383537336336353266353439303539316334346330313439
31666161613437643239373566303238353663653931343637353866303435666364
env_email_host_pass: !vault |
$ANSIBLE_VAULT;1.1;AES256
62303635346364383964393536613631623730363337333337343930653030333865373539643736
6138643665383165383863346239323066636233623937620a306137363539356362653935343338
30366561316361633936613731333639373136323732616638313837633438343135323530623134
6339646533356361340a633535323165653935376136303135353866353762663366663032376536
32636365613634373961353564626336343930393866393130656666316634316431353431386330
3561616461636134373033316665613035303736646133613630

# Azure SSO settings
azure_client_id: "f0629cf8-f6f4-4142-94c3-11b8beaaa510"
azure_tenant_id: "2046864f-68ea-497d-af34-a6629a6cd700"
azure_client_secret: !vault |
$ANSIBLE_VAULT;1.1;AES256
34653665623939373232343266393962386662373738363135313965636461303362656235353739
3833373532646436326463663233616238316431306633330a333664363061313630646565613465
34393634623231333964346166306639613438623330343865663066643239383634633538613130
3234326436376638370a353262656662656334653234666565313032333237353135336132636136
33613062323365303165663261356138616634656331373037363031326161383832333662333266
6339636266626239303165666261353362626564363636346665

k8s_environment_variables:
DATABASE_URL: "{{ env_database_url }}"
DJANGO_SETTINGS_MODULE: "{{ env_django_settings }}"
DJANGO_DEBUG: "False"
# DOMAIN is the ALLOWED_HOST
DOMAIN: "{{ k8s_domain_names[0] }}"
# join ALLOWED_HOSTS with a colon, because they are split by colon in deploy.py
ALLOWED_HOSTS: "{{ k8s_domain_names|join(':') }}"
ENVIRONMENT: "{{ env_name }}"
CACHE_HOST: "{{ env_cache_host }}"
# *** Uploaded media
DEFAULT_FILE_STORAGE: "{{ env_default_file_storage }}"
MEDIA_STORAGE_BUCKET_NAME: "{{ env_media_storage_bucket_name }}"
AWS_DEFAULT_ACL: "{{ env_aws_default_acl }}"
AWS_DEFAULT_REGION: "{{ aws_region }}"
MEDIA_LOCATION: "{{ env_media_location }}"
# *** Email
EMAIL_HOST: "{{ env_email_host }}"
EMAIL_HOST_USER: "{{ env_email_host_user }}"
EMAIL_HOST_PASSWORD: "{{ env_email_host_pass }}"
EMAIL_USE_TLS: "{{ env_email_use_tls }}"
DJANGO_SECRET_KEY: "{{ env_django_secret_key }}"
# Azure SSO settings
AZURE_CLIENT_ID: "{{ azure_client_id }}"
AZURE_TENANT_ID: "{{ azure_tenant_id }}"
AZURE_CLIENT_SECRET: "{{ azure_client_secret }}"

# Install Descheduler to attempt to spread out pods again after node failures
k8s_install_descheduler: yes
# You must set the k8s_descheduler_chart_version to match the Kubernetes
# node version (0.23.x -> K8s 1.23.x); see:
# https://github.com/kubernetes-sigs/descheduler#compatibility-matrix
k8s_descheduler_chart_version: v0.22.1
k8s_descheduler_chart_version: v0.25.2
# See values.yaml for options:
# https://github.com/kubernetes-sigs/descheduler/blob/master/charts/descheduler/values.yaml#L63
k8s_descheduler_release_values:
Expand Down
31 changes: 31 additions & 0 deletions deploy/group_vars/staging_shared.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# ----------------------------------------------------------------------------
# caktus.django-k8s
# ----------------------------------------------------------------------------

env_database_url: "postgres://{{ app_name }}_staging:{{ database_password }}@{{ DatabaseAddress }}:5432/{{ app_name }}_{{ env_name }}"
# pwgen -s 40 1|tr -d '\n'|ansible-vault encrypt_string
database_password: !vault |
$ANSIBLE_VAULT;1.1;AES256
31633236353234623634616635306235633431373936393632656266333831333335326565353536
6565353434336561386234346639626634363164613139620a333134643231643634373137323638
62366363386436613935323163663562313865326236306662333765613565353037386332323134
3631346233636465350a646135386165336631383837383436653638346361333862663035613066
61383363666334303639353661353862363433383833643164623865636363383162343666636132
3135306661623561633630323062613065623738383866653833

# Disaster Recovery: DB restore configuration
k8s_restore_namespace: "{{ k8s_namespace }}"
k8s_restore_target_db_url: "{{ env_database_url }}"
k8s_restore_sql_commands: [CREATE EXTENSION IF NOT EXISTS citext;]
k8s_restore_maint_user: hip_admin
k8s_restore_maint_host: "{{ DatabaseAddress }}"
k8s_restore_maint_port: "5432"
k8s_restore_maint_name: hip
k8s_restore_maint_pass: !vault |
$ANSIBLE_VAULT;1.1;AES256
63666133363834643339373132356631656633343463313761376363613138383035353532346236
3162396136333434303539346435306361336636636232620a353139306565616231303763646366
31636366666262323933643061626135346663646564656534313437393063396633626332663831
3565323636626163320a633963393664626563313265363632633161643833626366373265643835
30383637393636336335303231653434666536623535313439646136663239383139323533613239
6666643563326336613864366161623264363331656632333761
Loading
Loading