Create and Deploy a ML model using Google Cloud Run, Github Actions and Terraform

In this post. I will explain how to expose an API from a trained model, use best CI/CD practices (Github Actions) and IaC (Terraform) to automate infrastructure creation.

Prerrequisites

Docker Desktop
Git
Github Account
Google Cloud Platform with owner permissions

Google Cloud Run

Cloud Run is a serverless platform from Google Cloud to deploy and run containers. Cloud Run can be used to serve Restful web APIs, WebSocket applications, or microservices connected by gRPC.

In this project we will need:

An IAM account with permissions to create a service account
Cloud Storage Admin permissions
Cloud Registry Admin permissions
Google Cloud Run Admin permissions

In case the API is not exposed for public access:

In the terraform/main.tf:

Remove the resource "google_cloud_run_service_iam_member" "run_all_users".

Ideally, you can set the iam accounts that can access this api using Google Cloud Run UI or using Terraform. This approach doesn't add any latency to the customer because it uses built-in IAM roles and permissions from Google Cloud.

Terraform

Terraform is a popular open-source tool for running infrastructure as code. It uses HCL which is a declarative language to declare infrastructure. The basic flow is:

Terraform init: Initializes the plugins, backend and many config files Terraform uses to keep tracking of the infrastructure.
Terraform plan: Generates an execution plan for all the infrastructure which is in terraform/main.tf
Terraform apply: Apply all the changes that were on the plan.

All steps are declared in the .github/workflows/workflow.yaml

Run it locally

Fork the repo
Clone it in your computer
Run docker build -t ml-api . in the root of the project to build the image of the api.
Run docker run -d --name ml -p 80:8080 ml-api to create the container using ml-api image built.
Open localhost to test the project.
On /predict/ post endpoint, you can use this body as an example:

{
"test_array": [
0,0,0,0,0,1,0,1,0,0,1,0,1,0,1,0,1,0,1,0,0,1,0,0,1,0,1,0,0,1,1,1,0,1,0,1,0]

}

You should expect a response 200 with a "prediction": 0 which means the flight wasn't delayed.

Deploy it

Generate a Service Account key and upload it in Github Secrets as GCLOUD_SERVICE_KEY
Push any change in the main branch
That's it! :)

Stress Testing

On Mac brew install wrk
Run wrk -t12 -c200 -d45s -s request.lua https://mlops-api-backend-1-5gdi5qltoq-uc.a.run.app/predict/ to open 12 threads with 200 open http connections during 45 seconds.

How can we improve the results

The best approach would be using horizontal scaling. in this case we can create a 2nd Google Cloud Run instance and use load balancing to distribute the traffic between both instances.

SLO and SLIs

Availability of 99.95% in the cloud run instance over a year which sets an error budget of 0.5% in case there is any problem in the gcp region.
Latency under 200 ms given 50k requests over 45s

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
.github/workflows		.github/workflows
app		app
env		env
terraform		terraform
.DS_Store		.DS_Store
Dockerfile		Dockerfile
README.md		README.md
pickle_model.pkl		pickle_model.pkl
request.lua		request.lua
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Create and Deploy a ML model using Google Cloud Run, Github Actions and Terraform

Prerrequisites

Google Cloud Run

In case the API is not exposed for public access:

Terraform

Run it locally

Deploy it

Stress Testing

How can we improve the results

SLO and SLIs

About

Releases

Packages

Languages

AlvaroRaul7/fastapi-MlOps

Folders and files

Latest commit

History

Repository files navigation

Create and Deploy a ML model using Google Cloud Run, Github Actions and Terraform

Prerrequisites

Google Cloud Run

In case the API is not exposed for public access:

Terraform

Run it locally

Deploy it

Stress Testing

How can we improve the results

SLO and SLIs

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages