Skip to content

Commit

Permalink
Merge pull request #2 from yindia/oauth2
Browse files Browse the repository at this point in the history
Control  Plane  + Data Plane
  • Loading branch information
yindia authored Oct 16, 2024
2 parents bb7d012 + 9f303b1 commit 7c25c4c
Show file tree
Hide file tree
Showing 65 changed files with 4,353 additions and 1,397 deletions.
81 changes: 80 additions & 1 deletion .env
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,83 @@ DB_USERNAME=admin
DB_PASSWORD="admin"
DB_SSL_MODE=disable
DB_POOL_MAX_CONNS=50
TASK_TIME_OUT=3
TASK_TIME_OUT=3
# OAUTH2_ISSUER=https://dev-736553.okta.com
OAUTH2_CLIENT_ID=64660401062-s9nm4vp7esak8g9a6im8c9712jkk2lbb.apps.googleusercontent.com
OAUTH2_CLIENT_SECRET=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
OAUTH2_PROVIDER=google


















# // Tasks 1M (run_query)
# // Worker Count per deployment = 10000
# // Worker Deployment = 1M/10000 = 100
# // Timeout = 4s
# // Retry Count = 5
# // Delay between each retry = 2s
# // Max Time = 12s
# // Min Time = 4s
# // Failed Task = 20% (Cascade effect) Retry Required

# Initial run:
# Successful tasks (800,000): 800,000 * 4s = 3,200,000s
# Failed tasks (200,000): 200,000 * 1s = 200,000s
# Total time for initial run: 3,400,000s

# Retry 1:
# Total Retry Task Failed 1 = 200,000 * 20% = 40,000
# Total Retry Task Success 1 = 200,000 - 40,000 = 160,000
# Time: (160,000 * 4s) + (40,000 * (4s + 2s)) = 880,000s

# Retry 2:
# Total Retry Task Failed 2 = 40,000 * 20% = 8,000
# Total Retry Task Success 2 = 40,000 - 8,000 = 32,000
# Time: (32,000 * 4s) + (8,000 * (4s + 2s + 2s)) =

# Retry 3:
# Total Retry Task Failed 3 = 8,000 * 20% = 1,600
# Total Retry Task Success 3 = 8,000 - 1,600 = 6,400
# Time: (6,400 * 4s) + (1,600 * (4s + 2s + 2s + 2s)) = 41,600s

# Retry 4:
# Total Retry Task Failed 4 = 1,600 * 20% = 320
# Total Retry Task Success 4 = 1,600 - 320 = 1,280
# Time: (1,280 * 4s) + (320 * (4s + 2s + 2s + 2s + 2s)) = 8,960s

# Retry 5:
# Total Retry Task Failed 5 = 320 * 20% = 64
# Total Retry Task Success 5 = 320 - 64 = 256
# Time: (256 * 4s) + (64 * (4s + 2s + 2s + 2s + 2s + 2s)) = 1,920s

# Total Task Failed = 64
# Total Time = 3,400,000s + 880,000s + 192,000s + 41,600s + 8,960s + 1,920s = 4,424,480s
# Total Time in hours: 4,424,480s / 3600 ≈ 1,230.13 hours
# Total Time in days: 1,230.13 hours / 24 ≈ 51.25 days





# // TODO(Reconcile Improvment):
# // To Scale up run the system with 1M Dummy Task and Get insights of the system behavior
# // 1. Track the retry number in history table along with task table
# // 2. Reconcile based on history of the task rather than task table
# // 3. Reconcile based on the time difference of state rather than fixed time interval
# // 4. Check the task latest state and find the time difference and reconcile based on that


33 changes: 33 additions & 0 deletions Dockerfile.controller
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Build the manager binary
FROM golang:1.22 AS builder
ARG TARGETOS
ARG TARGETARCH

WORKDIR /workspace
# Copy the Go Modules manifests
COPY go.mod go.mod
COPY go.sum go.sum
# cache deps before building and copying source so that we don't need to re-download as much
# and so that source changes don't invalidate our downloaded layer
RUN go mod download

# Copy the go source
COPY cmd/main.go cmd/main.go
COPY api/ api/
COPY internal/controller/ internal/controller/

# Build
# the GOARCH has not a default value to allow the binary be built according to the host where the command
# was called. For example, if we call make docker-build in a local env which has the Apple Silicon M1 SO
# the docker BUILDPLATFORM arg will be linux/arm64 when for Apple x86 it will be linux/amd64. Therefore,
# by leaving it empty we can ensure that the container and binary shipped on it will have the same platform.
RUN CGO_ENABLED=0 GOOS=${TARGETOS:-linux} GOARCH=${TARGETARCH} go build -a -o manager cmd/main.go

# Use distroless as minimal base image to package the manager binary
# Refer to https://github.com/GoogleContainerTools/distroless for more details
FROM gcr.io/distroless/static:nonroot
WORKDIR /
COPY --from=builder /workspace/manager .
USER 65532:65532

ENTRYPOINT ["/manager"]
39 changes: 28 additions & 11 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -8,37 +8,47 @@ SERVER_SRC := ./server/root
DASHBOARD_SRC := ./clients/dashboard

# Docker configuration
DOCKER_REPO := ghcr.io/yindia
DOCKER_REPO := ghcr.io/bruin-hiring
VERSION := $(shell git describe --tags --always --dirty)
DOCKER_CLI_NAME := task-cli
DOCKER_SERVER_NAME := task-server
DOCKER_DASHBOARD_NAME := task-dashboard

# Colors for output
# ANSI color codes for prettier output
NO_COLOR := \033[0m
OK_COLOR := \033[32;01m
ERROR_COLOR := \033[31;01m
WARN_COLOR := \033[33;01m

# Declare phony targets (targets that don't represent files)
.PHONY: all bootstrap deps check-go check-npm build test docker-build docker-push helm-template helm-lint helm-fmt helm-install helm helm-dep-update

# Default target: run deps, tests, and build
all: deps test build

# Install all dependencies
deps: deps-go deps-npm

deps: check-go check-npm
# Install Go dependencies
deps-go: check-go
go mod download
go fmt ./...
go generate ./...
npm config set @buf:registry https://buf.build/gen/npm/v1/

# Install npm dependencies
deps-npm: check-npm
npm install --force

# Check if Go is installed
check-go:
@which go > /dev/null || (echo "$(ERROR_COLOR)Go is not installed$(NO_COLOR)" && exit 1)

# Check if npm is installed
check-npm:
@which npm > /dev/null || (echo "$(ERROR_COLOR)npm is not installed$(NO_COLOR)" && exit 1)

# CLI targets
build-cli: deps
build-cli: deps-go
@echo "$(OK_COLOR)==> Building the CLI...$(NO_COLOR)"
@CGO_ENABLED=0 go build -v -ldflags="-s -w" -o "$(BUILD_DIR)/$(CLI_NAME)" "$(CLI_SRC)"

Expand All @@ -55,7 +65,7 @@ docker-push-cli: docker-build-cli
docker push $(DOCKER_REPO)/$(DOCKER_CLI_NAME):$(VERSION)

# Server targets
build-server: deps
build-server: deps-go
@echo "$(OK_COLOR)==> Building the server...$(NO_COLOR)"
@CGO_ENABLED=0 go build -v -ldflags="-s -w" -o "$(BUILD_DIR)/$(SERVER_NAME)" "$(SERVER_SRC)"

Expand All @@ -72,17 +82,17 @@ docker-push-server: docker-build-server
docker push $(DOCKER_REPO)/$(DOCKER_SERVER_NAME):$(VERSION)

# Dashboard targets
build-dashboard: deps
build-dashboard: deps-npm
@echo "$(OK_COLOR)==> Building the dashboard...$(NO_COLOR)"
npm run build

run-dashboard: deps
run-dashboard: deps-npm
@echo "$(OK_COLOR)==> Running the dashboard...$(NO_COLOR)"
npm run dev

docker-build-dashboard:
@echo "$(OK_COLOR)==> Building Docker image for dashboard...$(NO_COLOR)"
docker build -f Dockerfile.client -t $(DOCKER_REPO)/$(DOCKER_DASHBOARD_NAME):$(VERSION) .
docker build -f Dockerfile.client -t $(DOCKER_REPO)/$(DOCKER_DASHBOARD_NAME):$(VERSION) .

docker-push-dashboard: docker-build-dashboard
@echo "$(OK_COLOR)==> Pushing Docker image for dashboard...$(NO_COLOR)"
Expand Down Expand Up @@ -112,6 +122,11 @@ helm-fmt:
@echo "$(OK_COLOR)==> Formatting Helm charts...$(NO_COLOR)"
helm lint --strict charts/task

helm-docs:
@echo "$(OK_COLOR)==> Generating Helm charts README.md...$(NO_COLOR)"
go install github.com/norwoodj/helm-docs/cmd/helm-docs@latest
helm-docs -c ./charts/task/

helm-install:
@echo "$(OK_COLOR)==> Installing Helm charts...$(NO_COLOR)"
helm install my-release charts/task
Expand All @@ -120,10 +135,12 @@ helm-dep-update:
@echo "$(OK_COLOR)==> Updating Helm dependencies...$(NO_COLOR)"
helm dependency update ./charts/task/

helm: helm-dep-update helm-template helm-lint helm-fmt
# Run all Helm-related tasks
helm: helm-dep-update helm-template helm-lint helm-fmt helm-docs
@echo "$(OK_COLOR)==> Helm template, lint, and format completed.$(NO_COLOR)"

# Set up development environment
bootstrap:
curl -fsSL https://pixi.sh/install.sh | bash
brew install bufbuild/buf/buf
pixi shell
pixi shell
95 changes: 14 additions & 81 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,12 +39,6 @@ make bootstrap

# Run Database
docker-compose up -d

# Install river
go install github.com/riverqueue/river/cmd/river@latest

# Run River migration (It will create the river resource in the database)
river migrate-up --database-url "postgres://admin:[email protected]:5432/tasks?sslmode=disable"
```


Expand Down Expand Up @@ -75,7 +69,7 @@ Access at https://127.0.0.1:3000
### 5. Worker (Data Plane)
Start worker instances:
```bash
./bin/task-cli serve -n 10
./bin/task-cli serve --log-level debug
```

## Project Structure
Expand Down Expand Up @@ -113,16 +107,20 @@ graph TD
A[Dashboard Client] -->|Sends Request| B(Server)
C[CLI Client] -->|Sends Request| B(Server)
%% Server and its connections
B(Server) -->|Reads/Writes| D[(PostgreSQL Database)]
B(Server) -->|Publishes Message| E(RiverQueue)
%% RabbitMQ and Worker
E(RiverQueue) -->|Sends Message| F(Worker)
F(Worker) -->|Consumes Data| G[Executes Work]
%% Control Plane
subgraph Control Plane
B(Server) -->|Reads/Writes| D[(PostgreSQL Database)]
end
%% Optional back-and-forth communication if needed
F(Worker) -->|Update Status| B[(Server)]
%% Data Plane
subgraph Data Plane
E[Agent] -->|Initiates Connection| B[Server]
B[Server] -->|Publish W| E[Agent]
E -->|Creates CRD| H[CRD]
F[Controller] -->|Watches CRD| H
F -->|Executes Task| J[Task Execution]
F -->|Sends Status Update| B
end
```

This architecture allows for:
Expand Down Expand Up @@ -267,20 +265,6 @@ graph TD
K --> L
```

Reconciliation Job (Run in every 10 minutes) as background job

```mermaid
graph TD
%% Reconciliation Job Flow
subgraph Reconciliation Job
M[Start Reconciliation Job] --> N[Get List of Stuck Jobs]
N --> O{Jobs Found?}
O -->|Yes| P[Update Status: Queued]
P --> Q[Enqueue Message to River Queue]
O -->|No| R[End Reconciliation Job]
Q --> R
end
```

## API Documentation
- [Proto Docs](https://buf.build/evalsocket/cloud)
Expand Down Expand Up @@ -568,54 +552,3 @@ kind delete cluster --name task-service
This setup allows you to test the entire Task Service stack, including the server, workers, and dependencies, in a local Kubernetes environment. It's an excellent way to validate the Helm charts and ensure everything works together as expected in a Kubernetes setting.


## Future Improvements

As we continue to evolve the Task Service, we are exploring several enhancements to improve its scalability, reliability, and management.

### Kubernetes-Native Task Execution

We are considering leveraging Kubernetes Custom Resource Definitions (CRDs) and custom controllers to manage task execution. This approach would enable us to fully utilize Kubernetes' scheduling and scaling capabilities.

#### High-Level Architecture

```mermaid
graph TD
%% Clients
A[Dashboard Client] -->|Sends Request| B(Server)
C[CLI Client] -->|Sends Request| B(Server)
%% Control Plane
subgraph Control Plane
B(Server) -->|Reads/Writes| D[(PostgreSQL Database)]
end
%% Data Plane
subgraph Data Plane
E[Agent] -->|Initiates Connection| B[Server]
E -->|Creates CRD| H[CRD]
F[Controller] -->|Watches CRD| H
F -->|Creates Pod for Task| I[Pod]
I -->|Executes Task| J[Task Execution]
F -->|Sends Status Update| B
end
```

In this architecture:

1. Our agent initiates a streaming connection with the control plane and listens for events.
2. When a new task is created, the control plane generates an event for the agent.
3. Upon receiving the event, the agent creates a Custom Resource Definition (CRD) for the task in Kubernetes.
4. A custom Worker Controller watches for these CRDs and creates pods to execute the tasks.
5. Each task runs in its own pod, allowing for improved isolation and resource management.
6. The Worker Controller monitors task execution and sends status updates back to the server.


#### Design Advantages

- **Separation of Concerns**: The customer does not need to open a port; our agent initiates the connection, and only the agent has permission to create resources inside the Data Plane.
- **Single Point of Setup**: Only the agent is required to set up the Data Plane, creating the necessary resources such as the controller, CRD, and other components.
- **Multiple Data Planes**: Customers can run multiple Data Planes with one Control Plane based on their requirements (from bare metal to any cloud). In the future, we can add functionality to route tasks to specific Data Planes as needed.
- **Security**: No sensitive information is stored in the Control Plane; we only retain metadata, ensuring enhanced security.
- **Infinite Scalability**: The architecture supports scaling as needed to accommodate varying workloads.
- **Co-location Flexibility**: Customers can run both the Data Plane and Control Plane together inside their VPC for easier management.
- **Secure Storage**: All input parameters are stored as S3 objects, with only references to these objects kept in the metadata, optimizing storage usage.
5 changes: 5 additions & 0 deletions buf.gen.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,11 @@ plugins:
out: pkg/gen/
opt:
- paths=source_relative
- plugin: buf.build/grpc/go:v1.5.1
out: pkg/gen/
opt:
- paths=source_relative

- plugin: buf.build/protocolbuffers/go
out: pkg/gen/
opt:
Expand Down
Loading

0 comments on commit 7c25c4c

Please sign in to comment.