Skip to content

Commit

Permalink
Run benchmarks on ubuntu-latest-16-cores
Browse files Browse the repository at this point in the history
Signed-off-by: Stefan Prodan <[email protected]>
  • Loading branch information
stefanprodan committed Dec 5, 2023
1 parent bb79f53 commit 97b9dee
Show file tree
Hide file tree
Showing 4 changed files with 51 additions and 73 deletions.
25 changes: 0 additions & 25 deletions .github/workflows/event.yaml

This file was deleted.

2 changes: 1 addition & 1 deletion .github/workflows/test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ jobs:
strategy:
matrix:
RESOURCES: [ 100, 500, 1000 ]
runs-on: ubuntu-latest
runs-on: ubuntu-latest-16-cores
steps:
- name: Checkout
uses: actions/checkout@v3
Expand Down
49 changes: 2 additions & 47 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@
for [Flux](https://fluxcd.io) release candidates,
made with [Timoni](https://github.com/stefanprodan/timoni).

The benchmark results can be found in [RESULTS.md](RESULTS.md).

## Prerequisites

Start by cloning the repository locally:
Expand Down Expand Up @@ -106,50 +108,3 @@ Remove all Flux resources and the benchmark namespaces with:
```shell
timoni bundle delete flux-benchmark
```

## MTTP Benchmark Results (Flux v2.2 RC)

The Mean Time To Production (MTTP) benchmark measures the time it takes for Flux
to deploy application changes into production. We measure the time spent on fetching
app packages from the registry (Flux OCI artifacts and Helm charts) and the time spent
reconciling app definitions on the Kubernetes cluster.

For this benchmark we assume 100, 500 and 1000 app packages being pushed to the registry at the same time.

| Objects | Type | Flux component | Duration Apple M1 | Duration Intel | Max Memory |
|---------|---------------|----------------------|-------------------|----------------|------------|
| 100 | OCIRepository | source-controller | 35s | 35s | 38Mi |
| 100 | Kustomization | kustomize-controller | 38s | 38s | 32Mi |
| 100 | HelmChart | source-controller | 35s | 35s | 40Mi |
| 100 | HelmRelease | helm-controller | 42s | 42s | 140Mi |
| 500 | OCIRepository | source-controller | 45s | 45s | 65Mi |
| 500 | Kustomization | kustomize-controller | **1m50s** | **3m50s** | 72Mi |
| 500 | HelmChart | source-controller | 1m10s | 1m10s | 68Mi |
| 500 | HelmRelease | helm-controller | **1m58s** | **4m40s** | 350Mi |
| 1000 | OCIRepository | source-controller | 1m30s | 1m30s | 67Mi |
| 1000 | Kustomization | kustomize-controller | **3m58s** | **4m50s** | 112Mi |
| 1000 | HelmChart | source-controller | 1m45s | 1m45s | 110Mi |
| 1000 | HelmRelease | helm-controller | **5m10s** | **14m10s** | 620Mi |

### Observations

Increasing kustomize-controller's concurrency above 10,
does yell better results but the tmp dir must be in tmpfs to avoid kustomize build disk thrashing.

Setting `DisableStatusPollerCache` in kustomize-controller is a must when reconciling more than 100
objects in a namespace, otherwise the poller cache will fill all the available memory.

Increasing helm-controller's concurrency above 10,
does not yell better results due to Helm SDK overloading the Kubernetes OpenAPI endpoint.
Higher concurrency probably requires an HA Kubernetes control plane with multiple API replicas.

### Specs

- Apple Macbook Pro (M1 Max 8cpu)
- GitHub hosted-runner (CPU Intel 2cpu)
- Kubernetes Kind (v1.28.0 / 3 nodes)
- Flux source-controller (1CPU / 1Gi / concurrency 10)
- Flux kustomize-controller (1CPU / 1Gi / concurrency 20)
- Flux helm-controller (2CPU / 1Gi / concurrency 10)
- Helm repository (oci://ghcr.io/stefanprodan/charts/podinfo)
- App manifests (Deployment scaled to zero, Service Account, Service, Ingress)
48 changes: 48 additions & 0 deletions RESULTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# MTTP Benchmark Results

The Mean Time To Production (MTTP) benchmark measures the time it takes for Flux
to deploy application changes into production. We measure the time spent on fetching
app packages from the registry (Flux OCI artifacts and Helm charts) and the time spent
reconciling app definitions on the Kubernetes cluster.

For this benchmark we assume 100, 500 and 1000 app packages being pushed to the registry at the same time.

**Specs**

- GitHub hosted-runner (ubuntu-latest-16-cores)
- Kubernetes Kind (v1.28.0 / 3 nodes)
- Flux source-controller (1CPU / 1Gi / concurrency 10)
- Flux kustomize-controller (1CPU / 1Gi / concurrency 20)
- Flux helm-controller (2CPU / 1Gi / concurrency 10)
- Helm repository (oci://ghcr.io/stefanprodan/charts/podinfo)
- App manifests (Deployment scaled to zero, Service Account, Service, Ingress)

## Flux v2.2.0

| Objects | Type | Flux component | Duration | Max Memory |
|---------|---------------|----------------------|----------|------------|
| 100 | OCIRepository | source-controller | 35s | 38Mi |
| 100 | Kustomization | kustomize-controller | 38s | 32Mi |
| 100 | HelmChart | source-controller | 35s | 40Mi |
| 100 | HelmRelease | helm-controller | 42s | 140Mi |
| 500 | OCIRepository | source-controller | 45s | 65Mi |
| 500 | Kustomization | kustomize-controller | 3m50s | 72Mi |
| 500 | HelmChart | source-controller | 1m10s | 68Mi |
| 500 | HelmRelease | helm-controller | 4m40s | 350Mi |
| 1000 | OCIRepository | source-controller | 1m30s | 67Mi |
| 1000 | Kustomization | kustomize-controller | 4m50s | 112Mi |
| 1000 | HelmChart | source-controller | 1m45s | 110Mi |
| 1000 | HelmRelease | helm-controller | 14m10 | 620Mi |

### Observations

Increasing kustomize-controller's concurrency above 10,
does yell better results but the tmp dir must be in tmpfs to avoid kustomize build disk thrashing.

Setting `DisableStatusPollerCache` in kustomize-controller is a must when reconciling more than 100
objects in a namespace, otherwise the poller cache will fill all the available memory.

Increasing helm-controller's concurrency above 10,
does not yell better results due to Helm SDK overloading the Kubernetes OpenAPI endpoint.
Higher concurrency probably requires an HA Kubernetes control plane with multiple API replicas.

0 comments on commit 97b9dee

Please sign in to comment.