Skip to content

Commit

Permalink
Merge pull request #108 from zalando-incubator/vpa-1.0.0
Browse files Browse the repository at this point in the history
vpa: Update to version 1.0.0
  • Loading branch information
mikkeloscar authored Mar 26, 2024
2 parents fbbd118 + f6ca5d1 commit c8b9704
Show file tree
Hide file tree
Showing 10,774 changed files with 1,658,481 additions and 676,967 deletions.
The diff you're trying to view is too large. We only load the first 3000 changed files.
17 changes: 17 additions & 0 deletions .github/ISSUE_TEMPLATE/vpa_release.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
---
name: VPA release
about: Create an issue for tracking VPA release
title: 'VPA release X.Y.Z'
labels: 'vertical-pod-autoscaler'
assignees: ''

---

Please follow instructions in
[RELEASE.md](https://github.com/kubernetes/autoscaler/blob/master/vertical-pod-autoscaler/RELEASE.md) and note each
completed step on this issue.

Please provide any information that is related to the release:

- When we plan to do the release?
- Are there any issues / PRs blocking the release?
6 changes: 0 additions & 6 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,3 @@
#### Which component this PR applies to?

<!--
Which autoscaling component hosted in this repository (cluster-autoscaler, vertical-pod-autoscaler, addon-resizer, helm charts) this PR applies to?
-->

#### What type of PR is this?

<!--
Expand Down
16 changes: 16 additions & 0 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
version: 2
updates:
- package-ecosystem: gomod
directory: "/vertical-pod-autoscaler"
schedule:
interval: daily
open-pull-requests-limit: 0 # setting this to 0 means only allowing security updates, see https://docs.github.com/en/code-security/dependabot/dependabot-version-updates/configuration-options-for-the-dependabot.yml-file#open-pull-requests-limit
labels:
- "vertical-pod-autoscaler"
- package-ecosystem: docker
directory: "/vertical-pod-autoscaler/builder"
schedule:
interval: daily
open-pull-requests-limit: 10
labels:
- "vertical-pod-autoscaler"
2 changes: 1 addition & 1 deletion .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ jobs:
- name: Set up Go
uses: actions/setup-go@v2
with:
go-version: 1.18.1
go-version: '>=1.20.0'

- uses: actions/checkout@v2
with:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/pr.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ jobs:
run: |
changed=$(ct list-changed)
if [[ -n "$changed" ]]; then
echo "::set-output name=changed::true"
echo "changed=true" >> $GITHUB_OUTPUT
fi
- if: steps.list-changed.outputs.changed == 'true'
name: Create kind cluster
Expand Down
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -28,3 +28,5 @@
Session.vim
.netrwhist

# Binary files
bin/
4 changes: 2 additions & 2 deletions OWNERS
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
approvers:
- mwielgus
- maciekpytel
- bskiba
- gjtempleton
reviewers:
- mwielgus
- maciekpytel
- bskiba
- gjtempleton
emeritus_approvers:
- bskiba # 2022-09-30
9 changes: 5 additions & 4 deletions addon-resizer/OWNERS
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
approvers:
- bskiba
- wojtek-t
- jbartosik
reviewers:
- bskiba
- wojtek-t
- jbartosik
emeritus_approvers:
- bskiba # 2022-09-30
- wojtek-t # 2022-09-30
labels:
- addon-resizer
2 changes: 1 addition & 1 deletion addon-resizer/deploy/example.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ spec:
spec:
serviceAccountName: pod-nanny
containers:
- image: k8s.gcr.io/autoscaling/addon-resizer:1.8.14
- image: registry.k8s.io/autoscaling/addon-resizer:1.8.14
imagePullPolicy: Always
name: pod-nanny
resources:
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
# KEP-5546: Scaling based on container count

<!-- toc -->
- [Summary](#summary)
- [Goals](#goals)
- [Non-Goals](#non-goals)
- [Proposal](#proposal)
- [Notes](#notes)
- [Risks and Mitigations](#risks-and-mitigations)
- [Design Details](#design-details)
- [Test Plan](#test-plan)
<!-- /toc -->

## Summary

Currently Addon Resizer supports scaling based on the number of nodes. Some workloads use resources proportionally to
the number of containers in the cluster. Since number of containers per node is very different in different clusters
it's more resource-efficient to scale such workloads based directly on the container count.

### Goals

- Allow scaling workloads based on count of containers in a cluster.
- Allow this for Addon Resizer 1.8 ([used by metrics server]).

### Non-Goals

- Using both node and container count to scale workloads.
- Bringing this change to the `master` branch of Addon Resizer.

## Proposal

Add flag `--scaling-mode` to Addon Resizer on the [`addon-resizer-release-1.8`] branch. Flag will
have two valid values:

- `node-proportional` - default, current behavior.
- `container-proportional` - addon resizer will set resources, using the same algorithm it's using now but using number
of containers where it's currently using number of nodes.

### Notes

Addon Resizer 1.8 assumes in multiple places that it's scaling based on the number of nodes:

- [Flag descriptions] that directly reference node counts (`--extra-cpu`, `--extra-memory`, `--extra-storage`, and
`--minClusterSize`) will need to be updated to instead refer to cluster size.
- [README] will need to be updated to reference cluster size instead of node count and explain that cluster size refers
to either node count or container count, depending on the value of the `--scaling-mode` flag.
- Many variable names in code which now refer to node count will refer to cluster size and should be renamed accordingly.

In addition to implementing the feature we should also clean up the code and documentation.

### Risks and Mitigations

One potential risk is that Addon resizer can obtain cluster size (node count or container count):
- from metrics or
- by querying Cluster Api Server to list all objects of the appropriate type

depending on the configuration. There can be many times more containers in a cluster that there are nodes. So listing
all containers could result in higher load on the Cluster API server. Since Addon Resizer is requesting very few fields
I don't expect this effect to be noticeable.

Also I expect metrics-server to test for this before using the feature and any other users of Addon Resizer are likely
better off using metrics (which don't have this problem).

## Design Details

- Implement function `kubernetesClient.CountContainers()`. It will be analogous to the existing
[`kubernetesClient.CountNodes()`] function.
- If using metrics to determine number of containers in the cluster:
- Fetch pod metrics (similar to [fetching node metrics] but use `/pods` URI instead of `/nodes`).
- For each pod obtain number of containers (length of the `containers` field).
- Sum container counts for all pods.
- If using API server:
- Fetch list pods (similar to [listing nodes])
- Fetch only [`Spec.InitContainers`], [`Spec.Containers`], and [`Spec.EphemeralContainers`] fields.
- Exclude pods in terminal states ([selector excluding pods in terminal states in VPA])
- Sum container count over pods.
- Add the `--scaling-mode` flag, with two valid values:
- `node-proportional` - default, current behavior, scaling based on clusters node count and
- `container-proportional` - new behavior, scaling based on clusters container count
- Pass value indicating if we should use node count or container count to the [`updateResources()`] function.
- In `updateResources()` use node count or container count, depending on the value.

Check that listing containers directly works

Coinsider listing pods, getting containers only for working pods

### Test Plan

In addition to unit tests we will run manual e2e test:

- Create config based on [`example.yaml`] but scaling the deployment based on the number of containers in the cluster.
- Create config starting deployment with 100 `pause` containers.

Test the feature by:

- Starting the deployment scaled by Addon Resizer, based on node count.
- Observe size of the deployment and that it's stable.
- Start deployment with 100 `pause` containers.
- Observe the scaled deployment change resources appropriately.

Test the node-based scaling:

- Apply [`example.yaml`].
- Observe amount and stability assigned resources.
- Resize cluster.
- Observe change in assigned resources.

Both tests should be performed with metrics- and API- based scaling.

[used by metrics server]: https://github.com/kubernetes-sigs/metrics-server/blob/0c47555e9b49cfe0719db1a0b7fb6c8dcdff3d38/charts/metrics-server/values.yaml#L121
[`addon-resizer-release-1.8`]: https://github.com/kubernetes/autoscaler/tree/addon-resizer-release-1.8
[Flag descriptions]: https://github.com/kubernetes/autoscaler/blob/da500188188d275a382be578ad3d0a758c3a170f/addon-resizer/nanny/main/pod_nanny.go#L47
[README]: https://github.com/kubernetes/autoscaler/blob/da500188188d275a382be578ad3d0a758c3a170f/addon-resizer/README.md?plain=1#L1
[`kubernetesClient.CountNodes()`]: https://github.com/kubernetes/autoscaler/blob/da500188188d275a382be578ad3d0a758c3a170f/addon-resizer/nanny/kubernetes_client.go#L58
[fetching node metrics]: https://github.com/kubernetes/autoscaler/blob/da500188188d275a382be578ad3d0a758c3a170f/addon-resizer/nanny/kubernetes_client.go#L150
[listing nodes]: https://github.com/kubernetes/autoscaler/blob/da500188188d275a382be578ad3d0a758c3a170f/addon-resizer/nanny/kubernetes_client.go#L71
[`Spec.InitContainers`]: https://github.com/kubernetes/api/blob/1528256abbdf8ff2510112b28a6aacd239789a36/core/v1/types.go#L3143
[`Spec.Containers`]: https://github.com/kubernetes/api/blob/1528256abbdf8ff2510112b28a6aacd239789a36/core/v1/types.go#L3150
[`Spec.EphemeralContainers`]: https://github.com/kubernetes/api/blob/1528256abbdf8ff2510112b28a6aacd239789a36/core/v1/types.go#L3158
[`Status.Phase`]: https://github.com/kubernetes/api/blob/1528256abbdf8ff2510112b28a6aacd239789a36/core/v1/types.go#L4011
[selector excluding pods in terminal states in VPA]: https://github.com/kubernetes/autoscaler/blob/04e5bfc88363b4af9fdeb9dfd06c362ec5831f51/vertical-pod-autoscaler/e2e/v1beta2/common.go#L195
[`updateResources()`]: https://github.com/kubernetes/autoscaler/blob/da500188188d275a382be578ad3d0a758c3a170f/addon-resizer/nanny/nanny_lib.go#L126
[`example.yaml`]: https://github.com/kubernetes/autoscaler/blob/c8d612725c4f186d5de205ed0114f21540a8ed39/addon-resizer/deploy/example.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# KEP-5546: Automatic reload of nanny configuration when updated

<!-- toc -->
- [Summary](#summary)
- [Goals](#goals)
- [Non-Goals](#non-goals)
- [Proposal](#proposal)
- [Notes](#notes)
- [Risks and Mitigations](#risks-and-mitigations)
- [Design Details](#design-details)
- [Test Plan](#test-plan)
<!-- /toc -->

Sure, here's the enhancement proposal in the requested format:

## Summary
- **Goals:** The goal of this enhancement is to improve the user experience for applying nanny configuration changes in the addon-resizer 1.8 when used with the metrics server. The proposed solution involves automatically reloading the nanny configuration whenever changes occur, eliminating the need for manual intervention and sidecar containers.
- **Non-Goals:** This proposal does not aim to update the functional behavior of the addon-resizer.

## Proposal
The proposed solution involves updating the addon-resizer with the following steps:
- Create a file system watcher using `fsnotify` under `utils/fswatcher` to watch nanny configurations' changes. It should run as a goroutine in the background.
- Detect changes of the nanny configurations' file using the created `fswatcher` trigger the reloading process when configuration changes are detected. Events should be sent in a channel.
- Re-execute the method responsible for building the NannyConfiguration `loadNannyConfiguration` to apply the updated configuration to the addon-resizer.
- Proper error handling should be implemented to manage scenarios where the configuration file is temporarily inaccessible or if there are parsing errors in the configuration file.

### Risks and Mitigations
- There is a potential risk of filesystem-related issues causing the file watcher to malfunction. Proper testing and error handling should be implemented to handle such scenarios gracefully.
- Errors in the configuration file could lead to unexpected behavior or crashes. The addon-resizer should handle parsing errors and fall back to the previous working configuration if necessary.

## Design Details
- Create a new package for the `fswatcher` under `utils/fswatcher`. It would contain the `fswatcher` struct and methods and unit-tests.
- `FsWatcher` struct would look similar to this:
```go
type FsWatcher struct {
*fsnotify.Watcher

Events chan struct{}
ratelimit time.Duration
names []string
paths map[string]struct{}
}
```
- Implement the following functions:
- `CreateFsWatcher`: Instantiates a new `FsWatcher` and start watching on file system.
- `initWatcher`: Initializes the `fsnotify` watcher and initialize the `paths` that would be watched.
- `add`: Adds a new file to watch.
- `reset`: Re-initializes the `FsWatcher`.
- `watch`: watches for the configured files.
- In the main function, we create a new `FsWatcher` and then we wait in an infinite loop to receive events indicating
filesystem changes. Based on these changes, we re-execute `loadNannyConfiguration` function.

> **Note:** The expected configuration file format is YAML. It has the same structure as the NannyConfiguration CRD.

### Test Plan
To ensure the proper functioning of the enhanced addon-resizer, the following test plan should be executed:
1. **Unit Tests:** Write unit tests to validate the file watcher's functionality and ensure it triggers events when the configuration file changes.
2. **Manual e2e Tests:** Deploy the addon-resizer with `BaseMemory` of `300Mi` and then we change the `BaseMemory` to `100Mi`. We should observer changes in the behavior of watched pod.
[fsnotify]: https://github.com/fsnotify/fsnotify
1 change: 1 addition & 0 deletions addon-resizer/enhancements/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Lightweight enhancements proposals for Addon Resizer. Please use [KEP template](https://github.com/kubernetes/enhancements/blob/master/keps/NNNN-kep-template/README.md) as guidance.
6 changes: 6 additions & 0 deletions balancer/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
FROM gcr.io/distroless/static:latest
MAINTAINER Marcin Wielgus "[email protected]"

COPY balancer /

ENTRYPOINT ["/balancer"]
59 changes: 59 additions & 0 deletions balancer/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
all: build

FLAGS=
OFFICIAL_NAME=balancer

build: clean
go build .

build-linux-amd64: clean
GOOS=linux GOARCH=amd64 CGO_ENABLED=0 go build .

test-unit: clean
go test --test.short -race ./... $(FLAGS)

docker-build:
ifndef REGISTRY
ERR = $(error REGISTRY is undefined)
$(ERR)
endif
ifndef TAG
ERR = $(error TAG is undefined)
$(ERR)
endif
docker build --pull -t ${REGISTRY}/${OFFICIAL_NAME}:${TAG} .

docker-push:
ifndef REGISTRY
ERR = $(error REGISTRY is undefined)
$(ERR)
endif
ifndef TAG
ERR = $(error TAG is undefined)
$(ERR)
endif
docker push ${REGISTRY}/${OFFICIAL_NAME}:${TAG}

docker-builder:
docker build -t vpa-autoscaling-builder ../vertical-pod-autoscaler/builder

build-in-docker: clean docker-builder
docker run -v `pwd`/..:/gopath/src/k8s.io/autoscaler vpa-autoscaling-builder:latest bash -c 'cd /gopath/src/k8s.io/autoscaler/balancer && make build-linux-amd64'

build-image-in-docker: build-in-docker docker-build

test-in-docker: build-in-docker
docker run -v `pwd`/..:/gopath/src/k8s.io/autoscaler vpa-autoscaling-builder:latest bash -c 'cd /gopath/src/k8s.io/autoscaler/balancer && make test-unit'

release: build-image-in-docker docker-push
@echo "Full in-docker release ${OFFICIAL_NAME}:${TAG} completed"

clean:
rm -f balancer

format:
test -z "$$(find . -path ./vendor -prune -type f -o -name '*.go' -exec gofmt -s -d {} + | tee /dev/stderr)" || \
test -z "$$(find . -path ./vendor -prune -type f -o -name '*.go' -exec gofmt -s -w {} + | tee /dev/stderr)"

.PHONY: all build test-unit clean format release

8 changes: 8 additions & 0 deletions balancer/OWNERS
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
approvers:
- mwielgus
- kgolab
reviewers:
- mwielgus
- kgolab
labels:
- balancer
Loading

0 comments on commit c8b9704

Please sign in to comment.