Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] Efficient and Greener way to use k8s cluster for benchmarking tasks #67

Open
dipankardas011 opened this issue Feb 22, 2024 · 2 comments

Comments

@dipankardas011
Copy link
Contributor

Proposal

Context: Use of autoscaler to scale the cluster up / down

Why: Assuming that benchmarking or other tasks related to specific projects run only for finite intervals, also that the the event of doing this is MP is a release event for all supported projects

Expected outcome: when we have to run specific project benchmark tasks we can use the OpenTOFU to add a node and we can attach node labels, etc. and then we can schedule our workload to it. once done with all the processing of the tasks we can store the results in Grafana or something and then de-provision the node we allocated to free up the node we provisioned before

Achievement: reduced costs, also demonstrates how can we optimize the Tests on each project

Challenges:

  • @AntonioDiTuri 📔 About adding a node on demand it would be nice for the next release, I guess a tradeoff would be startup time, a new node might take a while to set up.)
@rossf7
Copy link
Contributor

rossf7 commented Apr 16, 2024

@dipankardas011 Thanks again for creating this proposal. Reducing the footprint of the cluster is important to us as a WG and will be important as we onboard more projects.

We deferred this from the Q2 pipeline automation work we are starting. This is because the scope of that work is already large. However IMO the automation we will be developing in #84 would also help with this in future.

There are some challenges here and I expect it will take around 10 mins to join nodes to the cluster. However none of the metrics we capture are time sensitive and the benchmarking also takes time so I don't see that as a blocker.

@dipankardas011
Copy link
Contributor Author

Updates

we are planning for subset of kubernetes cluster to be scalable. Worker node where the benchmarking job will work are going to be made scalable to zero

later we can also think of batching benchmarking jobs where we can have N: no of benchmarking jobs we spin up and add the node to the cluster perform the benchmarking of all those N jobs and once its done we can then free up the baremetal (suggestion by @leonardpahlke)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Ready
Development

No branches or pull requests

3 participants