-
Notifications
You must be signed in to change notification settings - Fork 410
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Should --num-cpu be based on CPU requests instead of limits? #2361
Comments
Hey @andrewsykim, funny enough I was just debugging major perf degradation running a Ray workload on Kubernetes and it seems like having a CPU limit is the problem (due to throttling), started googling and came across this issue. I tried starting a cluster without CPU limits but pods do not show up. My feedback is that it would make much more sense to use CPU requests as num-cpus. Also is there a documented behaviour with regards of using KubeRay without CPU limits (you mentioned setting startParams["num-cpu"])? |
I'm deploying via ray-cluster helm chart and tried setting num-cpus in values file like this
and removed I tried both KubeRay 1.1.1 and 1.2.1 |
I'm not sure it matters, but the value needs to be string:
|
I generally agree with this feedback, but I'm worried it's a breaking change. Maybe it's not if we assume requests == limits? |
See https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/config.html#resources |
Yes, it can be a breaking change, given I'm now trying to simply get rid of limits and it just does not create pods. |
I tried setting it as a string, does not help. I checked kuberay-operator logs, it throws an error
Full trace:
I believe the operator sets a default value of 1 cpu if limits are not present, so essentially it enforces setting limits. @andrewsykim can you confirm? |
I don't think the operator does this. Would you happen to have a LimitRange or other admission policy in your cluster that might be defaulting CPU limits in your cluste? |
Yeah I always thought it was weird that I have to set CPU limits in kuberay (and head pod), which can end up throttling the deployment. I general I have followed past Google GKE recommendations I have read to not set a CPU limit, and to always match memory requests and limits. |
I think there's a way to add this without introducing breaking changes, by only defaulting to requests if the limit is not specified. I started a PR here: #2365 |
I did not set any limitranges. Ran |
Any mutating webhooks in your cluster that may be defaulting limits? |
None that I'm aware of. I'm running on minikube, can it be the reason somehow? |
@anovv mind opening a new issue on it? |
Search before asking
Description
Opening this issue to gather feedback on whether
--num-cpus
set on Ray workers should be based on requests instead of limits. From my personal experience, there are few reasons to ever set CPU limits and setting CPU requests is often good enough. Today it's possible to exclude limits but it requires also settingstartParams["num-cpu"]
to match the CPU requests.Would it be beneifical for KubeRay to set
num-cpu
based on requests instead of limits? Would this be considered a breaking change or would it be no-op in most cases since most people configure requests == limits?Use case
No response
Related issues
No response
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: