Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[kubectl-plugin] Support specifying number of head GPUs and worker GPUs for Rayjob #2989

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

win5923
Copy link
Contributor

@win5923 win5923 commented Feb 9, 2025

Why are these changes needed?

As title

Manual Test

$ kubectl ray job submit --name ray-job-sample pip --working-dir ~/workdir --head-gpu 1 --worker-gpu 1 --runtime-env ~/workdir/runtimeEnv.yaml -- python sample_code.py

For worker pod:
image

For head pod:
image

Related issue number

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

Comment on lines -104 to -131
// If the HeadGPU resource is set with a value, then proceed with parsing.
if rayClusterSpecObject.HeadGPU != "" {
headGPUResource := resource.MustParse(rayClusterSpecObject.HeadGPU)
if !headGPUResource.IsZero() {
var requests, limits corev1.ResourceList
requests = *rayClusterSpec.HeadGroupSpec.Template.Spec.Containers[0].Resources.Requests
limits = *rayClusterSpec.HeadGroupSpec.Template.Spec.Containers[0].Resources.Limits
requests[corev1.ResourceName(resourceNvidiaGPU)] = headGPUResource
limits[corev1.ResourceName(resourceNvidiaGPU)] = headGPUResource

rayClusterSpec.HeadGroupSpec.Template.Spec.Containers[0].Resources.Requests = &requests
rayClusterSpec.HeadGroupSpec.Template.Spec.Containers[0].Resources.Limits = &limits
}
headGPUResource := resource.MustParse(rayClusterSpecObject.HeadGPU)
if !headGPUResource.IsZero() {
var requests, limits corev1.ResourceList
requests = *rayClusterSpec.HeadGroupSpec.Template.Spec.Containers[0].Resources.Requests
limits = *rayClusterSpec.HeadGroupSpec.Template.Spec.Containers[0].Resources.Limits
requests[corev1.ResourceName(resourceNvidiaGPU)] = headGPUResource
limits[corev1.ResourceName(resourceNvidiaGPU)] = headGPUResource

rayClusterSpec.HeadGroupSpec.Template.Spec.Containers[0].Resources.Requests = &requests
rayClusterSpec.HeadGroupSpec.Template.Spec.Containers[0].Resources.Limits = &limits
}

// If the workerGPU resource is set with a value, then proceed with parsing.
if rayClusterSpecObject.WorkerGPU != "" {
workerGPUResource := resource.MustParse(rayClusterSpecObject.WorkerGPU)
if !workerGPUResource.IsZero() {
var requests, limits corev1.ResourceList
requests = *rayClusterSpec.WorkerGroupSpecs[0].Template.Spec.Containers[0].Resources.Requests
limits = *rayClusterSpec.WorkerGroupSpecs[0].Template.Spec.Containers[0].Resources.Limits
requests[corev1.ResourceName(resourceNvidiaGPU)] = workerGPUResource
limits[corev1.ResourceName(resourceNvidiaGPU)] = workerGPUResource

rayClusterSpec.WorkerGroupSpecs[0].Template.Spec.Containers[0].Resources.Requests = &requests
rayClusterSpec.WorkerGroupSpecs[0].Template.Spec.Containers[0].Resources.Limits = &limits
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, kubectl ray job submit did not support setting headGPU and workerGPU, which caused a panic. With this addition, the original implementation can now be restored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant