-
Notifications
You must be signed in to change notification settings - Fork 684
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Configure ray pod runtime class based on custom pod specs #6199
Conversation
Signed-off-by: Jason Parraga <[email protected]>
Code Review Agent Run Status
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #6199 +/- ##
==========================================
- Coverage 37.08% 37.07% -0.01%
==========================================
Files 1318 1318
Lines 132707 132710 +3
==========================================
- Hits 49208 49204 -4
- Misses 79244 79251 +7
Partials 4255 4255
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Code Review Agent Run #97d248Actionable Suggestions - 3
Review Details
|
Changelist by BitoThis pull request implements the following key changes.
|
if customPodSpec.RuntimeClassName != nil { | ||
podSpec.RuntimeClassName = customPodSpec.RuntimeClassName | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding validation for RuntimeClassName
before assignment. The value should be checked for compliance with Kubernetes naming conventions.
Code suggestion
Check the AI-generated fix before applying
if customPodSpec.RuntimeClassName != nil { | |
podSpec.RuntimeClassName = customPodSpec.RuntimeClassName | |
} | |
if customPodSpec.RuntimeClassName != nil { | |
if validateRuntimeClassName(*customPodSpec.RuntimeClassName) { | |
podSpec.RuntimeClassName = customPodSpec.RuntimeClassName | |
} | |
} | |
Code Review Run #97d248
Is this a valid issue, or was it incorrectly flagged by the Agent?
- it was incorrectly flagged
nvidiaRuntimeClassName := "nvidia-cdi" | ||
|
||
headPodSpecCustomResources := &corev1.PodSpec{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider moving the nvidiaRuntimeClassName
constant to a package-level constant since it appears to be a reusable configuration value.
Code suggestion
Check the AI-generated fix before applying
serviceAccount = "ray_sa"
+ nvidiaRuntimeClassName = "nvidia-cdi"
)
@@ -457,3 +457,1 @@
- nvidiaRuntimeClassName := "nvidia-cdi"
-
- headPodSpecCustomResources := &corev1.PodSpec{
Code Review Run #97d248
Is this a valid issue, or was it incorrectly flagged by the Agent?
- it was incorrectly flagged
headPodSpec := rayJob.Spec.RayClusterSpec.HeadGroupSpec.Template.Spec | ||
headPodResources := headPodSpec.Containers[0].Resources |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider extracting the pod spec and resources access into a helper function to avoid code duplication. Similar pattern is repeated for both head and worker pods.
Code suggestion
Check the AI-generated fix before applying
- headPodSpec := rayJob.Spec.RayClusterSpec.HeadGroupSpec.Template.Spec
- headPodResources := headPodSpec.Containers[0].Resources
+ headPodSpec, headPodResources := getPodSpecAndResources(rayJob.Spec.RayClusterSpec.HeadGroupSpec.Template)
@@ -571,2 +571,1 @@
- workerPodSpec := workerGroupSpec.Template.Spec
- workerPodResources := workerPodSpec.Containers[0].Resources
+ workerPodSpec, workerPodResources := getPodSpecAndResources(workerGroupSpec.Template)
Code Review Run #97d248
Is this a valid issue, or was it incorrectly flagged by the Agent?
- it was incorrectly flagged
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Tracking issue
Closes #6198
Why are the changes needed?
When you run GPU workloads on a ray cluster you need to configure the runtime class for GPU pods. Folks need a way to do this without also configuring the runtime class (and associated GPUs) for the submitter pod, which never needs GPUs.
What changes were proposed in this pull request?
Runtime class names are pulled out of the custom head/worker pod specs and injected into the pod templates set on the kuberay CR.
How was this patch tested?
Unit tests
Check all the applicable boxes
Summary by Bito
This PR implements runtime class name configuration support for Ray pod specifications, focusing on GPU workload requirements. The changes enable separate runtime class settings for head and worker pods while maintaining submitter pod independence. The implementation includes custom pod spec handling and kuberay CR pod template integration.Unit tests added: True
Estimated effort to review (1-5, lower is better): 2