Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configure ray pod runtime class based on custom pod specs #6199

Merged
merged 1 commit into from
Jan 29, 2025

Conversation

Sovietaced
Copy link
Contributor

@Sovietaced Sovietaced commented Jan 28, 2025

Tracking issue

Closes #6198

Why are the changes needed?

When you run GPU workloads on a ray cluster you need to configure the runtime class for GPU pods. Folks need a way to do this without also configuring the runtime class (and associated GPUs) for the submitter pod, which never needs GPUs.

What changes were proposed in this pull request?

Runtime class names are pulled out of the custom head/worker pod specs and injected into the pod templates set on the kuberay CR.

How was this patch tested?

Unit tests

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Summary by Bito

This PR implements runtime class name configuration support for Ray pod specifications, focusing on GPU workload requirements. The changes enable separate runtime class settings for head and worker pods while maintaining submitter pod independence. The implementation includes custom pod spec handling and kuberay CR pod template integration.

Unit tests added: True

Estimated effort to review (1-5, lower is better): 2

@flyte-bot
Copy link
Collaborator

Code Review Agent Run Status

  • Limitations and other issues: ❌ Failure - The AI Code Review Agent skipped reviewing this change because it is configured to exclude certain pull requests based on the source/target branch or the pull request status. You can change the settings here, or contact the agent instance creator at [email protected].

@Sovietaced Sovietaced added the added Merged changes that add new functionality label Jan 28, 2025
@Sovietaced Sovietaced marked this pull request as ready for review January 28, 2025 07:34
Copy link

codecov bot commented Jan 28, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 37.07%. Comparing base (45ce4c0) to head (8ba7051).
Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6199      +/-   ##
==========================================
- Coverage   37.08%   37.07%   -0.01%     
==========================================
  Files        1318     1318              
  Lines      132707   132710       +3     
==========================================
- Hits        49208    49204       -4     
- Misses      79244    79251       +7     
  Partials     4255     4255              
Flag Coverage Δ
unittests-datacatalog 51.58% <ø> (ø)
unittests-flyteadmin 54.31% <ø> (-0.03%) ⬇️
unittests-flytecopilot 30.99% <ø> (ø)
unittests-flytectl 62.29% <ø> (ø)
unittests-flyteidl 7.23% <ø> (ø)
unittests-flyteplugins 53.87% <100.00%> (+<0.01%) ⬆️
unittests-flytepropeller 42.73% <ø> (ø)
unittests-flytestdlib 55.33% <ø> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Sovietaced Sovietaced changed the title Configure pod runtime class based on custom pod specs Configure ray pod runtime class based on custom pod specs Jan 28, 2025
@flyte-bot
Copy link
Collaborator

flyte-bot commented Jan 28, 2025

Code Review Agent Run #97d248

Actionable Suggestions - 3
  • flyteplugins/go/tasks/plugins/k8s/ray/ray.go - 1
    • Consider validating RuntimeClassName before assignment · Line 548-550
  • flyteplugins/go/tasks/plugins/k8s/ray/ray_test.go - 2
    • Consider moving runtime class name constant · Line 457-459
    • Consider extracting pod spec access pattern · Line 561-562
Review Details
  • Files reviewed - 2 · Commit Range: 8ba7051..8ba7051
    • flyteplugins/go/tasks/plugins/k8s/ray/ray.go
    • flyteplugins/go/tasks/plugins/k8s/ray/ray_test.go
  • Files skipped - 0
  • Tools
    • Golangci-lint (Linter) - ✖︎ Failed
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful

AI Code Review powered by Bito Logo

@flyte-bot
Copy link
Collaborator

Changelist by Bito

This pull request implements the following key changes.

Key Change Files Impacted
Feature Improvement - Ray Pod Runtime Class Configuration Enhancement

ray.go - Added support for configuring runtime class names in Ray pod specs

ray_test.go - Added test cases for runtime class name configuration in Ray pods

Comment on lines +548 to +550
if customPodSpec.RuntimeClassName != nil {
podSpec.RuntimeClassName = customPodSpec.RuntimeClassName
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider validating RuntimeClassName before assignment

Consider adding validation for RuntimeClassName before assignment. The value should be checked for compliance with Kubernetes naming conventions.

Code suggestion
Check the AI-generated fix before applying
Suggested change
if customPodSpec.RuntimeClassName != nil {
podSpec.RuntimeClassName = customPodSpec.RuntimeClassName
}
if customPodSpec.RuntimeClassName != nil {
if validateRuntimeClassName(*customPodSpec.RuntimeClassName) {
podSpec.RuntimeClassName = customPodSpec.RuntimeClassName
}
}

Code Review Run #97d248


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

Comment on lines +457 to +459
nvidiaRuntimeClassName := "nvidia-cdi"

headPodSpecCustomResources := &corev1.PodSpec{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider moving runtime class name constant

Consider moving the nvidiaRuntimeClassName constant to a package-level constant since it appears to be a reusable configuration value.

Code suggestion
Check the AI-generated fix before applying
  	serviceAccount = "ray_sa"
 +	nvidiaRuntimeClassName = "nvidia-cdi"
  )
 @@ -457,3 +457,1 @@
 -	nvidiaRuntimeClassName := "nvidia-cdi"
 -
 -	headPodSpecCustomResources := &corev1.PodSpec{

Code Review Run #97d248


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

Comment on lines +561 to +562
headPodSpec := rayJob.Spec.RayClusterSpec.HeadGroupSpec.Template.Spec
headPodResources := headPodSpec.Containers[0].Resources
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider extracting pod spec access pattern

Consider extracting the pod spec and resources access into a helper function to avoid code duplication. Similar pattern is repeated for both head and worker pods.

Code suggestion
Check the AI-generated fix before applying
 -			headPodSpec := rayJob.Spec.RayClusterSpec.HeadGroupSpec.Template.Spec
 -			headPodResources := headPodSpec.Containers[0].Resources
 +			headPodSpec, headPodResources := getPodSpecAndResources(rayJob.Spec.RayClusterSpec.HeadGroupSpec.Template)
 @@ -571,2 +571,1 @@
 -				workerPodSpec := workerGroupSpec.Template.Spec
 -				workerPodResources := workerPodSpec.Containers[0].Resources
 +				workerPodSpec, workerPodResources := getPodSpecAndResources(workerGroupSpec.Template)

Code Review Run #97d248


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

Copy link
Contributor

@davidmirror-ops davidmirror-ops left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@davidmirror-ops davidmirror-ops merged commit 448aba9 into flyteorg:master Jan 29, 2025
53 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
added Merged changes that add new functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Ray plugin should support setting runtime class from head/worker pod specs
3 participants