Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support NVIDIA MIG for more secure multi-tenant GPU sharing / allocation #4252

Open
jcmcken opened this issue Oct 22, 2024 · 1 comment
Open
Assignees
Labels
status/needs-triage Pending triage or re-evaluation type/enhancement New feature or request

Comments

@jcmcken
Copy link

jcmcken commented Oct 22, 2024

What I'd like:

NVIDIA time-slicing landed (see #2347) in Bottlerocket 1.25. While a step forward, this can't really be used all that securely in a multi-tenant EKS cluster (as mentioned in the documentation).

NVIDIA MIG provides a more secure alternative.

Here are some references:

Any alternatives you've considered:

NVIDIA vGPU is another alternative, but it has fewer isolation guarantees than MIG. It's also not all that clear how it would be used in EKS, and I think might require a commercial license.

Today, in our multi-tenant clusters, the only solution seems to be that each pod receives its own GPU worker node, which is highly cost ineffective.

@jcmcken jcmcken added status/needs-triage Pending triage or re-evaluation type/enhancement New feature or request labels Oct 22, 2024
@piyush-jena piyush-jena self-assigned this Oct 29, 2024
@piyush-jena
Copy link
Contributor

Hi @jcmcken ! MIG is a prioritized feature on the Bottlerocket roadmap, and the team is actively working on its development. While we cannot commit to a specific delivery date, we aim to provide an update on the progress in the next two weeks. Please use this issue for the tracking purposes. Thank you for your interest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/needs-triage Pending triage or re-evaluation type/enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants