Nvidia container-runtime API for GPU allocation #4052
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Co-authored-by: Monirul Islam
Revives: #3994
Description of changes:
This PR will expose two new APIs that will allow customer to configure value of
accept-nvidia-visible-devices-as-volume-mounts
andaccept-nvidia-visible-devices-envvar-when-unprivileged
for nvidia container runtime.We introduce the default behavior to inject Nvidia GPUs using volume-mounts(#3718). This PR is to allow the users to opt-in to the previous behavior that allows unprivileged pods to have access to all GPUs when
NVIDIA_VISIBLE_DEVICES=all
is enabled and make both behavior configurable.settings.kubernetes.nvidia.container-runtime.visible-devices-as-volume-mounts
accept-nvidia-visible-devices-as-volume-mounts
value for k8s container-toolkittrue
|false
default:true
visible-devices-as-volume-mounts
settings will alters the method of GPU detection and integration within container environments. Setting this parameter totrue
enables the NVIDIA runtime to recognize GPU devices listed in theNVIDIA_VISIBLE_DEVICES
environment variable and mount them as volumes, which permits applications within the container to interact with and leverage the GPUs as if they were local resources.settings.kubernetes.nvidia.container-runtime.visible-devices-envvar-when-unprivileged
accept-nvidia-visible-devices-envvar-when-unprivileged
settings of nvidia container runtime for k8s varienttrue
|false
default:false
false
, it prevents unprivileged containers from accessing all GPU devices on the host by default. IfNVIDIA_VISIBLE_DEVICES
is set toall
within the container images andvisible-devices-envvar-when-unprivileged
is set to true, all GPUs on the host will be accessible to the containers, regardless of the limits set via nvidia.com/gpu. This could lead to situations where more GPUs are allocated to a pod than intended, which can affect resource scheduling and isolation.Testing done:
nvidia-container-runtime
config existsTested migration from 1.20.1 to new version.
Tested migration back to 1.20.1.
Terms of contribution:
By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.