-
Notifications
You must be signed in to change notification settings - Fork 519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nvidia settings API for container runtime #3994
Conversation
Signed-off-by: Monirul Islam <[email protected]>
visible-devices-as-volume-mounts = false | ||
visible-devices-envvar-when-unprivileged = true | ||
|
||
[metadata.settings.kubernetes.nvidia.container-runtime] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is missing a migration to remove the affected services on a downgrade.
accept-nvidia-visible-devices-as-volume-mounts = {{settings.kubernetes.nvidia.container-runtime.visible-devices-as-volume-mounts}} | ||
accept-nvidia-visible-devices-envvar-when-unprivileged = {{settings.kubernetes.nvidia.container-runtime.visible-devices-envvar-when-unprivileged}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's be safe, and use the {{default}}
helper, otherwise if settings.kubernetes.nvidia.container-runtime.visible-devices-as-volume-mounts
isn't present, the render will fail.
@@ -0,0 +1,14 @@ | |||
[settings.kubernetes.nvidia.container-runtime] | |||
visible-devices-as-volume-mounts = false | |||
visible-devices-envvar-when-unprivileged = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets use the default values to prevent unprivileged pods from accessing all the devices:
accept-nvidia-visible-devices-envvar-when-unprivileged = false
@@ -0,0 +1 @@ | |||
../../../shared-defaults/nvidia-k8s-container-toolkit.toml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR is missing symlinks for other variants, we need symlinks for:
- aws-k8s-1.24-nvidia
- aws-k8s-1.25-nvidia
- aws-k8s-1.26-nvidia
Opened #4052 instead. |
Issue number:
Closes #
Description of changes:
This PR will expose two new APIs that will allow customer to configure value of
accept-nvidia-visible-devices-as-volume-mounts
andaccept-nvidia-visible-devices-envvar-when-unprivileged
for nvidia container runtime.settings.kubernetes.nvidia.container-runtime.visible-devices-as-volume-mounts
accept-nvidia-visible-devices-as-volume-mounts
value for k8s container-toolkittrue
|false
default:true
visible-devices-as-volume-mounts
settings will alters the method of GPU detection and integration within container environments. Setting this parameter totrue
enables the NVIDIA runtime to recognize GPU devices listed in theNVIDIA_VISIBLE_DEVICES
environment variable and mount them as volumes, which permits applications within the container to interact with and leverage the GPUs as if they were local resources.settings.kubernetes.nvidia.container-runtime.visible-devices-envvar-when-unprivileged
accept-nvidia-visible-devices-envvar-when-unprivileged
settings of nvidia container runtime for k8s varienttrue
|false
default:false
false
, it prevents unprivileged containers from accessing all GPU devices on the host by default. IfNVIDIA_VISIBLE_DEVICES
is set to all within the container images andvisible-devices-envvar-when-unprivileged
is set to true, all GPUs on the host will be accessible to the containers, regardless of the limits set via nvidia.com/gpu. This could lead to situations where more GPUs are allocated to a pod than intended, which can affect resource scheduling and isolation.Testing done:
Yes.
Terms of contribution:
By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.