Generate CDI specification including additional GIDs #630

elezar · 2024-08-05T12:06:17Z

This change adds logic to include the AdditionalGIDs required for a device node in a generated CDI specification.

When a device node requires specific group membership for access, a non-root user running in a container (e.g. using the -u Docker command line argument) may not have access to the device.

in the v0.7.0 CDI specification, support was added for specifying additional GIDs in the CDI specification and these changes extract the required information from the device nodes on the host if available.

This is disabled by default.

It can be enabled for internal CDI representations such as those used for Tegra-based systems or /dev/dri devices by setting:

features.allow-additional-gids = true

in the config.toml file.

This can also be opted in to on a per-container basis by setting NVIDIA_ALLOW_ADDITIONAL_GIDS=enabled.

The nvidia-ctk command can be used to toggle this feature by running:

sudo nvidia-ctk config --in-place --set features.allow-additional-gids

It can be enabled when running nvidia-ctk cdi generate by specifying the --allow-additional-gids command line argument:

nvidia-ctk cdi generate --allow-additional-gids

elezar · 2024-08-05T12:07:19Z

Note: This updates the default spec version to v0.7.0 since the DRM devices generally require additional GIDs. As such, this should probably be an opt-in (or at least an opt-out) feature.

klueska · 2024-08-21T09:51:23Z

cmd/nvidia-ctk/cdi/generate/generate.go

@@ -169,6 +174,11 @@ func (m command) build() *cli.Command {
 			Usage:       "Specify a pattern the CSV mount specifications.",
 			Destination: &opts.csv.ignorePatterns,
 		},
+		&cli.BoolFlag{
+			Name:        "--allow-additional-gids",
+			Usage:       "Allow the use of the additionalGIDs field for generated CDI specifications. Note this will generate a v0.7.0 CDI specification.",


Which versions of containerd / cri-o (and other runtimes for that matter) support this?

I would have to check. Note that this is not enabled by default when generating the CDI spec and only for internal representations.

containerd

(>= 1.7.16) - containerd/containerd@7a2f49f

(>= 1.8.0, >= 2.0.0) - containerd/containerd@1b62224

docker (>= 26.1.0) - moby/moby@745e235

podman (>= 5.1.0) - containers/podman@a40cf31

crio (>= 1.30.0) - cri-o/cri-o@fd9aa76

klueska · 2024-08-21T09:54:00Z

internal/config/features.go

+	MOFED            *feature `toml:"mofed,omitempty"`
+	NVSWITCH         *feature `toml:"nvswitch,omitempty"`
+	GDRCopy          *feature `toml:"gdrcopy,omitempty"`
+	NoAdditionalGIDs *feature `toml:"no-additional-gids,omitempty"`


Why is this reversed (i.e. no rather than allow) from the flag passed in?

Because this has different default behaviour. For the NVIDIA Contianer Runtime I want to generated an internal representation including the additionalGIDs container edits for device nodes. This allows users to opt out if there is an issue.

I suppose we could make the argument that if we backport this (which we may want to due to the Jetpack feature request) we should keep the existing behaviour and not make these modifications either.

Ok, after having thought this through again, I have disabled this by default and switched the feature flag to --allow-additional-gids.

tariq1890 · 2024-08-21T15:54:22Z

internal/edits/device.go

 	deviceNode, err := d.toSpec()
 	if err != nil {
 		return nil, err
 	}

+	var additionalGIDs []uint32
+	if allowAdditionalGIDs {
+		if requiredGID, _ := d.getRequiredGID(); requiredGID != 0 {


Can we log the error instead of swallowing it ? Or we could just have the function not return an error struct

Updated to not return an error.

internal/edits/resource.go

cmd/nvidia-ctk/cdi/generate/generate.go

internal/edits/lib.go

This change refactors the creation of contianer edits to make provision for injecting options that affect the generated CDI specifications. Signed-off-by: Evan Lezar <[email protected]>

This change adds support for injecting additional GIDs using the internal CDI representations. (Applicable for Tegra-based systems and /dev/dri devices) This is disabled by default, but can be opted in to by setting the features.allow-additional-gids feature flag. This can also be done by running sudo nvidia-ctk config --in-place --set features.allow-additional-gids Signed-off-by: Evan Lezar <[email protected]>

Signed-off-by: Evan Lezar <[email protected]>

elezar requested review from klueska and cdesiniotis August 5, 2024 12:07

elezar self-assigned this Aug 12, 2024

elezar force-pushed the CNT-4739/additional-gids branch from 74441f4 to 285ea44 Compare August 21, 2024 09:30

elezar requested a review from tariq1890 August 21, 2024 09:36

elezar marked this pull request as ready for review August 21, 2024 09:36

klueska reviewed Aug 21, 2024

View reviewed changes

tariq1890 reviewed Aug 21, 2024

View reviewed changes

[no-relnote] Refactor CDI ContainerEdit creation

50278cb

This change refactors the creation of contianer edits to make provision for injecting options that affect the generated CDI specifications. Signed-off-by: Evan Lezar <[email protected]>

elezar force-pushed the CNT-4739/additional-gids branch from 285ea44 to 141772c Compare August 22, 2024 12:21

elezar requested review from tariq1890 and klueska August 22, 2024 12:22

elezar force-pushed the CNT-4739/additional-gids branch 2 times, most recently from 7ecd467 to fab0c6d Compare August 22, 2024 16:12

elezar added 2 commits August 22, 2024 18:17

Add --allow-additional-gids to cdi generate command

5d55813

Signed-off-by: Evan Lezar <[email protected]>

elezar force-pushed the CNT-4739/additional-gids branch from fab0c6d to 5d55813 Compare August 22, 2024 16:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate CDI specification including additional GIDs #630

Generate CDI specification including additional GIDs #630

elezar commented Aug 5, 2024 •

edited

Loading

elezar commented Aug 5, 2024

klueska Aug 21, 2024

elezar Aug 21, 2024

elezar Aug 21, 2024 •

edited

Loading

klueska Aug 21, 2024

elezar Aug 21, 2024

elezar Aug 22, 2024

tariq1890 Aug 21, 2024

elezar Aug 22, 2024

Generate CDI specification including additional GIDs #630

Are you sure you want to change the base?

Generate CDI specification including additional GIDs #630

Conversation

elezar commented Aug 5, 2024 • edited Loading

elezar commented Aug 5, 2024

klueska Aug 21, 2024

Choose a reason for hiding this comment

elezar Aug 21, 2024

Choose a reason for hiding this comment

elezar Aug 21, 2024 • edited Loading

Choose a reason for hiding this comment

klueska Aug 21, 2024

Choose a reason for hiding this comment

elezar Aug 21, 2024

Choose a reason for hiding this comment

elezar Aug 22, 2024

Choose a reason for hiding this comment

tariq1890 Aug 21, 2024

Choose a reason for hiding this comment

elezar Aug 22, 2024

Choose a reason for hiding this comment

elezar commented Aug 5, 2024 •

edited

Loading

elezar Aug 21, 2024 •

edited

Loading