Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate CDI specification including additional GIDs #630

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

elezar
Copy link
Member

@elezar elezar commented Aug 5, 2024

This change adds logic to include the AdditionalGIDs required for a device node in a generated CDI specification.

When a device node requires specific group membership for access, a non-root user running in a container (e.g. using the -u Docker command line argument) may not have access to the device.

in the v0.7.0 CDI specification, support was added for specifying additional GIDs in the CDI specification and these changes extract the required information from the device nodes on the host if available.

This is disabled by default.

It can be enabled for internal CDI representations such as those used for Tegra-based systems or /dev/dri devices by setting:

features.allow-additional-gids = true

in the config.toml file.

This can also be opted in to on a per-container basis by setting NVIDIA_ALLOW_ADDITIONAL_GIDS=enabled.

The nvidia-ctk command can be used to toggle this feature by running:

sudo nvidia-ctk config --in-place --set features.allow-additional-gids

It can be enabled when running nvidia-ctk cdi generate by specifying the --allow-additional-gids command line argument:

nvidia-ctk cdi generate --allow-additional-gids

@elezar
Copy link
Member Author

elezar commented Aug 5, 2024

Note: This updates the default spec version to v0.7.0 since the DRM devices generally require additional GIDs. As such, this should probably be an opt-in (or at least an opt-out) feature.

@elezar elezar self-assigned this Aug 12, 2024
@elezar elezar requested a review from tariq1890 August 21, 2024 09:36
@elezar elezar marked this pull request as ready for review August 21, 2024 09:36
@@ -169,6 +174,11 @@ func (m command) build() *cli.Command {
Usage: "Specify a pattern the CSV mount specifications.",
Destination: &opts.csv.ignorePatterns,
},
&cli.BoolFlag{
Name: "--allow-additional-gids",
Usage: "Allow the use of the additionalGIDs field for generated CDI specifications. Note this will generate a v0.7.0 CDI specification.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which versions of containerd / cri-o (and other runtimes for that matter) support this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have to check. Note that this is not enabled by default when generating the CDI spec and only for internal representations.

Copy link
Member Author

@elezar elezar Aug 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MOFED *feature `toml:"mofed,omitempty"`
NVSWITCH *feature `toml:"nvswitch,omitempty"`
GDRCopy *feature `toml:"gdrcopy,omitempty"`
NoAdditionalGIDs *feature `toml:"no-additional-gids,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this reversed (i.e. no rather than allow) from the flag passed in?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this has different default behaviour. For the NVIDIA Contianer Runtime I want to generated an internal representation including the additionalGIDs container edits for device nodes. This allows users to opt out if there is an issue.

I suppose we could make the argument that if we backport this (which we may want to due to the Jetpack feature request) we should keep the existing behaviour and not make these modifications either.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, after having thought this through again, I have disabled this by default and switched the feature flag to --allow-additional-gids.

deviceNode, err := d.toSpec()
if err != nil {
return nil, err
}

var additionalGIDs []uint32
if allowAdditionalGIDs {
if requiredGID, _ := d.getRequiredGID(); requiredGID != 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we log the error instead of swallowing it ? Or we could just have the function not return an error struct

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to not return an error.

internal/edits/resource.go Outdated Show resolved Hide resolved
cmd/nvidia-ctk/cdi/generate/generate.go Outdated Show resolved Hide resolved
internal/edits/lib.go Outdated Show resolved Hide resolved
internal/edits/lib.go Outdated Show resolved Hide resolved
internal/edits/lib.go Outdated Show resolved Hide resolved
This change refactors the creation of contianer edits to make
provision for injecting options that affect the generated CDI
specifications.

Signed-off-by: Evan Lezar <[email protected]>
@elezar elezar force-pushed the CNT-4739/additional-gids branch 2 times, most recently from 7ecd467 to fab0c6d Compare August 22, 2024 16:12
This change adds support for injecting additional GIDs using the internal
CDI representations. (Applicable for Tegra-based systems and /dev/dri devices)
This is disabled by default, but can be opted in to by setting the
features.allow-additional-gids feature flag.

This can also be done by running

sudo nvidia-ctk config --in-place --set features.allow-additional-gids

Signed-off-by: Evan Lezar <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants