Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(al2023): NVIDIA variant in isolated partitions - Install NVIDIA Container toolkit/deps #2032

Closed
wants to merge 4 commits into from

Conversation

whoix
Copy link
Contributor

@whoix whoix commented Oct 31, 2024

Issue #, if available:

Description of changes:

NVIDIA Container toolkit, and its necessary dependencies, are not in the Amazon Linux repos. So we have to manually fetch and local install the necessary RPMs. This same method was achieved with AL2 GPU EKS ami variants for isolated partitions.

RPMS=("libnvidia-container1-1.16.2-1.x86_64.rpm" "nvidia-container-toolkit-base-1.16.2-1.x86_64.rpm" "libnvidia-container-tools-1.16.2-1.x86_64.rpm" "nvidia-container-toolkit-1.16.2-1.x86_64.rpm")
for RPM in ${RPMS[@]}; do
echo "pulling and installing rpms: (${RPM}) from s3 bucket: (${BINARY_BUCKET_NAME}) in region: (${BINARY_BUCKET_REGION})"
aws s3 cp --region ${BINARY_BUCKET_REGION} s3://${BINARY_BUCKET_NAME}/rpms/${RPM} ${WORKING_DIR}/${RPM}
Copy link
Member

@Issacwww Issacwww Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

main concern for using this approach here would require us to vendor these rpms to public s3 bucket for isolated regions, otherwise customer in those region cannot build the ami

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RPMs necessary for nvidia container toolkit should already be in the s3 bucket used for building the nvidia variant amis in isolated regions. This is the same approach we use for AL2 GPU ami building. I agree its not the best approach but unfortunately I don't really see any other option to install nvidia container toolkit (because isolated partitions have no internet connectivity).

If the nvidia-container-toolkit and its necessary dependencies were vended from the Amazon Linux repo, then yes you would be correct this would no longer be needed. Probably worth a discussion again with the Amazon Linux team to see if they could vend it.

Eventually there is going to be a project to directly copy and patch EKS Node variant amis from commercial to isolated partitions but that is still under research.

@whoix whoix closed this by deleting the head repository Nov 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants