Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add build scripts for building Nvidia and Neuron AMIs based on AL2023 #1924

Merged
merged 8 commits into from
Aug 29, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,12 @@ ifeq ($(enable_fips), true)
AMI_VARIANT := $(AMI_VARIANT)-fips
endif

ifeq ($(os_distro), al2023)
ifdef accelerator_vendor
AMI_VARIANT := $(AMI_VARIANT)-$(accelerator_vendor)
endif
endif

ami_name ?= $(AMI_VARIANT)-node-$(K8S_VERSION_MINOR)-$(AMI_VERSION)

# ami owner overrides for cn/gov-cloud
Expand Down Expand Up @@ -91,7 +97,7 @@ validate: ## Validate packer config

.PHONY: k8s
k8s: validate ## Build default K8s version of EKS Optimized AMI
@echo "Building AMI [os_distro=$(os_distro) kubernetes_version=$(kubernetes_version) arch=$(arch)]"
@echo "Building AMI [os_distro=$(os_distro) kubernetes_version=$(kubernetes_version) arch=$(arch) $(if $(accelerator_vendor),accelerator_vendor=$(accelerator_vendor))]"
$(PACKER_BINARY) build -timestamp-ui -color=false $(PACKER_ARGS) $(PACKER_TEMPLATE_FILE)

# DEPRECATION NOTICE: `make` targets for each Kubernetes minor version will not be added after 1.28
Expand Down
16 changes: 15 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,4 +58,18 @@ For security issues or concerns, please do not open an issue or pull request on

## ⚖️ License Summary

This sample code is made available under a modified MIT license. See the LICENSE file.
This sample code is made available under a MIT-0 license. See the LICENSE file.

Although this repository is released under the MIT license, when using Nvidia accelerated AMIs you agree to the NVIDIA Cloud End User License Agreement: https://s3.amazonaws.com/EULA/NVidiaEULAforAWS.pdf.

Although this repository is released under the MIT license, Nvidia accelerated AMIs
use the third party [open-gpu-kernel-modules](https://github.com/NVIDIA/open-gpu-kernel-modules). The open-gpu-kernel-modules project's licensing includes the dual MIT/GPLv2 license.

Although this repository is released under the MIT license, Nvidia accelerated AMIs
use the third party [nvidia-container-toolkit](https://github.com/NVIDIA/nvidia-container-toolkit). The nvidia-container-toolkit project's licensing includes the Apache-2.0 license.
nkvetsinski marked this conversation as resolved.
Show resolved Hide resolved

Although this repository is released under the MIT license, Neuron accelerated AMIs
use the third party [Neuron Driver](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/runtime/aws-neuronx-dkms/index.html). The Neuron Driver project's licensing includes the GPLv2 license.

Although this repository is released under the MIT license, accelerated AMIs
use the third party [Elastic Fabric Adapter Driver](https://github.com/amzn/amzn-drivers/tree/master/kernel/linux/efa). The Elastic Fabric Adapter Driver project's licensing includes the GPLv2 license.
21 changes: 21 additions & 0 deletions doc/usage/al2023.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
<!-- template-variable-table-boundary -->
| Variable | Description |
| - | - |
| `accelerator_vendor` | Vendor that provides the GPU or accelerator hardware. Currently we support Neuron and Nvidia. |
| `ami_component_description` | |
| `ami_description` | |
| `ami_name` | |
Expand All @@ -28,6 +29,7 @@
| `kubernetes_build_date` | |
| `kubernetes_version` | |
| `launch_block_device_mappings_volume_size` | |
| `nvidia_major_driver_version` | Driver version to install, depends on what is available in Nvidia upstream yum repository. |
nkvetsinski marked this conversation as resolved.
Show resolved Hide resolved
| `remote_folder` | Directory path for shell provisioner scripts on the builder instance |
| `runc_version` | |
| `security_group_id` | |
Expand All @@ -44,3 +46,22 @@
| `volume_type` | |
| `working_dir` | Directory path for ephemeral resources on the builder instance |
<!-- template-variable-table-boundary -->

## Accelerated images

One can build images that contain Neuron and Nvidia drivers and runtime configuration. To build Neuron image execute:
nkvetsinski marked this conversation as resolved.
Show resolved Hide resolved
nkvetsinski marked this conversation as resolved.
Show resolved Hide resolved

```
make k8s=1.29 os_distro=al2023 accelerator_vendor=neuron
```

To build Nvidia image execute:
nkvetsinski marked this conversation as resolved.
Show resolved Hide resolved
```
make k8s=1.29 os_distro=al2023 accelerator_vendor=nvidia
```

One can pass the Nvidia major driver version using the following:
nkvetsinski marked this conversation as resolved.
Show resolved Hide resolved
```
make k8s=1.29 os_distro=al2023 accelerator_vendor=nvidia nvidia_major_driver_version=560
```
To see which driver versions are available, one can check the Nvidia AL2023 yum [repository](https://developer.download.nvidia.com/compute/cuda/repos/amzn2023/).
nkvetsinski marked this conversation as resolved.
Show resolved Hide resolved
6 changes: 6 additions & 0 deletions nodeadm/internal/containerd/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,12 @@ func writeContainerdConfig(cfg *api.NodeConfig) error {
if err != nil {
return err
}

err = applyInstanceTypeMixins(cfg, &containerdConfig)
if err != nil {
return err
}

// because the logic in containerd's import merge decides to completely
// overwrite entire sections, we want to implement this merging ourselves.
// see: https://github.com/containerd/containerd/blob/a91b05d99ceac46329be06eb43f7ae10b89aad45/cmd/containerd/server/config/config.go#L407-L431
Expand Down
78 changes: 78 additions & 0 deletions nodeadm/internal/containerd/runtime_config.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
package containerd

import (
"slices"
"strings"

"github.com/awslabs/amazon-eks-ami/nodeadm/internal/api"
"github.com/awslabs/amazon-eks-ami/nodeadm/internal/util"
"github.com/pelletier/go-toml/v2"
"go.uber.org/zap"
)

type instanceTypeMixin struct {
instanceFamilies []string
apply func(*[]byte) error
}

func (m *instanceTypeMixin) matches(cfg *api.NodeConfig) bool {
instanceFamily := strings.Split(cfg.Status.Instance.Type, ".")[0]
return slices.Contains(m.instanceFamilies, instanceFamily)
}

var (
// TODO: fetch this list dynamically
nvidiaInstances = []string{"p3", "p3dn", "p4d", "p4de", "p5", "g4", "g4dn", "g5", "g6"}
nkvetsinski marked this conversation as resolved.
Show resolved Hide resolved
NvidiaInstanceTypeMixin = instanceTypeMixin{
instanceFamilies: nvidiaInstances,
apply: applyNvidia,
}

mixins = []instanceTypeMixin{
NvidiaInstanceTypeMixin,
}
)

// applyInstanceTypeMixins adds the needed OCI hook options to containerd config.toml
// based on the instance family
func applyInstanceTypeMixins(cfg *api.NodeConfig, containerdConfig *[]byte) error {
for _, mixin := range mixins {
if mixin.matches(cfg) {
if err := mixin.apply(containerdConfig); err != nil {
return err
}
return nil
}
}
zap.L().Info("No containerd OCI configuration needed..", zap.String("instanceType", cfg.Status.Instance.Type))
return nil
}

// applyNvidia adds the needed Nvidia containerd options
func applyNvidia(containerdConfig *[]byte) error {
zap.L().Info("Configuring Nvidia OCI hook..")
nvidiaOptions := `
[plugins.'io.containerd.grpc.v1.cri'.containerd]
default_runtime_name = 'nvidia'
discard_unpacked_layers = true
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]
base_runtime_spec = "/etc/containerd/base-runtime-spec.json"
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
BinaryName = "/usr/bin/nvidia-container-runtime"
SystemdCgroup = true
`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. If this is a constant, lets move it out of this func.
  2. If you're having to include keys like SystemdCgroup = true, etc. so they don't get lost during merge, you should be able to just swap the order of the args you're passing to util.Merge.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Moved out of the function and declared as constant.
  2. I'm not sure how swapping will help. IIUC [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia] and [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options] are not present in config.template.toml, so if we want them to present in the resulting config, we'll have pass everything we need as is (SystemdCgroup, BinaryName, etc).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean. That's really not ideal, we don't want to have to keep this blob in sync with what the rest of the config flow does to these nested structs. Does the nvidia-ctk clone your existing runtime config and just change the name? How does this work there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvidia-ctk uses /etc/containerd/config.toml as base and does an in-place change (adding whatever it needs to the base). This is why initially I was writing the config file first, calling nvidia-ctk, passing the resulting config back. Let me see what will happen if we call it on empty file and merge with our template.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering about how nvidia-ctk handles options like SystemdCgroup, base_runtime_spec, etc. It must be copying our existing runc runtime config and just plugging in nvidia and BinaryName.

Could you just update our existing containerd config template such that you can swap in nvidia? https://github.com/awslabs/amazon-eks-ami/blob/main/nodeadm/internal/containerd/config.template.toml

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I addressed that, check the latest code.


if containerdConfig != nil {
nkvetsinski marked this conversation as resolved.
Show resolved Hide resolved
containerdConfigMap, err := util.Merge(*containerdConfig, []byte(nvidiaOptions), toml.Marshal, toml.Unmarshal)
if err != nil {
return err
}
*containerdConfig, err = toml.Marshal(containerdConfigMap)
if err != nil {
return err
}
}

return nil
}
70 changes: 70 additions & 0 deletions nodeadm/internal/containerd/runtime_config_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
package containerd

import (
"reflect"
"testing"

"github.com/awslabs/amazon-eks-ami/nodeadm/internal/api"
)

func TestApplyInstanceTypeMixins(t *testing.T) {

var nvidiaExpectedOutput = []byte(`version = 2

[grpc]
address = '/run/foo/foo.sock'

[plugins]
[plugins.'io.containerd.grpc.v1.cri']
[plugins.'io.containerd.grpc.v1.cri'.containerd]
default_runtime_name = 'nvidia'
discard_unpacked_layers = true

[plugins.'io.containerd.grpc.v1.cri'.containerd.runtimes]
[plugins.'io.containerd.grpc.v1.cri'.containerd.runtimes.nvidia]
base_runtime_spec = '/etc/containerd/base-runtime-spec.json'
runtime_type = 'io.containerd.runc.v2'

[plugins.'io.containerd.grpc.v1.cri'.containerd.runtimes.nvidia.options]
BinaryName = '/usr/bin/nvidia-container-runtime'
SystemdCgroup = true
`)

var nonAcceleratedExpectedOutput = []byte(`
version = 2
[grpc]
address = '/run/foo/foo.sock'
`)
var tests = []struct {
name string
instanceType string
expectedOutput []byte
expectedError error
}{
{instanceType: "p5.xlarge", expectedOutput: nvidiaExpectedOutput, expectedError: nil},
// non accelerated instance
{instanceType: "m5.xlarge", expectedOutput: nonAcceleratedExpectedOutput, expectedError: nil},
}
for _, test := range tests {
var mockConfig = []byte(`
version = 2
[grpc]
address = '/run/foo/foo.sock'
`)
err := applyInstanceTypeMixins(&api.NodeConfig{
Status: api.NodeConfigStatus{
Instance: api.InstanceDetails{
Type: test.instanceType,
},
},
}, &mockConfig)

if err != test.expectedError {
t.Fatalf("unexpected error: %v", err)
}

if !reflect.DeepEqual(mockConfig, test.expectedOutput) {
t.Fatalf("unexpected output: %s, expecting: %s", mockConfig, test.expectedOutput)
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
---
apiVersion: node.eks.aws/v1alpha1
kind: NodeConfig
spec:
cluster:
name: my-cluster
apiServerEndpoint: https://example.com
certificateAuthority: Y2VydGlmaWNhdGVBdXRob3JpdHk=
cidr: 10.100.0.0/16
containerd:
config: |
version = 2
[grpc]
address = "/run/foo/foo.sock"
[plugins."io.containerd.grpc.v1.cri".containerd]
discard_unpacked_layers = false
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
root = '/var/lib/containerd'
state = '/run/containerd'
version = 2

[grpc]
address = '/run/foo/foo.sock'

[plugins]
[plugins.'io.containerd.grpc.v1.cri']
sandbox_image = '602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/pause:3.5'

[plugins.'io.containerd.grpc.v1.cri'.cni]
bin_dir = '/opt/cni/bin'
conf_dir = '/etc/cni/net.d'

[plugins.'io.containerd.grpc.v1.cri'.containerd]
default_runtime_name = 'runc'
discard_unpacked_layers = false

[plugins.'io.containerd.grpc.v1.cri'.containerd.runtimes]
[plugins.'io.containerd.grpc.v1.cri'.containerd.runtimes.runc]
base_runtime_spec = '/etc/containerd/base-runtime-spec.json'
runtime_type = 'io.containerd.runc.v2'

[plugins.'io.containerd.grpc.v1.cri'.containerd.runtimes.runc.options]
SystemdCgroup = true

[plugins.'io.containerd.grpc.v1.cri'.registry]
config_path = '/etc/containerd/certs.d:/etc/docker/certs.d'
15 changes: 15 additions & 0 deletions nodeadm/test/e2e/cases/containerd-runtime-config-neuron/run.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
#!/usr/bin/env bash

set -o errexit
set -o nounset
set -o pipefail

source /helpers.sh

mock::aws /etc/aemm-inf1-config.json
mock::kubelet 1.27.0
wait::dbus-ready

nodeadm init --skip run --config-source file://config.yaml

assert::files-equal /etc/containerd/config.toml expected-containerd-config.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
---
apiVersion: node.eks.aws/v1alpha1
kind: NodeConfig
spec:
cluster:
name: my-cluster
apiServerEndpoint: https://example.com
certificateAuthority: Y2VydGlmaWNhdGVBdXRob3JpdHk=
cidr: 10.100.0.0/16
containerd:
config: |
version = 2
[grpc]
address = "/run/foo/foo.sock"
[plugins."io.containerd.grpc.v1.cri".containerd]
discard_unpacked_layers = false
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
root = '/var/lib/containerd'
state = '/run/containerd'
version = 2

[grpc]
address = '/run/foo/foo.sock'

[plugins]
[plugins.'io.containerd.grpc.v1.cri']
sandbox_image = '602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/pause:3.5'

[plugins.'io.containerd.grpc.v1.cri'.cni]
bin_dir = '/opt/cni/bin'
conf_dir = '/etc/cni/net.d'

[plugins.'io.containerd.grpc.v1.cri'.containerd]
default_runtime_name = 'nvidia'
discard_unpacked_layers = false

[plugins.'io.containerd.grpc.v1.cri'.containerd.runtimes]
[plugins.'io.containerd.grpc.v1.cri'.containerd.runtimes.nvidia]
base_runtime_spec = '/etc/containerd/base-runtime-spec.json'
runtime_type = 'io.containerd.runc.v2'

[plugins.'io.containerd.grpc.v1.cri'.containerd.runtimes.nvidia.options]
BinaryName = '/usr/bin/nvidia-container-runtime'
SystemdCgroup = true

[plugins.'io.containerd.grpc.v1.cri'.containerd.runtimes.runc]
base_runtime_spec = '/etc/containerd/base-runtime-spec.json'
runtime_type = 'io.containerd.runc.v2'

[plugins.'io.containerd.grpc.v1.cri'.containerd.runtimes.runc.options]
SystemdCgroup = true

[plugins.'io.containerd.grpc.v1.cri'.registry]
config_path = '/etc/containerd/certs.d:/etc/docker/certs.d'
15 changes: 15 additions & 0 deletions nodeadm/test/e2e/cases/containerd-runtime-config-nvidia/run.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
#!/usr/bin/env bash

set -o errexit
set -o nounset
set -o pipefail

source /helpers.sh

mock::aws /etc/aemm-g5-config.json
mock::kubelet 1.27.0
wait::dbus-ready

nodeadm init --skip run --config-source file://config.yaml

assert::files-equal /etc/containerd/config.toml expected-containerd-config.toml
3 changes: 3 additions & 0 deletions nodeadm/test/e2e/infra/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,9 @@ COPY --from=imds-mock-build /imds-mock /usr/local/bin/imds-mock
# certificateAuthority: Y2VydGlmaWNhdGVBdXRob3JpdHk=
# cidr: 10.100.0.0/16
COPY test/e2e/infra/aemm-default-config.json /etc/aemm-default-config.json
COPY test/e2e/infra/aemm-inf1-config.json /etc/aemm-inf1-config.json
COPY test/e2e/infra/aemm-g5-config.json /etc/aemm-g5-config.json
COPY test/e2e/infra/nvidia-ctk /usr/bin/nvidia-ctk
COPY --from=nodeadm-build /nodeadm /usr/local/bin/nodeadm
COPY test/e2e/infra/systemd/kubelet.service /usr/lib/systemd/system/kubelet.service
COPY test/e2e/infra/systemd/containerd.service /usr/lib/systemd/system/containerd.service
Expand Down
Loading