Docker

github: Open Container Initiative
- opencontainers/runc

Containerd

containerd
- runtime/v2

AI accelerators

medium: How Docker Runs Machine Learning on NVIDIA GPUs, AWS Inferentia, and Other Hardware AI Accelerators by Shashank Prasanna
youtube: How does Docker run machine learning on AI accelerators (NVIDIA GPUs, AWS Inferentia) by Shashank Prasanna
aws blog: Why use Docker containers for machine learning development? by Shashank Prasanna

ML software stack

Docker Container
- Typical software stack
  - My Code
  - Tensorflow, PyTorch, Frameworks + Library Dependencies
  - Python
  - CPU ML libraries
- Hardware Accelator
  - AI accelerator ML libraries
  - AI accelerator drivers
OS
- AI accelerator drivers: with matching versions
- OS Kernel
- Host OS
Heterogeneous Hardware
- CPU
- AI Accelerator

Challenges

Duplicating drivers = bloated VMs and containers
Hardware driver versions must match
Not portable (whole point of containers). difficult to scale
Very brittle solution

Container Runtimes

docker: alternative runtimes

runc/libcontainer/process_linux.go

runc: libcontainer/process_linux.go#L754

func (p *initProcess) start() (retErr error) {
	ierr := parseSync(p.comm.syncSockParent, func(sync *syncT) error {
		switch sync.Type {
		case procHooks:
			if p.config.Config.HasHook(configs.Prestart, configs.CreateRuntime) {
				if err := hooks.Run(configs.Prestart, s); err != nil {
					return err
				}

Nvidia

docs: container-toolkit

Configs

/etc/docker/daemon.json
/etc/nvidia-container-runtime/config.toml

/etc/docker/daemon.json

{
    "runtimes": {
        "nvidia": {
            "args": [],
            "path": "nvidia-container-runtime"
        }
    }
}

in a tritonserver image

docker run --rm -it --gpus all nvcr.io/nvidia/tritonserver:25.01-py3 bash

ls -Fl /dev | grep nvidia

crw-rw-rw- 1 root root 511,   0 Mar  3 03:09 nvidia-uvm
crw-rw-rw- 1 root root 511,   1 Mar  3 03:09 nvidia-uvm-tools
crw-rw-rw- 1 root root 195,   0 Mar  3 03:08 nvidia0
crw-rw-rw- 1 root root 195, 255 Mar  3 03:08 nvidiactl

nvidia-smi

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.86.15              Driver Version: 570.86.15      CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3070        Off |   00000000:2B:00.0  On |                  N/A |
|  0%   50C    P3             49W /  270W |    1256MiB /   8192MiB |     21%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

Neuron

AWS docs: Tutorial Docker Neuron OCI Hook Setup
github: awslabs/oci-add-hooks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docker.md

docker.md

Docker

Containerd

AI accelerators

ML software stack

Challenges

Container Runtimes

runc/libcontainer/process_linux.go

Nvidia

Configs

/etc/docker/daemon.json

in a tritonserver image

Neuron

Files

docker.md

Latest commit

History

docker.md

File metadata and controls

Docker

Containerd

AI accelerators

ML software stack

Challenges

Container Runtimes

runc/libcontainer/process_linux.go

Nvidia

Configs

/etc/docker/daemon.json

in a tritonserver image

Neuron