Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Ubuntu instructions #1

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ As part of the [spearow/juice](https://github.com/spearow/juice) efforts, it bec
cuda™/cudnn access and eventually also rocm and OpenCL™ support from within the container without granting
excessive privileges that would allow to remount the device tree.

All instructions here are for [`Fedora 32` / `Fedora 33`](https://getfedora.org).
All instructions here are for [`Fedora 32` / `Fedora 33`](https://getfedora.org). For Ubuntu-specific instructions, see ubuntu/README.md

Assumes concourse is unpacked under `/usr/local`, such that `/usr/local/concourse/bin/{gdn,concourse}` exist.

Expand Down
File renamed without changes.
File renamed without changes.
25 changes: 25 additions & 0 deletions ubuntu/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Ubuntu 20.04

The following describes a variation to the Fedora-style configuration described at ../README.md, and explains how I setup Concourse worker with GPU acceleration using Ubuntu 20.04

## nvidia drivers

Install the latest nvidia drivers recommended by `sudo ubuntu-drivers devices`, or simply run `sudo ubuntu-drivers autoinstall` if you're feeling lucky

## nvidia runtime

Per the [docs](https://nvidia.github.io/nvidia-container-runtime/), install the [nvidia-runtime](https://github.com/NVIDIA/nvidia-container-runtime):

```
curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey | \
sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.list | \
sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list
sudo apt-get update
sudo apt-get install nvidia-container-runtime
```

## config

Copy the contents of the `etc` folder to `/etc`, and customize to suit. The primary distinction to the fedora instructions is that containerd running on Ubuntu seems to ignore the `/etc/containers` directory, so it's necessary to specify the nvidia runtime in `/etc/containerd/config.toml`.
17 changes: 17 additions & 0 deletions ubuntu/etc/concourse/garden.ini
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
[server]
# avoid running out of ip addresses
# use a bigger subnet than the default
network-pool = 172.16.1.0/16
# local first
dns-server = 127.0.0.1
# avoid dns resolution failures
dns-server = 1.1.1.1
dns-server = 9.9.9.9

# failed attempts, ignore
#runtime-plugin = runc
#runtime-plugin-extra-arg = --debug
#runtime-plugin = /usr/bin/nvidia-container-runtime

# avoid running out of file descriptors
cleanup-process-dirs-on-wait = true
19 changes: 19 additions & 0 deletions ubuntu/etc/containerd/config.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
root = "/media/cachy/containerd"
state = "/run/containerd"
#subreaper = true
#oom_score = 0

[grpc]
address = "/run/containerd/containerd.sock"
uid = 0
gid = 0

[debug]
# address = "/run/containerd/debug.sock"
# uid = 0
# gid = 0
level = "debug"

[plugins]
[plugins.linux]
runtime = "nvidia-container-runtime"
40 changes: 40 additions & 0 deletions ubuntu/etc/systemd/system/[email protected]
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
[Unit]
Description=concourse worker %i
After=suspend.target
After=hibernate.target
After=hybrid-sleep.target
After=network.service
Requires=network.service
After=garden.service
Requires=garden.service
RequiresMountsFor=/media/cachy
ConditionPathIsDirectory=/media/cachy/concourse

[Service]
Type=simple
Restart=always
RestartSec=15s

Environment=CONCOURSE_KEY_DIR=/etc/concourse/keys/worker
Environment=CONCOURSE_WORK_DIR=/media/cachy/concourse
#Environment=CONCOURSE_ENABLE_LIDAR=true

ExecStartPre=-/usr/bin/mkdir ${CONCOURSE_WORK_DIR}
ExecStartPre=-/usr/local/concourse/bin/concourse --version
ExecStart=/usr/local/concourse/bin/concourse \
worker \
--name=%i \
--work-dir=${CONCOURSE_WORK_DIR} \
--tsa-host=ci.example.com:1111111 \
--tsa-worker-private-key=${CONCOURSE_KEY_DIR}/%i \
--tsa-public-key=${CONCOURSE_KEY_DIR}/tsa_host_key.pub \
--external-garden-url=http://localhost:7777/

RestartSec=5
RestartKillSignal=SIGUSR1
KillMode=process
KillSignal=SIGUSR2
TimeoutStopSec=180

[Install]
WantedBy=multi-user.target
12 changes: 12 additions & 0 deletions ubuntu/etc/systemd/system/containerd.service
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
[Unit]
Description=containerd container runtime
Documentation=https://containerd.io
After=network.target

[Service]
ExecStart=/usr/bin/containerd
Delegate=yes
KillMode=process

[Install]
WantedBy=multi-user.target
37 changes: 37 additions & 0 deletions ubuntu/etc/systemd/system/garden.service
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
[Unit]
Description=garden container management

After=suspend.target
After=hibernate.target
After=hybrid-sleep.target
After=network.target
Requires=network.target
After=containerd.service
Requires=containerd.service
RequiresMountsFor=/media/cachy
ConditionPathIsDirectory=/media/cachy/concourse

[Service]
Type=simple
Restart=always
LimitNOFILE=50000
TasksMax=50000
User=root
Group=root
#ExecStartPre=-btrfschk --check --repair --backup /dev/yadayada
ExecStartPre=-/usr/local/concourse/bin/gdn -v
ExecStart=/usr/local/concourse/bin/gdn \
--config /etc/concourse/garden.ini \
server \
--use-containerd-for-processes \
--containerd-socket=/run/containerd/containerd.sock \
--log-level=info \
--bind-ip 127.0.0.1 \
--bind-port 7777 \
--depot /media/cachy/concourse/depot \
--properties-path /media/cachy/concourse/garden-properties.json \
--time-format rfc3339 \
--no-image-plugin

RestartSec=3
TimeoutStopSec=120