Skip to content

Commit

Permalink
Merge pull request #322 from csc-training/10-containers-dev
Browse files Browse the repository at this point in the history
Course development: Topic 10 - Containers and Apptainer
  • Loading branch information
amsaren authored May 10, 2024
2 parents e3b5591 + ede2bbd commit 94c3208
Show file tree
Hide file tree
Showing 3 changed files with 76 additions and 76 deletions.
63 changes: 37 additions & 26 deletions _slides/09_singularity.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ Unported License, [http://creativecommons.org/licenses/by-sa/4.0/](http://creati
# Containers

- Containers are a way of packaging software and their dependencies (libraries, etc.)
- Popular container engines include Docker, Apptainer (previously called Singularity), Shifter
- Popular container engines include Docker, Apptainer (previously called Singularity), Shifter, Podman etc
- Apptainer is most popular in HPC environments

# Containers vs. virtual machines (1/2)
Expand All @@ -43,12 +43,11 @@ Unported License, [http://creativecommons.org/licenses/by-sa/4.0/](http://creati
# Benefits of containers: Ease of installation

- Containers are becoming a popular way of distributing software
- A single-command installation
- A single-command installation from existing image
- More portable since all dependencies are included
- Normal user rights are enough when using an existing container
- Root access on build system is enough
- Root access, package managers (yum, apt, etc.) can be utilized even when not available on the target system.
- Makes installing libraries easier
- Limited root privileges inside the container if the build system supports it
- Package managers (yum, apt, etc.) can be utilized even when not available on the target system.
- Some containers need full root access in to build

# Benefits of containers: Environment isolation

Expand All @@ -66,12 +65,12 @@ Unported License, [http://creativecommons.org/licenses/by-sa/4.0/](http://creati
# Apptainer in a nutshell

- Containers can be run with user-level rights
- But: building new containers requires root access
- But: building new containers requires root access or support for `--fakeroot` option
- Minimal performance overhead
- Supports MPI
- Requires containers tailored to the host system
- Can use host driver stack (Nvidia/CUDA)
- Add the option `--nv`
- Add option `--nv`
- Can import and run Docker containers
- Running Docker directly would require root privileges

Expand Down Expand Up @@ -119,7 +118,7 @@ export SING_IMAGE=/path/to/container.sif
apptainer_wrapper exec myprog <options>
```

- Since `$SING_IMAGE` is set, the image file name is not needed in the `apptainer_wrapper` command
- Additional options can be set with variable `$SING_FLAGS`, e.g. `export SING_FLAGS=--nv`

# Using Docker containers with Apptainer

Expand All @@ -139,31 +138,43 @@ apptainer_wrapper exec myprog <options>
- Complex installations with many dependencies/files
- Obsolete dependencies incompatible with the native environment
- Still needs to be kernel-compatible
- Should be considered even when other methods exist
- Image is a single file

# Just a random example (FASTX-toolkit)

- Tested installation methods:
- Native: 47 files, total size 1.9 MB
- Needed changes to source code to compile
- Conda: 27464 files, total size 1.1 GB
- Apptainer: 1 file, total size 339 MB
- Containers are not the solution for everything, but they do have their uses
- Especially Conda environments should always be containerized to avoid file system issues (see [Tykky](https://docs.csc.fi/computing/containers/tykky/))

# Building a new Apptainer container (1/2)
# Methods of building a new Apptainer container

- ‼️ Requires root access: Can not be done directly on, e.g., Puhti
- Building using [Tykky](https://docs.csc.fi/computing/containers/tykky/))
- Building from a definition (aka recipe) file
- Building in "sandbox" mode

- 1. Build a basic container in sandbox mode (`--sandbox`)
- Uses a folder structure instead of an image file
- 2. Open a shell in the container and install the software
- Depending on the base image system, package managers can be used to install libraries and dependencies (`apt install`,s `yum install` etc.)
- Installation following the instructions of the software developer

# Building a new Apptainer container (2/2)
# Building using Tykky

- Especially suited for Conda environments
- Can take an environment YAML file as an input
- Can be used for any application type
- Use `--post-install <file>`to run the installation commands
- See [example](https://github.com/CSCfi/hpc-container-wrapper/blob/master/examples/fftw.md)

# Building using a definition file

- Provides transparency
- Everybody can see what commands were used to build the container
- Definition files reusable
- Updating the software typically only requires minor changes to the file
- Can be a bit cumbersome if you have to try many things (e.g. installing missing libraries)

# Building using sandbox mode

- 3. Build a production image from the sandbox
- 4. (optional) Make a definition file and build a production image from it
- Mostly necessary if you wish to distribute your container
- Also helps with updating and reusing containers
- The production image can be transferred to, e.g., Puhti and run with user-level rights
- container created as a directory structure instead of an image file
- Installation done interactively
- Easier to test different options
- A production image needs to be built for general use
- Resulting image is a "black box"
- No record left of installation commands used
48 changes: 36 additions & 12 deletions part-2/containers/creating-containers.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,12 @@ permalink: /hands-on/singularity/singularity_extra_creating-containers.html

# Creating Apptainer containers

This is an extra exercise which can not be run on Puhti. You will need access to a computer or virtual machine where you have root privileges and that has Apptainer (v1.1.x) installed.

In this tutorial we will create an Apptainer container and install the same software as we installed in the tutorial ["Installing a simple C code from source"](https://csc-training.github.io/csc-env-eff/hands-on/installing/installing_hands-on_mcl.html). Feel free to revisit that tutorial for more information on the installation commands.

CSC supercomputers support the `fakeroot` feature of Apptainer, so it is possible to build
container images without root privileges. There are some limitations, so it is possible to run into problems, especially when using package managers. In these cases it is necessary to either use an
alternate installation method for the dependency, or build in system where you do have root privileges.

We will only cover Apptainer basics here. Detailed instructions can be found in the [official Apptainer documentation](https://apptainer.org/docs/user/latest/quick_start.html).

## Sandbox mode
Expand All @@ -34,25 +36,41 @@ MirrorURL: http://mirror.centos.org/centos-%{OSVERSION}/%{OSVERSION}/os/$basearc
Include: yum
```

By default Apptainer uses the home directory for cached files. As the home directory is quite
small and easily fills up, it is recommended to use some other directory. For example to use
$TMPDIR (make sure it is defined) set:

```bash
export APPTAINER_CACHEDIR=$TMPDIR
```

You can clean the cache with command:

```bash
apptainer cache clean
```

Using this definition file, build the container:

```bash
sudo apptainer build --sandbox mcl centos.def
apptainer build --fakeroot --sandbox mcl centos.def
```

Note that instead of an image file, we created a directory called `mcl`. If you need to include some reference files etc., you can copy them to the correct subdirectory.

We can now open a shell in the container. We need the container file system to be writable, so we include the option `--writable`:
We can now open a shell in the container. We need the container file system to be writable, so we include the option `--writable`. We will also need to include `--fakeroot`:

```bash
sudo apptainer shell --writable mcl
apptainer shell --fakeroot --writable mcl
```

The command prompt should now be `Apptainer>`

If there is a need to make the container as small as possible, we should only install the dependencies we need. Usually the size is not that critical, so we may opt for ease of use.
The base container images are typically very barebones and do not contain any compilers,
download tools etc, so those need to be installed. If there is a need to make the container as small as possible, we should only install the dependencies we need. Usually the size is not that critical, so we may opt for ease of use.

In this case we will install the application group "Development Tools" that includes most of the components we need (C, C++, make), but also a lot of currently less important tools.
In this case we will install the application group "Development Tools" that includes most of the components we need (C, C++, make), but also a lot of tools not needed in this example. We also
install `wget` to download the source code.

Notice that unlike on CSC supercomputers, we are able to use package management tools (in this case `yum`). This will often make installing libraries and other dependencies easier. Also notice that it is not necessary to use `sudo` inside the container.

Expand Down Expand Up @@ -107,21 +125,21 @@ We can now exit the container:
exit
```

In order to run the container without root privileges, build a production image from the sandbox:
We can then build a production image from the sandbox:

```bash
sudo apptainer build mcl.sif mcl
apptainer build --fakeroot mcl.sif mcl
```

We can now test it. Note that `sudo` is no longer needed:
We can now test it:

```bash
apptainer exec mcl.sif mcl --version
```

## Definition file

The above method is applicable as is if you intend the container to be only used by you and your close collaborators. However, if you plan to distribute it wider, it's best to write a definition file for it. That way the other users can see what is in the container and they can, if they so choose, easily rebuild the production image.
The above method is fine if you intend the container to be only used by you and your close collaborators. However, if you plan to distribute it wider, it's best to write a definition file for it. That way the other users can see what is in the container, and they can, if they so choose, easily rebuild the production image.

A definition file will also make it easier to modify and reuse the container later. For example, software updates can often be done simply by modifying the version number in the definition file and rebuilding the image.

Expand Down Expand Up @@ -170,4 +188,10 @@ Include: yum
exec /bin/bash "$@"
```

In more complex cases, it often helpful to first build the image in the sandbox mode and make note of all the commands needed.
You can now build the image:

```bash
apptainer build --fakeroot mcl.sif mcl.def
```

In more complex cases, it often helpful to first build the image in the sandbox mode and make note of all the commands needed. You can then write a definition file to replicate the necessary steps.
41 changes: 3 additions & 38 deletions part-2/containers/replicating-conda.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,7 @@ permalink: /hands-on/singularity/singularity_extra_replicating-conda.html

# Replicating a Conda environment in a container

This is an extra exercise which can not be run on Puhti. You will need access to a computer or virtual machine where you have root privileges and that has Apptainer installed.

On Puhti, you can use [Tykky](https://docs.csc.fi/computing/containers/tykky/) to easily containerize Conda environments. This method is recommended over the manual procedure detailed in this exercise, which is mainly provided for you to develop your skills in working with containers. For tutorials on using Tykky, see:
On CSC supercomputers you can use [Tykky](https://docs.csc.fi/computing/containers/tykky/) to easily containerize Conda environments. This method is recommended over the manual procedure detailed in this exercise, which is mainly provided for you to develop your skills in working with containers. For tutorials on using Tykky, see:

- [Containerizing a Conda environment with Tykky](https://csc-training.github.io/csc-env-eff/hands-on/installing/installing_hands-on_python.html#example-containerizing-a-conda-environment-with-tykky)
- [Installing packages from Bioconda using Tykky](https://csc-training.github.io/csc-env-eff/hands-on/modules/module-exercise-with-aligners.html#extra-installing-packages-from-bioconda)
Expand All @@ -24,7 +22,7 @@ Conda is a useful tool for installing software with complex dependencies. It has

The main problems of Conda environments are related to storage. Conda environments are quite large, containing tens to hundreds of thousands of files. Just 3-4 environments are enough to fill the basic quota of a project's `/projappl` directory. Moreover, many of these files will be accessed each time you launch a program installed with Conda, generating massive I/O load which may degrade the performance of the system for all users.

Conda environments can also be somewhat sensitive to changes in the base system, meaning that, e.g., updates on Puhti can sometimes break existing Conda environments, necessitating a re-install.
Conda environments can also be somewhat sensitive to changes in the base system, meaning that e.g. system updates can sometimes break existing Conda environments, necessitating a re-install.

Using an Apptainer container can help with both problems. A container is just a single file that is typically smaller than the total size of the Conda environment directory. It is also less sensitive to changes in the host system.

Expand Down Expand Up @@ -58,7 +56,6 @@ In addition to the `environment.yml` file, you will need an Apptainer definition

```text
Bootstrap: docker
From: continuumio/miniconda3
%files
Expand All @@ -82,7 +79,7 @@ From: continuumio/miniconda3
Make sure the files `environment.yml` and `conda_environment.def` are in the current directory and give the command:

```bash
sudo apptainer build fastx.sif conda_environment.def
apptainer build --fakeroot fastx.sif conda_environment.def
```

This will build an Apptainer image file called `fastx.sif`. We can now verify that it works:
Expand All @@ -91,35 +88,3 @@ This will build an Apptainer image file called `fastx.sif`. We can now verify th
apptainer exec fastx.sif fastq_to_fasta -h
```

The image file could now be transferred to and used on Puhti.

## Comparison of installation methods

This particular environment was chosen because it is a good "bad example" of the effects different installation methods can have.

The software package is a collection of applications written in C++ with only a few dependencies. Usually, similar packages are best installed natively. In this case, however, the code is quite old, and it will not compile with modern versions of `gcc` without some changes to the source code.

The software is available in the Bioconda repository, so it can also be installed with:

```bash
conda install fastx_toolkit
```

- Good: Can be done with user privileges
- Bad: Using this method, you will end up with a directory with a total size of about 1 GB and over 26000 files. The default file number limit for `/projappl` is 100000 files, so this single installation would already use more than 25 % of that.

Containerizing the Conda environment like we did in this exercise is better:

- Good: We ended up with a single 465 MB file. The default capacity limit of `/projappl` is 50 GB, so this installation would only use less than 1 % of the quota.
- Good: Although containerization as outlined above cannot be done directly on Puhti, you can use Tykky to circumvent the need for root privileges (see the tutorials linked at the top).

In this case there's also another good option – converting a ready-made Docker container:

```bash
apptainer build fastx.sif docker://biocontainers/fastx-toolkit:v0.0.14-6-deb_cv1
```

- Good: This can be done with user-level rights also on Puhti and you'll end up with a single 61 MB file.
- Bad: Finding a ready, working container may take some time.

Containers are not a "silver bullet" solution to all installation problems, but they are nonetheless a much more favorable alternative to direct Conda installations on HPC systems.

0 comments on commit 94c3208

Please sign in to comment.