Merge pull request #322 from csc-training/10-containers-dev

Course development: Topic 10 - Containers and Apptainer
csc-training · May 10, 2024 · 94c3208 · 94c3208
2 parents e3b5591 + ede2bbd
commit 94c3208
Show file tree

Hide file tree

Showing 3 changed files with 76 additions and 76 deletions.
diff --git a/_slides/09_singularity.md b/_slides/09_singularity.md
@@ -27,7 +27,7 @@ Unported License, [http://creativecommons.org/licenses/by-sa/4.0/](http://creati
 # Containers
 
 - Containers are a way of packaging software and their dependencies (libraries, etc.)
-- Popular container engines include Docker, Apptainer (previously called Singularity), Shifter
+- Popular container engines include Docker, Apptainer (previously called Singularity), Shifter, Podman etc
 - Apptainer is most popular in HPC environments
 
 # Containers vs. virtual machines (1/2)
@@ -43,12 +43,11 @@ Unported License, [http://creativecommons.org/licenses/by-sa/4.0/](http://creati
 # Benefits of containers: Ease of installation
 
 - Containers are becoming a popular way of distributing software
-  - A single-command installation
+  - A single-command installation from existing image
   - More portable since all dependencies are included
-  - Normal user rights are enough when using an existing container
-- Root access on build system is enough
-  - Root access, package managers (yum, apt, etc.) can be utilized even when not available on the target system.
-  - Makes installing libraries easier
+- Limited root privileges inside the container if the build system supports it
+  - Package managers (yum, apt, etc.) can be utilized even when not available on the target system.
+  - Some containers need full root access in to build
 
 # Benefits of containers: Environment isolation
 
@@ -66,12 +65,12 @@ Unported License, [http://creativecommons.org/licenses/by-sa/4.0/](http://creati
 # Apptainer in a nutshell
 
 - Containers can be run with user-level rights
-  - But: building new containers requires root access
+  - But: building new containers requires root access or support for `--fakeroot` option
 - Minimal performance overhead
 - Supports MPI
   - Requires containers tailored to the host system
 - Can use host driver stack (Nvidia/CUDA)
-  - Add the option `--nv`
+  - Add option `--nv`
 - Can import and run Docker containers
   - Running Docker directly would require root privileges
 
@@ -119,7 +118,7 @@ export SING_IMAGE=/path/to/container.sif
 apptainer_wrapper exec myprog <options>
 ```
 
-- Since `$SING_IMAGE` is set, the image file name is not needed in the `apptainer_wrapper` command
+- Additional options can be set with variable `$SING_FLAGS`, e.g. `export SING_FLAGS=--nv`
 
 # Using Docker containers with Apptainer
 
@@ -139,31 +138,43 @@ apptainer_wrapper exec myprog <options>
   - Complex installations with many dependencies/files
   - Obsolete dependencies incompatible with the native environment
     - Still needs to be kernel-compatible
-- Should be considered even when other methods exist
+  - Image is a single file
 
 # Just a random example (FASTX-toolkit)
 
 - Tested installation methods:
   - Native: 47 files, total size 1.9 MB
+    - Needed changes to source code to compile
   - Conda: 27464 files, total size  1.1 GB
   - Apptainer: 1 file, total size 339 MB
-- Containers are not the solution for everything, but they do have their uses
-  - Especially Conda environments should always be containerized to avoid file system issues (see [Tykky](https://docs.csc.fi/computing/containers/tykky/))
 
-# Building a new Apptainer container (1/2)
+# Methods of building a new Apptainer container
 
-- ‼️ Requires root access: Can not be done directly on, e.g., Puhti
+- Building using [Tykky](https://docs.csc.fi/computing/containers/tykky/))
+- Building from a definition (aka recipe) file
+- Building in "sandbox" mode
 
-- 1. Build a basic container in sandbox mode (`--sandbox`)
-  - Uses a folder structure instead of an image file
-- 2. Open a shell in the container and install the software
-  - Depending on the base image system, package managers can be used to install libraries and dependencies (`apt install`,s `yum install` etc.)
-  - Installation following the instructions of the software developer
-
-# Building a new Apptainer container (2/2)
+# Building using Tykky
+
+- Especially suited for Conda environments
+  - Can take an environment YAML file as an input
+- Can be used for any application type
+  - Use `--post-install <file>`to run the installation commands
+  - See [example](https://github.com/CSCfi/hpc-container-wrapper/blob/master/examples/fftw.md)
+
+# Building using a definition file
+
+- Provides transparency
+  - Everybody can see what commands were used to build the container
+- Definition files reusable
+  - Updating the software typically only requires minor changes to the file
+- Can be a bit cumbersome if you have to try many things (e.g. installing missing libraries)  
+
+# Building using sandbox mode
 
-- 3. Build a production image from the sandbox
-- 4. (optional) Make a definition file and build a production image from it
-  - Mostly necessary if you wish to distribute your container
-  - Also helps with updating and reusing containers
-- The production image can be transferred to, e.g., Puhti and run with user-level rights
+- container created as a directory structure instead of an image file
+- Installation done interactively
+  - Easier to test different options
+- A production image needs to be built for general use
+- Resulting image is a "black box"
+  - No record left of installation commands used
diff --git a/part-2/containers/creating-containers.md b/part-2/containers/creating-containers.md
@@ -11,10 +11,12 @@ permalink: /hands-on/singularity/singularity_extra_creating-containers.html
 
 # Creating Apptainer containers
 
-This is an extra exercise which can not be run on Puhti. You will need access to a computer or virtual machine where you have root privileges and that has Apptainer (v1.1.x) installed.
-
 In this tutorial we will create an Apptainer container and install the same software as we installed in the tutorial ["Installing a simple C code from source"](https://csc-training.github.io/csc-env-eff/hands-on/installing/installing_hands-on_mcl.html). Feel free to revisit that tutorial for more information on the installation commands.
 
+CSC supercomputers support the `fakeroot` feature of Apptainer, so it is possible to build
+container images without root privileges. There are some limitations, so it is possible to run into problems, especially when using package managers. In these cases it is necessary to either use an
+alternate installation method for the dependency, or build in system where you do have root privileges.
+
 We will only cover Apptainer basics here. Detailed instructions can be found in the [official Apptainer documentation](https://apptainer.org/docs/user/latest/quick_start.html).
 
 ## Sandbox mode
@@ -34,25 +36,41 @@ MirrorURL: http://mirror.centos.org/centos-%{OSVERSION}/%{OSVERSION}/os/$basearc
 Include: yum
 ```
 
+By default Apptainer uses the home directory for cached files. As the home directory is quite
+small and easily fills up, it is recommended to use some other directory. For example to use
+$TMPDIR (make sure it is defined) set:
+
+```bash
+export APPTAINER_CACHEDIR=$TMPDIR
+```
+
+You can clean the cache with command:
+
+```bash
+apptainer cache clean
+```
+
 Using this definition file, build the container:
 
 ```bash
-sudo apptainer build --sandbox mcl centos.def
+apptainer build --fakeroot --sandbox mcl centos.def
 ```
 
 Note that instead of an image file, we created a directory called `mcl`. If you need to include some reference files etc., you can copy them to the correct subdirectory.
 
-We can now open a shell in the container. We need the container file system to be writable, so we include the option `--writable`:
+We can now open a shell in the container. We need the container file system to be writable, so we include the option `--writable`. We will also need to include `--fakeroot`:
 
 ```bash
-sudo apptainer shell --writable mcl
+apptainer shell --fakeroot --writable mcl
 ```
 
 The command prompt should now be `Apptainer>`
 
-If there is a need to make the container as small as possible, we should only install the dependencies we need. Usually the size is not that critical, so we may opt for ease of use.
+The base container images are typically very barebones and do not contain any compilers, 
+download tools etc, so those need to be installed. If there is a need to make the container as small as possible, we should only install the dependencies we need. Usually the size is not that critical, so we may opt for ease of use.
 
-In this case we will install the application group "Development Tools" that includes most of the components we need (C, C++, make), but also a lot of currently less important tools.
+In this case we will install the application group "Development Tools" that includes most of the components we need (C, C++, make), but also a lot of tools not needed in this example. We also
+install `wget` to download the source code.
 
 Notice that unlike on CSC supercomputers, we are able to use package management tools (in this case `yum`). This will often make installing libraries and other dependencies easier. Also notice that it is not necessary to use `sudo` inside the container.
 
@@ -107,21 +125,21 @@ We can now exit the container:
 exit
 ```
 
-In order to run the container without root privileges, build a production image from the sandbox:
+We can then build a production image from the sandbox:
 
 ```bash
-sudo apptainer build mcl.sif mcl
+apptainer build --fakeroot mcl.sif mcl
 ```
 
-We can now test it. Note that `sudo` is no longer needed:
+We can now test it:
 
 ```bash
 apptainer exec mcl.sif mcl --version
 ```
 
 ## Definition file
 
-The above method is applicable as is if you intend the container to be only used by you and your close collaborators. However, if you plan to distribute it wider, it's best to write a definition file for it. That way the other users can see what is in the container and they can, if they so choose, easily rebuild the production image.
+The above method is fine if you intend the container to be only used by you and your close collaborators. However, if you plan to distribute it wider, it's best to write a definition file for it. That way the other users can see what is in the container, and they can, if they so choose, easily rebuild the production image.
 
 A definition file will also make it easier to modify and reuse the container later. For example, software updates can often be done simply by modifying the version number in the definition file and rebuilding the image.
 
@@ -170,4 +188,10 @@ Include: yum
     exec /bin/bash "$@"
 ```
 
-In more complex cases, it often helpful to first build the image in the sandbox mode and make note of all the commands needed.
+You can now build the image:
+
+```bash
+apptainer build --fakeroot mcl.sif mcl.def
+```
+
+In more complex cases, it often helpful to first build the image in the sandbox mode and make note of all the commands needed. You can then write a definition file to replicate the necessary steps.
diff --git a/part-2/containers/replicating-conda.md b/part-2/containers/replicating-conda.md
@@ -11,9 +11,7 @@ permalink: /hands-on/singularity/singularity_extra_replicating-conda.html
 
 # Replicating a Conda environment in a container
 
-This is an extra exercise which can not be run on Puhti. You will need access to a computer or virtual machine where you have root privileges and that has Apptainer installed.
-
-On Puhti, you can use [Tykky](https://docs.csc.fi/computing/containers/tykky/) to easily containerize Conda environments. This method is recommended over the manual procedure detailed in this exercise, which is mainly provided for you to develop your skills in working with containers. For tutorials on using Tykky, see:
+On CSC supercomputers you can use [Tykky](https://docs.csc.fi/computing/containers/tykky/) to easily containerize Conda environments. This method is recommended over the manual procedure detailed in this exercise, which is mainly provided for you to develop your skills in working with containers. For tutorials on using Tykky, see:
 
 - [Containerizing a Conda environment with Tykky](https://csc-training.github.io/csc-env-eff/hands-on/installing/installing_hands-on_python.html#example-containerizing-a-conda-environment-with-tykky)
 - [Installing packages from Bioconda using Tykky](https://csc-training.github.io/csc-env-eff/hands-on/modules/module-exercise-with-aligners.html#extra-installing-packages-from-bioconda)
@@ -24,7 +22,7 @@ Conda is a useful tool for installing software with complex dependencies. It has
 
 The main problems of Conda environments are related to storage. Conda environments are quite large, containing tens to hundreds of thousands of files. Just 3-4 environments are enough to fill the basic quota of a project's `/projappl` directory. Moreover, many of these files will be accessed each time you launch a program installed with Conda, generating massive I/O load which may degrade the performance of the system for all users.
 
-Conda environments can also be somewhat sensitive to changes in the base system, meaning that, e.g., updates on Puhti can sometimes break existing Conda environments, necessitating a re-install.
+Conda environments can also be somewhat sensitive to changes in the base system, meaning that e.g. system updates can sometimes break existing Conda environments, necessitating a re-install.
 
 Using an Apptainer container can help with both problems. A container is just a single file that is typically smaller than the total size of the Conda environment directory. It is also less sensitive to changes in the host system.
 
@@ -58,7 +56,6 @@ In addition to the `environment.yml` file, you will need an Apptainer definition
 
 ```text
 Bootstrap: docker
-
 From: continuumio/miniconda3
 
 %files
@@ -82,7 +79,7 @@ From: continuumio/miniconda3
 Make sure the files `environment.yml` and `conda_environment.def` are in the current directory and give the command:
 
 ```bash
-sudo apptainer build fastx.sif conda_environment.def
+apptainer build --fakeroot fastx.sif conda_environment.def
 ```
 
 This will build an Apptainer image file called `fastx.sif`. We can now verify that it works:
@@ -91,35 +88,3 @@ This will build an Apptainer image file called `fastx.sif`. We can now verify th
 apptainer exec fastx.sif fastq_to_fasta -h
 ```
 
-The image file could now be transferred to and used on Puhti.
-
-## Comparison of installation methods
-
-This particular environment was chosen because it is a good "bad example" of the effects different installation methods can have.
-
-The software package is a collection of applications written in C++ with only a few dependencies. Usually, similar packages are best installed natively. In this case, however, the code is quite old, and it will not compile with modern versions of `gcc` without some changes to the source code.
-
-The software is available in the Bioconda repository, so it can also be installed with:
-
-```bash
-conda install fastx_toolkit
-```
-
-- Good: Can be done with user privileges
-- Bad: Using this method, you will end up with a directory with a total size of about 1 GB and over 26000 files. The default file number limit for `/projappl` is 100000 files, so this single installation would already use more than 25 % of that.
-
-Containerizing the Conda environment like we did in this exercise is better:
-
-- Good: We ended up with a single 465 MB file. The default capacity limit of `/projappl` is 50 GB, so this installation would only use less than 1 % of the quota.
-- Good: Although containerization as outlined above cannot be done directly on Puhti, you can use Tykky to circumvent the need for root privileges (see the tutorials linked at the top).
-
-In this case there's also another good option – converting a ready-made Docker container:
-
-```bash
-apptainer build fastx.sif docker://biocontainers/fastx-toolkit:v0.0.14-6-deb_cv1
-```
-
-- Good: This can be done with user-level rights also on Puhti and you'll end up with a single 61 MB file.
-- Bad: Finding a ready, working container may take some time.
-
-Containers are not a "silver bullet" solution to all installation problems, but they are nonetheless a much more favorable alternative to direct Conda installations on HPC systems.