From da63258ad6104a48f4910fc9371f5f75d0dc1b54 Mon Sep 17 00:00:00 2001 From: Simon Wilson Date: Fri, 5 Mar 2021 14:50:21 +0000 Subject: [PATCH] Updates to docs --- README.md | 47 ++++++++++++++++++++++++++--------------------- lfric.sub | 4 ++-- lfric_env.def | 2 +- 3 files changed, 29 insertions(+), 24 deletions(-) diff --git a/README.md b/README.md index 41becfa..123c874 100644 --- a/README.md +++ b/README.md @@ -6,12 +6,11 @@ It is based on [Fedora](https://getfedora.org/) and includes all of the software A compiler is **not** required on the build and run machine where the container is deployed. All compilation of LFRic is done via the containerised compilers. -LFRic components are built using a shell within the container. -The shell automatically sets up the build environment when invoked. +LFRic components are built using a shell within the container and the shell automatically sets up the build environment when invoked. -The LFRic source code is not containerised, it is retrieved as usual via subversion from within the container shell so there is no need to rebuild the container for LFRic trunk updates. +The LFRic source code is not containerised, it is retrieved as usual via subversion from within the container shell so there is no need to rebuild the container for LFRic code updates. -The container is compatible with [slurm](https://slurm.schedmd.com/documentation.html) and the compiled executable can be run in batch, using the local MPI libraries, if the host system has an [MPICH ABI](https://www.mpich.org/abi/) compatible MPI. +The container is compatible with [slurm](https://slurm.schedmd.com/documentation.html), and the compiled executable can be run in batch using the local MPI libraries, if the host system has an [MPICH ABI](https://www.mpich.org/abi/) compatible MPI. A rebuilt container is available from [Sylabs Cloud](https://cloud.sylabs.io/library/simonwncas/default/test). @@ -23,9 +22,10 @@ lfric.sub is and example ARCHER2 submission script. # Requirements ## Base requirement -Linux host to build and run. -[Singularity](https://sylabs.io/) 3.0+ (3.7 preferred); Access to [Met Office Science Repository Service](https://code.metoffice.gov.uk) +[Singularity](https://sylabs.io/) 3.0+ (3.7 preferred) + +Access to [Met Office Science Repository Service](https://code.metoffice.gov.uk) ## Optional requirements @@ -44,7 +44,9 @@ singularity pull [--disable-cache] lfric_env.sif library://simonwncas/default/lf ``` Note: `--disable-cache` is required if using Archer2. -* Build container using `lfric_env.sif`. +or: + +* Build container using `lfric_env.def`. ``` sudo singularity build lfric_env.sif lfric_env.def ``` @@ -99,11 +101,10 @@ cd example mpiexec -np 6 ../bin/gungho configuration.nml ``` Note: This uses the MPI runtime libraries built into in the container. If the host machine has a MPICH based MPI (MPICH, Intel MPI, Cray MPT, MVAPICH2), then see below on how to use [MPICH ABI](https://www.mpich.org/abi/) to access the local MPI and therefore the fast interconnects when running the executable via the container. -OpenMPI will not work with this method. # Using MPICH ABI -This approach is a variation on the [Singularity MPI Bind model](https://sylabs.io/guides/3.7/user-guide/mpi.html#bind-model). The compiled model executable is run within the container with suitable options to allow access to the local MPI installation. At runtime, containerised libraries are used by the executable apart from the local MPI libraries. +This approach is a variation on the [Singularity MPI Bind model](https://sylabs.io/guides/3.7/user-guide/mpi.html#bind-model). The compiled model executable is run within the container with suitable options to allow access to the local MPI installation. At runtime, containerised libraries are used by the executable apart from the local MPI libraries. OpenMPI will not work with this method. Note: this only applies when a model is run, the executable is compiled using the method above, without any reference to local libraries. @@ -113,11 +114,11 @@ A MPICH ABI compatible MPI is required. These have MPI libraries named `libmpifo ## Build bind points and LD_LIBRARY_PATH -The local MPI libraries need to be made available to the container. Bind points are required so that containerised processes can access the local directories. Also the `LD_LIBRARY_PATH` inside the container needs updating to reflect the path to the local libraries. +The local MPI libraries need to be made available to the container. Bind points are required so that containerised processes can access the local directories which contain the MPI libraries. Also the `LD_LIBRARY_PATH` inside the container needs updating to reflect the path to the local libraries. This method has been tested for slurm, but should for other job control systems. For example, assuming the system MPI libraries are in `/opt/mpich/lib`, set the bind directory with ``` -export BIND_DIR=/opt/mpich +export BIND_OPT="-B /opt/mpich" ``` then for Singularity versions <3.7 ``` @@ -128,38 +129,42 @@ for Singularity v3.7 and over export LOCAL_LD_LIBRARY_PATH="/opt/mpich/lib:\$LD_LIBRARY_PATH" ``` +The entries in `BIND_OPT` are comma separated, while `[SINGULARITYENV_LOCAL_]LD_LIBRARY_PATH` are colon separated. + ## Construct run command and submit -For Singularity versions <3.7, the command to run gungho is now +For Singularity versions <3.7, the command to run gungho within MPI is now ``` -singularity exec $BIND_DIR lfric_env.sif ../bin/gungho configuration.nml +singularity exec $BIND_OPT /lfric_env.sif ../bin/gungho configuration.nml ``` for Singularity v3.7 and over ``` -singularity exec $BIND_DIR --env=LD_LIBRARY_PATH=$LOCAL_LD_LIBRARY_PATH lfric_env.sif ../bin/gungho configuration.nml +singularity exec $BIND_OPT --env=LD_LIBRARY_PATH=$LOCAL_LD_LIBRARY_PATH /lfric_env.sif ../bin/gungho configuration.nml ``` Running with mpirun/slurm is straightforward, just use the standard command for running MPI jobs eg: ``` -mpirun -n singularity exec $BIND_DIR lfric_env.sif ../bin/gungho configuration.nml +mpirun -n singularity exec $BIND_OPT lfric_env.sif ../bin/gungho configuration.nml ``` or ``` -srun --cpu-bind=cores singularity exec $BIND_DIR lfric_env.sif ../bin/gungho configuration.nml +srun --cpu-bind=cores singularity exec $BIND_OPT lfric_env.sif ../bin/gungho configuration.nml ``` on ARCHER2 -If running with slurm, `/var/spool/slurmd` should be appended to `BIND_DIR`, separated with a comma. +If running with slurm, `/var/spool/slurmd` should be appended to `BIND_OPT`, separated with a comma. ## Update for local MPI dependencies -It could be possible that the local MPI libraries have other dependencies which are in other system directories. In this case `BIND_DIR` and `[SINGULARITYENV_]LOCAL_LD_LIBRARY_PATH` have to be updated to reflect these. For example on ARCHER2 these are +It could be possible that the local MPI libraries have other dependencies which are in other system directories. In this case `BIND_OPT` and `[SINGULARITYENV_]LOCAL_LD_LIBRARY_PATH` have to be updated to reflect these. For example on ARCHER2 these are ``` -export BIND_DIR="-B /opt/cray,/usr/lib64:/usr/lib/host,/var/spool/slurmd" +export BIND_OPT="-B /opt/cray,/usr/lib64:/usr/lib/host,/var/spool/slurmd" ``` and ``` export SINGULARITYENV_LOCAL_LD_LIBRARY_PATH=/opt/cray/pe/mpich/8.0.16/ofi/gnu/9.1/lib-abi-mpich:/opt/cray/libfabric/1.11.0.0.233/lib64:/opt/cray/pe/pmi/6.0.7/lib ``` -Discovering these is a process of trail and error where the executable is run via the container and any missing libraries included the the above environment variables. -`/usr/lib/host` Is at the end of `LD_LIBRARY_PATH` in the container, so that the bind point can be used to provide any remaining system libraries dependencies in standard locations such as `/usr/lib64`. +Discovering the missing dependencies is a process of trail and error where the executable is run via the container, and any missing libraries will cause an error and be reported. A suitable bind point and library path is then included in the above environment variables, and the process repeated. + +`/usr/lib/host` Is at the end of `LD_LIBRARY_PATH` in the container, so that this bind point can be used to provide any remaining system libraries dependencies in standard locations. In the above example, there are extra dependencies in `/usr/lib64`, so ` +/usr/lib64:/usr/lib/host` in `BIND_OPT` mounts this as `/usr/lib/host` inside the container, and therefore `/usr/lib64` is appended to the container's `LD_LIBRARY_PATH`. \ No newline at end of file diff --git a/lfric.sub b/lfric.sub index 023cf2b..5372205 100644 --- a/lfric.sub +++ b/lfric.sub @@ -19,7 +19,7 @@ cd /trunk/gungho/example export SINGULARITYENV_LOCAL_LD_LIBRARY_PATH=/opt/cray/pe/mpich/8.0.16/ofi/gnu/9.1/lib-abi-mpich:/opt/cray/libfabric/1.11.0.0.233/lib64:/opt/cray/pe/pmi/6.0.7/lib -export BIND_DIR="-B /opt/cray,/usr/lib64:/usr/lib/host,/var/spool/slurmd" +export BIND_OPT="-B /opt/cray,/usr/lib64:/usr/lib/host,/var/spool/slurmd" -srun --cpu-bind=cores singularity exec $BIND_DIR /lfric_env.sif ../bin/gungho configuration.nml +srun --cpu-bind=cores singularity exec $BIND_OPT /lfric_env.sif ../bin/gungho configuration.nml diff --git a/lfric_env.def b/lfric_env.def index c78da41..8a2a224 100755 --- a/lfric_env.def +++ b/lfric_env.def @@ -162,7 +162,7 @@ export MPICH_DIR=$BASE_DIR/mpich export PFUNIT=$INSTALL_DIR export NETCDF_DIR=$BASE_DIR/netcdf export CPPFLAGS="-I$INSTALL_DIR/include -I$NETCDF_DIR/include" -export FFLAGS="-I$INSTALL_DIR/include -I$NETCDF_DIR/include -I$MPICH_DIR/include" +export FFLAGS="-I$INSTALL_DIR/include -I$INSTALL_DIR/mod -I$NETCDF_DIR/include -I$MPICH_DIR/include" export LDFLAGS="-L$INSTALL_DIR/lib -L$NETCDF_DIR/lib" export PATH=$MPICH_DIR/bin:$NETCDF_DIR/bin:$INSTALL_DIR/bin:/opt/intel/oneapi/compiler/latest/linux/bin/intel64:$PATH export PSYCLONE_CONFIG=/usr/local/share/psyclone/psyclone.cfg