Skip to content

Notes on MPI support

Jacob Nelson edited this page Jan 21, 2015 · 1 revision

This page contains notes on getting Grappa to run with various MPI implementations

MPI requirements

We depend on a bunch of MPI-3 features, including non-blocking collectives, neighborhood collectives, and topology-aware communicators. These require requires a reasonably modern build of MPI.

MPI versions

OpenMPI

OpenMPI 1.7 and later support MPI-3.

MPICH

MVAPICH

MVAPICH2 version 1.9 and later support MPI-3.

When running with mpiexec.hydra under slurm, CPU affinity may be incorrectly set without it being requested, limiting each node's ranks to run on a fraction of the available cores. To work around this, pass the MV2_ENABLE_AFFINITY variable to the processes like this:

salloc --nodes 2 --ntasks-per-node 1 -- mpiexec.hydra -genv MV2_ENABLE_AFFINITY 0 mpi/pt2pt/osu_mbw_mr

Intel MPI

Intel MPI 5.0 and later supports MPI-3.

Intel uses the Hydra process manager from MPICH.

Intel MPI and Slurm

Intel MPI supports direct job launch with slurm by setting the environment variable

export I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so

You will need to update the path to libpmi.so for your system. After setting this variable, Intel MPI jobs can be launched directly with srun.

Others

Schedulers

Slurm

This is what the Grappa team has used the most.

Slurm can be used as just a scheduler (with the salloc command) or as a combination scheduler/job launcher (with the srun command). We prefer the latter mode, but it requires proper configuration of Slurm, the MPI library, and possibly the binary. For Slurm 2.3 and above, it's generally sufficient to link the binary with the PMI library (-lpmi).