mpi4py-ve is an extension to mpi4py, which provides Python bindings for the Message Passing Interface (MPI). This package also supports to communicate array objects of NLCPy (nlcpy.ndarray) between MPI processes on x86 servers of SX-Aurora TSUBASA systems. Combining NLCPy with mpi4py-ve enables Python scripts to utilize multi-VE computing power. The current version of mpi4py-ve is based on mpi4py version 3.0.3. For details of API references, please refer to mpi4py manual.
Before the installation, the following components are required to be installed on your x86 Node of SX-Aurora TSUBASA.
- Alternative VE Offloading (AVEO)
- required version: >= 3.0.2
- NEC MPI
- required NEC MPI version: >= 2.26.0 (for Mellanox OFED 4.x) or >= 3.5.0 (for Mellanox OFED 5.x)
- Python
- required version: 3.6, 3.7, or 3.8
- NumPy
- required version: v1.17, v1.18, v1.19, or v1.20
- NLC(optional)
- required version: >= 3.0.0
- NLCPy(optional)
- required version: >= 3.0.1
Since December 2022, mpi4py-ve has been provided as a software of NEC SDK (NEC Software Development Kit for Vector Engine). If NEC SDK on your machine has been properly installed or updated after that, mpi4py-ve is available by using /usr/bin/python3 command.
You can install mpi4py-ve by executing either of the following commands.
Install from PyPI
$ pip install mpi4py-ve
Install from your local computer
Download the wheel package from GitHub.
Put the wheel package to your any directory.
Install the local wheel package via pip command.
$ pip install <path_to_wheel>
- The shared objects for Vector Host, which are included in the wheel package, are compiled by gcc 4.8.5 and tested by using following software:
NEC MPI v2.26.0 and v3.5.0 NumPy v1.19.5 NLCPy v3.0.1
Before building this package, you need to execute the environment setup script necmpivars.sh or necmpivars.csh once advance.
When using sh or its variant:
For VE30
$ source /opt/nec/ve3/mpi/X.X.X/bin/necmpivars.sh
For VE20, VE10, or VE10E
$ source /opt/nec/ve/mpi/X.X.X/bin/necmpivars.sh
When using csh or its variant:
For VE30
% source /opt/nec/ve3/mpi/X.X.X/bin/necmpivars.csh
For VE20, VE10, or VE10E
% source /opt/nec/ve/mpi/X.X.X/bin/necmpivars.csh
Here, X.X.X denotes the version number of NEC MPI.
After that, execute the following commands:
$ git clone https://github.com/SX-Aurora/mpi4py-ve.git $ cd mpi4py-ve $ python setup.py build --mpi=necmpi $ python setup.py install
Transfer Array
Transfers an NLCPy's ndarray from MPI rank 0 to 1 by using comm.Send() and comm.Recv():
from mpi4pyve import MPI
import nlcpy as vp
comm = MPI.COMM_WORLD
size = comm.Get_size()
rank = comm.Get_rank()
if rank == 0:
x = vp.array([1,2,3], dtype=int)
comm.Send(x, dest=1)
elif rank == 1:
y = vp.empty(3, dtype=int)
comm.Recv(y, source=0)
Sum of Numbers
Sums the numbers locally, and reduces all the local sums to the root rank (rank=0):
from mpi4pyve import MPI
import nlcpy as vp
comm = MPI.COMM_WORLD
size = comm.Get_size()
rank = comm.Get_rank()
N = 1000000000
begin = N * rank // size
end = N * (rank + 1) // size
sendbuf = vp.arange(begin, end).sum()
recvbuf = comm.reduce(sendbuf, MPI.SUM, root=0)
The following table shows the performance results[msec] on VE Type 20B:
np=1 | np=2 | np=3 | np=4 | np=5 | np=6 | np=7 | np=8 |
35.8 | 19.0 | 12.6 | 10.1 | 8.1 | 7.0 | 6.0 | 5.5 |
When executing Python script using mpi4py-ve, use mpirun command of NEC MPI on an x86 server of SX-Aurora TSUBASA. Before running the Python script, you need to execute the environment the following setup scripts once advance.
When using sh or its variant:
For VE30
$ source /opt/nec/ve3/mpi/X.X.X/bin/necmpivars.sh gnu 4.8.5
For VE20, VE10, or VE10E
$ source /opt/nec/ve/mpi/X.X.X/bin/necmpivars.sh gnu 4.8.5
When using csh or its variant:
For VE30
% source /opt/nec/ve3/mpi/X.X.X/bin/necmpivars.csh gnu 4.8.5
For VE20, VE10, or VE10E
% source /opt/nec/ve/mpi/X.X.X/bin/necmpivars.csh gnu 4.8.5
Here, X.X.X denotes the version number of NEC MPI.
When using the mpirun command:
$ mpirun -veo -np N $(which python) sample.py
$ export NMPI_USE_COMMAND_SEARCH_PATH=ON $ mpirun -veo -np N python sample.py
The following examples show how to launch MPI programs that use mpi4py-ve and NLCPy on the SX-Aurora TSUBASA.
Interactive Execution
Execution on one VE
Example of using 4 processes on local VH and 4 VE processes (ncore / 4 OpenMP parallel per process) on VE#0 of local VH
$ mpirun -veo -np 4 python a.py
Execution on multiple VEs on a VH
Example of using 4 processes on local VH and 4 VE processes (1 process per VE, ncore OpenMP parallel per process) on VE#0 to VE#3 of local VH
$ VE_NLCPY_NODELIST=0,1,2,3 mpirun -veo -np 4 python a.py
Example of using 32 processes on local VH and 32 VE processes (8 processes per VE, ncore / 8 OpenMP parallel per process) on VE#0 to VE# 3 of local VH
$ VE_NLCPY_NODELIST=0,1,2,3 mpirun -veo -np 32 python a.py
Execution on multiple VEs on multiple VHs
Example of using a total of 32 processes on two VHs host1 and host2, and a total of 32 VE processes on VE#0 and VE#1 of each VH (8 processes per VE, ncore / 8 OpenMP parallel per process)
$ VE_NLCPY_NODELIST=0,1 mpirun -hosts host1,host2 -veo -np 32 python a.py
NQSV Request Execution
Execution on a specific VH, on a VE
Example of using 32 processes on logical VH#0 and 32 VE processes on logical VE#0 to logical VE#3 on logical VH#0 (8 processes per VE, ncore / 8 OpenMP parallel per process)
#PBS -T necmpi #PBS -b 2 # The number of logical hosts #PBS --venum-lhost=4 # The number of VEs per logical host #PBS --cpunum-lhost=32 # The number of CPUs per logical host source /opt/nec/ve/mpi/X.X.X/bin/necmpivars.sh export NMPI_USE_COMMAND_SEARCH_PATH=ON mpirun -host 0 -veo -np 32 python a.py
Execution on a specific VH, on a specific VE
Example of using 16 processes on logical VH#0, 16 VE processes in total on logical VE#0 and logical VE#3 on logical VH#0 (8 processes per VE, ncore / 8 OpenMP parallel per process)
#PBS -T necmpi #PBS -b 2 # The number of logical hosts #PBS --venum-lhost=4 # The number of VEs per logical host #PBS --cpunum-lhost=16 # The number of CPUs per logical host source /opt/nec/ve/mpi/X.X.X/bin/necmpivars.sh export NMPI_USE_COMMAND_SEARCH_PATH=ON VE_NLCPY_NODELIST=0,3 mpirun -host 0 -veo -np 16 python a.py
Execution on all assigned VEs
Example of using 32 processes in total on 4 VHs and using 32 VE processes in total from logical VE#0 to logical VE#7 on each of VHs (1 process per VE, ncore OpenMP parallel per process).
#PBS -T necmpi #PBS -b 4 # The number of logical hosts #PBS --venum-lhost=8 # The number of VEs per logical host #PBS --cpunum-lhost=8 # The number of CPUs per logical host #PBS --use-hca=2 # The number of HCAs source /opt/nec/ve/mpi/X.X.X/bin/necmpivars.sh export NMPI_USE_COMMAND_SEARCH_PATH=ON mpirun -veo -np 32 python a.py
Here, X.X.X denotes the version number of NEC MPI.
NEC MPI provides the facility of displaying MPI communication information. There are two formats of MPI communication information available as follows:
Reduced Format | The maximum, minimum, and average values of MPI communication information of all MPI processes are displayed. |
Extended Format | MPI communication information of each MPI process is displayed in the ascending order of their ranks in the communicator MPI_COMM_WORLD after the information in the reduced format. |
You can control the display and format of MPI communication information by setting the environment variable NMPI_COMMINF at runtime as shown in the following table.
The Settings of NMPI_COMMINF:
NMPI_COMMINF | Displayed Information |
NO | (Default) No Output |
YES | Reduced Format |
ALL | Extended Format |
When using the mpirun command:
$ export NMPI_COMMINF=ALL $ mpirun -veo -np N python sample.py
Below links would be useful to use mpi4py-ve with homebrew classes (without NLCPy):
Below links would be useful to understand mpi4py-ve in more detail:
- The current version of mpi4py-ve does not support some functions that are listed in the section "List of Unsupported Functions" of mpi4py-ve tutorial.
- Communication of type bool between NumPy and NLCPy will fail because of the different number of bytes.
If you import NLCPy before calling MPI_Init()/MPI_Init_thread(), a runtime error will be raised.
Not recommended usage:
$ mpirun -veo -np 1 $(which python) -c "import nlcpy; from mpi4pyve import MPI" RuntimeError: NLCPy must be import after MPI initialization
Recommended usage:
$ mpirun -veo -np 1 $(which python) -c "from mpi4pyve import MPI; import nlcpy"
MPI_Init() or MPI_Init_thread() is called when you import the MPI module from the mpi4pyve package.
If you use the Lock/Lock_all function for one-sided communication using NLCPy array data, you need to put in NLCPy synchronization control.
Synchronization usage:
import mpi4pyve from mpi4pyve import MPI import nlcpy as vp comm = MPI.COMM_WORLD size = comm.Get_size() rank = comm.Get_rank() array = vp.array(0, dtype=int) if rank == 0: win_n = MPI.Win.Create(array, comm=MPI.COMM_WORLD) else: win_n = MPI.Win.Create(None, comm=MPI.COMM_WORLD) if rank == 0: array.fill(1) array.venode.synchronize() comm.Barrier() if rank != 0: comm.Barrier() win_n.Lock(MPI.LOCK_EXCLUSIVE, 0) win_n.Get([array, MPI.INT], 0) win_n.Unlock(0) assert array == 1 comm.Barrier() win_n.Free()