iPIC3D with GPU acceleration, supporting multi-node multi-GPU.
,-.----. .--,-``-.
\ / \ ,---, ,----.. / / '. ,---,
,--, | : \ ,`--.' | / / \ / ../ ; .' .' `\
,--.'| | | .\ : | : : | : : \ ``\ .`- ' ,---.' \
| |, . : |: | : | ' . | ;. / \___\/ \ : | | .`\ |
`--'_ | | \ : | : | . ; /--` \ : | : : | ' |
,' ,'| | : . / ' ' ; ; | ; / / / | ' ' ; :
' | | ; | |`-' | | | | : | \ \ \ ' | ; . |
| | : | | ; ' : ; . | '___ ___ / : | | | : | '
' : |__ : ' | | | ' ' ; : .'| / /\ / : ' : | / ;
| | '.'| : : : ' : | ' | '/ : / ,,/ ',- . | | '` ,/
; : ; | | : ; |.' | : / \ ''\ ; ; : .'
| , / `---'.| '---' \ \ .' \ \ .' | ,.'
---`-' `---` `---` `--`-,,-' '---'
Markidis, Stefano, and Giovanni Lapenta. "Multi-scale simulations of plasma with iPIC3D." Mathematics and Computers in Simulation 80.7 (2010): 1509-1519.
To install and run iPIC3D-GPU, you need:
- CUDA/HIP compatible hardware, CUDA capabiliy 7.5 or higher
- cmake, MPI(MPICH and OpenMPI are tested) and HDF5 (optional), C/C++ compiler supporting C++ 17 standard
To meet the requirements of compatability between CUDA and compiler, it's recommended to use a relatively new compiler version e.g. GCC 12
If you are on a super-computer or cluster, it's highly possible that you can use tools like module
to change the compiler, MPI or libraries used.
Git clone this repository or download the zip file of default branch. For example:
git clone https://github.com/iPIC3D/iPIC3D-GPU.git iPIC3D-GPU
cd ./iPIC3D-GPU
Now you are in the project folder
- Create a build directory
mkdir build && cd build
- Use
CMake
to generate the make files
# use .. here because CMakeList.txt would be under project root
cmake .. # using CUDA by default
cmake -DHIP_ON=ON .. # use HIP
If you's like to use HIP, notice the GPU architecture in CMakeLists.txt, change it according to your hardware to get bet performance:
set_property(TARGET iPIC3Dlib PROPERTY HIP_ARCHITECTURES gfx90a)
- Compile with
make
if successful, you will find an executable namediPIC3D
in build directory
make # you can use this single-threaded compile command, but slow
make -j4 # build with 4 threads
make -j # build with max threads, fast, recommended
iPIC3D uses inputfiles to control the simulation, we pass this text file as the only command line argument:
export OMP_NUM_THREADS=2
mpirun -np 8 ./iPIC3D ../share/inputfiles/magneticReconnection/testGEM3Dsmall.inp
With this command, you are using 8 MPI ranks, 2 OpenMP threads per rank.
Important: make sure number of MPI process = XLEN x YLEN x ZLEN
as specified in the input file.
Critical: OpenMP is enabled by default, make sure the number of thread every process is reasonable. Refer to OpenMP for more details.
If you are on a super-computer, especially a multi-node system, it's likely that you should use srun
to launch the program.
Assigning MPI processes to nodes and GPUs are vital in performance, for it decides the pipeline and subdomains in the program.
It's fine to use more than 1 MPI process per GPU. The following example uses 4 nodes, each equipped with 4 GPU:
# 1 MPI process per GPU
srun --nodes=4 --ntasks=16 --ntasks-per-node=4 ./iPIC3D ../share/benchmark/GEM3Dsmall_4x2x2_100/testGEM3Dsmall.inp
# 2 MPI processes per GPU
srun --nodes=4 --ntasks=32 --ntasks-per-node=8 ./iPIC3D ../share/benchmark/GEM3Dsmall_4x4x2_100/testGEM3Dsmall.inp
This iPIC3D-GPU will create folder (usually named data
) for the output results if it doesn't exist. However, it will delete everything in the folder if it already exits.
By default, the software is built with Release
build type, which means highly optimized by the compiler. If you'd like to debug, use:
cmake -DCMAKE_BUILD_TYPE=Debug ..
instead, and you'll have iPIC3D_d
. If you'd like to just have an unoptimized (slow) version:
cmake -DCMAKE_BUILD_TYPE=Default ..
In this iPIC3D-GPU, the Solver stays on the CPU side, which means the number of MPI process will not only affect the GPU but also the Solver's performance.
To speedup the CPU part, the OpenMP is enabled by default:
cmake .. # default
cmake -DUSE_OPENMP=OFF .. # if you'd like to disable OpenMP
# set OpenMP threads for each MPI process
export OMP_NUM_THREADS=4
The solver on CPU will be benefited from OpenMP now, and this option is ON by default. It's important to control the number of threads per MPI process, make sure it's in a reasonable range.
In benchmark folder, we prepared some scripts for profiling, please read the benchmark/readme for more infomation.
There's a performance baseline file for your reference, 2 threads per process is used:
You can find the corresponding data at ./share/benchmark/GH200_release_baseline.csv.
Feel free to contact Professor Stefano Markidis at KTH for using iPIC3D.