Skip to content

the GPU version of iPIC3D supporting multi-node multi-GPU simulations

License

Notifications You must be signed in to change notification settings

iPIC3D/iPIC3D-GPU

Repository files navigation

iPIC3D-GPU

iPIC3D with GPU acceleration, supporting multi-node multi-GPU.

          ,-.----.                           .--,-``-.                   
          \    /  \      ,---,   ,----..    /   /     '.       ,---,     
  ,--,    |   :    \  ,`--.' |  /   /   \  / ../        ;    .'  .' `\   
,--.'|    |   |  .\ : |   :  : |   :     : \ ``\  .`-    ' ,---.'     \  
|  |,     .   :  |: | :   |  ' .   |  ;. /  \___\/   \   : |   |  .`\  | 
`--'_     |   |   \ : |   :  | .   ; /--`        \   :   | :   : |  '  | 
,' ,'|    |   : .   / '   '  ; ;   | ;           /  /   /  |   ' '  ;  : 
'  | |    ;   | |`-'  |   |  | |   : |           \  \   \  '   | ;  .  | 
|  | :    |   | ;     '   :  ; .   | '___    ___ /   :   | |   | :  |  ' 
'  : |__  :   ' |     |   |  ' '   ; : .'|  /   /\   /   : '   : | /  ;  
|  | '.'| :   : :     '   :  | '   | '/  : / ,,/  ',-    . |   | '` ,/   
;  :    ; |   | :     ;   |.'  |   :    /  \ ''\        ;  ;   :  .'     
|  ,   /  `---'.|     '---'     \   \ .'    \   \     .'   |   ,.'       
 ---`-'     `---`                `---`       `--`-,,-'     '---'         
                                                                         

Citation

Markidis, Stefano, and Giovanni Lapenta. "Multi-scale simulations of plasma with iPIC3D." Mathematics and Computers in Simulation 80.7 (2010): 1509-1519.

Usage

Requirement

To install and run iPIC3D-GPU, you need:

  • CUDA/HIP compatible hardware, CUDA capabiliy 7.5 or higher
  • cmake, MPI(MPICH and OpenMPI are tested) and HDF5 (optional), C/C++ compiler supporting C++ 17 standard

To meet the requirements of compatability between CUDA and compiler, it's recommended to use a relatively new compiler version e.g. GCC 12

If you are on a super-computer or cluster, it's highly possible that you can use tools like module to change the compiler, MPI or libraries used.

Get the code

Git clone this repository or download the zip file of default branch. For example:

git clone https://github.com/iPIC3D/iPIC3D-GPU.git iPIC3D-GPU
cd ./iPIC3D-GPU

Now you are in the project folder

Build

  • Create a build directory
mkdir build && cd build
  • Use CMake to generate the make files
# use .. here because CMakeList.txt would be under project root 
cmake .. # using CUDA by default
cmake -DHIP_ON=ON .. # use HIP

If you's like to use HIP, notice the GPU architecture in CMakeLists.txt, change it according to your hardware to get bet performance:

set_property(TARGET iPIC3Dlib PROPERTY HIP_ARCHITECTURES gfx90a) 
  • Compile with make if successful, you will find an executable named iPIC3D in build directory
make # you can use this single-threaded compile command, but slow
make -j4 # build with 4 threads
make -j # build with max threads, fast, recommended

Run

iPIC3D uses inputfiles to control the simulation, we pass this text file as the only command line argument:

export OMP_NUM_THREADS=2
mpirun -np 8 ./iPIC3D ../share/inputfiles/magneticReconnection/testGEM3Dsmall.inp

With this command, you are using 8 MPI ranks, 2 OpenMP threads per rank.

Important: make sure number of MPI process = XLEN x YLEN x ZLEN as specified in the input file.

Critical: OpenMP is enabled by default, make sure the number of thread every process is reasonable. Refer to OpenMP for more details.

If you are on a super-computer, especially a multi-node system, it's likely that you should use srun to launch the program.

Multi-node and Multi-GPU

Assigning MPI processes to nodes and GPUs are vital in performance, for it decides the pipeline and subdomains in the program.

It's fine to use more than 1 MPI process per GPU. The following example uses 4 nodes, each equipped with 4 GPU:

# 1 MPI process per GPU
srun --nodes=4 --ntasks=16 --ntasks-per-node=4 ./iPIC3D ../share/benchmark/GEM3Dsmall_4x2x2_100/testGEM3Dsmall.inp 

# 2 MPI processes per GPU
srun --nodes=4 --ntasks=32 --ntasks-per-node=8 ./iPIC3D ../share/benchmark/GEM3Dsmall_4x4x2_100/testGEM3Dsmall.inp  

Result

This iPIC3D-GPU will create folder (usually named data) for the output results if it doesn't exist. However, it will delete everything in the folder if it already exits.

Build Options

Debug

By default, the software is built with Release build type, which means highly optimized by the compiler. If you'd like to debug, use:

cmake -DCMAKE_BUILD_TYPE=Debug ..

instead, and you'll have iPIC3D_d. If you'd like to just have an unoptimized (slow) version:

cmake -DCMAKE_BUILD_TYPE=Default ..

OpenMP

In this iPIC3D-GPU, the Solver stays on the CPU side, which means the number of MPI process will not only affect the GPU but also the Solver's performance.

To speedup the CPU part, the OpenMP is enabled by default:

cmake .. # default

cmake -DUSE_OPENMP=OFF .. # if you'd like to disable OpenMP

# set OpenMP threads for each MPI process
export OMP_NUM_THREADS=4

The solver on CPU will be benefited from OpenMP now, and this option is ON by default. It's important to control the number of threads per MPI process, make sure it's in a reasonable range.

Tool

Benchmark

In benchmark folder, we prepared some scripts for profiling, please read the benchmark/readme for more infomation.

There's a performance baseline file for your reference, 2 threads per process is used:

GH200

You can find the corresponding data at ./share/benchmark/GH200_release_baseline.csv.

Contact

Feel free to contact Professor Stefano Markidis at KTH for using iPIC3D.

About

the GPU version of iPIC3D supporting multi-node multi-GPU simulations

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •