forked from QMCPACK/qmcpack
-
Notifications
You must be signed in to change notification settings - Fork 2
OpenMP offload
Ye Luo edited this page Jan 23, 2020
·
42 revisions
To enable OpenMP offload to GPUs in QMCPACK, use the following cmake flag.
-DENABLE_OFFLOAD=1
In conjunction with CUDA math libraries, add the following cmake flag.
-DENABLE_CUDA=1 # This is not the QMC_CUDA flag for the CUDA kernels.
XL 16.1.1 Linux version is not fully C++14 compliant but enough for the current QMCPACK needs. Use -qxflag=disable__cplusplusOverride
to override C++ macro and use C++14 features.
Use the following cmake line on Summit P9+V100
cmake -DCMAKE_C_COMPILER=mpixlc -DCMAKE_CXX_COMPILER=mpixlC \
-DENABLE_OFFLOAD=1 -DENABLE_CUDA=1 \
-DCMAKE_CXX_FLAGS="-qxflag=disable__cplusplusOverride -isystem /sw/summit/gcc/6.4.0/include/c++/6.4.0/powerpc64le-none-linux-gnu -qgcc_cpp_stdinc=/sw/summit/gcc/6.4.0/include/c++/6.4.0" \
-DCMAKE_CXX_STANDARD_LIBRARIES=/sw/summit/gcc/6.4.0/lib64/libstdc++.a \
..
Although LLVM Clang compiler supports OpenMP offload. There are few outstanding bugs causing it not being able to compile and run QMCPACK. Known issues:
- Only support CUDA 10.0 and below. https://bugs.llvm.org/show_bug.cgi?id=44587
- cmath/math.h header file conflict affecting x86. https://bugs.llvm.org/show_bug.cgi?id=42061, https://bugs.llvm.org/show_bug.cgi?id=42798, https://bugs.llvm.org/show_bug.cgi?id=42799
- Static linking fat binary is still broken and causes runtime error. https://bugs.llvm.org/show_bug.cgi?id=42395 and https://bugs.llvm.org/show_bug.cgi?id=38703
- The offload library is single threaded and uses the default stream CUDA stream which constrains performance. http://lists.llvm.org/pipermail/openmp-dev/2019-December/002986.html
- (only checked with Clang8, not recently due to 1,2,3 issues) when OpenMP offload and CUDA are both enabled with the Clang compiler, there is some CUDA execution failure.
Clang derived Cray compilers 9.0 can compile but cannot link QMCPACK.
cmake -DCMAKE_C_COMPILER=cc -DCMAKE_CXX_COMPILER=CC \
-DENABLE_OFFLOAD=1 -DENABLE_CUDA=1 \
-DQMC_MIXED_PRECISION=1 -DCUDA_ARCH=sm_70 \
-DCUDA_HOST_COMPILER=`which gcc` -DENABLE_TIMERS=1 ..
Known issues:
- Clang issue #2 affects Cray 9.1 and 10.
- Fat binary linker error modf, sincos, sincosf with Cray 9.0
@E@nvlink error : Undefined reference to 'modf' in '/tmp/cooltmp-fed625/tmp_cce_omp_offload_linkerlibqmcwfs.a__SplineC2ROMP.cpp.o__sec.cubin'
- Only default stream is used in Cray 9.0 OpenMP runtime library.
Using AOMP compiler. Verified with 0.7-6 release and Radeon VII.
cmake -D CMAKE_C_COMPILER=/usr/lib/aomp/bin/clang -D CMAKE_CXX_COMPILER=/usr/lib/aomp/bin/clang++ \
-D CMAKE_C_FLAGS="-march=native"
-D CMAKE_CXX_FLAGS="-march=native -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx906" \
-D OFFLOAD_TARGET="amdgcn-amd-amdhsa" \
-D CMAKE_FIND_ROOT_PATH=/opt/math-libraries/OpenBLAS/current \
-D QMC_MPI=0 -D ENABLE_OFFLOAD=1 ..
- Due to Clang issue 4, libomptarget is only safe to work with 1 thread. AOMP supports multiple GPU queues and the data race in libomptarget causes multi-threaded run to fail. https://github.com/ROCm-Developer-Tools/aomp/issues/23
- Excessive use of register reduces performance https://github.com/ROCm-Developer-Tools/aomp/issues/24