OpenMP offload

To enable OpenMP offload to GPUs in QMCPACK, use the following cmake flag.

-DENABLE_OFFLOAD=1

Nvidia GPU

In conjunction with CUDA math libraries, add the following cmake flag.

-DENABLE_CUDA=1 # This is not the QMC_CUDA flag for the CUDA kernels.

XL

XL 16.1.1 Linux version is not fully C++14 compliant but enough for the current QMCPACK needs. Use -qxflag=disable__cplusplusOverride to override C++ macro and use C++14 features. Use the following cmake line on Summit P9+V100

cmake -DCMAKE_C_COMPILER=mpixlc -DCMAKE_CXX_COMPILER=mpixlC \
      -DENABLE_OFFLOAD=1 -DENABLE_CUDA=1 \
      -DCMAKE_CXX_FLAGS="-qxflag=disable__cplusplusOverride -isystem /sw/summit/gcc/6.4.0/include/c++/6.4.0/powerpc64le-none-linux-gnu -qgcc_cpp_stdinc=/sw/summit/gcc/6.4.0/include/c++/6.4.0" \
      -DCMAKE_CXX_STANDARD_LIBRARIES=/sw/summit/gcc/6.4.0/lib64/libstdc++.a \
      ..

Clang

Although LLVM Clang compiler supports OpenMP offload. There are few outstanding bugs causing it not being able to compile and run QMCPACK. Known issues:

Only support CUDA 10.0 and below. https://bugs.llvm.org/show_bug.cgi?id=44587
cmath/math.h header file conflict affecting x86. https://bugs.llvm.org/show_bug.cgi?id=42061, https://bugs.llvm.org/show_bug.cgi?id=42798, https://bugs.llvm.org/show_bug.cgi?id=42799
Static linking fat binary is still broken and causes runtime error. https://bugs.llvm.org/show_bug.cgi?id=42395 and https://bugs.llvm.org/show_bug.cgi?id=38703
The offload library is single threaded and uses the default stream CUDA stream which constrains performance. http://lists.llvm.org/pipermail/openmp-dev/2019-December/002986.html
(only checked with Clang8, not recently due to 1,2,3 issues) when OpenMP offload and CUDA are both enabled with the Clang compiler, there is some CUDA execution failure.

Cray

Clang derived Cray compilers 9.0 can compile but cannot link QMCPACK.

cmake -DCMAKE_C_COMPILER=cc -DCMAKE_CXX_COMPILER=CC \
      -DENABLE_OFFLOAD=1 -DENABLE_CUDA=1 \
      -DQMC_MIXED_PRECISION=1 -DCUDA_ARCH=sm_70 \
      -DCUDA_HOST_COMPILER=`which gcc` -DENABLE_TIMERS=1 ..

Known issues:

Clang issue #2 affects Cray 9.1 and 10.
Fat binary linker error modf, sincos, sincosf with Cray 9.0

@E@nvlink error   : Undefined reference to 'modf' in '/tmp/cooltmp-fed625/tmp_cce_omp_offload_linkerlibqmcwfs.a__SplineC2ROMP.cpp.o__sec.cubin'

Only default stream is used in Cray 9.0 OpenMP runtime library.

AMD GPU

AOMP

Using AOMP compiler. Verified with 0.7-6 release and Radeon VII.

cmake -D CMAKE_C_COMPILER=/usr/lib/aomp/bin/clang  -D CMAKE_CXX_COMPILER=/usr/lib/aomp/bin/clang++ \
      -D CMAKE_C_FLAGS="-march=native"
      -D CMAKE_CXX_FLAGS="-march=native -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx906" \
      -D OFFLOAD_TARGET="amdgcn-amd-amdhsa" \
      -D CMAKE_FIND_ROOT_PATH=/opt/math-libraries/OpenBLAS/current \
      -D QMC_MPI=0 -D ENABLE_OFFLOAD=1 ..

Due to Clang issue 4, libomptarget is only safe to work with 1 thread. AOMP supports multiple GPU queues and the data race in libomptarget causes multi-threaded run to fail. https://github.com/ROCm-Developer-Tools/aomp/issues/23
Excessive use of register reduces performance https://github.com/ROCm-Developer-Tools/aomp/issues/24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly