Skip to content

KOKKOS Branch

Luke Shulenburger edited this page Jun 7, 2019 · 8 revisions

Build instructions

To build with Kokkos, first checkout github.com/kokkos/kokkos.git . Let's assume you've placed the top level Kokkos directory in ${KOKKOS_ROOT}. Then, navigate to the miniqmc/build directory. The following cmake command will build for a Power8 based CPU with OpenMP threading, assuming the GCC compiler is used:

> cmake -DQMC_USE_KOKKOS=1 \
      -DKOKKOS_PREFIX=${KOKKOS_ROOT} \
      -DKOKKOS_ARCH="Power8" \
      -DKOKKOS_ENABLE_OPENMP=true \
      -DKOKKOS_ENABLE_EXPLICIT_INSTANTIATION=false \
      -DCMAKE_CXX_FLAGS="-Drestrict=__restrict__ -D__forceinline=inline" .. 

The CMAKE_CXX_FLAGS are to deal with the handling of the "restrict" and "__forceinline" keywords that appear in miniqmc. Analogous flags for different compilers can be found in their reference.

For CUDA, assuming a Power8 host and a P100 Nvidia Card, we use the following:

> cmake -DQMC_USE_KOKKOS=1 \
      -DKOKKOS_PREFIX=${KOKKOS_ROOT} \
      -DKOKKOS_ENABLE_CUDA=true \
      -DKOKKOS_ENABLE_OPENMP=false \
      -DKOKKOS_ARCH="Power8;Pascal60" \
      -DKOKKOS_ENABLE_CUDA_UVM=true \
      -DKOKKOS_ENABLE_CUDA_LAMBDA=true \
      -DKOKKOS_ENABLE_EXPLICIT_INSTANTIATION=false \
      -DCMAKE_CXX_COMPILER=${KOKKOS_ROOT}/bin/nvcc_wrapper \
      -DCMAKE_CXX_FLAGS="-Drestrict=__restrict__ -D__forceinline=inline " .. 

KOKKOS_ENABLE_CUDA=true and KOKKOS_ENABLE_CUDA_UVM=true must be set. Notice also that the compiler CMAKE_CXX_COMPILER is hijacked by the Kokkos nvcc wrapper.

Runtime Instructions

OpenMP

It is recommended that the following run time variables be set:

  • export OMP_PROC_BIND=spread
  • export OMP_PLACES=threads
  • export OMP_NUM_THREADS=[put available/desired number of threads here]

Moreover, for asynchronous multi-walker moves using the partition_master construct, nested threading must be enabled. This is done with the following:

  • export OMP_NESTED=true
  • export OMP_NUM_THREADS=[comma separated list]

See https://gcc.gnu.org/onlinedocs/libgomp/OMP_005fNUM_005fTHREADS.html

Instructions for the new global_batched_kokkos branch

In order to build, you need to download and checkout the development branch of Kokkos and change to the global_batched_kokkos branch of miniqmc. For example:

> mkdir kokkos
> cd kokkos
> git clone https://github.com/kokkos/kokkos.git .
> git checkout develop
> cd ..
> mkdir miniqmc
> cd miniqmc
> git clone https://github.com/QMCPACK/miniqmc.git .
> git checkout global_batched_kokkos
> cd build

For a CPU build, first identify the architecture of the CPU you will use. On the left is the architecture name for the CPU type on the right. AMDAVX AMD CPU ARMv80 ARMv8.0 Compatible CPU ARMv81 ARMv8.1 Compatible CPU ARMv8-ThunderX ARMv8 Cavium ThunderX CPU BGQ IBM Blue Gene Q Power7 IBM POWER7 and POWER7+ CPUs Power8 IBM POWER8 CPUs Power9 IBM POWER9 CPUs WSM Intel Westmere CPUs SNB Intel Sandy/Ivy Bridge CPUs HSW Intel Haswell CPUs BDW Intel Broadwell Xeon E-class CPUs SKX Intel Sky Lake Xeon E-class HPC CPUs (AVX512) KNC Intel Knights Corner Xeon Phi KNL Intel Knights Landing Xeon Phi

Now plug this into a cmake command like so after setting KOKKOS_ROOT to the directory where you have KOKKOS.

cmake -DQMC_USE_KOKKOS=1 \
     -DQMC_MIXED_PRECISION=1 \
    -DKOKKOS_PREFIX=${KOKKOS_ROOT} \
    -DKOKKOS_ENABLE_CUDA=false \
    -DKOKKOS_ENABLE_OPENMP=true \
    -DKOKKOS_ARCH="SKX" \
    -DKOKKOS_ENABLE_EXPLICIT_INSTANTIATION=false \
    -DCMAKE_CXX_FLAGS="-Drestrict=__restrict__ -D__forceinline=inline" \
    -DCMAKE_CXX_COMPILER="icpc" \
    ..

Where SKX is replaced with the appropriate architecture from above. Now build with:

make miniqmc_sync_move_noref

The code can now be run as bin/miniqmc_sync_move_noref -g "2 1 1" -n 5 -r 0.99 -16.

For the GPU, things are similar, but the list of GPU architectures includes: Kepler30 NVIDIA Kepler generation CC 3.0 Kepler32 NVIDIA Kepler generation CC 3.2 Kepler35 NVIDIA Kepler generation CC 3.5 Kepler37 NVIDIA Kepler generation CC 3.7 Maxwell50 NVIDIA Maxwell generation CC 5.0 Maxwell52 NVIDIA Maxwell generation CC 5.2 Maxwell53 NVIDIA Maxwell generation CC 5.3 Pascal60 NVIDIA Pascal generation CC 6.0 Pascal61 NVIDIA Pascal generation CC 6.1 Volta70 NVIDIA Volta generation CC 7.0 Volta72 NVIDIA Volta generation CC 7.2

On a P8+ P100 system you use:

cmake -DQMC_USE_KOKKOS=1 \
      -DQMC_MIXED_PRECISION=1 \
      -DKOKKOS_PREFIX=/ascldap/users/lshulen/new-sandbox/kokkos \
      -DKOKKOS_ENABLE_CUDA=true \
      -DKOKKOS_ENABLE_OPENMP=false \
      -DKOKKOS_ARCH="Power8;Pascal60" \
      -DKOKKOS_ENABLE_CUDA_UVM=false \
      -DKOKKOS_ENABLE_CUDA_LAMBDA=true \
      -DKOKKOS_ENABLE_DEBUG=true \
      -DCMAKE_CXX_COMPILER=/ascldap/users/lshulen/sandbox/kokkos/bin/nvcc_wrapper \
      -DCMAKE_CXX_FLAGS="-G0 -g -Drestrict=__restrict__ -D__forceinline=inline " ..

The code is run in the same way from there.

Clone this wiki locally