Skip to content
This repository has been archived by the owner on May 20, 2024. It is now read-only.

Evaluate Kokkos in LowFlow context #40

Open
ian-bertolacci opened this issue Jul 15, 2019 · 2 comments
Open

Evaluate Kokkos in LowFlow context #40

ian-bertolacci opened this issue Jul 15, 2019 · 2 comments
Assignees

Comments

@ian-bertolacci
Copy link

Implement LowFlow mini-app variant using Kokkos to learn the infrastructure and it's utility to the Parflow project.

@ian-bertolacci ian-bertolacci self-assigned this Jul 15, 2019
@ian-bertolacci
Copy link
Author

Initial thoughts on Kokkos.
(Disclosure: I do not have a CUDA LowFlow variant yet).
Implemented LowFlow using Serial, Threading (using std::thread) and OpenMP Kokkos backends.
Relatively simple.
Requires C++11 (importantly, lambda functions)

Difficulties:

  1. GPU code still (as far as I can currently tell) requires some management of data. It's relatively simple, but difficult to do automatically within macros.
  2. Template typed system makes development difficult (for me, at least).

@ian-bertolacci
Copy link
Author

Some more about the data management.
Below are two examples. First using View objects where the data is allocated on the device, accessed on the host using the view's HostMirror, and using a deep_copy from device to host. The second using a main DualView, unwrapping device and host views, and using the modify/sync commands.

#include <Kokkos_Core.hpp>
#include <Kokkos_DualView.hpp>

// Execution Policy
// Defines our Parallelism
typedef Kokkos::MDRangePolicy< Kokkos::Cuda, Kokkos::Rank< 3 > >
        Policy3D;

// View type for device
typedef Kokkos::View< double***, Kokkos::CudaSpace >
        device_view;

// View type for host
typedef device_view::HostMirror
        host_view;

int main (int argc, char* argv[]) {
  Kokkos::initialize(argc, argv);

  int nx = 2;
  int ny = 3;
  int nz = 4;

  // Create the execution policy (basically a domain)
  Policy3D domain(
    {{ 0,  0,  0}},
    {{nx, ny, nz}}
  );

  // Create main view
  device_view device_data( "data", nx, ny, nz );
  host_view host_data = create_mirror_view( device_data );
  
  // Execute on the device
  Kokkos::parallel_for( "test", domain,
    KOKKOS_LAMBDA (const int i, const int j, const int k ){
      device_data(i,j,k) = (double) i + j + k;
      printf( "on device data(%d,%d,%d) = %f\n", i, j, k, device_data(i,j,k) );
    }
  );

  // Copy data from device to host
  deep_copy(host_data, device_data);

  // Execute locally
  for( int i = 0; i < nx; ++i ){
    for( int j = 0; j < ny; ++j ){
      for( int k = 0; k < nz; ++k ){
        printf("on host data(%d,%d,%d) = %f\n", i, j, k, host_data(i,j,k) );
      }
    }
  }

  Kokkos::finalize();
}
#include <Kokkos_Core.hpp>
#include <Kokkos_DualView.hpp>

// Execution Policy
// Defines our Parallelism
typedef Kokkos::MDRangePolicy< Kokkos::Cuda, Kokkos::Rank< 3 > >
        Policy3D;

// DualView type
typedef Kokkos::DualView< double***, Kokkos::CudaSpace >
        dual_view;

// Unwrapped View type on device
typedef Kokkos::View< dual_view::scalar_array_type, dual_view::array_layout, dual_view::memory_space >
        device_view;

// Unwrapped View type on host
typedef Kokkos::View< dual_view::scalar_array_type, dual_view::array_layout, dual_view::host_mirror_space >
        host_view;

int main (int argc, char* argv[]) {
  Kokkos::initialize(argc, argv);

  int nx = 2;
  int ny = 3;
  int nz = 4;

  // Create the execution policy (basically a domain)
  Policy3D domain(
    {{ 0,  0,  0}},
    {{nx, ny, nz}}
  );

  // Create main view
  dual_view data( "data", nx, ny, nz );

  // Create view for accessing data on device
  device_view device_data = data.template view< dual_view::memory_space >();

  // Create View for accessing data on the host
  host_view host_data = data.template view< dual_view::host_mirror_space >();

  // Execute on the device
  Kokkos::parallel_for( "test", domain,
    KOKKOS_LAMBDA (const int i, const int j, const int k ){
      device_data(i,j,k) = (double) i + j + k;
      printf( "on device data(%d,%d,%d) = %f\n", i, j, k, device_data(i,j,k) );
    }
  );

  // Note the device view as modified
  data.modify< dual_view::memory_space >();
  // tell kokkos to that the host view needs to be syncronized
  data.sync< dual_view::host_mirror_space >();

  // Execute locally
  for( int i = 0; i < nx; ++i ){
    for( int j = 0; j < ny; ++j ){
      for( int k = 0; k < nz; ++k ){
        printf("on host data(%d,%d,%d) = %f\n", i, j, k, host_data(i,j,k) );
      }
    }
  }

  Kokkos::finalize();
}

The main issue is, just like in OpenCL, we are required to make calls to specific arrays, which requires some knowledge about which arrays are being used when. To do this in a pure macro fashion, we would need to have some input to a loop that lists the names of the arrays being accessed so that they can be copied or unwrapped.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant