Implement MPI functions for ArborX #53

cwpearson · 2024-05-10T19:14:02Z

probably skip these for a bare MPI wrapper

MPI_Comm_dup / MPI_Comm_free
MPI_Init
MPI_Finalize

aprokop · 2024-05-10T23:01:42Z

This sounds right. Would be nice to also do something about getting rank and size from a communicator, but that's not a big deal.

Another pretty annoying thing that we have to do is to avoid passing --help to non-zero ranks, as otherwise Kokkos would print help on all ranks. We do it like this:

  // Strip "--help" and "--kokkos-help" from the flags passed to Kokkos if we
  // are not on MPI rank 0 to prevent Kokkos from printing the help message
  // multiply.
  if (comm_rank != 0)
  {
    auto *help_it = std::find_if(argv, argv + argc, [](std::string const &x) {
      return x == "--help" || x == "--kokkos-help";
    });
    if (help_it != argv + argc)
    {
      std::swap(*help_it, *(argv + argc - 1));
      --argc;
    }
  }
  Kokkos::initialize(argc, argv);

aprokop · 2024-05-10T23:03:25Z

Another thing that could be worth trying to address at some point is to be able to run with GPU-aware MPI implementations, rather than doing everything on the host. We have a pathway in ArborX for it (even though we never saw it running faster, but it was like 4 years ago, maybe things are better now).

cwpearson · 2024-05-13T15:45:29Z

I expect GPU-aware MPI to be pretty high on the priority list once we get some of these initial functions implemented!

janciesko · 2024-05-13T20:08:04Z

GPU-aware MPI installations should be supported out of the box. Note that the communication itself is still host-triggered.

dutkalex · 2024-05-13T20:24:09Z

GPU-aware MPI installations should be supported out of the box. Note that the communication itself is still host-triggered.

GPU-aware communications work out of the box indeed, but a nice to have thing in my experience is a fallback mode which deep copies onto the host, sends and deep copies back upon reception, for non GPU-aware MPI installations (or badly installed supposedly GPU-aware installations which just crash in practice...). It can easily be implemented with a few #ifdefs, and it enables users to run on GPUs with a slight communication overhead instead of crashing with cryptic segfaults

dssgabriel · 2024-05-13T20:25:04Z

So how do we handle this?

Simply call the MPI function and pass the view's pointer to GPU memory (as we can with a GPU-aware MPI implementation)
Deep copy on a host view and initiate the communication from the host

dssgabriel · 2024-05-13T20:30:56Z

@dutkalex wrote:

It can easily be implemented with a few #ifdefs, and it enables users to run on GPUs in a slight communication overhead instead of crashing with cryptic segfaults

We should implement this with if constexpr and let users define a pre-processor variable that tells us if (a) they know their MPI implementation is GPU-aware and want the raw MPI call, or (b) they do not know if it is GPU-aware/want the view to be deep-copied onto the host and the communication is initiated from the host.

Users will always know what to expect, and it will let us remove the fallback mechanisms from the critical path once the code is compiled.

dutkalex · 2024-05-13T20:32:21Z

So how do we handle this?

To me, it's as simple as:

void send(device_view, ...){
#ifdef KOKKOSCOMM_GPU_AWARE
  send(device_view, ...);
#else
  auto mirror = Kokkos::create_mirror_view(device_view);
  Kokkos::deep_copy(mirror, device_view);
  send(mirror, ...);
#endif
}

void recv(device_view, ...){
#ifdef KOKKOSCOMM_GPU_AWARE
  recv(device_view, ...);
#else
  auto mirror = Kokkos::create_mirror_view(device_view);
  recv(mirror, ...);
  Kokkos::deep_copy(device_view, mirror);
#endif
}

You can replace the #ifdefs by if constexpr branches if you prefer but the result will be the same @dssgabriel

aprokop · 2024-05-13T20:44:08Z

I would argue that GPU-aware call should be a runtime and not compile time decision.

dutkalex · 2024-05-13T20:50:59Z

I would argue that GPU-aware call should be a runtime and not compile time decision.

Why would you want to have both compiled? When linking against an mpi implementation, gpu-aware communications either work or they don't...
I guess however that having this be a runtime switch wouldn't incur any measurable overhead, so in that regard why not, but I don't see a use case for this... Maybe to measure the overhead of this degraded mode? 🤔

aprokop · 2024-05-13T21:30:47Z

A user may want to dispatch the kernel on the host or the device depending on the size of the data, or the MPI call. I think if a view is on the host, it would make sense to use host MPI routines, while if it's on the device, GPU-aware routines if available, and transfer to host if not.

dutkalex · 2024-05-13T21:45:18Z

I agree, but the compile time solution already enables this: if the view is on the host you get a regular mpi host call regardless of the branch, because the pointer passed to mpi is a host pointer and because the deep copy becomes a no-op

cedricchevalier19 · 2024-05-27T21:40:33Z

So how do we handle this?

To me, it's as simple as:

...

It is more complicated for asynchronous communications, and I am slightly annoyed by the silent extra allocation.
However, we should handle this case. This scenario is one of the motivating examples for this project.

dutkalex · 2024-05-27T22:27:18Z

It is more complicated for asynchronous communications

Agreed. Async semantics require hooking the recv deep_copy to the wait() call and making sure that the host views' lifetimes are managed properly. Nothing undoable, but it's a little bit more work.

and I am slightly annoyed by the silent extra allocation.

Agreed, this is suboptimal. I don't see an obvious solution to this problem, but I bet we can come up with something better. Or at the very least more transparent to the user...

aprokop · 2024-06-11T18:58:09Z

I have an ArborX branch for kokkos-comm. Currently:

can't test it yet because of missing in-place allgather
Couple using KokkosComm::impl as irecv and alltoall are not in KokkosComm
This also require manual inclusion of corresponding header (also, Add irecv wrapper in KokkosComm namespace #76)
Support for custom data types
Treat subviews correctly #77

cwpearson · 2024-06-12T18:33:47Z

How do you want the custom datatypes to look? Is it a Kokkos::View of things that are described by custom datatypes?

aprokop · 2024-06-12T19:30:55Z

How do you want the custom datatypes to look? Is it a Kokkos::View of things that are described by custom datatypes?

It is Kokkos::View<Box*>, where Box is a struct of pods.

cwpearson · 2024-06-12T19:50:54Z

Do you ship the whole Box or just some of the fields?

aprokop · 2024-06-12T19:54:22Z

The whole thing. Right now, just through n * sizeof(Box) and MPI_BYTE.

aprokop mentioned this issue Jun 11, 2024

Add irecv wrapper in KokkosComm namespace #76

Merged

cedricchevalier19 mentioned this issue Jun 12, 2024

Provide a KokkosCommInitialize #82

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement MPI functions for ArborX #53

Implement MPI functions for ArborX #53

cwpearson commented May 10, 2024 •

edited

Loading

aprokop commented May 10, 2024

aprokop commented May 10, 2024

cwpearson commented May 13, 2024

janciesko commented May 13, 2024

dutkalex commented May 13, 2024 •

edited

Loading

dssgabriel commented May 13, 2024

dssgabriel commented May 13, 2024 •

edited

Loading

dutkalex commented May 13, 2024 •

edited

Loading

aprokop commented May 13, 2024

dutkalex commented May 13, 2024

aprokop commented May 13, 2024

dutkalex commented May 13, 2024 •

edited

Loading

cedricchevalier19 commented May 27, 2024

dutkalex commented May 27, 2024 •

edited

Loading

aprokop commented Jun 11, 2024 •

edited

Loading

cwpearson commented Jun 12, 2024

aprokop commented Jun 12, 2024

cwpearson commented Jun 12, 2024

aprokop commented Jun 12, 2024 •

edited

Loading

Implement MPI functions for ArborX #53

Implement MPI functions for ArborX #53

Comments

cwpearson commented May 10, 2024 • edited Loading

aprokop commented May 10, 2024

aprokop commented May 10, 2024

cwpearson commented May 13, 2024

janciesko commented May 13, 2024

dutkalex commented May 13, 2024 • edited Loading

dssgabriel commented May 13, 2024

dssgabriel commented May 13, 2024 • edited Loading

dutkalex commented May 13, 2024 • edited Loading

aprokop commented May 13, 2024

dutkalex commented May 13, 2024

aprokop commented May 13, 2024

dutkalex commented May 13, 2024 • edited Loading

cedricchevalier19 commented May 27, 2024

dutkalex commented May 27, 2024 • edited Loading

aprokop commented Jun 11, 2024 • edited Loading

cwpearson commented Jun 12, 2024

aprokop commented Jun 12, 2024

cwpearson commented Jun 12, 2024

aprokop commented Jun 12, 2024 • edited Loading

cwpearson commented May 10, 2024 •

edited

Loading

dutkalex commented May 13, 2024 •

edited

Loading

dssgabriel commented May 13, 2024 •

edited

Loading

dutkalex commented May 13, 2024 •

edited

Loading

dutkalex commented May 13, 2024 •

edited

Loading

dutkalex commented May 27, 2024 •

edited

Loading

aprokop commented Jun 11, 2024 •

edited

Loading

aprokop commented Jun 12, 2024 •

edited

Loading