Skip to content

June 17 2024

Carl Pearson edited this page Jun 17, 2024 · 2 revisions

Attds: Carl, Nicole, Evan, Damien, Gabriel, Junchao, Stephen, Jan, Joseph Schuchart

Note-taker

Carl

General Topics

  • https://github.com/kokkos/kokkos-comm/pull/81

  • The name

  • Init / finalize (Damien, Joseph, Gabriel)

    • why not just say "only valid when MPI & Kokkos are valid"
      • Damien: can't really take init back once you give it. May make sense to wait and see if users find it too hard to manage the initialization themselves.
      • Jan: it does "leak" some implementation details
        • Carl: we have to support interop anyway, which implies leaking
      • automatically handle multiple "backends"
        • Damien: application will know if they configured with NCCL enabled, and then they will need to init NCCL
      • opportunity to manage our own communication handles (e.g. ignore ordering semantics for tags), don't want to change settings on a communicator someone hands to us
        • need a defined point in time when this happens
        • if we have our own communicator, users can't provide us custom communicators
    • Damien: suggests that our init should just always call MPI_Init and Kokkos::init, not handle any hybrid cases, which complicate things. Hard or impossible to tell whether Kokkos was previously initialized with other arguments, reconcile different initializations. We should require the user to handle these things as it may be application-specific what needs to happen.
    • Carl/Damien/Joseph: maybe just do nothing until we get some experience with a partial NCCL backend and see what we need
    • Gabriel: Without sessions, the init/finalize PR I wrote may not be very interesting, and we can table for now

Working Groups

  • Kokkos Comm
    • Gabriel, Nicole, Evan looking at NCCL
    • CI build for nvcc, Carl will work on unit tests at Sandia
    • Nicole working on duplicating OSU microbenchmarks
  • Application Usecases
    • Andrey Prokopenko (ORNL) is working on an ArborX fork
    • may want to maintain our own copies of some mini/proxy apps where we are interested but don't have any mantainer buy-in
  • Modern C++ / MPI
  • Accelerator-Initiated Communication / Support
  • Smart NICs

Notes

  • Gabriel

    • docs for MPI init snippet we expect
    • looking at NCCL
    • fix skipped rsend tests (when we lacked irecv)
      • may need the higher-level irecv
    • Hosting Kokkos workshop, will probably discuss naming w/Cedric this week. Christian will be there next week
    • will nudge CEA people to make their naming arguments once more
  • Junchao

    • Clear examples in the docs, especially once we're talking about MPI + NCCL
      • stream-triggered in NCCL looks promising
        • Carl: this we want to support
      • neighborhood communication may be interesting as well
  • Jan

    • exa mini MD port for remote-spaces, may be useful comparison point for kokkos-comm
    • Andrey's PR about mpi_datatype fallback to mpi_byte