Support MPI #752

mofeing · 2025-02-15T18:52:47Z

This PR...

Registers MPI routine symbol address when MPI.jl gets loaded
Specializes MPI.jl methods to be traced by Reactant

unresolved questions

how can we represent MPI_Request with tensor and stablehlo types?
mmm stablehlo.custom_call has a backend attribute that could be useful during lowering; e.g. if we want to lower to NCCL instead of MPI, since both have a similar API, we could potentially add our own custom c-functions that use NCCL but adapt them to MPI-like API
@wsmoses can we create @cfunctions in Julia and pass them to the symbol table? some MPI routines might need a lil bit of adaption and writing them in Julia would be easier, faster (and also, would use the correct symbols from MPI.jl-loaded libmpi)

wsmoses · 2025-02-15T23:05:50Z

you won't, instead you'll emit something like


function send_wrap(%arg : memref<axb>) {
    mpi.send %arg
}

function main() {
    ...
    enzymexla.jit_call @set_wrap(%x : tensor<...>)
}

And then lower-jit will convert into a custom call. however you will need to define a lowering of mpi.send into a corresponding MPI_Send call [which will use the symbol you just registered here]

Re CUDA though we also need to ensure we are sync'd wrt the current custream which you can get via enzymexla.get_stream

ext/ReactantMPIExt/Overrides.jl

Co-authored-by: Paul Berg <[email protected]>

mofeing · 2025-02-16T11:19:50Z

mmm from our last discussion on this a couple of weeks ago, i understood that we would emit this

function main() {
    ...
    mpi.send(%arg0, ...)
    ...
}

and it would get lowered to

function send_wrap(%arg : memref<axb>) {
    llvm.call <0xffff> (%arg)
}

function main() {
    ...
    enzymexla.jit_call @send_wrap(%x : tensor<...>)
    ...
}

which will finally lower to the following with the enzymexla.jit pass

function main() {
    ...
    stablehlo.custom_call @mpi_send_wrap(%x : tensor<...>)
    ...
}

is this correct or do we need to emit the enzymexla.jit_call directly from Reactant?

ahh or do you mean that any wrapping we need to do around MPI should be done in this way?

Re CUDA though we also need to ensure we are sync'd wrt the current custream which you can get via enzymexla.get_stream

okay, this will probably be required for NCCL

ext/ReactantMPIExt/Overrides.jl

Register MPI symbols on load

74cf16e

ops

ab2394b

Pangoraw reviewed Feb 16, 2025

View reviewed changes

ext/ReactantMPIExt/Overrides.jl Outdated Show resolved Hide resolved

Update ext/ReactantMPIExt/Overrides.jl

32985a0

Co-authored-by: Paul Berg <[email protected]>

add MPI.jl dep

740cb36

avik-pal reviewed Feb 22, 2025

View reviewed changes

ext/ReactantMPIExt/Overrides.jl Outdated Show resolved Hide resolved

mofeing commented Feb 22, 2025

View reviewed changes

ext/ReactantMPIExt/Overrides.jl Outdated Show resolved Hide resolved

Update ext/ReactantMPIExt/Overrides.jl

b42cc8f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support MPI #752

Support MPI #752

mofeing commented Feb 15, 2025 •

edited

Loading

wsmoses commented Feb 15, 2025

mofeing commented Feb 16, 2025 •

edited

Loading

Support MPI #752

Are you sure you want to change the base?

Support MPI #752

Conversation

mofeing commented Feb 15, 2025 • edited Loading

unresolved questions

wsmoses commented Feb 15, 2025

mofeing commented Feb 16, 2025 • edited Loading

mofeing commented Feb 15, 2025 •

edited

Loading

mofeing commented Feb 16, 2025 •

edited

Loading