Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect events using MPI: only gets registered to the first rank #21

Open
jofrevalles opened this issue Oct 30, 2024 · 8 comments
Open

Comments

@jofrevalles
Copy link
Member

Hey!
I have been using Extrae.jl with MPI.jl, and I can't manage to emit some events for different ranks. For example I ran the following code using mpiexec with 10 workers:

using MPI
using Extrae

MPI.Init()
Extrae.init()

comm = MPI.COMM_WORLD
mpi_size = MPI.Comm_size(comm)
mpi_rank = MPI.Comm_rank(comm)

Extrae.emit(UInt32(80_000), UInt64(1))
println("Hello from user function")
sleep(1 + mpi_rank)
Extrae.emit(UInt32(80_000), UInt64(0))

Extrae.finish()
MPI.Finalize()

I would expect that with this simple code I can emit the event 80_000 on each rank, but when I use paraver to view the trace I only see that the event is emitted on the first rank:
image

I tried with the macro Extrae.@user_function and with Extrae.FFI.MPItrace_user_function and also only managed to get the trace over the first rank.

@jofrevalles
Copy link
Member Author

jofrevalles commented Oct 31, 2024

Update:
I have been looking at the extrae api for python (pyextrae), and they have some differences between your implementation. Importantly, I have seen they have defined:

def event(type, value):
  if (Extrae):
    Extrae[os.getpid()].Extrae_event(type, value)

Which for me this is relevant since I guess it captures the process with os.getpid(), which your emit implementation may not be doing. Also, they have for mpi the following lines:

from pyextrae.common.extrae import *

TracingLibrary = "libmpitrace.so"

startTracing( TracingLibrary )

where startTracing is defined here.

I don't know if you should link this "libmpitrace.so", but it surely looks like it is needed here. @giordano What do you think? Have you used Extrae.jl with mpi or am I the first one here? :)

Thanks!
Jofre

@giordano
Copy link
Collaborator

giordano commented Oct 31, 2024

Have you used Extrae.jl with mpi

I haven't used it with MPI myself 🙂 I know Sergio had little bit more success by LD_PRELOADing the tracing library, but that's more annoying to set up because you have to do it manually (the other day we were discussing with Sergio how to slightly simplify this for the users, but I haven't had the time to look into it yet).

@jofrevalles
Copy link
Member Author

@mofeing I don't know how to do this LD_PRELOAD in our code, but for me it is still strange that the event is not registred. I will do some tests but my guess is that there is something that pyextrae has that we need here.

@giordano
Copy link
Collaborator

If you can figure out how to make this work without doing the preload dance, that'd amazing! It's totally possible we're missing something here 😉

@mofeing
Copy link
Member

mofeing commented Oct 31, 2024

@mofeing I don't know how to do this LD_PRELOAD in our code, but for me it is still strange that the event is not registred. I will do some tests but my guess is that there is something that pyextrae has that we need here.

You're already doing the preload when you configure the "preloads" in the LocalPreferences.toml in MPIPreferences

@clasqui
Copy link
Contributor

clasqui commented Nov 6, 2024

Hey Jofre @jofrevalles if you can provide me reproducibility steps (like detailed steps, it's been a long time since I worked with Julia) I can try to help you!

@mofeing
Copy link
Member

mofeing commented Nov 6, 2024

@clasqui one question: do you need to call extrae_set_tracing_task (or similar) before emitting a custom event? I believe the problem is that all are being emitted as rank 0

@jofrevalles
Copy link
Member Author

jofrevalles commented Nov 7, 2024

Bon dia, Marc @clasqui !!

I just ran the following code using mpirun -n 4 julia --project=. julia_example.jl:

using MPI
using Extrae

MPI.Init()
Extrae.init()

comm = MPI.COMM_WORLD
mpi_size = MPI.Comm_size(comm)
mpi_rank = MPI.Comm_rank(comm)

Extrae.emit(UInt32(80_000), UInt64(1))
println("Hello from user function")
sleep(1 + mpi_rank)
Extrae.emit(UInt32(80_000), UInt64(0))

Extrae.finish()
MPI.Finalize()

However, you first have to setup your project using that in the terminal (and having internet connection). Run julia --project=., and it will open the REPL, and then:

using Pkg
Pkg.add("MPI")
Pkg.add("MPIPreferences")
Pkg.add("Extrae")

using MPIPreferences
MPIPreferences.use_system_binary()

using Extrae
Extrae.use_system_binary(; library_names=["libmpitrace"], extra_paths=["/apps/GPP/BSCTOOLS/extrae/4.2.3/impi_2021_10_0/lib"], export_prefs=true)
Pkg.build("Extrae")

You can then step outside the REPL and you will see that a file LocalPreferences.toml has been created. You have to open it and modify the line preloads = [] to:

preloads = ["/apps/GPP/BSCTOOLS/extrae/4.2.3/impi_2021_10_0/lib/libmpitrace.so"]

(or wherever you have that installed)

Then I put in the slurm submit file:

export EXTRAE_ON=1
export EXTRAE_SKIP_AUTO_LIBRARY_INITIALIZE=1
export EXTRAE_CONFIG_FILE=<PATH_TO_EXTRAE.XML_FILE>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants