Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation Faults Due to Exceeding Libfabric Tx/Rx Context Limits in SST RDMA Transport #4329

Open
abhishek1297 opened this issue Aug 29, 2024 · 5 comments

Comments

@abhishek1297
Copy link

I am running into segmentation faults (SIGSEGV), most likely due to running out of libfabric contexts with SST RDMA transport. I am running a simple reader-writer scenario from your examples where I submit 80 writers ( 2 nodes) and 1 reader (1 node). Reader runs on a separate node. All jobs are running a single MPI rank.

After increasing the log level for libfabric. This is what I see,

libfabric:1035878:1724682976:psm2:core:psmx2_trx_ctxt_alloc():273<warn> number of Tx/Rx contexts exceeds limit (40).
DP Reader 0 (0x55a3b5926330): opening endpoint failed with -22 (Unknown error -22). This is fatal.
libfabric:1035878:1724682976:psm2:core:psmx2_progress_func():110<info> 
[r1i0n23:1035878] *** Process received signal ***
[r1i0n23:1035878] Signal: Segmentation fault (11)
[r1i0n23:1035878] Signal code: Address not mapped (1)

If I run 40 clients, the scripts work fine. But, beyond 40, seg faults occur. Given the context related warning, it is probably that we run out of resources. But, looking at fabric's pages, I see that there is a way to share contexts with multiple endpoints, but not sure how to proceed.

Is there any workaround for this situation ?

Specs

  • Adios 2.10.0
  • Libfabric 1.14.1
  • OpenMPI 4.1.5

Node details

  • 2 Intel Cascade Lake 6248 processors (20 cores at 2.5 GHz), or 40 cores per node
  • 192 GB of memory per node
@eisenhauer
Copy link
Member

Hi, and sorry for the delay in responding. Can you tell me what machine you're running on here? I don't know of a workaround for this in libfabric now (probably it would require us to be able to reproduce and try to find a fix), but you might try using the mpi data plane if its available. (MPI has some significant disadvantages when used in SST, but it does have the advantage that on HPC resources it has probably seen a lot of machine-specific optimization...)

@abhishek1297
Copy link
Author

Hi,

I am running on jean-zay supercomputer with these specs

I tried MPI option with SST but it always reverts back to RDMA and the same segmentation fault.

DP Writer 0 (0x560069d1c960): Prefered dataplane name is "mpi"
DP Writer 0 (0x560069d1c960): Considering DataPlane "evpath" for possible use, priority is 1
DP Writer 0 (0x560069d1c960): Considering DataPlane "rdma" for possible use, priority is 10
Warning:  Preferred DataPlane "mpi" not found.
DP Writer 0 (0x560069d1c960): Selecting DataPlane "rdma", priority 10 for use
DP Writer 0 (0x560069d1c960): seeing candidate fabric psm2, will use this unless we see something better.

Should Adios be compiled without the libfabrics so that it doesn't fall back to RDMA?

The MPI available on the cluster does not use libfabrics (--without-ofi) but compiled with OmniPath (--with-psm2).

@eisenhauer
Copy link
Member

From the log, the MPI data plane isn't available in this compilation, likely because CMake decided that it wasn't likely to work. You can force it to be included by including "-DADIOS2_HAVE_MPI_CLIENT_SERVER=true". This is probably a worthwhile thing to try because sometimes our test for that is too conservative. Compiling without libfabric won't affect this, and it doesn't matter if MPI uses libfabric or not.

@abhishek1297
Copy link
Author

Hi, Sorry, I was on vacation. Okay. Thanks for sharing this. I will to recompile. But, it seems using adios with Omnipath has been difficult to say the least. There is always some segfault popping up.

@eisenhauer
Copy link
Member

eisenhauer commented Sep 18, 2024

But, it seems using adios with Omnipath has been difficult to say the least. There is always some segfault popping up.

If libfabric fulfilled it's promise of being a nice universal interface to all the RDMA, life would be easier. Unfortunately it falls far short, requiring code to be customized for each variation. Makes things difficult on the more research end of things like data streaming.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants