Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problem linking mpi memory routines from C++ #12247

Open
floquet-cxx opened this issue Jan 19, 2024 · 25 comments
Open

problem linking mpi memory routines from C++ #12247

floquet-cxx opened this issue Jan 19, 2024 · 25 comments
Milestone

Comments

@floquet-cxx
Copy link

Thank you for taking the time to submit an issue!

Background information

I am compiling an application (my own) which has C++ as the top level but also uses some C and F77 libraries. The build system is cmake on OSX via macports. The application is about 2 decades old now, and has successfully used various versions of MPI-1 for most of that time. Recently I added some new MPI memory and IO routines to do with Cartesian communicators and parallel IO, and have created a small set of linkage problems if using openmpi (they can however be linked successfully if I use mpich). The problem has persisted across two OS X versions, so I suspect the problem lies in openmpi code base. It has the flavour of a misplaced "extern" declaration in a header file.

What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)

4.1.6

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

Installed via macports. (Of course this leaves open the possibility that the problem lies there...)

sudo port -N install  openmpi +gfortran
sudo port select --set mpi openmpi-mp-fortran

If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.

Please describe the system on which you are running

  • Operating system/version: MacOS Sonoma (14.2.1) + Xcode15
  • Computer hardware: Mac Studio and Macbook Pro (ARM M2)
  • Network type: SMP/single machine

Details of the problem

The build fails at the linkage stage, unable to find 4 low-level (not API-level) MPI routines called from various object files. Below is a clip of the relevant messages.

[ 53%] Linking CXX executable elliptic_mp
Undefined symbols for architecture arm64:
  "__ZN3MPI3Win4FreeEv", referenced from:
      __ZTVN3MPI3WinE in auxfield.cpp.o
      __ZTVN3MPI3WinE in data2df.cpp.o
      __ZTVN3MPI3WinE in domain.cpp.o
      __ZTVN3MPI3WinE in geometry.cpp.o
      __ZTVN3MPI3WinE in mesh.cpp.o
      __ZTVN3MPI3WinE in message.cpp.o
      __ZTVN3MPI3WinE in helmholtz.cpp.o
      ...
  "__ZN3MPI4CommC2Ev", referenced from:
      __ZNK3MPI9Intracomm5CloneEv in auxfield.cpp.o
      __ZNK3MPI9Graphcomm5CloneEv in auxfield.cpp.o
      __ZNK3MPI8Cartcomm3SubEPKb in auxfield.cpp.o
      __ZNK3MPI9Intracomm12Create_graphEiPKiS2_b in auxfield.cpp.o
      __ZNK3MPI8Cartcomm5CloneEv in auxfield.cpp.o
      __ZNK3MPI9Intracomm11Create_cartEiPKiPKbb in auxfield.cpp.o
      __ZNK3MPI9Intercomm5MergeEb in auxfield.cpp.o
      ...
  "__ZN3MPI8Datatype4FreeEv", referenced from:
      __ZTVN3MPI8DatatypeE in auxfield.cpp.o
      __ZTVN3MPI8DatatypeE in data2df.cpp.o
      __ZTVN3MPI8DatatypeE in domain.cpp.o
      __ZTVN3MPI8DatatypeE in geometry.cpp.o
      __ZTVN3MPI8DatatypeE in mesh.cpp.o
      __ZTVN3MPI8DatatypeE in message.cpp.o
      __ZTVN3MPI8DatatypeE in helmholtz.cpp.o
      ...
  "_ompi_mpi_cxx_op_intercept", referenced from:
      __ZN3MPI2Op4InitEPFvPKvPviRKNS_8DatatypeEEb in auxfield.cpp.o
      __ZN3MPI2Op4InitEPFvPKvPviRKNS_8DatatypeEEb in data2df.cpp.o
      __ZN3MPI2Op4InitEPFvPKvPviRKNS_8DatatypeEEb in domain.cpp.o
      __ZN3MPI2Op4InitEPFvPKvPviRKNS_8DatatypeEEb in geometry.cpp.o
      __ZN3MPI2Op4InitEPFvPKvPviRKNS_8DatatypeEEb in mesh.cpp.o
      __ZN3MPI2Op4InitEPFvPKvPviRKNS_8DatatypeEEb in message.cpp.o
      __ZN3MPI2Op4InitEPFvPKvPviRKNS_8DatatypeEEb in helmholtz.cpp.o
      ...
ld: symbol(s) not found for architecture arm64
collect2: error: ld returned 1 exit status
make[2]: *** [elliptic_mp] Error 1
make[1]: *** [CMakeFiles/elliptic_mp.dir/all] Error 2
make: *** [all] Error 2

These linkage errors do not occur if I use mpich instead of openmpi (but right now I have issues with mpich not working on Sonoma). As I said, this looks like it could be an issue with C++ mangling routine names as a result of a misplaced "extern" declaration in a header.

An older code version which used a smaller subset of MPI continues to compile and run fine.

@floquet-cxx floquet-cxx changed the title problem linking mpi-2 memory routines from C++ problem linking mpi memory routines from C++ Jan 19, 2024
@ggouaillardet
Copy link
Contributor

Your application is using the MPI C++ bindings that have been removed from the MPI standard more than a decade ago.

The right fix is to modernize your code and stop using them.

Meanwhile, you can rebuild Open MPI by passing --enable-mpi-cxx to the configure command line
(note that won't be possible any more with Open MPI 5)

@floquet-cxx
Copy link
Author

Thanks - now could you please suggest a simple way to revert to using C bindings? It seems, I can't just put extern "C" { #include <mpi.h>}. Perhaps I can put extern "C" {} around each and every instance of MPI use, but I don't think so.

An alternative may be to pull them all the routines I've used out of C++ code and put them into a C code file; effectively, to re-wrap them - that used to be my approach when I was only calling a small set of MPI routines. (Forgive me but that somehow doesn't seem like "modernization"!)

@ggouaillardet
Copy link
Contributor

This is not straight forward, but this is not rocket science either.

For example, you can compare examples/ring_c.c and examples/ring_cxx.cc to get an idea of what has to be changed.

Stopping using an API that has been removed from the MPI standard more than a decade ago is indeed modernization.
But if you believe/expect something more C++ish is the way to modernize your code, then feel free to do your own research starting for example with Boost.MPI or Elementals (do not ask me to help you though)

@rhc54
Copy link
Contributor

rhc54 commented Jan 19, 2024

Another, perhaps simpler, option might be to just use OMPI v3.x - IIRC, the C++ bindings were still supported at that time. Unless you really need something in the 4.x series, you can probably find an older version that works for you.

@floquet-cxx
Copy link
Author

Another question occurs to me though: why don't I get the same issues when I call a more restricted set of MPI routines directly from C++? That approach seems to work fine with openmpi. I would have thought that I should get many more instances of linkage failure if the problem is related to removal of C++ bindings.

@ggouaillardet
Copy link
Contributor

you are probably using the C bindings.

@floquet-cxx
Copy link
Author

But, a top-level instance of #include <mpi.h> is used in every C++ file I have. So how could I be using the C bindings? Maybe because testing for CXX is not uniform/consistent across openmpi include files? That seems a bit unlikely.

However, I'm happy to believe you that I should be using C bindings now, and make the adjustments if I have to.

@ggouaillardet
Copy link
Contributor

there is only one include file, and it is mpi.h

For example you use the C bindings if you do something like MPI_Send(..., MPI_COMM_WORLD), and you use the C++ bindings if you MPI::COMM_WORLD.Send(...)

That being said, maybe the C++ bindings have been built and your link command is missing -lmpi_cxx
That can happen if you link manually or if you link with mpicc instead of mpicxx

@floquet-cxx
Copy link
Author

Oh for sure, I am just using the C bindings within my C++ code. For example

MPI_File_set_view
  (fh, skip, Geometry::contig(), Geometry::fileview(),
   "native", MPI_INFO_NULL);

Thank you, changing the link command is a good idea. I don't have such easy direct control over link flags because I'm using macports and cmake. I will have a go though, and also feed that back to the macports maintainers.

Also, I can't find ring_cxx.cc in the examples...

@floquet-cxx
Copy link
Author

Hmmmm. I should just be able to use the C bindings directly as I am doing though, right? Again, I am wondering if there is a missing extern "C" in some header file. I see a few remarks about this when I look through the headers.

@ggouaillardet
Copy link
Contributor

That's a C binding, but not the one the error message is about.

Try to grep Clone and see what you got

the example can be seen at https://github.com/open-mpi/ompi/blob/v4.1.x/examples/ring_cxx.cc

@floquet-cxx
Copy link
Author

Hmmm. No Clone, but MPI_Datatype does certainly get used. And MPI_Type_free(). All in one simple base file. Both these things come up on the errors. Perhaps I'm mis-using something, let me review that file. (Why that would have worked with mpich, I don't know...). Thanks again for your suggestions.

@ggouaillardet
Copy link
Contributor

Since you are not using the MPI C++ bindings, you should be able to compile with -DOMPI_SKIP_MPICXX
By doing so, no MPI C++ headers will be pulled.
But if the compilation fails, that would suggests your code is trying to use them.

@jsquyres jsquyres added this to the v4.1.7 milestone Jan 19, 2024
@jsquyres
Copy link
Member

I see there's been a bunch of back-n-forth here -- let me throw in a few things to check:

  1. It would be good to see exactly what underlying command is being run at [ 53%] Linking CXX executable elliptic_mp.
  2. It would probably be good to make a small example and see if you can narrow down the problem from there. E.g., seeing a linker complain about not finding __ZN3MPI3Win4FreeEv -- I can see it's clearly looking for some kind of variant of MPI_Win_free, but I don't know why the "Free" is capitalized in the missing symbol error message (it's lower case in the C bindings), and I don't know why the name would be munged (it's not munged in the C bindings). It's been a long, long time since I've worked with C++ so I don't remember these kinds of details, but is there a chance you're calling MPI_Win_Free() somewhere instead of MPI_Win_free()?

@ggouaillardet
Copy link
Contributor

FWIW __ZN3MPI3Win4FreeEv is the mangled symbol for MPI::Win::Free()

That suggests the MPI C++ bindings are indeed available but libmpi_cxx.so is not used at link time.

I naively expect these undefined references would not be there if they were not used by the application, but you know, C++ does C++, so maybe I should drop my expectations.

@jsquyres
Copy link
Member

FWIW __ZN3MPI3Win4FreeEv is the mangled symbol for MPI::Win::Free()

Ah, there it is. Ok.

Then I think it would be very good to see exactly what the underlying command is for [ 53%] Linking CXX executable elliptic_mp. It could be as simple as accidentally using mpicc instead of mpic++.

@floquet-cxx
Copy link
Author

I have no idea why any MPI C++ bindings get invoked if I have only used the C bindings in the first place. And, why only 3 or 4 undefined symbols, when I have used quite a few MPI routines? (Pondering these issues again leads me to wonder about something buried in openmpi header files.)

Below is the link command for elliptic_mp. As I suspected, (Apple/Xcode) g++ is used as the linker, not mpi_cxx. For completeness, I have first listed a compile command for one of the object files (helmholtz.o) which complains about undefined symbols. The -DMPI_EX is a definition issued by me.

[ 54%] Building CXX object CMakeFiles/elliptic_mp.dir/elliptic/helmholtz.cpp.o
g++ -DMPI_EX -I/Users/hmb/develop-git/semtex-xxt/veclib -I/Users/hmb/develop-git/semtex-xxt/femlib -I/opt/local/include/openmpi-mp -I/Users/hmb/develop-git/semtex-xxt/src -I/Users/hmb/develop-git/semtex-xxt/elliptic -w -std=c++11 -O3 -DNDEBUG -arch arm64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.2.sdk -MD -MT CMakeFiles/elliptic_mp.dir/elliptic/helmholtz.cpp.o -MF CMakeFiles/elliptic_mp.dir/elliptic/helmholtz.cpp.o.d -o CMakeFiles/elliptic_mp.dir/elliptic/helmholtz.cpp.o -c /Users/hmb/develop-git/semtex-xxt/elliptic/helmholtz.cpp

and

[ 55%] Linking CXX executable elliptic_mp
/opt/local/bin/cmake -E cmake_link_script CMakeFiles/elliptic_mp.dir/link.txt --verbose=1
g++   -w -std=c++11 -O3 -DNDEBUG -arch arm64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.2.sdk -Wl,-search_paths_first -Wl,-headerpad_max_install_names CMakeFiles/elliptic_mp.dir/src/auxfield.cpp.o CMakeFiles/elliptic_mp.dir/src/data2df.cpp.o CMakeFiles/elliptic_mp.dir/src/domain.cpp.o CMakeFiles/elliptic_mp.dir/src/geometry.cpp.o CMakeFiles/elliptic_mp.dir/src/mesh.cpp.o CMakeFiles/elliptic_mp.dir/src/message.cpp.o CMakeFiles/elliptic_mp.dir/elliptic/helmholtz.cpp.o CMakeFiles/elliptic_mp.dir/elliptic/drive.cpp.o -o elliptic_mp   -L/opt/local/lib/gcc12/gcc/arm64-apple-darwin23/12.3.0  -L/opt/local/lib/gcc12  src/libsrc.a femlib/libfem.a veclib/libvec.a /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.2.sdk/usr/lib/libblas.tbd /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.2.sdk/usr/lib/liblapack.tbd /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.2.sdk/usr/lib/libblas.tbd /opt/local/lib/openmpi-mp/libmpi.dylib /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.2.sdk/usr/lib/liblapack.tbd /opt/local/lib/openmpi-mp/libmpi.dylib -lgfortran -lemutls_w -lgcc -lquadmath -lemutls_w -lgcc -lgcc 

Finally, here again are the undefined symbol messages:

-macosx_version_min has been renamed to -macos_version_min
Undefined symbols for architecture arm64:
  "__ZN3MPI3Win4FreeEv", referenced from:
      __ZTVN3MPI3WinE in auxfield.cpp.o
      __ZTVN3MPI3WinE in data2df.cpp.o
      __ZTVN3MPI3WinE in domain.cpp.o
      __ZTVN3MPI3WinE in geometry.cpp.o
      __ZTVN3MPI3WinE in mesh.cpp.o
      __ZTVN3MPI3WinE in message.cpp.o
      __ZTVN3MPI3WinE in helmholtz.cpp.o
      ...
  "__ZN3MPI4CommC2Ev", referenced from:
      __ZNK3MPI9Intracomm5CloneEv in auxfield.cpp.o
      __ZNK3MPI9Graphcomm5CloneEv in auxfield.cpp.o
      __ZNK3MPI8Cartcomm3SubEPKb in auxfield.cpp.o
      __ZNK3MPI9Intracomm12Create_graphEiPKiS2_b in auxfield.cpp.o
      __ZNK3MPI8Cartcomm5CloneEv in auxfield.cpp.o
      __ZNK3MPI9Intracomm11Create_cartEiPKiPKbb in auxfield.cpp.o
      __ZNK3MPI9Intercomm5MergeEb in auxfield.cpp.o
      ...
  "__ZN3MPI8Datatype4FreeEv", referenced from:
      __ZTVN3MPI8DatatypeE in auxfield.cpp.o
      __ZTVN3MPI8DatatypeE in data2df.cpp.o
      __ZTVN3MPI8DatatypeE in domain.cpp.o
      __ZTVN3MPI8DatatypeE in geometry.cpp.o
      __ZTVN3MPI8DatatypeE in mesh.cpp.o
      __ZTVN3MPI8DatatypeE in message.cpp.o
      __ZTVN3MPI8DatatypeE in helmholtz.cpp.o
      ...
  "_ompi_mpi_cxx_op_intercept", referenced from:
      __ZN3MPI2Op4InitEPFvPKvPviRKNS_8DatatypeEEb in auxfield.cpp.o
      __ZN3MPI2Op4InitEPFvPKvPviRKNS_8DatatypeEEb in data2df.cpp.o
      __ZN3MPI2Op4InitEPFvPKvPviRKNS_8DatatypeEEb in domain.cpp.o
      __ZN3MPI2Op4InitEPFvPKvPviRKNS_8DatatypeEEb in geometry.cpp.o
      __ZN3MPI2Op4InitEPFvPKvPviRKNS_8DatatypeEEb in mesh.cpp.o
      __ZN3MPI2Op4InitEPFvPKvPviRKNS_8DatatypeEEb in message.cpp.o
      __ZN3MPI2Op4InitEPFvPKvPviRKNS_8DatatypeEEb in helmholtz.cpp.o
      ...
ld: symbol(s) not found for architecture arm64
collect2: error: ld returned 1 exit status
make[2]: *** [elliptic_mp] Error 1
make[1]: *** [CMakeFiles/elliptic_mp.dir/all] Error 2
make: *** [all] Error 2

Thanks again for your thoughts.

@ggouaillardet
Copy link
Contributor

Thanks for the feedback.

The easiest workaround is probably to cmake -DCMAKE_CXX_COMPILER=mpicxx

A slightly better one should be to understand why cmake did not pick libmpi_cxx.so (it might be because the CMake files only request the C bindings for MPI instead of the C++ ones.

Or you can pass -DOMPI_SKIP_MPICXX to the C++ compiler

FWIW

$ cat foo.cc
#include <mpi.h>
$  ~/local/ompi-v4.1.x/bin/mpicxx -c foo.cc
$ nm -u foo.o | grep MPI | grep -v _MPI_
__ZN3MPI3Win4FreeEv
__ZN3MPI4CommC2Ev
__ZN3MPI8Datatype4FreeEv

so yeah, some undefined C++ symbols are generated even if they are not used!

@floquet-cxx
Copy link
Author

Pardon me, but I think these outcomes point to an error in the openmpi preprocessing system. As a slightly more extended example, albeit using g++, not mpicxx.

 semtex-xxt (xxt) >$ cat foo.cpp
#include <mpi.h>

int main() { return 0 ; }

Now, what happens withj openmpi headers and g++:

semtex-xxt (xxt) >$ g++ -I /opt/local/include/openmpi-mp foo.cpp
-macosx_version_min has been renamed to -macos_version_min
Undefined symbols for architecture arm64:
  "_MPI_Abort", referenced from:
      __ZN3MPI4Comm5AbortEi in ccvzKD2X.o
  "_MPI_Accumulate", referenced from:
      __ZNK3MPI3Win10AccumulateEPKviRKNS_8DatatypeEiliS5_RKNS_2OpE in ccvzKD2X.o
  "_MPI_Allgather", referenced from:
      __ZNK3MPI4Comm9AllgatherEPKviRKNS_8DatatypeEPviS5_ in ccvzKD2X.o
  "_MPI_Allgatherv", referenced from:
      __ZNK3MPI4Comm10AllgathervEPKviRKNS_8DatatypeEPvPKiS8_S5_ in ccvzKD2X.o
  "_MPI_Allreduce", referenced from:
      __ZNK3MPI4Comm9AllreduceEPKvPviRKNS_8DatatypeERKNS_2OpE in ccvzKD2X.o
  "_MPI_Alltoall", referenced from:
      __ZNK3MPI4Comm8AlltoallEPKviRKNS_8DatatypeEPviS5_ in ccvzKD2X.o
  "_MPI_Alltoallv", referenced from:
      __ZNK3MPI4Comm9AlltoallvEPKvPKiS4_RKNS_8DatatypeEPvS4_S4_S7_ in ccvzKD2X.o
  "_MPI_Alltoallw", referenced from:
      __ZNK3MPI4Comm9AlltoallwEPKvPKiS4_PKNS_8DatatypeEPvS4_S4_S7_ in ccvzKD2X.o
  "_MPI_Barrier", referenced from:
      __ZNK3MPI4Comm7BarrierEv in ccvzKD2X.o
  "_MPI_Bcast", referenced from:
      __ZNK3MPI4Comm5BcastEPviRKNS_8DatatypeEi in ccvzKD2X.o
  "_MPI_Bsend", referenced from:
      __ZNK3MPI4Comm5BsendEPKviRKNS_8DatatypeEii in ccvzKD2X.o
  "_MPI_Bsend_init", referenced from:
      __ZNK3MPI4Comm10Bsend_initEPKviRKNS_8DatatypeEii in ccvzKD2X.o
  "_MPI_Cancel", referenced from:
      __ZNK3MPI7Request6CancelEv in ccvzKD2X.o
  "_MPI_Cart_coords", referenced from:
      __ZNK3MPI8Cartcomm10Get_coordsEiiPi in ccvzKD2X.o
  "_MPI_Cart_create", referenced from:
      __ZNK3MPI9Intracomm11Create_cartEiPKiPKbb in ccvzKD2X.o
...
... Around 100 lines of messages ...
...
  "__ZN3MPI3Win4FreeEv", referenced from:
      __ZTVN3MPI3WinE in ccvzKD2X.o
  "__ZN3MPI4CommC2Ev", referenced from:
      __ZN3MPI9IntracommC2Ev in ccvzKD2X.o
      __ZN3MPI9IntracommC1EP19ompi_communicator_t in ccvzKD2X.o
  "__ZN3MPI8Datatype4FreeEv", referenced from:
      __ZTVN3MPI8DatatypeE in ccvzKD2X.o
  "_ompi_mpi_comm_null", referenced from:
      __ZN3MPI9IntracommC1EP19ompi_communicator_t in ccvzKD2X.o
      __ZN3MPI8CartcommC1ERKP19ompi_communicator_t in ccvzKD2X.o
      __ZN3MPI9GraphcommC1ERKP19ompi_communicator_t in ccvzKD2X.o
  "_ompi_mpi_cxx_op_intercept", referenced from:
      __ZN3MPI2Op4InitEPFvPKvPviRKNS_8DatatypeEEb in ccvzKD2X.o
  "_ompi_op_set_cxx_callback", referenced from:
      __ZN3MPI2Op4InitEPFvPKvPviRKNS_8DatatypeEEb in ccvzKD2X.o
ld: symbol(s) not found for architecture arm64
collect2: error: ld returned 1 exit status

I suggest that is clearly a problem. Since I didn't ask for any MPI routines, nothing MPI-related should arise. Here is what occurs with mpich headers:

semtex-xxt (xxt) >$ g++ -I /opt/local/include/mpich-gcc12 foo.cpp
-macosx_version_min has been renamed to -macos_version_min
semtex-xxt (xxt) >$ nm a.out | grep MPI
semtex-xxt (xxt) >$ 

Exactly what I'd expect should happen. I'm unsure about the macosx warning but think it comes from Xcode/g++ and is unrelated.

@ggouaillardet
Copy link
Contributor

I don't know...

Unlike MPICH, Open MPI has a lot of C++ inlined subroutines/constructors that invoke the C bindings.
Even if I was not able to evidence this with a small reproducer, the fact is a lot of undefined references get pulled by the compiler. I would hope the compiler get rid of the unused inline subroutines, but I am not sure this is a valid expectation, nor something Open MPI did wrong.

Anyway, I strongly doubt this issue will be adressed so I suggest you use one of the described workarounds.
You can also upgrade to Open MPI 5 or rebuilt Open MPI without --enable-mpi-cxx

@floquet-cxx
Copy link
Author

(Setting the C++ compiler to mpicxx did fix some issues, but I still couldn't get everything to work, since I use C and F77 too. I had trouble getting it all up and running.)

HOWEVER: I think I missed a step in my macports setup, which was to install port openmpi-default. That is meant to set all the compilers correctly/consistently. Using the equivalent cured my problems with mpich, so I can believe it will work with openmpi too. (The macports documentation could use some improvement! But, this is entirely understandable.) Thank you for your explanations and patience.

@floquet-cxx
Copy link
Author

Later ... I went back and re-installed openmpi and added macports installation for openmpi-default ("sudo port install openmpi-default +gcc12"). That did not fix the problem, which remains as I have described above. But, I found the suggested workaround, passing -DOMPI_SKIP_MPICXX to the C++ compiler, does work. Still using openmpi version 4.1.6 as installed by macports.

@floquet-cxx floquet-cxx reopened this Nov 1, 2024
@floquet-cxx
Copy link
Author

Further - I found the same error occurring on Ubuntu linux (20.04) with openmpi (but not, again, with mpich). Adding the suggested workaround again overcame the problem. So, it seems more deep-rooted than something associated with a particular installation set.

BTW, these errors only seemed to have arisen when I started to use Cartesian routines with MPI. (E.g. MPI_Cart_create) - I had previously (for over a decade?) used openmpi without issue.

@ggouaillardet
Copy link
Contributor

I am not sure of what you are expecting here.
The workaround (e.g. -DOMPI_SKIP_MPICXX is needed with Open MPI because of how inlining is done by Open MPI and the compiler). C++ bindings have been removed from Open MPI 5, so upgrading is also an option and it should not need any workarounds.

@floquet-cxx
Copy link
Author

Thanks. I am waiting for all the port maintainers to install V5 upgrades.

But - I think the surprising thing for me is that the issue seemed to be raised just by using a (very old) part of MPI that I hadn't previously accessed, whereas I started out thinking it was package-dependent (macports). It seems so general that I'm surprised other users don't seem to have reported it.

On the plus side, I am finding that openmpi is quite a bit faster than mpich on my Mac Studio. The thing that caused me to revisit openmpi was finding only 80% CPU use with mpich. But execution with openmpi seems way faster than the missing 20%.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants