Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERF coupling with WW3 (MPI communicator assumptions) #1261

Open
jmsexton03 opened this issue Jul 2, 2024 · 7 comments
Open

ERF coupling with WW3 (MPI communicator assumptions) #1261

jmsexton03 opened this issue Jul 2, 2024 · 7 comments
Labels
enhancement New feature or request

Comments

@jmsexton03
Copy link

Is your feature request related to a problem? Please describe.
WW3 uses MPI_COMM_WORLD in many places that make it more difficult to couple.

Describe the solution you'd like
Hi! We're interested in coupling WW3 with ERF (a new C++ GPU-ready code that offers an alternative to WRF) and it would be helpful if WW3 didn't assume it uses the whole MPI communicator. We've forked WW3 and have a version that defaults to the whole communicator but allows us to split it. Would this be something you'd be interested in having in the main github repo or should we maintain a separate fork?

Describe alternatives you've considered
We looked into existing coupling solutions currently in WW3 such at OASIS and ESMF

Additional context
ERF lives at https://github.com/erf-model/ERF on Github
Our current WW3 fork is located at https://github.com/erf-model/WW3
My initial tests were with:
./model/bin/w3_setup model -c gnu -s Ifremer1 cd regtests ./bin/run_cmake_test -C MPMD -n 2 -p mpirun -f -s PR1_MPI ../model ww3_tp2.2
For the non-split case it appears the github regtest action caught a bug which may be due to incorrect placement of initialization calls, I'm still trying to fully reproduce this bug on my local machine but spack is going rather slowly: https://github.com/jmsexton03/WW3/actions/runs/9670620268/job/26680790816#step:5:6995:7002

@jmsexton03 jmsexton03 added the enhancement New feature or request label Jul 2, 2024
@JessicaMeixner-NOAA
Copy link
Collaborator

@jmsexton03 - I'm not 100% sure how the coupling with OASIS is done, but when we couple with ESMF we essentially use a different driver and obtain the MPI from that. I'd be curious to see your changes, which branch on https://github.com/erf-model/WW3 do you have your changes?

@jmsexton03
Copy link
Author

I'd take a look at the mpmd branch. We're currently working on documentation, for context we've been testing using edits to regtests/run_cmake_test

@jmsexton03
Copy link
Author

These instructions should let you reproduce our current test (which just sends two variables from WW3 to ERF and runs 5 steps of an example)

git clone --recursive [email protected]:erf-model/ERF
cd ERF/Exec/ABL
make -j4 USE_WW3_COUPLING=TRUE
cd ../../Submodules/WW3
./model/bin/w3_setup model -c gnu -s Ifremer1
cd regtests
./bin/run_cmake_test -C MPMD -n 2 -p mpirun -f -s PR1_MPI ../model ww3_tp2.2

@jmsexton03
Copy link
Author

https://github.com/NOAA-EMC/WW3/compare/develop...erf-model:WW3:mpmd?expand=1

Summary of changes in mpmd branch:

  • add mpicomm module to track split communicator
  • use mpicomm module to replace MPI_COMM_WORLD statements
  • brief changes in ww3_shell.F90 to implement communicator splitting
  • brief changes in w3iogomd.F90 to use split communicator to send data
  • all communicator splitting sections currently have a ifdef W3_MPMD compile guard (which still needs to be worked into the build system)
  • minor changes to run_cmake_test to add MPMD as a coupling option (which currently assumes a specific ERF problem setup)

@jmsexton03
Copy link
Author

@JessicaMeixner-NOAA thanks for looking at my changes, I'm still investigating the regtest differences locally. Do you have any further feedback on whether these type of changes (or something along those lines) are something you'd be interested in having in the main repo, or any further follow-up questions?

@JessicaMeixner-NOAA
Copy link
Collaborator

@jmsexton03 I'm a bit confused by some of the changes in the routines such as ww3_ounp, etc that are not ww3_shel or ww3_multi. These are not routines that would be run within a coupled model to my knowledge, but perhaps I didn't look closely enough.

@jmsexton03
Copy link
Author

Originally we put some of the more complicated routines in ww3_shel directly, but we moved some of them to w3iogomd.F90 which seemed to get called from ww3_shel. Would it be better to move the mpi_send calls to ww3_shel?

We did use ww3_ounf when debugging since we were validating variable names based on comparisons to netcdf write. It seemed clearer there how the variables we were more interested in were used / declared. Most of the changes in files like ww3_ounp were trying to make the mpi communicator used a global variable, since the way we want to launch mpi would make MPI_COMM_WORLD being used in many files incorrect, since some processors would be assigned to a different executable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants