Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update scrip_remap_conservative to use MPI #1268

Open
wants to merge 6 commits into
base: develop
Choose a base branch
from

Conversation

jcwarner-usgs
Copy link

Pull Request Summary

When using nesting, the SCRIP routines are called to compute remapping weights. But each processor computes all the weights. Here we updated SCRIP/scrip_remap_conservative.F to use all the MPI processors to compute separate portions of the remap weights, and then distribute a full set of weights to all the nodes.

Description

The results should be identical with or without this update.
Reviewer could be: Erick Rogers, NRL

Issue(s) addressed

Related to Discussion #1252

Commit Message

Update scrip_remap_conservative to be more efficient on first use.

Check list

Testing

  • How were these changes tested? I ran several personal tests on local HPC, nothing too extensive.
  • Are the changes covered by regression tests? (If not, why? Do new tests need to be added?)
  • Have the matrix regression tests been run (if yes, please note HPC and compiler)?
  • Please indicate the expected changes in the regression test output, (Note the list of known non-identical tests.)
  • Please provide the summary output of matrix.comp (matrix.Diff.txt, matrixCompFull.txt and matrixCompSummary.txt):

@JessicaMeixner-NOAA
Copy link
Collaborator

@ErickRogers pinging you as you were suggested as a possible reviewer.

Copy link
Contributor

@ErickRogers ErickRogers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks safe to me, since it's put behind a switch. I'm not proficient with MPI programming, so I can't comment on that, but I expect that it is good, since John says it works. Let me know if you want an MPI-fluent reviewer.
NB: We have some rudimentary MPI in WMGHGH. However, it splits things up like "one process per remapping", e.g., if you have 2 grid-pairs, it only splits into 2 processes. So this "within grid" parallelism should be significantly faster.

@MatthewMasarik-NOAA
Copy link
Collaborator

Thanks for submitting @jcwarner-usgs. I'll start reviewing.

@MatthewMasarik-NOAA
Copy link
Collaborator

This looks safe to me, since it's put behind a switch. I'm not proficient with MPI programming, so I can't comment on that, but I expect that it is good, since John says it works. Let me know if you want an MPI-fluent reviewer. NB: We have some rudimentary MPI in WMGHGH. However, it splits things up like "one process per remapping", e.g., if you have 2 grid-pairs, it only splits into 2 processes. So this "within grid" parallelism should be significantly faster.

Thanks for reviewing @ErickRogers.

@jcwarner-usgs
Copy link
Author

I think the MPI could be re-written to be more WW3 centric. And my switches in there are not active.
If you want me to spend more time on this i can. I was not sure if WW3 people would re-write the code or not.

@MatthewMasarik-NOAA
Copy link
Collaborator

MatthewMasarik-NOAA commented Jul 10, 2024

I think the MPI could be re-written to be more WW3 centric. And my switches in there are not active. If you want me to spend more time on this i can. I was not sure if WW3 people would re-write the code or not.

Hi @jcwarner-usgs, thanks for this follow up. I had two thoughts on the switches you mentioned. Currently they are W3_SCRIP_JCW. This should be modified to remove your initials from the end, maybe W3_SCRIP_MPI or something similar? We should also have the switches be active and set a default value (I believe if MPI isn't enabled then we wouldn't want to enter those code blocks, so maybe we should have a default value of .false.?).

Regarding re-writing to be more WW3 centric. Being as WW3 centric as possible is obviously best for the code base, though I can say in my initial review nothing jumped out as being problematic in that regard. It seemed clean and well commented which is great. We do have a Code Style Standards page though that may help out here. I would currently just make the suggestion of renaming some of these My* variables here

https://github.com/jcwarner-usgs/WW3/blob/0baa3df08da8bfeb8be1fe1c13755e1a9938c46c/model/src/SCRIP/scrip_remap_conservative.f#L269-L270

to something more descriptive if possible.

@ErickRogers
Copy link
Contributor

@jcwarner-usgs I'm not sure I understand what you mean re: WW3 centric. What was your motivation for adding this feature? For you to use in your WW3 applications? Or some other applications?

@jcwarner-usgs
Copy link
Author

oh. I just meant like I used:

  call mpi_allreduce(Asend, Arecv, grid1_size, MPI_DOUBLE,                 &
 &                   MPI_SUM, MyComm, MyError)

but in other parts of the WW3 the mpi calls look like:

        CALL MPI_ALLREDUCE(SND_BUFF(1:NSEA), &
             RCV_BUFF(1:NSEA), &
             NSEA,     &
             MPI_REAL, &
             MPI_SUM,  &
             ID_LCOMM, &
             IL_ERR)

should i go thru and make the variables all capitals?
should i use SND_BUF and RCV_BUF instead of Asend and Arecv?
The My Start (Mystr), My end (Myend), My Rank, My COMM etc are from ROMS.
I can give it a clean up if you want.

@ErickRogers
Copy link
Contributor

@jcwarner-usgs OK, I see. Matt and Jessica may disagree, but from my point of view, it is fine to use the conventions of the original file, rather than the conventions of WW3. I think of the stuff in the /SCRIP/ subdirectory as not strictly part of the WW3 code proper, but rather as a semi-external library that is adapted (with as light a touch as possible) to work with WW3, to be compiled into WW3. We maintain it in the same version control with WW3 because maintaining it separately would just result in extra work. (2 cents)

@ErickRogers
Copy link
Contributor

All that aside, renaming the My* variables wouldn't hurt, for clarity purposes.

@JessicaMeixner-NOAA
Copy link
Collaborator

JessicaMeixner-NOAA commented Jul 11, 2024

I agree that keeping styles matching to what is in the routine is okay. It's more of the using your initials or "my" variable names versus just making them generic/descriptive to what they are.

That's really great to hear from @aliabdolali that it sped things up, we'll have to see if that can help us in some set-ups here. I don't see any regression tests that are using these updates.

@jcwarner-usgs Can you tell me a little more about your tests that you've run on your end so we can help advise how to best put in a test for the regression tests so we ensure this feature continues to work? Also, did you obtain identical results with and without this update?

@JessicaMeixner-NOAA
Copy link
Collaborator

@jcwarner-usgs Can you tell me a little more about your tests that you've run on your end so we can help advise how to best put in a test for the regression tests so we ensure this feature continues to work? Also, did you obtain identical results with and without this update?

@jcwarner-usgs - I realized I wasn't clear that this question was to you so I updated my comment and am pinging you on this.

Copy link
Collaborator

@MatthewMasarik-NOAA MatthewMasarik-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jcwarner-usgs I wanted to touch base following the discussion on code style. I only have one request here, that is to rename the switch W3_SCRIP_JCW to something without your initials (maybe W3_SCRIP_MPI or other).

Also, I wanted to mention that a couple Github CI jobs failed. I'm re-running these manually now to see if they will resolve themselves.

@jcwarner-usgs
Copy link
Author

Thanks for checking back.
-I did make a small change today, for the size of the remap weights in the scrip_remap_conservative.f
-I will change the switch to something else. You already have SCRIP and SCRIPNC. So how about SCRIPMPI?

  • I am running some more tests here. We have a 4 nested set of grids that are challenging to get going.

@MatthewMasarik-NOAA
Copy link
Collaborator

-I will change the switch to something else. You already have SCRIP and SCRIPNC. So how about SCRIPMPI?

SCRIPMPI is fine with me.

@MatthewMasarik-NOAA
Copy link
Collaborator

Hi @jcwarner-usgs, just a note to say we are taking a short pause on our PR processing do to some temporary changes for our group. I just posted our group on the Commit Queue: Temporary PR Policy. Look forward to picking back up after our short break.

@jcwarner-usgs
Copy link
Author

Getting back onto this pull request now.
I have just updated 3 routines in my code:

  1. model/src/cmake/src_list.cmake
    changed
    ${CMAKE_CURRENT_SOURCE_DIR}/SCRIP/scrip_remap_conservative.f
    to
    ${CMAKE_CURRENT_SOURCE_DIR}/SCRIP/scrip_remap_conservative.F
    (notice the capital F extension)

  2. model/src/cmake/switches.json
    added:
    {
    "name": "scripmpi",
    "num_switches": "upto2",
    "description": "",
    "valid-options": [
    {
    "name": "SCRIPMPI"
    }
    ]
    },

  3. model/src/SCRIP/scrip_remap_conservative.F

  • changed the file extension from .f to .F This allows cpp to process this file.
  • put all my changes within #define SCRIPMPI blocks. I think we should have 3 options of SCRIP, SCRIPNC, and SCRIPMPI.
  • made use of MPI_COMM_WAVE, IAPROC, and NTPROC from w3adatmd and w3odatmd.
  • Removed
    Myrank, MyError, MyComm, MyStr and Myend and replaced with
    IAPROC, IERR_MPI, MPI_COMM_WAV, grid1/2_str, grid1/2_end
  • cleaned up the mpi send/recv calls.
  1. I tested these changes with a simple case we use of Hurricane Sandy with a nested grid. I ran the case with the original version of scrip_remap_conservative.f and the modified scrip_remap_conservative.F. For each case a new version of rmp_src_to_dst_conserv_002_001.nc was created. I compared the contents of these files by comparisons between src_address, dst_address, dst_grid_frac, and remap_matrix. The results were identical, so the scrip files being generated are the same. I also, looked at results of hs and dtp and they were identical. Happy to do any tests that you have.

Can you try this merge again?

@MatthewMasarik-NOAA
Copy link
Collaborator

Hi @jcwarner-usgs, thank you for addressing the previous comments with these updates, and including the description of the changes. Much appreciated.

A few weeks ago we posted a temporary PR Policy. It is in response to temporary resource reductions, where we don't have the manpower to review PRs, and so are limiting what we take on to just what is needed for the next GFS/GEFS releases being worked on. This is just till mid-November. I really apologize because I know you submitted this prior to that. We will pick this back up from our end just as soon as our temporary status ends sometime in November. We'll look forward to completing the review then. Thanks for understanding.

@jcwarner-usgs
Copy link
Author

All good! no worries.
I had time to make it cleaner and so i worked on it. We will keep using it and ensure it is working.
thanks
-j

@MatthewMasarik-NOAA
Copy link
Collaborator

All good! no worries. I had time to make it cleaner and so i worked on it. We will keep using it and ensure it is working. thanks -j

Awesome. thanks so much for understanding. having some more time to ensure it's working is a side benefit!

@MatthewMasarik-NOAA
Copy link
Collaborator

@jcwarner-usgs just to let you know we are back from our temporary PR policy, and I've started working on the review for this. I'll be back in touch again next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants