Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When using latest GEOS model in the DAS must use O-server #88

Open
bena-nasa opened this issue Apr 26, 2021 · 6 comments
Open

When using latest GEOS model in the DAS must use O-server #88

bena-nasa opened this issue Apr 26, 2021 · 6 comments
Assignees

Comments

@bena-nasa
Copy link
Collaborator

Another issue related to updating the GEOSadas to use newer MAPL/Model versions

With changes to History to use the new O-Server and ESMF regridding, at the GEOSadas scales (c720, ~80 collections) it is necessary to use the o-server to achieve optimal performance. We have been benchmarking the DAS history with the latest version of MAPL that has the newest o-server options.
It was found that on the skylakes, 8 server nodes in this configuration provided good performance with limited gains beyond 8 nodes:
mpirun -np 5720 ./GEOSgcm.x --npes_model 5400 --nodes_output_server 8 --oserver_type multigroup --npes_backend_pernode 8 --fast_oclient true

@rtodling
Copy link
Collaborator

Ben: What's the GCM layout in this configuration (NX, NY)?
Also: I assume you are referring to the version of the GCM in FP (5.27.1) ... which does not use MAPL2.0 - am I right?

@bena-nasa
Copy link
Collaborator Author

bena-nasa commented Apr 27, 2021

No, I'm not talking about 5.27.1, I am just giving a heads up that when testing the GEOSadas with newer model versions that make use of MAPL2, use of the IO server is mandatory so the scripting will need to reflect this. That's all.
For example, Hamid has been testing the latest model version c720/181 levels with the current OPS history and finding this works for example:
mpirun -np 5720 ./GEOSgcm.x --npes_model 5400 --nodes_output_server 8 --oserver_type multigroup --npes_backend_pernode 8 --fast_oclient true

@rtodling
Copy link
Collaborator

Question on this: how does the total number of PEs about relate to the ncpus-per-nodes? So, we are on skylake w/ ncpus-per-node=40 but requesting a batch job w/ ncpus-per-node=36 would the math leading up to the number 5720 above change, that is,

5720 = 5400 + (8 * 40)

in the case of 36 should I have the entry in the option -np change to

5400 + (36*8) = 5688

?

@bena-nasa
Copy link
Collaborator Author

bena-nasa commented Nov 15, 2021

Yes, if you are only running 36 cores per node then you would change the -np to 5400 + (36*8) = 5688
In general the -np XXX needs to be equal to or great than: (npes_model = nx * ny) + (--nodes_output_server)*cores_per_node

@bena-nasa
Copy link
Collaborator Author

bena-nasa commented Nov 15, 2021

More clarification, the infrastructure is detecting the node information and partitioning the MPI communicators between the nodes as appropriate. The npes_model does not have to be evenly divisible by the number of nodes and if there are few left over mpi tasks, those just aren't used by either the model or o-server. So just to makeup numbers, lets say you had 23 cores per node and needed 540 mpi tasks for the model and wanted 4 O-server nodes, you would use (24 * 23) (as you need 24 nodes to run the model) + (4 * 23) mpi tasks like so:

-np 644, --npes_model 540, --nodes_output_server 4.

You need 24 nodes to run the model (so 12 mpi tasks just don't get used by in the actual GEOSgcm.x run, they just wait for the rest to finish)
In other words on nodes 1-23, use all 23 mpi ranks to run the model, on node 24, 11 mpi ranks are used for the model, 12 are not used at all, just wait for everyone else, and the ranks on nodes 25-28 are used for the o-sever
644 > 540 + (23*4)

@rtodling
Copy link
Collaborator

Thanks Ben.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants