When using latest GEOS model in the DAS must use O-server #88

bena-nasa · 2021-04-26T20:48:58Z

Another issue related to updating the GEOSadas to use newer MAPL/Model versions

With changes to History to use the new O-Server and ESMF regridding, at the GEOSadas scales (c720, ~80 collections) it is necessary to use the o-server to achieve optimal performance. We have been benchmarking the DAS history with the latest version of MAPL that has the newest o-server options.
It was found that on the skylakes, 8 server nodes in this configuration provided good performance with limited gains beyond 8 nodes:
mpirun -np 5720 ./GEOSgcm.x --npes_model 5400 --nodes_output_server 8 --oserver_type multigroup --npes_backend_pernode 8 --fast_oclient true

rtodling · 2021-04-27T12:04:16Z

Ben: What's the GCM layout in this configuration (NX, NY)?
Also: I assume you are referring to the version of the GCM in FP (5.27.1) ... which does not use MAPL2.0 - am I right?

bena-nasa · 2021-04-27T14:16:25Z

No, I'm not talking about 5.27.1, I am just giving a heads up that when testing the GEOSadas with newer model versions that make use of MAPL2, use of the IO server is mandatory so the scripting will need to reflect this. That's all.
For example, Hamid has been testing the latest model version c720/181 levels with the current OPS history and finding this works for example:
mpirun -np 5720 ./GEOSgcm.x --npes_model 5400 --nodes_output_server 8 --oserver_type multigroup --npes_backend_pernode 8 --fast_oclient true

rtodling · 2021-11-15T16:07:44Z

Question on this: how does the total number of PEs about relate to the ncpus-per-nodes? So, we are on skylake w/ ncpus-per-node=40 but requesting a batch job w/ ncpus-per-node=36 would the math leading up to the number 5720 above change, that is,

5720 = 5400 + (8 * 40)

in the case of 36 should I have the entry in the option -np change to

5400 + (36*8) = 5688

?

bena-nasa · 2021-11-15T16:12:00Z

Yes, if you are only running 36 cores per node then you would change the -np to 5400 + (36*8) = 5688
In general the -np XXX needs to be equal to or great than: (npes_model = nx * ny) + (--nodes_output_server)*cores_per_node

bena-nasa · 2021-11-15T16:26:22Z

More clarification, the infrastructure is detecting the node information and partitioning the MPI communicators between the nodes as appropriate. The npes_model does not have to be evenly divisible by the number of nodes and if there are few left over mpi tasks, those just aren't used by either the model or o-server. So just to makeup numbers, lets say you had 23 cores per node and needed 540 mpi tasks for the model and wanted 4 O-server nodes, you would use (24 * 23) (as you need 24 nodes to run the model) + (4 * 23) mpi tasks like so:

-np 644, --npes_model 540, --nodes_output_server 4.

You need 24 nodes to run the model (so 12 mpi tasks just don't get used by in the actual GEOSgcm.x run, they just wait for the rest to finish)
In other words on nodes 1-23, use all 23 mpi ranks to run the model, on node 24, 11 mpi ranks are used for the model, 12 are not used at all, just wait for everyone else, and the ranks on nodes 25-28 are used for the o-sever
644 > 540 + (23*4)

rtodling · 2021-11-15T16:31:51Z

Thanks Ben.

bena-nasa assigned rtodling and gmao-jstassi Apr 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When using latest GEOS model in the DAS must use O-server #88

When using latest GEOS model in the DAS must use O-server #88

bena-nasa commented Apr 26, 2021

rtodling commented Apr 27, 2021

bena-nasa commented Apr 27, 2021 •

edited

Loading

rtodling commented Nov 15, 2021

bena-nasa commented Nov 15, 2021 •

edited

Loading

bena-nasa commented Nov 15, 2021 •

edited

Loading

rtodling commented Nov 15, 2021

When using latest GEOS model in the DAS must use O-server #88

When using latest GEOS model in the DAS must use O-server #88

Comments

bena-nasa commented Apr 26, 2021

rtodling commented Apr 27, 2021

bena-nasa commented Apr 27, 2021 • edited Loading

rtodling commented Nov 15, 2021

bena-nasa commented Nov 15, 2021 • edited Loading

bena-nasa commented Nov 15, 2021 • edited Loading

rtodling commented Nov 15, 2021

bena-nasa commented Apr 27, 2021 •

edited

Loading

bena-nasa commented Nov 15, 2021 •

edited

Loading

bena-nasa commented Nov 15, 2021 •

edited

Loading