Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compile-time ng in ecCKD and TripleClouds + GPTL timing library #20

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

peterukk
Copy link

@peterukk peterukk commented May 21, 2024

Currently the official repo is missing two major optimizations found in my ecRad-OTP: compile-time ng and cloudy layer batching. I'm making a pull request for compile-time ng first - TripleClouds might not get much additional benefit from cloudy layer batching since it already collapses the two cloudy regions in the shortwave, but we'll see.

Here the compile-time NG is implementing in TripleClouds-LW, TripleClouds-SW and ecCKD and the #ifdef NG_LW are added to ecrad_config.h, which helps avoid code bloat. To avoid compilation errors I needed to move the #include "ecrad_config.h" in all files to occur after the module statement.

Support for General Purpose Timing Library is also added in order to measure the performance impact. I can remove it if you wish but it adds minimal extra code: a few lines in the main Makefile, the driver (I only added it to the blocked IFS driver since this is what should be used for timing tests) and a few instrumentation calls in radiation_interface.

Compile-time ng has minimal impact on TripleClouds but speeds up ecCKD by a factor of 3 when using the Intel compiler on Atos:

make PROFILE=intel_atos SINGLE_PRECISION=1 GPTL_TIMING=1 
cd test/ifs; make test_ifsdriver_blocked_ecckd_tc 
cat timing.ECCKD_Tripleclouds_block8_nrep10_0038
                                   Called  Recurse     Wall      max      min %_of_radi
 radiation                              1     -       0.114    0.114    0.114   100.00
   radiation_interface:radiation       40     -       0.111 4.76e-03 2.39e-03    97.13
     gas_optics                        40     -       0.030 1.00e-03 7.44e-04    26.61
     cloud_optics                      40     -    4.22e-03 4.44e-04 8.00e-05     3.69
     aerosol_optics                    40     -       0.011 3.75e-04 2.71e-04     9.83
     solver_longwave                   40     -       0.032 1.77e-03 6.86e-04    27.59
     solver_shortwave                  40     -       0.033 1.10e-03 5.17e-04    29.08
make PROFILE=intel_atos SINGLE_PRECISION=1 GPTL_TIMING=1 NG_SW=32 NG_LW=32
cd test/ifs; make test_ifsdriver_blocked_ecckd_tc 
                                   Called  Recurse     Wall      max      min %_of_radi
 radiation                              1     -       0.087    0.087    0.087   100.00
   radiation_interface:radiation       40     -       0.083 2.56e-03 1.79e-03    96.29
     gas_optics                        40     -       0.011 4.27e-04 2.55e-04    12.16
     cloud_optics                      40     -    4.08e-03 3.17e-04 8.10e-05     4.71
     aerosol_optics                    40     -       0.011 3.45e-04 2.71e-04    12.79
     solver_longwave                   40     -       0.028 8.28e-04 6.35e-04    32.40
     solver_shortwave                  40     -       0.029 9.20e-04 4.69e-04    33.85

So a 20-25% speed-up in total.

@FussyDuck
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Peter Ukkonen seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants