-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fb reg gpuport #40
base: develop
Are you sure you want to change the base?
Fb reg gpuport #40
Conversation
The performance of the GPU versions for the code has not been fully analysed as there is known areas in which we will see drastic improvement, e.g. more resident GPU code (Source term, Intra-spectral routines), data optimisations. This has also only been explicitly tested with managed memory, the explicit transfers compilation of the code is more complex and will contain more intrusive OpenACC directives. Regressions tests run: ./bin/run_cmake_test -o both -S -s PR2_UNO_MPI -w work_PR2_UNO_MPI -f -p mpiexec -n 16 ../model ww3_tp2.1 This has been run on both Isambard and XCE. We ran comparisons using matrix.comp against develop for the pre-existing regressions tests, and nccmp -d comparisons (original vs GPU) for the new regressions tests. All of this produced identical output and as such is considered to be a pass for the regression tests. The port we have included is not optimised for the performance of the regression tests, instead the aim has been to mimic the GPU acceleration that our manual port uses. This means several sections are being run sequentially that could be accelerated and there is some data optimisations that we have not included, however, this will take further research to be functional. We have chosen specific regressions tests as the regular propagation typically activates a very large group of tests that are not necessary for our current experiments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking fine thanks Lewis.
As with the SMC PR, I will leave this open for now until we decide how and where we will merge it.
* origin/develop: Enable doxygen documentation in the cmake build system (NOAA-EMC#1281) Simplify MPI ifdefs in subroutine W3MPIO (NOAA-EMC#1266) Add depth scaling value to SMC regression tests. (NOAA-EMC#1264) Updates to NCEP regtests for Orion Rocky9 OS(NOAA-EMC#1263) Fix code stability issue in ww3_outp (NOAA-EMC#1258) Fix GNU regtest CI failure (NOAA-EMC#1253)
Pull Request Summary
A section (regular grid spatial propagation) of the manual GPU porting efforts at the Met Office has been brought into the WW3 operational framework for testing and understanding the compatibility and flexibility of using OpenACC parallelism.
Description
The PR includes the successful merged port for the regular grid propagation routines, which are enabled via a new GPU switch. This activates the OpenACC directives that are required for performant GPU acceleration. For regular CPU compilations this has no affect, and unless compiled on a GPU architecture (such as Isambard) with the correct compiler flags, will perform as standard WW3.
We have used a separate build system on Isambard so that the GPU's can be targeted and tested as required. These are currently available under the
/projects/metoffice/WW3_Isambard
directory structure on Isambard.Commit Message
Integration of the regular grid propagation from manual GPU porting efforts at the Met Office into the WW3 operational framework for testing and understanding the compatibility and flexibility of using OpenACC parallelism.
Check list
Testing