Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fb reg gpuport #40

Open
wants to merge 15 commits into
base: develop
Choose a base branch
from
Open

Fb reg gpuport #40

wants to merge 15 commits into from

Conversation

UKMO-lsampson
Copy link

Pull Request Summary

A section (regular grid spatial propagation) of the manual GPU porting efforts at the Met Office has been brought into the WW3 operational framework for testing and understanding the compatibility and flexibility of using OpenACC parallelism.

Description

The PR includes the successful merged port for the regular grid propagation routines, which are enabled via a new GPU switch. This activates the OpenACC directives that are required for performant GPU acceleration. For regular CPU compilations this has no affect, and unless compiled on a GPU architecture (such as Isambard) with the correct compiler flags, will perform as standard WW3.

We have used a separate build system on Isambard so that the GPU's can be targeted and tested as required. These are currently available under the /projects/metoffice/WW3_Isambard directory structure on Isambard.

Commit Message

Integration of the regular grid propagation from manual GPU porting efforts at the Met Office into the WW3 operational framework for testing and understanding the compatibility and flexibility of using OpenACC parallelism.

Check list

  • Branch is up to date with the authoritative repository (ukmo-waves) develop branch.
  • Relative regression tests have been run. Including additional GPU regression tests run on Isambard and XCE.

Testing

  • How were these changes tested? ...
  • Are the changes covered by regression tests? CPU changes are covered, added GPU switches to current regression tests.
  • Have the matrix regression tests been run (if yes, please note HPC and compiler)? N
  • Please indicate the expected changes in the regression test output, (Note the list of known non-identical tests.) No expected changes.
  • Please provide the summary output of matrix.comp (matrix.Diff.txt, matrixCompFull.txt and matrixCompSummary.txt): [TBC] matrix.comp provides identical output for the original regressions tests, and the new regressions test produce netcdf output which is identical to the tests with the GPU switch, according to nccmp. (On XCE)

@UKMO-lsampson
Copy link
Author

The performance of the GPU versions for the code has not been fully analysed as there is known areas in which we will see drastic improvement, e.g. more resident GPU code (Source term, Intra-spectral routines), data optimisations.

This has also only been explicitly tested with managed memory, the explicit transfers compilation of the code is more complex and will contain more intrusive OpenACC directives.

Regressions tests run:

./bin/run_cmake_test -o both -S -s PR2_UNO_MPI -w work_PR2_UNO_MPI -f -p mpiexec -n 16 ../model ww3_tp2.1
./bin/run_cmake_test -o both -S -s PR2_UNO_MPI -w work_PR2_UNO_MPI -f -p mpiexec -n 16 ../model ww3_tp2.2
./bin/run_cmake_test -o both -S -s PR2_UNO_MPI -w work_PR2_UNO_MPI -f -p mpiexec -n 16 ../model ww3_tp2.3
./bin/run_cmake_test -o both -S -s PR2_UNO_MPI -w work_PR2_UNO_MPI -f -p mpiexec -n 16 ../model ww3_tp2.4
./bin/run_cmake_test -o both -S -s PR2_UNO_MPI_GPU -w work_PR2_UNO_MPI_GPU -f -p mpiexec -n 16 ../model ww3_tp2.1
./bin/run_cmake_test -o both -S -s PR2_UNO_MPI_GPU -w work_PR2_UNO_MPI_GPU -f -p mpiexec -n 16 ../model ww3_tp2.2
./bin/run_cmake_test -o both -S -s PR2_UNO_MPI_GPU -w work_PR2_UNO_MPI_GPU -f -p mpiexec -n 16 ../model ww3_tp2.3
./bin/run_cmake_test -o both -S -s PR2_UNO_MPI_GPU -w work_PR2_UNO_MPI_GPU -f -p mpiexec -n 16 ../model ww3_tp2.4

This has been run on both Isambard and XCE. We ran comparisons using matrix.comp against develop for the pre-existing regressions tests, and nccmp -d comparisons (original vs GPU) for the new regressions tests. All of this produced identical output and as such is considered to be a pass for the regression tests.

The port we have included is not optimised for the performance of the regression tests, instead the aim has been to mimic the GPU acceleration that our manual port uses. This means several sections are being run sequentially that could be accelerated and there is some data optimisations that we have not included, however, this will take further research to be functional. We have chosen specific regressions tests as the regular propagation typically activates a very large group of tests that are not necessary for our current experiments.

@UKMO-lsampson UKMO-lsampson added the enhancement New feature or request label Apr 14, 2023
Copy link
Member

@ukmo-ccbunney ukmo-ccbunney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking fine thanks Lewis.
As with the SMC PR, I will leave this open for now until we decide how and where we will merge it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Todo
Development

Successfully merging this pull request may close these issues.

2 participants