Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run icon4py on gpu #579

Closed
wants to merge 406 commits into from
Closed

Run icon4py on gpu #579

wants to merge 406 commits into from

Conversation

OngChia
Copy link
Contributor

@OngChia OngChia commented Oct 30, 2024

Enable GPU run for icon4py.
Changes:

  1. Remove backend and xp imports from common.setting.py. The backend is instead added into one of the click options. xp is decided explicitly by calling a global function that returns np if cpu backends are used or cp if gpu backends are used in icon4py_configuration.py.
  2. Computation in vertical.py is changed to numpy and a backend argument is added in get_vct_a_and_vct_b to return the gt4py vct_a and vct_b fields with correct allocator.
  3. A new field_operator _interpolate_to_half_levels_wp is used in compute_virtual_potential_temperatures_and_pressure_gradient.py to avoid CUDA illegal memory access for an unknown reason.
  4. add backend when as_field is called to generate gt4py fields from serialized data. However, xp and backend are still read from common.setting.py, it is still not possible to run the driver with cpu backend if you set the environment variable ICON4PY_BACKEND to GPU. This remains to be done in a separate PR.

halungge and others added 30 commits November 15, 2023 10:02
* fix cffi_utils.py tests in tools

* Fix: Explicit roundtrip backend set on advection tests

* add backend fixture to test_diffusion_utils.py and test_face_val_ppm_stencil_02.py

* pre-commit fix

* Add backend fixture to advection tests

* fix stencil test:
 - switch to as_field from deprecates np_as_located_field
 - use asnumpy() instead of np.asarray()

* fix test_cffi_utils.py:
 - use asnumpy() to convert to numpy array

* pre-commit fix for model/common

* fix datatest for model/common

* fix datatest for model/driver

* pre-commit fixes in test_vertical.py

* Pre-commit fix in diffusion

* fixing datatest for diffusion: ignoring assertion failure in verfication which needs to be understood better

* fix datatest test_velocity_advection.py

* fix datatest test_solve_nonhydro.py
 preliminary ignore of failing predictor and single step test.

* Replace np.asarray with .asnumpy in test_divide_flux_area_list_stencil_02.py

---------

Co-authored-by: Magdalena Luz <[email protected]>
Co-authored-by: Nina Burgdorfer <[email protected]>
(Fix) Part 2 of adapting to new `gt4py` field representation.

- remove remaining `np.asarray` 
- fix bugs due to run dimension in serialbox input in `Diffusion` and `SolveNonHydro`

Co-authored-by: Nina Burgdorfer <[email protected]>
Co-authored-by: ninaburg <[email protected]>
(Fix) 
- pass `divdamp_fac_o2` as argument to `SolveNonHydro.predictor_step` as it changes dynamically in the timeloop
-  use `mean_cell_area` from `CellParams`
- cleanup tests.
* Constructing time loop

* Constructing time loop

* COnstructing time loop test

* Constructing timeloop

* Constructing time loop

* time loop under construction

* Created a test for timeloop

* Finalizing timeloop and its test

* Finalized timeloop before attempting to merge

* Resolving pre-commit failed runs

* Attempting to resolve shadowing of Python builtin in stencil 14

* Resolving A004 error code for abs import in dycore velocity_stencil 14, 18, and 20, and utils

* Removed adding A004 to the flake8 ignore list in dycore

* Fixed bugs due to recent update of grid infrastructure, and add second time step test for timeloop and non hydro multistep

* CLeaning up timeloop

* Passed qa after resolving conflicts

* Resolving issues brought up in review

* Removed a bug in r04b09_iconrun_config where appla_initial_stabilization was wrongly set to true after the review

* Made changes according to second review

* Made changes according to icon4py qa

* Added TODO of diagnostic variable preparation in time_integration necessary for JW test
Pass domain sizes to all programs
…time parameter (#323)

- use runtime parameter in corrector step
* Change gt4py branch to main in spack-PR-icon

* Update spack-PR-icon
* set of stencil reordering to enforce BFB reproducibility with ICON GPU

* fixes and applying precommit

* fixing one more

* double check

* fix

* fix styel

* fix style

* fix style

* applying precommit

---------

Co-authored-by: Abishek Gopal <[email protected]>
Co-authored-by: Daniel Hupp <[email protected]>
This PR introduces more fused stencils in the velocity advection.
The time limit is increased for the cscs-ci.
Unifor naming convection is introduced for k, cell, edge.
* fix computation order
* replace pwoer of two with multiplication
Several fixes for greenline from recent merges.

---------

Co-authored-by: Nicoletta Farabullini <[email protected]>
* move states classes of dycore into one file
* fix circular dependency from model/commen to model/atmosphere/dycore through serialbox_utils
Added GPU benchmarking to the CI pipeline.

---------

Co-authored-by: Rico Haeuselmann <[email protected]>
 * Add CLI option to run dace backend, if dace module is installed in the python environment.
 * Add CI job configuration to run tests on dace backend, after each PR is merged on main, but ignore test failures.
 * Use manual trigger to run benchmarks on dace backend.
Calculate NhConstants inside NonHydrostaticParams and removes NhConstants from argument to the timestep.
Add data tests for verification for a global model experiment (exclaim_ape_R02B04)
Add additional parameter in apply_diffusion_to_w_and_compute_horizontal_gradients_for_turbulence.py for type_sher != 2
Remove circular dependency from common upon diffusion
Remove and rename duplicate stencils
Allocates local `ZFields` locally in `solve_nonhydro` and removes them from the `time_step` interface. The dataclass is renamed to `IntermediateFields`.
Modified CI pipeline to run benchmarks on the icon grid using serialised data, and simplified pre-commit execution.
* run timestep that is present in reduced global set
fix typo

* update download URIs for single node data to new version.

* rename CI config file
Added default and benchmark CI pipelines.
Update boost download server
This PR removes code duplication, by calling already implemented stencils instead of using the same code.
halungge and others added 24 commits November 22, 2024 09:09
* add some cell version of math stencils
fix wrong return value in gvec2cvec
* add convenience functions for
- import of array_ns depending on backend
- transfer field to a given backend

* extract CUPY devices
extract _size function and use in all allocation funtions
Pass gt4py backend to grid manager and allocate gt4py fields on the backend.

---------

Co-authored-by: Nicoletta Farabullini <[email protected]>
Add mch v7 as upstream to speed up build times
bugfixes in interpolation coefficients
…s.py (#613)

- renameing of functions in grid_utils.py
- add poormans grid_manager cache in test_grid_manager.py to speed up tests (replacement for @functools.cache that does no longer work)
- add convenience as_numpy function
* Add vertical advection granule with PPM

---------

Co-authored-by: Nicoletta Farabullini <[email protected]>
…590)

* Initial refactoring to use Swapping for double buffering

* More refactorings and cleanups in the driver.

* Use keyword arg in DriverParams

* Extend docstrings

* Extend double buffer changes to solve_nonhydro

* Refactor Swapping to generic Pair class, and specialized it for different use cases.

* Change common.utils import alias

* Recover methods of Pair deleted by accident in previous commits

* Format

* Export `namedproperty` utility

* Update Pair to have both accessors read/writable by default

* Replace `ddt_vn_apc_ntl1` and `ddt_vn_apc_ntl2` by `ddt_vn_apc_pc`

* Replace `ddt_w_adv_ntl1` and `ddt_w_adv_ntl2` by `ddt_w_adv_pc`

* Fix Pair

* Format issues

* Fixes

* More replacements of prognostic_states lists

* More replacements

* Final missing replacement (in theory)

* Fixes

* More missing replacements in tests

* Update model/driver/src/icon4py/model/driver/icon4py_driver.py

Co-authored-by: Magdalena <[email protected]>

* More missing replacements and deletions.

* Fix

* More fixes

* Simplify Pair base class.

* Simplify docstrings

* Testing

* Fixes after debugging

* More fixes

* Refactorings and style

* Minor fix to Pair and namedproperty utility classes

* Rename named_property

* Missing changes from previous commits

* New refactoring of Pair adding direct item access.

* More fixes

* Fix remaining failing tests

* Renaming symbols

* Fix merging errors

* Rename NextStepPair

* Fix spellings

* Fix typos and expand documentation

* Readability improvements

* Address reviewer's comments.

* Enhance diagnostic states swap documentation.

* Fix style of comments

* Rename and enhance documentation related to the velocity tendencies in the diagnostic state

* Minor rename

* Add forgotten changes from Pair to TimeStepPair

* Fix bug in dycore wrapper

* Remove unneeded indices in dycore_wrapper

* Make next read-write in TimeStepPair

* fix update_time_levels_for_velocity_tendencies and modify its docstring

* fix duplicated swapping in test_run_solve_nonhydro_multi_step for the second time step

* remove print statement

* fix bug in test multi substeps

* review changes

* fix ddt_w_apc index in test_timeloop

* frozen_first for current of TimeStepPair

* review changes

* remove print swap

* remove comments

* Remove get/set item of pair class

---------

Co-authored-by: Magdalena <[email protected]>
Co-authored-by: Chia Rui Ong <[email protected]>
Fixes two bugs in `compute_ppm4gpu_integer_flux` and
`compute_ppm4gpu_fractional_flux`, and refactors them for consistency
Set up a field source for interpolation fields.

(Restrictions) Adds a `FieldProvider` implementation for direct call of
`gtx.field_operator` (where possible), this one currently duplicates
some code with from the implementation for the `gtx.program.` It also
allows for computing on different backend than the target backend.

---------

Co-authored-by: Nicoletta Farabullini <[email protected]>
(FIX): fixes an test in `test_grid_manager.py` which was previousely
asserting nothing relevant.
- moved the following elements from
`model/common/src/icon4py/model/common/settings.py`to
`tools/src/icon4pytools/py2fgen/wrappers/settings.py`:
 ```
config = Icon4PyConfig()
backend = config.gt4py_runner
dace_orchestration = config.icon4py_dace_orchestration
xp = config.array_ns
device = config.device
limited_area = config.limited_area
```
- subsequent edit of imports and related in the code
- cleanup of left-over stencils backend specification on program definition
…614)

This PR moves the development tooling to use modern and more capable
tools like `uv` and `nox`. It also adds a new `icon4py` virtual package
in the root folder (used to gather all the model components) and moves
out of the `icon4py.model.common` subpackage to the new
`icon4py.model.testing` subpackage the utilities only needed for testing
purposes.

`uv` tool is used to manage the dependencies of all subprojects thanks
to its support for the `workspace` concept which fits perfectly the
monorepo structure of this repository. All requirements files have been
replaced by dependency groups in the root `pyproject.toml` used by `uv`.

`nox` is used to manage the testing environments instead of `tox`.
Feature-wise is similar to `tox` but it has a more user-friendly
configuration using Python code instead of configuration files.

---------

Co-authored-by: Magdalena Luz <[email protected]>
Co-authored-by: Nicoletta Farabullini <[email protected]>
Co-authored-by: Edoardo Paone <[email protected]>
Co-authored-by: Hannes Vogt <[email protected]>
Fix for module usage in test file after #614 was merged.
Add further details about how to use `uv` properly in the development
workflow.

---------

Co-authored-by: Nicoletta Farabullini <[email protected]>
Fix pre-commit config to work on all model files, and not only on
`model/common` folder. This PR also includes all the format and linter
changes required to make CI pass.

Additionally, add the `esbonio` language server for RestructuredText to
the`docs` dependency group.

Bug originally reported by @havogt

---------

Co-authored-by: Nicoletta Farabullini <[email protected]>
Copy link

Mandatory Tests

Please make sure you run these tests via comment before you merge!

  • cscs-ci run default
  • launch jenkins spack

Optional Tests

To run benchmarks you can use:

  • cscs-ci run benchmark

To run tests and benchmarks with the DaCe backend you can use:

  • cscs-ci run dace

In case your change might affect downstream icon-exclaim, please consider running

  • launch jenkins icon

For more detailed information please look at CI in the EXCLAIM universe.

@OngChia OngChia closed this Jan 14, 2025
@OngChia OngChia force-pushed the run_icon4py_on_gpu branch from ef4df50 to 5f3e0f3 Compare January 14, 2025 23:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.