55/refactor fix_latlon_coord #63

blimlim · 2024-08-02T02:43:55Z

This pull request closes #55. Apologies that there's a bit in here... It refactors the fix_latlon_coord function, which was used to add bounds, convert types, and add variable names for each cube's horizontal coordinates.

The main structural changes are to split the function into several more modular ones in order to help with readability and unit testing. Other changes to try and improve readability and testability include:

Adding docstring
Adding constants for repeated strings and preset bounds arrays
Modifying exception handling so that it occurs within the fix_latlon_coord function, and raises a more specific exception.
Update run script to handle alternate exception class?
Removes redundant cube coordinate lookups
Convert if check calcs to plain functions
Use descriptive names for variables used in the checks

To help with the unit tests, the pull request adds a DummyCoordinate class to mimic iris cube coordinate objects with imitations of the name and has_bounds methods needed to test the new functions. It also adds an extension to the DummyCube class, DummyCubeWithCoords which can be used to add DummyCoordinate to a DummyCube.

Any suggestions about the structure, logic, tests, and anything else would be great!

…5 and CM2

marc-white · 2024-08-05T00:14:40Z

test/test_um2netcdf.py

@@ -1,6 +1,8 @@
 import unittest.mock as mock
 from dataclasses import dataclass
 from collections import namedtuple
+import numpy as np
+from iris.exceptions import CoordinateNotFoundError

 import umpost.um2netcdf as um2nc


Out of interest, how am I meant to get this working for testing? I can't pip install -e the umpost directory in its current state, so I'm getting test collection errors.

Whoops, due to prior history, this project has started oddly. Spencer is using a pre-existing gadi environment & I'm reusing a local virtualenv from another related project. There's been no setup docs until last week, when I grabbed docs from a related project. Do you feel like yet another PR review for docs? #62
(these docs are missing pip -e)

I do like me some good docs...

umpost/um2netcdf.py

truth-quark

Wow, there is much more testing that I initially thought!

For some initial thoughts: fixing um2nc is tricky on account of uncertain requirements (gathered from the code), retro fitting tests & that the processing code contains substantial nested boolean logic. The latter causes a combinational testing blowout (more than the the fix naming funcs).

In both the earlier cube renaming & lat/long fixes, our testing approach is blowing out in an attempt to cover most/all code paths. Part of the problem in being this thorough, is writing tests against some internal implementation details. Some mocks test_fix_lat_coord_name() & the process/masking fixes are a symptom of coupling to internal details (although some of this is unavoidable as we work around some constraints).

Additionally, I assume writing the lat/long tests was tricky, which implies future maintenance/code changes could be harder (e..g changing processing code potentially breaks multiple tests). This could happen if cube modification requirements change.

Thus, I've been rethinking our testing strategy, helped with these TDD videos (~1 hour each):

Ian Cooper: “TDD, Where did it all go wrong?”
https://www.youtube.com/watch?v=EZ05e7EMOLM

Ian Cooper: “TDD Revisited” (more instructional)
https://www.youtube.com/watch?v=IN9lftH0cJc

One key message is focusing testing to external interfaces avoiding referencing internal implementation details. This suggests focusing tests around fix_latlon_coords() & skipping direct testing of sub functions. Calling the sub-functions can be treated as an implementation detail, with the tests asserting correct end results (thus, we treeat the check functions as correct/tested if the cube vars are correct). Also, as um2nc cube mod functions often don't return anything, we'll have to make assertions on modified cubes. In decoupling from details, we should move away from assertions on mocks (except for the DummyCubes). Some mocking is needed as an intermediate stop gap while the codebase is redesigned, but I think we can reduce it iteratively.

As a next step, I think we should try an experiment to see how this works, trying these steps:

Temporarily comment out the grid type & bounds tests as an implementation detail
- e.g. test_is_lat/lon_river_grid, test_add_latlon_coord_bounds_has_bounds()
Test a common use case with a test_fix_latlon_coords_type_change() type function
- Remove mock calls, so fix_latlon_coords() executes code for grids, bounds & coord naming fixes etc
- Are more compound data fixtures needed to make a valid input?
Try assertions on the modded cube (check bounds, data types etc)
Run coverage tests to explore how this higher level testing covers the code

If the above is straightforward:

configure a cube with different data to execute other branches
run tests & re-analyse the coverage

Otherwise, if the testing is fiddly/tricky with setup or otherwise:

adjust experiment, test against the mid level funcs add_latlon_coord_bounds() & fix_..._coord_name()
explore the coverage...

Analysing higher level testing & coverage is probably a good pair dev exercise. It's also likely our code repair efforts will require starting with coupled tests, then iteratively refactoring the processing code & tests. Thus while things are harder now, it's temporary while iterating towards better design.

Hopefully this makes sense!

umpost/um2netcdf.py

test/test_um2netcdf.py

truth-quark

Wow, that required substantially more effort than the initial skim indicated. Apologies!

This looks good to go & will help with the process of merging all the current PRs. @marc-white do you want to do a final check?

marc-white · 2024-09-12T06:30:37Z

test/test_conversion_driver_esm1p5.py

+    # TODO: This test will not catch changes to the exceptions in um2nc. E.g.
+    # if decided to instead raise a RuntimeError in um2nc when timeseries
+    # are encountered, the conversion of ESM1.5 outputs would undesirably
+    # crash, but this test would say everything is fine, since we're
+    # prescribing the error that is raised.


You could define the error type raised in this instance as a global variable in conversion_driver_esm1p5.py, and then import it for use in the test as the expected error type?

test/test_um2netcdf.py

marc-white · 2024-09-12T06:34:43Z

test/test_um2netcdf.py

+D_LAT_N96 = 1.25
+D_LON_N96 = 1.875


Could you comment what these numbers are for clarity?

Are these meant to be DELTA_LAT_N96 & DELTA_LON_N96 ?

I tried to use similar naming here as in other parts of um2nc, eg here and here

I'm wondering would it be best to stick to that naming, and add a comment clarifying that these are grid spacings in degrees?

Yes just go the comment.

Comments added in 21dc396

marc-white · 2024-09-13T00:31:00Z

umpost/um2netcdf.py

+def fix_lat_coord_name(lat_coordinate, grid_type, dlat):
+    """
+    Add a 'var_name' attribute to a latitude coordinate object
+    based on the grid it lies on.

-    lat = cube.coord('latitude')
+    NB - Grid spacing dlon only refers to variables on the main
+    horizontal grids, and not the river grid.

-    # Force to double for consistency with CMOR
-    lat.points = lat.points.astype(np.float64)
-    _add_coord_bounds(lat)
-    lon = cube.coord('longitude')
-    lon.points = lon.points.astype(np.float64)
-    _add_coord_bounds(lon)
-
-    lat = cube.coord('latitude')
-    if len(lat.points) == 180:
-        lat.var_name = 'lat_river'
-    elif (lat.points[0] == -90 and grid_type == 'EG') or \
-         (np.allclose(-90.+0.5*dlat, lat.points[0]) and grid_type == 'ND'):
-        lat.var_name = 'lat_v'
+    Parameters
+    ----------
+    lat_coordinate: coordinate object from iris cube (edits in place).
+    grid_type: (string) model horizontal grid type.
+    dlat: (float) meridional spacing between latitude grid points.
+    """
+
+    if lat_coordinate.name() != LATITUDE:
+        raise ValueError(
+                f"Wrong coordinate {lat_coordinate.name()} supplied. "
+                f"Expected {LATITUDE}."
+            )
+
+    if is_lat_river_grid(lat_coordinate.points):
+        lat_coordinate.var_name = VAR_NAME_LAT_RIVER
+    elif is_lat_v_grid(lat_coordinate.points, grid_type, dlat):
+        lat_coordinate.var_name = VAR_NAME_LAT_V
+    else:
+        lat_coordinate.var_name = VAR_NAME_LAT_STANDARD
+
+
+def fix_lon_coord_name(lon_coordinate, grid_type, dlon):
+    """
+    Add a 'var_name' attribute to a longitude coordinate object
+    based on the grid it lies on.
+
+    NB - Grid spacing dlon only refers to variables on the main
+    horizontal grids, and not the river grid.
+
+    Parameters
+    ----------
+    lon_coordinate: coordinate object from iris cube (edits in place).
+    grid_type: (string) model horizontal grid type.
+    dlon: (float) zonal spacing between longitude grid points.
+    """
+
+    if lon_coordinate.name() != LONGITUDE:
+        raise ValueError(
+                f"Wrong coordinate {lon_coordinate.name()} supplied. "
+                f"Expected {LATITUDE}."
+            )
+
+    if is_lon_river_grid(lon_coordinate.points):
+        lon_coordinate.var_name = VAR_NAME_LON_RIVER
+    elif is_lon_u_grid(lon_coordinate.points, grid_type, dlon):
+        lon_coordinate.var_name = VAR_NAME_LON_U
+    else:
+        lon_coordinate.var_name = VAR_NAME_LON_STANDARD
+


Given these two functions have an almost identical structure, is it worth considering refactoring them as follows:

A private helper function that does the actual work;

Turn the existing functions into wrappers that call the private functions with the expecyed coordinate name, and comparison options for that coordinate?

I've put together a trial of this - let me know if it's along the lines of what you were thinking! The first attempt was using a private function:

def _fix_horizontal_coord_name(coordinate, grid_type, grid_spacing, river_grid_check, river_grid_name, staggered_grid_check, staggered_name, base_name): if river_grid_check(coordinate.points): coordinate.var_name = river_grid_name elif staggered_grid_check(coordinate, grid_type, grid_spacing): coordinate.var_name = staggered_name else: coordinate.var_name = base_name def fix_lat_coord_name(lat_coordinate, grid_type, dlat): _fix_horizontal_coord_name(coordinate = lat_coordinate, grid_type=grid_type, grid_spacing=dlat, river_grid_check=is_lat_river, river_grid_name=VAR_NAME_LAT_RIVER, staggered_grid_check=is_lat_v_grid, staggered_name=VAR_NAME_LAT_V, base_name=VAR_NAME_LAT_STANDARD)

And another option was to put together a dictionary holding the specified checks:

HORIZONTAL_GRID_NAMING_DATA = { LATITUDE: { "river_grid_check": is_lat_river_grid, "river_grid_name": VAR_NAME_LAT_RIVER, "staggered_name": VAR_NAME_LAT_V, "base_name": VAR_NAME_LAT_STANDARD, "staggered_grid_check": is_lat_v_grid }, LONGITUDE: { "river_grid_check": is_lon_river_grid, "river_grid_name": VAR_NAME_LON_RIVER, "staggered_grid_check": is_lon_u_grid, "staggered_name": VAR_NAME_LON_U, "base_name": VAR_NAME_LAT_STANDARD, } } def fix_latlon_coord_name(coordinate, grid_type, grid_spacing): coord_name = coordinate.name if HORIZONTAL_GRID_NAMING_DATA["river_grid_check"](coordinate.points): coordinate.var_name = HORIZONTAL_GRID_NAMING_DATA["river_grid_name"] elif HORIZONTAL_GRID_NAMING_DATA["staggered_grid_check"](coordinate.points, grid_type, grid_spacing): coordinate.var_name = HORIZONTAL_GRID_NAMING_DATA["staggered_name"] else: coordinate.var_name = HORIZONTAL_GRID_NAMING_DATA["base_name"]

In terms of readability, I think I find the separate tests a bit easier to follow, though it is some amount of duplicate code. Happy to hear other opinions and ideas though!

Are these experiments on other branches?

While the code & tests work here (with a bit of duplication), refining the code can be deferred in favour of modularisation & test coverage. @blimlim What are your thoughts on adding an neatening task to issue #27, linking it to this discussion & returning to it later?

That sounds like a good idea, I've added a task to #27. The two experiments are up on the following branches:
https://github.com/ACCESS-NRI/um2nc-standalone/tree/55/refactor-fix_latlon_coord-private-name-func
https://github.com/ACCESS-NRI/um2nc-standalone/tree/55/refactor-fix_latlon_coord-dictionary-name-func

Deferred action sounds like a good idea.

My original thought was like your option 2, but even more general - each coordinate could have an iterable of (check, name) tuples that you go through until you hit a match. The last tuple in the iterable would then be (True, <default name>).

umpost/um2netcdf.py

marc-white · 2024-09-13T01:35:51Z

umpost/um2netcdf.py

+    if coordinate_name not in [LONGITUDE, LATITUDE]:
+        raise ValueError(
+                f"Wrong coordinate {coordinate_name} supplied. "
+                f"Expected one of {LONGITUDE}, {LATITUDE}."
+            )


Are we ever likely to get coordinate names other than LATITUDE and LONGITUDE? If so, it might be worth parameterizing out the 'set of coordinate names I expect' for use in this situation. You could also do it using GLOBAL_COORDS_BOUNDS.keys().

I added in this error to try to guard against add_latlon_coord_bounds being inadvertently used on the wrong coordinates, and so here we should only be expecting LATITUDE or Longitude.

Would adding a constant eg HORIZONTAL_COORD_NAMES = [LATITUDE, LONGITUDE], and then checking if coordinate_name not in HORIZONTAL_COORD_NAMES be a bit cleaner here?

Precisely, although you've possibly already got the 'constant' available as GLOBAL_COORDS_BOUNDS.keys().

Is GLOBAL_COORDS_BOUNDS likely to change, therefore adding keys that coord name should not be?

The advantage of the current line is that the test condition is obvious.

Also a minor formatting thing, lines 266-268 look over indented.

Have updated the indentation in 37049c7

marc-white

The comments I've made are only tweaks that you're free to think about and implement or not at your discretion - good job!

truth-quark · 2024-09-16T04:27:52Z

The comments I've made are only tweaks that you're free to think about and implement or not at your discretion - good job!

As a general observation, I think staggering the 2 reviews has been beneficial. The later 2nd review brings fresh eyes after a few refinements. I'm pretty sure I inadvertently started to skim the code as it got more familiar.

blimlim · 2024-09-17T07:18:43Z

Thanks @truth-quark and @marc-white for the very helpful reviews! And apologies it took so much work on your end!

blimlim added 10 commits July 29, 2024 12:29

Add grid checking functions

f3951ff

Add changes from develop

9189a6a

continue refactor and add tests

088378c

split fix_latlon_names to separate functions and add tests

4545c5b

Add fixtures for lon/lat array examples and refactor tests

45dfa8e

Clean up fix_latlon exception handling and add tests

468e434

Fix broken tests

cbc143b

Update coordinate array fixtures to match real coordinated from ESM1.…

b2fb618

…5 and CM2

Try undo accidental autoformat...

995345c

Continue attempting to undo accidental auto format

cf70d71

marc-white reviewed Aug 5, 2024

View reviewed changes

truth-quark reviewed Aug 7, 2024

View reviewed changes

umpost/um2netcdf.py Outdated Show resolved Hide resolved

truth-quark reviewed Aug 7, 2024

View reviewed changes

umpost/um2netcdf.py Outdated Show resolved Hide resolved

truth-quark reviewed Aug 7, 2024

View reviewed changes

umpost/um2netcdf.py Outdated Show resolved Hide resolved

truth-quark reviewed Aug 7, 2024

View reviewed changes

umpost/um2netcdf.py Outdated Show resolved Hide resolved

truth-quark reviewed Aug 7, 2024

View reviewed changes

umpost/um2netcdf.py Outdated Show resolved Hide resolved

truth-quark reviewed Aug 7, 2024

View reviewed changes

umpost/um2netcdf.py Outdated Show resolved Hide resolved

truth-quark reviewed Aug 7, 2024

View reviewed changes

umpost/um2netcdf.py Outdated Show resolved Hide resolved

truth-quark reviewed Aug 7, 2024

View reviewed changes

umpost/um2netcdf.py Show resolved Hide resolved

truth-quark reviewed Aug 7, 2024

View reviewed changes

umpost/um2netcdf.py Outdated Show resolved Hide resolved

truth-quark reviewed Aug 7, 2024

View reviewed changes

umpost/um2netcdf.py Outdated Show resolved Hide resolved

truth-quark reviewed Aug 7, 2024

View reviewed changes

umpost/um2netcdf.py Outdated Show resolved Hide resolved

truth-quark requested changes Aug 7, 2024

View reviewed changes

truth-quark mentioned this pull request Aug 12, 2024

Refactor process() testing & cleaning cube masking/filtering logic #58

Merged

Review fixes inc magic numbers, readability, styling

2e77e15

truth-quark reviewed Aug 13, 2024

View reviewed changes

umpost/um2netcdf.py Outdated Show resolved Hide resolved

truth-quark reviewed Aug 13, 2024

View reviewed changes

umpost/um2netcdf.py Outdated Show resolved Hide resolved

truth-quark reviewed Aug 13, 2024

View reviewed changes

umpost/um2netcdf.py Outdated Show resolved Hide resolved

truth-quark mentioned this pull request Aug 13, 2024

Refactor coord mocks when #63/lat long fixes is merged #69

Closed

merge in THE process refactor

a43b15f

truth-quark reviewed Sep 6, 2024

View reviewed changes

umpost/um2netcdf.py Outdated Show resolved Hide resolved

truth-quark reviewed Sep 9, 2024

View reviewed changes

test/test_um2netcdf.py Show resolved Hide resolved

truth-quark reviewed Sep 9, 2024

View reviewed changes

test/test_um2netcdf.py Outdated Show resolved Hide resolved

truth-quark reviewed Sep 9, 2024

View reviewed changes

test/test_um2netcdf.py Show resolved Hide resolved

truth-quark reviewed Sep 9, 2024

View reviewed changes

test/test_um2netcdf.py Outdated Show resolved Hide resolved

truth-quark reviewed Sep 9, 2024

View reviewed changes

test/test_um2netcdf.py Outdated Show resolved Hide resolved

truth-quark reviewed Sep 9, 2024

View reviewed changes

test/test_um2netcdf.py Outdated Show resolved Hide resolved

truth-quark reviewed Sep 9, 2024

View reviewed changes

test/test_um2netcdf.py Outdated Show resolved Hide resolved

Test style simplifications

6a4c876

blimlim requested a review from truth-quark September 10, 2024 04:47

truth-quark approved these changes Sep 10, 2024

View reviewed changes

truth-quark requested a review from marc-white September 12, 2024 06:10

marc-white reviewed Sep 12, 2024

View reviewed changes

test/test_um2netcdf.py Outdated Show resolved Hide resolved

marc-white reviewed Sep 12, 2024

View reviewed changes

marc-white reviewed Sep 13, 2024

View reviewed changes

umpost/um2netcdf.py Outdated Show resolved Hide resolved

marc-white reviewed Sep 13, 2024

View reviewed changes

umpost/um2netcdf.py Outdated Show resolved Hide resolved

marc-white reviewed Sep 13, 2024

View reviewed changes

marc-white approved these changes Sep 13, 2024

View reviewed changes

marc-white mentioned this pull request Sep 13, 2024

1st draft README with install instructions. #62

Closed

blimlim added 2 commits September 17, 2024 13:03

Clarity suggestions from review

21dc396

Fix indentation

37049c7

blimlim mentioned this pull request Sep 17, 2024

Future neatening/readability #27

Open

18 tasks

blimlim merged commit 9acda2f into develop Sep 17, 2024
4 checks passed

blimlim mentioned this pull request Sep 23, 2024

Merge updates from develop into main #105

Merged

truth-quark mentioned this pull request Sep 24, 2024

92/Refactor pressure level modifications #94

Merged

blimlim deleted the 55/refactor-fix_latlon_coord branch September 26, 2024 23:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

55/refactor fix_latlon_coord #63

55/refactor fix_latlon_coord #63

blimlim commented Aug 2, 2024

marc-white Aug 5, 2024 •

edited

Loading

truth-quark Aug 5, 2024 •

edited

Loading

marc-white Aug 5, 2024

truth-quark left a comment •

edited

Loading

truth-quark left a comment

marc-white Sep 12, 2024

marc-white Sep 12, 2024

truth-quark Sep 12, 2024

blimlim Sep 13, 2024

marc-white Sep 13, 2024

blimlim Sep 17, 2024

marc-white Sep 13, 2024

blimlim Sep 17, 2024

truth-quark Sep 17, 2024

blimlim Sep 17, 2024

marc-white Sep 17, 2024

marc-white Sep 13, 2024

blimlim Sep 17, 2024

marc-white Sep 17, 2024

truth-quark Sep 17, 2024

truth-quark Sep 17, 2024

blimlim Sep 17, 2024

marc-white left a comment

truth-quark commented Sep 16, 2024

blimlim commented Sep 17, 2024

		D_LAT_N96 = 1.25
		D_LON_N96 = 1.875

55/refactor fix_latlon_coord #63

55/refactor fix_latlon_coord #63

Conversation

blimlim commented Aug 2, 2024

marc-white Aug 5, 2024 • edited Loading

Choose a reason for hiding this comment

truth-quark Aug 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

truth-quark left a comment • edited Loading

Choose a reason for hiding this comment

truth-quark left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marc-white left a comment

Choose a reason for hiding this comment

truth-quark commented Sep 16, 2024

blimlim commented Sep 17, 2024

marc-white Aug 5, 2024 •

edited

Loading

truth-quark Aug 5, 2024 •

edited

Loading

truth-quark left a comment •

edited

Loading