Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add spatial hashing #1169

Open
wants to merge 40 commits into
base: main
Choose a base branch
from
Open

Conversation

fluidnumerics-joe
Copy link
Collaborator

@fluidnumerics-joe fluidnumerics-joe commented Feb 24, 2025

Closes #1126

Overview

This PR adds the SpatialHash class in the neighbors.py module. This class is used to encapsulate the necessary information and methods for conducting unstructured grid searches using spatial hashes. As part of the initialization of the SpatialHash object, an associated unstructured grid is used to construct a uniformly spaced structured grid, called the "hash grid", and relate cell indices in the hash grid to elements in the unstructured grid. This hash table is a critical element of the spatial hash grid search as it is used to create a short-list of elements to search within.

The SpatialHash.query method returns the element id that a point (or element ids that a list of points) reside within alonside the barycentric coordinates. I've included helper methods to compute the barycentric coordinates of a point in relation to a given unstructured grid face (it works for convex faces n>=3 vertices). The barycentric coordinates are used to determine if a point is inside or outside a face; if all the barycentric coordinates are between [0,1], then a point is inside a face.

Currently, only spherical coordinates (in radians or degrees) are supported for this search.

Expected Usage

import uxarray as ux
grid_path = "../../test/meshfiles/fesom/soufflet-netcdf/grid.nc"
uxgrid = ux.open_grid(grid_path)

uxds = ux.open_dataset(grid_path, data_path)

# Construct the `SpatialHash` object
spatial_hash = uxgrid.get_spatial_hash(
    reconstruct="False",
)

# Query the `SpatialHash` for the face id and the barycentric coordinates.
face_ids, bcoords = spatial_hash.query([0.9, 1.8])

PR Checklist

General

  • An issue is linked created and linked
  • Add appropriate labels
  • Filled out Overview and Expected Usage (if applicable) sections

Testing

  • Adequate tests are created if there is new functionality
  • Tests cover all possible logical paths in your function
  • Tests are not too basic (such as simply calling a function and nothing else)

Documentation

  • Docstrings have been added to all new functions
  • Docstrings have updated with any function changes
  • Internal functions have a preceding underscore (_) and have been added to docs/internal_api/index.rst
  • User functions have been added to docs/user_api/index.rst

Examples

  • Any new notebook examples added to docs/examples/ folder
  • Clear the output of all cells before committing
  • New notebook files added to docs/examples.rst toctree
  • New notebook files added to new entry in docs/gallery.yml with appropriate thumbnail photo in docs/_static/thumbnails/

fluidnumerics-joe and others added 4 commits February 18, 2025 16:38
I attempted to stay close to the KD-Tree and BallTree API. However, the
purpose of the SpatialHash class is to help locate the faces that a
list of coordinates lie within. Because of this, the intent of the
`query` function for the SpatialHash is a bit different than that for
the KD and Ball trees.

Additional support routines are provided for assessing whether a point
is inside or outside a polygon. This is done by calculating the
barycentric coordinates of a point in a convex polygon. The method is
defined so that the sum of the barycentric coordinates is exactly one.
Because of this, if any of the barycentric coordinates are negative, we
immediately know that a point is outside the polygon.

All coordinate calculations are done using (lon,lat) in radians for the
SpatialHash class. Cartesian coordinates are not supported in this
commit.
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@philipc2 philipc2 added the new feature New feature or request label Feb 24, 2025
@philipc2
Copy link
Member

Hi @fluidnumerics-joe

This looks great so far! May you add a test_spatial_hasing.py module with a few tests and run pre-commit? Let me know if you have any questions.

Copy link
Member

@philipc2 philipc2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the spatial-hashing notebook here? (both in the outline and the toc tree at the bottom)

I'm thinking between "Subsetting" and "Cross-Sections?" would be a good spot.

https://github.com/UXARRAY/uxarray/blob/main/docs/userguide.rst?plain=1

@fluidnumerics-joe
Copy link
Collaborator Author

Hi @fluidnumerics-joe

This looks great so far! May you add a test_spatial_hasing.py module with a few tests and run pre-commit? Let me know if you have any questions.

Yes, I'll take care of both of these.

fluidnumerics-joe and others added 8 commits February 26, 2025 17:02
* Ensure that the lon bounds are sorted from low to high. This is needed
to make sure that the index looping for filling the hashtable actually
executes.

* All barycentric coordinates are calculated directly. Rather than
calculating the last coordinate to ensure they sum to one, we now
compute all coordinates. The coordinates are calculated now to align
with the indexing of the node ID's so that np.sum(bcoords*nodes) returns
the barycentric interpolation estimate of point. If the point is within
the element, then the barycentric interpolation estimate matches the
input coordinate (to machine precision). I've added this as a check.
@fluidnumerics-joe
Copy link
Collaborator Author

@philipc2 - I've added tests for the spatial hashing, fixed formatting, and added spatial hashing to the toc in the userguide. Let me know if there's anything else you'd like to see updated

Copy link
Member

@philipc2 philipc2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! A few small comments attached.

I'm going to play around with this today.

Thanks for putting this together so quickly!

Copy link
Member

@philipc2 philipc2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some leftover print statements.

@philipc2
Copy link
Member

philipc2 commented Mar 3, 2025

I'm having some issues with the soufflet-netcdf/grid.nc grid (seems to be crashing the docs too).

It stalls for multiple minutes when trying to construct the tree. This was one of the grids that gave me some trouble in #1013

Co-authored-by: Philip Chmielowiec <[email protected]>
@fluidnumerics-joe
Copy link
Collaborator Author

I'm having some issues with the soufflet-netcdf/grid.nc grid (seems to be crashing the docs too).

It stalls for multiple minutes when trying to construct the tree. This was one of the grids that gave me some trouble in #1013

I suspect this is related to the periodicity on a longitudinal extent of ~4.5 degrees ; it's an odd channel model that we've discussed with the Parcels and FESOM teams that I don't think we plan on supporting initially. Let me see if I can just get a rectangular ocean basin with closed boundaries.

Co-authored-by: Philip Chmielowiec <[email protected]>
Added dual and primal grids for mpas. The generalized waschpress points
give us the correct generalized barycentric coordinates that satisfy the
Lagrange property for barycentric interpolation in convex polygons.
Copy link
Member

@philipc2 philipc2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like one of the Github suggestions broke the pre-commit. Can you run it again?

Also, please restart and clear the outputs of the notebook, they will be rendered via our CI. Other than that all looks good.

@fluidnumerics-joe
Copy link
Collaborator Author

Looks like one of the Github suggestions broke the pre-commit. Can you run it again?

Also, please restart and clear the outputs of the notebook, they will be rendered via our CI. Other than that all looks good.

I re-ran pre-commit run --all-files and cleared python notebook output.

@philipc2
Copy link
Member

philipc2 commented Mar 4, 2025

Looks like one of the Github suggestions broke the pre-commit. Can you run it again?
Also, please restart and clear the outputs of the notebook, they will be rendered via our CI. Other than that all looks good.

I re-ran pre-commit run --all-files and cleared python notebook output.

Oops, look's like its complaining about a merge conflict.

Instead, we're favoring a check of the predicted coordinates using
barycentric interpolation
a2 = _triangle_area(point, vi, vi1)
sum_wi += a0 / (a1 * a2)
w.append(a0 / (a1 * a2))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some queries, I'm received a divide by zero error. Can we add a small value to the sum before dividing?

We can import the ERROR_TOLLERANCE from uxarray.constants

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you share a specific example that is causing this problem so that we can get it added to the tests ? I'd like to make sure whatever patch we put it in here is covered in the test cases .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error arose when I decorated the function with Numba. In it's current state, it raises warnings when using the 30km MPAS grid I tested above, but I haven't run into it in other instances.

We could extend the existing tests and make sure they don't raise any warnings in the parts this function is called.


return index_to_face

def query(
Copy link
Member

@philipc2 philipc2 Mar 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a 30km MPAS grid, it was taking about 6-7 seconds for each query. Here's a slightly revised version that I was able to get down to about 0.5s per query.

I'm happy to help explore further optimizations, since I assume you'll want to perform a large number of queries and even after the optimizations, 0.5s per query might be sub-optimal

coords = _prepare_xy_for_query(coords, in_radians, distance_metric=None)
num_coords = coords.shape[0]
max_nodes = self._source_grid.n_max_face_nodes

# Preallocate results
bcoords = np.zeros((num_coords, max_nodes), dtype=np.double)
faces = np.full(num_coords, -1, dtype=INT_DTYPE)

# Get grid variables 
n_nodes_per_face = self._source_grid.n_nodes_per_face.to_numpy() 
face_node_connectivity = self._source_grid.face_node_connectivity.to_numpy()

# Precompute radian values for node coordinates:
node_lon = np.deg2rad(self._source_grid.node_lon.to_numpy())
node_lat = np.deg2rad(self._source_grid.node_lat.to_numpy())

# Get the list of candidate faces for each coordinate
candidate_faces = [self._face_hash_table[pid] for pid in self._hash_index(coords)]

for i, (coord, candidates) in enumerate(zip(coords, candidate_faces)):
    for face_id in candidates:
        n_nodes = n_nodes_per_face[face_id]
        node_ids = face_node_connectivity[face_id, :n_nodes]
        nodes = np.column_stack((node_lon[node_ids], node_lat[node_ids]))
        bcoord = np.asarray(_barycentric_coordinates(nodes, coord))
        err = abs(np.dot(bcoord, nodes[:, 0]) - coord[0]) + abs(np.dot(bcoord, nodes[:, 1]) - coord[1])
        if (bcoord >= 0).all() and err < tol:
            faces[i] = face_id
            bcoords[i, :n_nodes] = bcoord[:n_nodes]
            break  

return faces, bcoords

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to implement this change you've shared here, but I think we should address additional performance optimizations in a future PR. Let's not keep moving the goal post here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practice, for a lagrangian code like Parcels, the spatial_hash will be used to initially populate the face ids for particle locations. Since we're mostly using explicit time integration methods, lookups for face ids in future time steps will be done using connectivity tables in uxarray (since a particle likely won't have a CFL > 1 ). When particles are not found by means of looking through the current associated face id, or the neighbors, the query will be run as a last ditch effort before determining that a particle is lost.

In our view, functionality is most critical at this point. Once we have concrete examples with benchmark data and user feedback, it's possible performance may need to be addressed. But right now, it's a "down-the-road" concern.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to implement this change you've shared here, but I think we should address additional performance optimizations in a future PR. Let's not keep moving the goal post here.

This works with me. I'm happy to continue iterating on performance once this is merged. If you are happy with my suggestion and would like to implement it, go ahead!

In our view, functionality is most critical at this point. Once we have concrete examples with benchmark data and user feedback, it's possible performance may need to be addressed. But right now, it's a "down-the-road" concern.

Sounds good! Other than the performance suggestions, I am very pleased with the implementation. Great work!

Copy link
Member

@philipc2 philipc2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this contribution! Other than the comments I left earlier, it looks good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Suggested Feature : Spatial Hashing and Point Cloud Interpolation
3 participants