-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add spatial hashing #1169
base: main
Are you sure you want to change the base?
Add spatial hashing #1169
Conversation
I attempted to stay close to the KD-Tree and BallTree API. However, the purpose of the SpatialHash class is to help locate the faces that a list of coordinates lie within. Because of this, the intent of the `query` function for the SpatialHash is a bit different than that for the KD and Ball trees. Additional support routines are provided for assessing whether a point is inside or outside a polygon. This is done by calculating the barycentric coordinates of a point in a convex polygon. The method is defined so that the sum of the barycentric coordinates is exactly one. Because of this, if any of the barycentric coordinates are negative, we immediately know that a point is outside the polygon. All coordinate calculations are done using (lon,lat) in radians for the SpatialHash class. Cartesian coordinates are not supported in this commit.
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
This looks great so far! May you add a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add the spatial-hashing
notebook here? (both in the outline and the toc tree at the bottom)
I'm thinking between "Subsetting" and "Cross-Sections?" would be a good spot.
https://github.com/UXARRAY/uxarray/blob/main/docs/userguide.rst?plain=1
Yes, I'll take care of both of these. |
* Ensure that the lon bounds are sorted from low to high. This is needed to make sure that the index looping for filling the hashtable actually executes. * All barycentric coordinates are calculated directly. Rather than calculating the last coordinate to ensure they sum to one, we now compute all coordinates. The coordinates are calculated now to align with the indexing of the node ID's so that np.sum(bcoords*nodes) returns the barycentric interpolation estimate of point. If the point is within the element, then the barycentric interpolation estimate matches the input coordinate (to machine precision). I've added this as a check.
@philipc2 - I've added tests for the spatial hashing, fixed formatting, and added spatial hashing to the toc in the userguide. Let me know if there's anything else you'd like to see updated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! A few small comments attached.
I'm going to play around with this today.
Thanks for putting this together so quickly!
Co-authored-by: Philip Chmielowiec <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some leftover print statements.
I'm having some issues with the It stalls for multiple minutes when trying to construct the tree. This was one of the grids that gave me some trouble in #1013 |
Co-authored-by: Philip Chmielowiec <[email protected]>
I suspect this is related to the periodicity on a longitudinal extent of ~4.5 degrees ; it's an odd channel model that we've discussed with the Parcels and FESOM teams that I don't think we plan on supporting initially. Let me see if I can just get a rectangular ocean basin with closed boundaries. |
Co-authored-by: Philip Chmielowiec <[email protected]>
Added dual and primal grids for mpas. The generalized waschpress points give us the correct generalized barycentric coordinates that satisfy the Lagrange property for barycentric interpolation in convex polygons.
Co-authored-by: Philip Chmielowiec <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like one of the Github suggestions broke the pre-commit. Can you run it again?
Also, please restart and clear the outputs of the notebook, they will be rendered via our CI. Other than that all looks good.
I re-ran |
Oops, look's like its complaining about a merge conflict. |
Instead, we're favoring a check of the predicted coordinates using barycentric interpolation
a2 = _triangle_area(point, vi, vi1) | ||
sum_wi += a0 / (a1 * a2) | ||
w.append(a0 / (a1 * a2)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some queries, I'm received a divide by zero error. Can we add a small value to the sum before dividing?
We can import the ERROR_TOLLERANCE
from uxarray.constants
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you share a specific example that is causing this problem so that we can get it added to the tests ? I'd like to make sure whatever patch we put it in here is covered in the test cases .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error arose when I decorated the function with Numba. In it's current state, it raises warnings when using the 30km MPAS grid I tested above, but I haven't run into it in other instances.
We could extend the existing tests and make sure they don't raise any warnings in the parts this function is called.
|
||
return index_to_face | ||
|
||
def query( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a 30km MPAS grid, it was taking about 6-7 seconds for each query. Here's a slightly revised version that I was able to get down to about 0.5s per query.
I'm happy to help explore further optimizations, since I assume you'll want to perform a large number of queries and even after the optimizations, 0.5s per query might be sub-optimal
coords = _prepare_xy_for_query(coords, in_radians, distance_metric=None)
num_coords = coords.shape[0]
max_nodes = self._source_grid.n_max_face_nodes
# Preallocate results
bcoords = np.zeros((num_coords, max_nodes), dtype=np.double)
faces = np.full(num_coords, -1, dtype=INT_DTYPE)
# Get grid variables
n_nodes_per_face = self._source_grid.n_nodes_per_face.to_numpy()
face_node_connectivity = self._source_grid.face_node_connectivity.to_numpy()
# Precompute radian values for node coordinates:
node_lon = np.deg2rad(self._source_grid.node_lon.to_numpy())
node_lat = np.deg2rad(self._source_grid.node_lat.to_numpy())
# Get the list of candidate faces for each coordinate
candidate_faces = [self._face_hash_table[pid] for pid in self._hash_index(coords)]
for i, (coord, candidates) in enumerate(zip(coords, candidate_faces)):
for face_id in candidates:
n_nodes = n_nodes_per_face[face_id]
node_ids = face_node_connectivity[face_id, :n_nodes]
nodes = np.column_stack((node_lon[node_ids], node_lat[node_ids]))
bcoord = np.asarray(_barycentric_coordinates(nodes, coord))
err = abs(np.dot(bcoord, nodes[:, 0]) - coord[0]) + abs(np.dot(bcoord, nodes[:, 1]) - coord[1])
if (bcoord >= 0).all() and err < tol:
faces[i] = face_id
bcoords[i, :n_nodes] = bcoord[:n_nodes]
break
return faces, bcoords
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to implement this change you've shared here, but I think we should address additional performance optimizations in a future PR. Let's not keep moving the goal post here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In practice, for a lagrangian code like Parcels, the spatial_hash will be used to initially populate the face ids for particle locations. Since we're mostly using explicit time integration methods, lookups for face ids in future time steps will be done using connectivity tables in uxarray (since a particle likely won't have a CFL > 1 ). When particles are not found by means of looking through the current associated face id, or the neighbors, the query will be run as a last ditch effort before determining that a particle is lost.
In our view, functionality is most critical at this point. Once we have concrete examples with benchmark data and user feedback, it's possible performance may need to be addressed. But right now, it's a "down-the-road" concern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to implement this change you've shared here, but I think we should address additional performance optimizations in a future PR. Let's not keep moving the goal post here.
This works with me. I'm happy to continue iterating on performance once this is merged. If you are happy with my suggestion and would like to implement it, go ahead!
In our view, functionality is most critical at this point. Once we have concrete examples with benchmark data and user feedback, it's possible performance may need to be addressed. But right now, it's a "down-the-road" concern.
Sounds good! Other than the performance suggestions, I am very pleased with the implementation. Great work!
Co-authored-by: Philip Chmielowiec <[email protected]>
Co-authored-by: Philip Chmielowiec <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this contribution! Other than the comments I left earlier, it looks good to me.
Closes #1126
Overview
This PR adds the
SpatialHash
class in theneighbors.py
module. This class is used to encapsulate the necessary information and methods for conducting unstructured grid searches using spatial hashes. As part of the initialization of theSpatialHash
object, an associated unstructured grid is used to construct a uniformly spaced structured grid, called the "hash grid", and relate cell indices in the hash grid to elements in the unstructured grid. This hash table is a critical element of the spatial hash grid search as it is used to create a short-list of elements to search within.The
SpatialHash.query
method returns the element id that a point (or element ids that a list of points) reside within alonside the barycentric coordinates. I've included helper methods to compute the barycentric coordinates of a point in relation to a given unstructured grid face (it works for convex faces n>=3 vertices). The barycentric coordinates are used to determine if a point is inside or outside a face; if all the barycentric coordinates are between [0,1], then a point is inside a face.Currently, only spherical coordinates (in radians or degrees) are supported for this search.
Expected Usage
PR Checklist
General
Testing
Documentation
_
) and have been added todocs/internal_api/index.rst
docs/user_api/index.rst
Examples
docs/examples/
folderdocs/examples.rst
toctreedocs/gallery.yml
with appropriate thumbnail photo indocs/_static/thumbnails/