Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/issue 258 - Granules with large numbers of features cannot be loadedd #259

Merged
merged 7 commits into from
Oct 31, 2024

Conversation

torimcd
Copy link
Collaborator

@torimcd torimcd commented Oct 30, 2024

Github Issue: Closes #258

Description

Granules with 100,000+ features cannot be loaded due to lambda timeout when iterating through features.

Overview of work done

  • Updated shapefile unpacking module to use dataframe assign operation instead of for loop
  • Added logging statements to more easily isolate where issues are in loading operations
  • removed logging statement for each item in database batch writer too make logs easier to parse
  • @nikki-t changed api test validation data to match sigfigs returned from dataframe operations so tests pass

Overview of verification done

  • unit tests pass
  • checked that 'items' object returned from assemble_attributes module is same structure as prior method.

Overview of integration done

  • deployed to SIT and successfully loaded granule SWOT_L2_HR_LakeSP_Prior_020_123_AR_20240825T010843_20240825T011234_PIC0_01.zip, which contains 140,000+ features and was the same granule reported failing.

PR checklist:

  • Linted
  • Updated unit tests
  • Updated changelog
  • Integration testing

See Pull Request Review Checklist for pointers on reviewing this pull request

@torimcd torimcd requested a review from nikki-t October 30, 2024 02:34
Copy link
Collaborator

@nikki-t nikki-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great!

I like the use of a geopandas dataframe to load in the shapefile attributes, it's interesting that it makes such a difference in load time but makes sense.

I also like the new logging, it was tough to verify the load data operations with all of the items logged. We may want to think about implementing some debug statements with what we have logged within Hydrocron so we can flip to more verbose logging as needed but I think this is a nice to have and fairly low priority.

@torimcd torimcd merged commit c15774e into develop Oct 31, 2024
5 checks passed
@torimcd torimcd deleted the feature/issue-258 branch October 31, 2024 00:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Granules with very large feature counts cannot be added to hydrocron due to lambda timeout
2 participants