Feature/issue 258 - Granules with large numbers of features cannot be loadedd #259

torimcd · 2024-10-30T02:34:05Z

Github Issue: Closes #258

Description

Granules with 100,000+ features cannot be loaded due to lambda timeout when iterating through features.

Overview of work done

Updated shapefile unpacking module to use dataframe assign operation instead of for loop
Added logging statements to more easily isolate where issues are in loading operations
removed logging statement for each item in database batch writer too make logs easier to parse
@nikki-t changed api test validation data to match sigfigs returned from dataframe operations so tests pass

Overview of verification done

unit tests pass
checked that 'items' object returned from assemble_attributes module is same structure as prior method.

Overview of integration done

deployed to SIT and successfully loaded granule SWOT_L2_HR_LakeSP_Prior_020_123_AR_20240825T010843_20240825T011234_PIC0_01.zip, which contains 140,000+ features and was the same granule reported failing.

PR checklist:

Linted
Updated unit tests
Updated changelog
Integration testing

See Pull Request Review Checklist for pointers on reviewing this pull request

…w looping

nikki-t

This looks great!

I like the use of a geopandas dataframe to load in the shapefile attributes, it's interesting that it makes such a difference in load time but makes sense.

I also like the new logging, it was tough to verify the load data operations with all of the items logged. We may want to think about implementing some debug statements with what we have logged within Hydrocron so we can flip to more verbose logging as needed but I think this is a nice to have and fairly low priority.

torimcd and others added 6 commits October 25, 2024 10:49

change assemble attrs function to avoid for loop

f9627d8

change how attributes are concatenated during shp unpack to avoid slo…

7743188

…w looping

remove unused import

2e97285

Update API test data with less precise data coordinates

e4b6a84

remove logging every item in batch writer

3bda9c0

lint

106d444

torimcd requested a review from nikki-t October 30, 2024 02:34

nikki-t approved these changes Oct 30, 2024

View reviewed changes

Merge branch 'develop' into feature/issue-258

8684187

torimcd temporarily deployed to SIT October 31, 2024 00:24 — with GitHub Actions Inactive

torimcd merged commit c15774e into develop Oct 31, 2024
5 checks passed

torimcd deleted the feature/issue-258 branch October 31, 2024 00:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/issue 258 - Granules with large numbers of features cannot be loadedd #259

Feature/issue 258 - Granules with large numbers of features cannot be loadedd #259

torimcd commented Oct 30, 2024

nikki-t left a comment

Feature/issue 258 - Granules with large numbers of features cannot be loadedd #259

Feature/issue 258 - Granules with large numbers of features cannot be loadedd #259

Conversation

torimcd commented Oct 30, 2024

Description

Overview of work done

Overview of verification done

Overview of integration done

PR checklist:

nikki-t left a comment

Choose a reason for hiding this comment