Feature/issue 212 - Update track ingest table with granule status (#228)

* /version 1.3.0a0 * Update build.yml * /version 1.3.0a1 * /version 1.3.0a2 * Feature/issue 175 - Update docs to point to OPS (#176) * changelog * update examples, remove load_data readme, info moved to wiki * Dependency update to fix snyk scan * issues/101: Support for HTTP Accept header (#172) * Reorganize timeseries code to prep for Accept header * Enable Accept header to return response of specific content-type * Fix whitespace and string continuation * Make error handling consistent and add an additional test where a reach can't be found * Update changelog with issue for unreleased version * Add 415 status code to API definition * Few minor cleanup items * Few minor cleanup items * Update to [email protected] * Fix dependencies --------- Co-authored-by: Frank Greguska <[email protected]> * /version 1.3.0a3 * issues/102: Support compression of API response (#173) * Enable payload compression * Update changelog with issue --------- Co-authored-by: Frank Greguska <[email protected]> * /version 1.3.0a4 * Feature/issue 100 Add option to 'compact' GeoJSON result into single feature (#177) * Reorganize timeseries code to prep for Accept header * Enable Accept header to return response of specific content-type * Fix whitespace and string continuation * Make error handling consistent and add an additional test where a reach can't be found * Update changelog with issue for unreleased version * Add 415 status code to API definition * Few minor cleanup items * Few minor cleanup items * Update to [email protected] * Fix dependencies * Update required query parameters based on current API functionality * Enable return of 'compact' GeoJSON response * Fix linting and add test data * Update documentation for API accept headers and compact GeoJSON response * Fix references to incorrect Accept header examples --------- Co-authored-by: Frank Greguska <[email protected]> * /version 1.3.0a5 * Feature/issue 183 (#185) * Provide introduction to timeseries endpoint * Remove _units in fields list * Fix typo * Update examples with Accept headers and compact query parameter * Add issue to changelog * Fix typo in timeseries documentation * Update pymysql * Update pymysql * Provide clarity on accept headers and request parameter fields * /version 1.3.0a6 * Feature/issue 186 Implement API keys (#188) * API Gateway Lambda authorizer to facilitate API keys and usage plans * Unit tests to test Lambda authorizer * Fix terraform file formatting * API Gateway Lambda Authorizer - Lambda function - API Keys and Authorizer definition in OpenAPI spec - API gateway API keys - API gateway usage plans - SSM parameters for API keys * Fix trailing whitespace * Set default region environment variable * Fix SNYK vulnerabilities * Add issue to changelog * Implement custom trusted partner header x-hydrocron-key * Update cryptography for SNYK vulnerability * Update documentation to include API key usage * Update quota and throttle settings for API Gateway * Update API keys documentation to indicate to be implemented * Move API key lookup to Lambda INIT * Remove API key authentication and update API key to x-hydrocron-key * /version 1.3.0a7 * Update changelog for 1.3.0 release * /version 1.4.0a0 * Feature/issue 198 (#207) * Update pylint to deal with errors and fix collection reference * Initial CMR and Hydrocron queries - Includes placeholders for other operations needed to track granule ingest. - GranuleUR query for Hydrocron tables. * Add and set up vcrpy for testing CMR API query * Test track ingest operations - Test CMR and hydrocron queries - Test granuleUR query - Update database to include granuleUR GSI * Update to use track_ingest naming consistently * Initial Lambda function and IAM role definition * Replace deprecated path function with as_file * Add SSM read IAM permissions * Add DynamoDB read permissions * Update track ingest lambda memory * Remove duplicate IAM permissions * Add in permissions to query index * Update changelog * Update changelog description * Use python_cmr for CMR API queries * /version 1.4.0a1 * Add doi to documentation pages (#216) * Update intro.md with DOI * Update overview.md with DOI * /version 1.4.0a2 * issue-193: Add Dynamo DB Table for SWOT Prior Lakes (#209) * add code to handle prior lakes shapefiles, add test prior lake data * update terraform to add prior lake table * fix tests, change to smaller test data file, changelog * linting * reconfigure main load_data method to make more readable and pass linting * lint * lint * fix string casting to lower storage req & update test responses to handle different rounding pattern in coords * update load benchmarking function for linting and add unit test * try parent collection for lakes * update version parsing for parent collection * fix case error * fix lake id reference * add logging to troubleshoot too large features * add item size logging and remove error raise for batch write * clean up logging statements & move numeric_columns assignment * update batch logging statement * Rename constant * Fix temp dir security risk https://rules.sonarsource.com/python/RSPEC-5443/ * Fix temp dir security risk https://rules.sonarsource.com/python/RSPEC-5443/ * fix code coverage calculation --------- Co-authored-by: Frank Greguska <[email protected]> * /version 1.4.0a3 * Feature/issue 201 Create a table for tracking granule ingest status (#214) * Define track ingest database and IAM permissions * Update changelog with issue * Modify table structure to support sparse status index * Updated to only apply PITR in ops --------- Co-authored-by: Frank Greguska <[email protected]> * /version 1.4.0a4 * Feature/issue 210 - Load large geometry polygons (#219) * add functions to handle null geometries and convert polygons to points * update doi in docs * fix fill null geometries * fix tests and update changelog * /version 1.4.0a5 * Feature/issue 222 - Add granule info to track ingest table on load (#223) * adjust lambdas to populate track ingest table on granule load * changelog * remove test cnm * lint * change error caught when handling checksum * update lambda role permissions to write to track ingest table * fix typo on lake table terraform * set default fill values for checksum and rev date in track status * fix checksum handling in bulk load data * lint * add logging to debug * /version 1.4.0a6 * Add SSM parameter read for last run time * Feature/issue-225: Create one track ingest table per feature type (#226) * add track ingest tables for each feature type and adjust load data to populate * changelog * /version 1.4.0a7 * Feature/issue 196 Add new feature type to query the API for lake data (#224) * Initial API queries for lake data * Unit tests for lake data * Updates after center point calculations - Removed temp code to calculate a point in API - Implemented unit test to test lake data retrieval - Updated fixtures to load in lake data for testing * Add read lake table permissions to lambda timeseries and track ingest roles * Update documenation to include lake data * Updated documentation to include info on lake centerpoints --------- Co-authored-by: Frank Greguska <[email protected]> * /version 1.4.0a8 * Feature/issue 205 - Add Confluence API key (#221) * Fix possible variable references before value is assigned * Define Confluence API key and trusted partner plan limits * Define a list of trusted partner keys and store under single parameter * Define API keys as encrypted envrionment variables for Lambda authorizer * Update authorizer and connection class to use KMS to retrieve API keys * Hack to force lambda deployment when ssm value changes (#218) * Add replace_triggered_by to hydrocron_lambda_authorizer * Introduce environment variable that contains random id which will change whenever an API key value changes. This will force lambda to publish new version of the function. * Remove unnecessary hash function * Update to SSM parameter API key storage and null_resource enviroment variable * Update Terraform and AWS provider * Update API key documentation * Set source_code_hash to force deployment of new image * Downgrade AWS provider to 4.0 to remove inline policy errors * Update docs/timeseries.md --------- Co-authored-by: Frank Greguska <[email protected]> * /version 1.4.0a9 * /version 1.4.0a10 * changelog for 1.4.0 release * /version 1.5.0a0 * Initial track ingest table query * Fix linting and code style * Implement feature count operations * Enable S3 permissions and set environment variable for track lambda * Fix trailing white spaces and code format * Update docstrings for class methods * Implement run time storage in SSM * Query track table unit tests * Update CHANGELOG with issue * Update SSM run time parameter * Fix trailing whitespace * Fix reference to IAM policy * Enable specification of temporal range to search revision date by * Fix SSM put parameter policy * Update IAM permissions for reading track ingest * Enable full temporal search on CMR granules * Add capability to download shapefile granule to count features * Update granule UR to include .zip * Count features via Hydrocron table query * Remove unnecessary s3 permissions * Remove whitespace from blank line * Update cryptography to 43.0.1 * Update track ingest table operations * Update changelog with issue * update dependencies * upgrade geopandas * update dependencies --------- Co-authored-by: nikki-t <[email protected]> Co-authored-by: Frank Greguska <[email protected]> Co-authored-by: frankinspace <[email protected]> Co-authored-by: Victoria McDonald <[email protected]> Co-authored-by: Cassie Nickles <[email protected]> Co-authored-by: cassienickles <[email protected]> Co-authored-by: podaac-cicd[bot] <podaac-cicd[bot]@users.noreply.github.com> Co-authored-by: Victoria McDonald <[email protected]> Co-authored-by: torimcd <[email protected]>
podaac · Oct 3, 2024 · 01836e2 · 01836e2
1 parent 8e768a1
commit 01836e2
Show file tree

Hide file tree

Showing 3 changed files with 75 additions and 1 deletion.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -8,6 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ### Added
     - Issue 211 - Query track ingest table for granules with "to_ingest" status
+    - Issue 212 - Update track ingest table with granule status
 ### Changed
 ### Deprecated
 ### Removed

diff --git a/hydrocron/db/track_ingest.py b/hydrocron/db/track_ingest.py
@@ -13,6 +13,7 @@
 
 # Application Imports
 from hydrocron.api.data_access.db import DynamoDataRepository
+from hydrocron.db.load_data import load_data
 from hydrocron.utils import connection
 
 
@@ -178,6 +179,8 @@ def query_track_ingest(self, hydrocron_track_table, hydrocron_table):
                 self.ingested.append(ingest_item)
             else:
                 ingest_item["status"] = "to_ingest"
+                if ingest_item in self.to_ingest:
+                    continue    # Skip if not found in Hydrocron table
                 self.to_ingest.append(ingest_item)
 
         logging.info("Located %s granules that require ingestion.", len(self.to_ingest))
@@ -193,6 +196,11 @@ def update_track_ingest(self, hydrocron_track_table):
         :type hydrocron_track_table: str
         """
 
+        items = self.ingested + self.to_ingest
+        dynamo_resource = connection.dynamodb_resource
+        load_data(dynamo_resource=dynamo_resource, table_name=hydrocron_track_table, items=items)
+        logging.info("Updated %s with %s items.", hydrocron_track_table, len(items))
+
     def update_runtime(self):
         """Update SSM parameter runtime for next execution."""
 

diff --git a/tests/test_track_ingest.py b/tests/test_track_ingest.py
@@ -195,7 +195,7 @@ def test_query_ingest_to_ingest(track_ingest_fixture):
     track = Track(collection_shortname, collection_start_date)
     track._query_for_granule_ur = MagicMock(name="_query_for_granule_ur")
     track._query_for_granule_ur.return_value = "s3://podaac-swot-ops-cumulus-protected/SWOT_L2_HR_RiverSP_2.0/SWOT_L2_HR_RiverSP_Reach_020_149_NA_20240825T231711_20240825T231722_PIC0_01.zip"
-    
+
     hydrocron_track_table = constants.SWOT_REACH_TRACK_INGEST_TABLE_NAME
     hydrocron_table = constants.SWOT_REACH_TABLE_NAME
     track.query_track_ingest(hydrocron_track_table, hydrocron_table)
@@ -209,3 +209,68 @@ def test_query_ingest_to_ingest(track_ingest_fixture):
         "status": "to_ingest"
     }]
     assert track.to_ingest == expected
+
+
+def test_update_track_to_ingest(track_ingest_fixture):
+    """Test query_ingest function for require ingest.
+    
+    Parameters
+    ----------
+    track_ingest_fixture: Fixture ensuring the database is configured for track ingest operations
+    """
+    from boto3.dynamodb.conditions import Key
+    from hydrocron.db.track_ingest import Track
+    import hydrocron.utils.connection
+
+    collection_shortname = "SWOT_L2_HR_RiverSP_reach_2.0"
+    collection_start_date = datetime.datetime.strptime("20240630", "%Y%m%d").replace(tzinfo=datetime.timezone.utc)
+    track = Track(collection_shortname, collection_start_date)
+    track.to_ingest = [{
+        "granuleUR": "SWOT_L2_HR_RiverSP_Reach_010_177_NA_20240131T074748_20240131T074759_PIC0_01.zip",
+        "revision_date": "2024-06-30T21:22:23.123Z",
+        "checksum": "1234",
+        "expected_feature_count": -1,
+        "actual_feature_count": 0,
+        "status": "to_ingest"
+    }]
+    track.update_track_ingest(constants.SWOT_REACH_TRACK_INGEST_TABLE_NAME)
+
+    dynamodb = hydrocron.utils.connection._dynamodb_resource
+    table = dynamodb.Table(constants.SWOT_REACH_TRACK_INGEST_TABLE_NAME)
+    table.load()
+    actual_item = table.query(
+        KeyConditionExpression=(Key("granuleUR").eq("SWOT_L2_HR_RiverSP_Reach_010_177_NA_20240131T074748_20240131T074759_PIC0_01.zip"))
+    )
+    assert actual_item["Items"] == track.to_ingest
+
+def test_update_track_ingested(track_ingest_fixture):
+    """Test query_ingest function for require ingest.
+    
+    Parameters
+    ----------
+    track_ingest_fixture: Fixture ensuring the database is configured for track ingest operations
+    """
+    from boto3.dynamodb.conditions import Key
+    from hydrocron.db.track_ingest import Track
+    import hydrocron.utils.connection
+
+    collection_shortname = "SWOT_L2_HR_RiverSP_reach_2.0"
+    collection_start_date = datetime.datetime.strptime("20240630", "%Y%m%d").replace(tzinfo=datetime.timezone.utc)
+    track = Track(collection_shortname, collection_start_date)
+    track = Track(collection_shortname, collection_start_date)
+    track.ingested = [{
+        "granuleUR": "SWOT_L2_HR_RiverSP_Reach_020_149_NA_20240825T231711_20240825T231722_PIC0_01.zip",
+        "revision_date": "2024-05-22T19:15:44.572Z",
+        "checksum": "0823db619be0044e809a5f992e067d03",
+        "expected_feature_count":664,
+        "actual_feature_count": 664,
+    }]
+    track.update_track_ingest(constants.SWOT_REACH_TRACK_INGEST_TABLE_NAME)
+
+    dynamodb = hydrocron.utils.connection._dynamodb_resource
+    table = dynamodb.Table(constants.SWOT_REACH_TRACK_INGEST_TABLE_NAME)
+    table.load()
+    actual_item = table.query(
+        KeyConditionExpression=(Key("granuleUR").eq("SWOT_L2_HR_RiverSP_Reach_020_149_NA_20240825T231711_20240825T231722_PIC0_01.zip"))
+    )
+    assert actual_item["Items"] == track.ingested