Skip to content

Commit

Permalink
Feature/issue 212 - Update track ingest table with granule status (#228)
Browse files Browse the repository at this point in the history
* /version 1.3.0a0

* Update build.yml

* /version 1.3.0a1

* /version 1.3.0a2

* Feature/issue 175 - Update docs to point to OPS (#176)

* changelog

* update examples, remove load_data readme, info moved to wiki

* Dependency update to fix snyk scan

* issues/101: Support for HTTP Accept header (#172)

* Reorganize timeseries code to  prep for Accept header

* Enable Accept header to return response of specific content-type

* Fix whitespace and string continuation

* Make error handling consistent and add an additional test where a reach can't be found

* Update changelog with issue for unreleased version

* Add 415 status code to API definition

* Few minor cleanup items

* Few minor cleanup items

* Update to [email protected]

* Fix dependencies

---------

Co-authored-by: Frank Greguska <[email protected]>

* /version 1.3.0a3

* issues/102: Support compression of API response (#173)

* Enable payload compression

* Update changelog with issue

---------

Co-authored-by: Frank Greguska <[email protected]>

* /version 1.3.0a4

* Feature/issue 100 Add option to 'compact' GeoJSON result into single feature (#177)

* Reorganize timeseries code to  prep for Accept header

* Enable Accept header to return response of specific content-type

* Fix whitespace and string continuation

* Make error handling consistent and add an additional test where a reach can't be found

* Update changelog with issue for unreleased version

* Add 415 status code to API definition

* Few minor cleanup items

* Few minor cleanup items

* Update to [email protected]

* Fix dependencies

* Update required query parameters based on current API functionality

* Enable return of 'compact' GeoJSON response

* Fix linting and add test data

* Update documentation for API accept headers and compact GeoJSON response

* Fix references to incorrect Accept header examples

---------

Co-authored-by: Frank Greguska <[email protected]>

* /version 1.3.0a5

* Feature/issue 183 (#185)

* Provide introduction to timeseries endpoint

* Remove _units in fields list

* Fix typo

* Update examples with Accept headers and compact query parameter

* Add issue to changelog

* Fix typo in timeseries documentation

* Update pymysql

* Update pymysql

* Provide clarity on accept headers and request parameter fields

* /version 1.3.0a6

* Feature/issue 186 Implement API keys (#188)

* API Gateway Lambda authorizer to facilitate API keys and usage plans

* Unit tests to test Lambda authorizer

* Fix terraform file formatting

* API Gateway Lambda Authorizer

- Lambda function
- API Keys and Authorizer definition in OpenAPI spec
- API gateway API keys
- API gateway usage plans
- SSM parameters for API keys

* Fix trailing whitespace

* Set default region environment variable

* Fix SNYK vulnerabilities

* Add issue to changelog

* Implement custom trusted partner header x-hydrocron-key

* Update cryptography for SNYK vulnerability

* Update documentation to include API key usage

* Update quota and throttle settings for API Gateway

* Update API keys documentation to indicate to be implemented

* Move API key lookup to Lambda INIT

* Remove API key authentication and update API key to x-hydrocron-key

* /version 1.3.0a7

* Update changelog for 1.3.0 release

* /version 1.4.0a0

* Feature/issue 198 (#207)

* Update pylint to deal with errors and fix collection reference

* Initial CMR and Hydrocron queries

- Includes placeholders for other operations needed to track granule
ingest.
- GranuleUR query for Hydrocron tables.

* Add and set up vcrpy for testing CMR API query

* Test track ingest operations

- Test CMR and hydrocron queries
- Test granuleUR query
- Update database to include granuleUR GSI

* Update to use track_ingest naming consistently

* Initial Lambda function and IAM role definition

* Replace deprecated path function with as_file

* Add SSM read IAM permissions

* Add DynamoDB read permissions

* Update track ingest lambda memory

* Remove duplicate IAM permissions

* Add in permissions to query index

* Update changelog

* Update changelog description

* Use python_cmr for CMR API queries

* /version 1.4.0a1

* Add doi to documentation pages (#216)

* Update intro.md with DOI

* Update overview.md with DOI

* /version 1.4.0a2

* issue-193: Add Dynamo DB Table for SWOT Prior Lakes (#209)

* add code to handle prior lakes shapefiles, add test prior lake data

* update terraform to add prior lake table

* fix tests, change to smaller test data file, changelog

* linting

* reconfigure main load_data method to make more readable and pass linting

* lint

* lint

* fix string casting to lower storage req & update test responses to handle different rounding pattern in coords

* update load benchmarking function for linting and add unit test

* try parent collection for lakes

* update version parsing for parent collection

* fix case error

* fix lake id reference

* add logging to troubleshoot too large features

* add item size logging and remove error raise for batch write

* clean up logging statements & move numeric_columns assignment

* update batch logging statement

* Rename constant

* Fix temp dir security risk https://rules.sonarsource.com/python/RSPEC-5443/

* Fix temp dir security risk https://rules.sonarsource.com/python/RSPEC-5443/

* fix code coverage calculation

---------

Co-authored-by: Frank Greguska <[email protected]>

* /version 1.4.0a3

* Feature/issue 201 Create a table for tracking granule ingest status (#214)

* Define track ingest database and IAM permissions

* Update changelog with issue

* Modify table structure to support sparse status index

* Updated to only apply PITR in ops

---------

Co-authored-by: Frank Greguska <[email protected]>

* /version 1.4.0a4

* Feature/issue 210 - Load large geometry polygons (#219)

* add functions to handle null geometries and convert polygons to points

* update doi in docs

* fix fill null geometries

* fix tests and update changelog

* /version 1.4.0a5

* Feature/issue 222 - Add granule info to track ingest table on load (#223)

* adjust lambdas to populate track ingest table on granule load

* changelog

* remove test cnm

* lint

* change error caught when handling checksum

* update lambda role permissions to write to track ingest table

* fix typo on lake table terraform

* set default fill values for checksum and rev date in track status

* fix checksum handling in bulk load data

* lint

* add logging to debug

* /version 1.4.0a6

* Add SSM parameter read for last run time

* Feature/issue-225: Create one track ingest table per feature type (#226)

* add track ingest tables for each feature type and adjust load data to populate

* changelog

* /version 1.4.0a7

* Feature/issue 196 Add new feature type to query the API for lake data (#224)

* Initial API queries for lake data

* Unit tests for lake data

* Updates after center point calculations

- Removed temp code to calculate a point in API
- Implemented unit test to test lake data retrieval
- Updated fixtures to load in lake data for testing

* Add read lake table permissions to lambda timeseries and track ingest roles

* Update documenation to include lake data

* Updated documentation to include info on lake centerpoints

---------

Co-authored-by: Frank Greguska <[email protected]>

* /version 1.4.0a8

* Feature/issue 205 - Add Confluence API key (#221)

* Fix possible variable references before value is assigned

* Define Confluence API key and trusted partner plan limits

* Define a list of trusted partner keys and store under single parameter

* Define API keys as encrypted envrionment variables for Lambda authorizer

* Update authorizer and connection class to use KMS to retrieve API keys

* Hack to force lambda deployment when ssm value changes (#218)

* Add replace_triggered_by to hydrocron_lambda_authorizer

* Introduce environment variable that contains random id which will change whenever an API key value changes. This will force lambda to publish new version of the function.

* Remove unnecessary hash function

* Update to SSM parameter API key storage and null_resource enviroment variable

* Update Terraform and AWS provider

* Update API key documentation

* Set source_code_hash to force deployment of new image

* Downgrade AWS provider to 4.0 to remove inline policy errors

* Update docs/timeseries.md

---------

Co-authored-by: Frank Greguska <[email protected]>

* /version 1.4.0a9

* /version 1.4.0a10

* changelog for 1.4.0 release

* /version 1.5.0a0

* Initial track ingest table query

* Fix linting and code style

* Implement feature count operations

* Enable S3 permissions and set environment variable for track lambda

* Fix trailing white spaces and code format

* Update docstrings for class methods

* Implement run time storage in SSM

* Query track table unit tests

* Update CHANGELOG with issue

* Update SSM run time parameter

* Fix trailing whitespace

* Fix reference to IAM policy

* Enable specification of temporal range to search revision date by

* Fix SSM put parameter policy

* Update IAM permissions for reading track ingest

* Enable full temporal search on CMR granules

* Add capability to download shapefile granule to count features

* Update granule UR to include .zip

* Count features via Hydrocron table query

* Remove unnecessary s3 permissions

* Remove whitespace from blank line

* Update cryptography to 43.0.1

* Update track ingest table operations

* Update changelog with issue

* update dependencies

* upgrade geopandas

* update dependencies

---------

Co-authored-by: nikki-t <[email protected]>
Co-authored-by: Frank Greguska <[email protected]>
Co-authored-by: frankinspace <[email protected]>
Co-authored-by: Victoria McDonald <[email protected]>
Co-authored-by: Cassie Nickles <[email protected]>
Co-authored-by: cassienickles <[email protected]>
Co-authored-by: podaac-cicd[bot] <podaac-cicd[bot]@users.noreply.github.com>
Co-authored-by: Victoria McDonald <[email protected]>
Co-authored-by: torimcd <[email protected]>
  • Loading branch information
10 people authored Oct 3, 2024
1 parent 8e768a1 commit 01836e2
Show file tree
Hide file tree
Showing 3 changed files with 75 additions and 1 deletion.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Added
- Issue 211 - Query track ingest table for granules with "to_ingest" status
- Issue 212 - Update track ingest table with granule status
### Changed
### Deprecated
### Removed
Expand Down
8 changes: 8 additions & 0 deletions hydrocron/db/track_ingest.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@

# Application Imports
from hydrocron.api.data_access.db import DynamoDataRepository
from hydrocron.db.load_data import load_data
from hydrocron.utils import connection


Expand Down Expand Up @@ -178,6 +179,8 @@ def query_track_ingest(self, hydrocron_track_table, hydrocron_table):
self.ingested.append(ingest_item)
else:
ingest_item["status"] = "to_ingest"
if ingest_item in self.to_ingest:
continue # Skip if not found in Hydrocron table
self.to_ingest.append(ingest_item)

logging.info("Located %s granules that require ingestion.", len(self.to_ingest))
Expand All @@ -193,6 +196,11 @@ def update_track_ingest(self, hydrocron_track_table):
:type hydrocron_track_table: str
"""

items = self.ingested + self.to_ingest
dynamo_resource = connection.dynamodb_resource
load_data(dynamo_resource=dynamo_resource, table_name=hydrocron_track_table, items=items)
logging.info("Updated %s with %s items.", hydrocron_track_table, len(items))

def update_runtime(self):
"""Update SSM parameter runtime for next execution."""

Expand Down
67 changes: 66 additions & 1 deletion tests/test_track_ingest.py
Original file line number Diff line number Diff line change
Expand Up @@ -195,7 +195,7 @@ def test_query_ingest_to_ingest(track_ingest_fixture):
track = Track(collection_shortname, collection_start_date)
track._query_for_granule_ur = MagicMock(name="_query_for_granule_ur")
track._query_for_granule_ur.return_value = "s3://podaac-swot-ops-cumulus-protected/SWOT_L2_HR_RiverSP_2.0/SWOT_L2_HR_RiverSP_Reach_020_149_NA_20240825T231711_20240825T231722_PIC0_01.zip"

hydrocron_track_table = constants.SWOT_REACH_TRACK_INGEST_TABLE_NAME
hydrocron_table = constants.SWOT_REACH_TABLE_NAME
track.query_track_ingest(hydrocron_track_table, hydrocron_table)
Expand All @@ -209,3 +209,68 @@ def test_query_ingest_to_ingest(track_ingest_fixture):
"status": "to_ingest"
}]
assert track.to_ingest == expected


def test_update_track_to_ingest(track_ingest_fixture):
"""Test query_ingest function for require ingest.
Parameters
----------
track_ingest_fixture: Fixture ensuring the database is configured for track ingest operations
"""
from boto3.dynamodb.conditions import Key
from hydrocron.db.track_ingest import Track
import hydrocron.utils.connection

collection_shortname = "SWOT_L2_HR_RiverSP_reach_2.0"
collection_start_date = datetime.datetime.strptime("20240630", "%Y%m%d").replace(tzinfo=datetime.timezone.utc)
track = Track(collection_shortname, collection_start_date)
track.to_ingest = [{
"granuleUR": "SWOT_L2_HR_RiverSP_Reach_010_177_NA_20240131T074748_20240131T074759_PIC0_01.zip",
"revision_date": "2024-06-30T21:22:23.123Z",
"checksum": "1234",
"expected_feature_count": -1,
"actual_feature_count": 0,
"status": "to_ingest"
}]
track.update_track_ingest(constants.SWOT_REACH_TRACK_INGEST_TABLE_NAME)

dynamodb = hydrocron.utils.connection._dynamodb_resource
table = dynamodb.Table(constants.SWOT_REACH_TRACK_INGEST_TABLE_NAME)
table.load()
actual_item = table.query(
KeyConditionExpression=(Key("granuleUR").eq("SWOT_L2_HR_RiverSP_Reach_010_177_NA_20240131T074748_20240131T074759_PIC0_01.zip"))
)
assert actual_item["Items"] == track.to_ingest

def test_update_track_ingested(track_ingest_fixture):
"""Test query_ingest function for require ingest.
Parameters
----------
track_ingest_fixture: Fixture ensuring the database is configured for track ingest operations
"""
from boto3.dynamodb.conditions import Key
from hydrocron.db.track_ingest import Track
import hydrocron.utils.connection

collection_shortname = "SWOT_L2_HR_RiverSP_reach_2.0"
collection_start_date = datetime.datetime.strptime("20240630", "%Y%m%d").replace(tzinfo=datetime.timezone.utc)
track = Track(collection_shortname, collection_start_date)
track = Track(collection_shortname, collection_start_date)
track.ingested = [{
"granuleUR": "SWOT_L2_HR_RiverSP_Reach_020_149_NA_20240825T231711_20240825T231722_PIC0_01.zip",
"revision_date": "2024-05-22T19:15:44.572Z",
"checksum": "0823db619be0044e809a5f992e067d03",
"expected_feature_count":664,
"actual_feature_count": 664,
}]
track.update_track_ingest(constants.SWOT_REACH_TRACK_INGEST_TABLE_NAME)

dynamodb = hydrocron.utils.connection._dynamodb_resource
table = dynamodb.Table(constants.SWOT_REACH_TRACK_INGEST_TABLE_NAME)
table.load()
actual_item = table.query(
KeyConditionExpression=(Key("granuleUR").eq("SWOT_L2_HR_RiverSP_Reach_020_149_NA_20240825T231711_20240825T231722_PIC0_01.zip"))
)
assert actual_item["Items"] == track.ingested

0 comments on commit 01836e2

Please sign in to comment.