Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release/1.5.0 #262

Open
wants to merge 45 commits into
base: main
Choose a base branch
from
Open

Release/1.5.0 #262

wants to merge 45 commits into from

Conversation

nikki-t
Copy link
Collaborator

@nikki-t nikki-t commented Nov 11, 2024

[1.5.0]

Added

- Issue 211 - Query track ingest table for granules with "to_ingest" status
- Issue 212 - Update track ingest table with granule status
- Issue 203 - Construct CNM to trigger load data operations and ingest granule
- Issue 236 - Allow UAT query of CMR to support querying in different venues
- Issue 250 - Handle overlapping times with unique CRIDS

Changed

- Issue 251 - Add note to readme to point to documentation

Deprecated

Removed

Fixed

- Issue 258 - Granules with very large feature counts cannot be added to hydrocron
- Issue 235 - Track ingest table can be populated with granules that aren't loaded into Hydrocron
- Issue 248 - Track ingest operations need to query UAT for granule files if track ingest is running in SIT or UAT

Security

nikki-t and others added 30 commits September 26, 2024 08:47
* Update hydrocron-lambda.tf

* Update pyproject.toml

* update lock

* fix lint

* fix lint
…gest" status (#227)

* /version 1.3.0a0

* Update build.yml

* /version 1.3.0a1

* /version 1.3.0a2

* Feature/issue 175 - Update docs to point to OPS (#176)

* changelog

* update examples, remove load_data readme, info moved to wiki

* Dependency update to fix snyk scan

* issues/101: Support for HTTP Accept header (#172)

* Reorganize timeseries code to  prep for Accept header

* Enable Accept header to return response of specific content-type

* Fix whitespace and string continuation

* Make error handling consistent and add an additional test where a reach can't be found

* Update changelog with issue for unreleased version

* Add 415 status code to API definition

* Few minor cleanup items

* Few minor cleanup items

* Update to [email protected]

* Fix dependencies

---------

Co-authored-by: Frank Greguska <[email protected]>

* /version 1.3.0a3

* issues/102: Support compression of API response (#173)

* Enable payload compression

* Update changelog with issue

---------

Co-authored-by: Frank Greguska <[email protected]>

* /version 1.3.0a4

* Feature/issue 100 Add option to 'compact' GeoJSON result into single feature (#177)

* Reorganize timeseries code to  prep for Accept header

* Enable Accept header to return response of specific content-type

* Fix whitespace and string continuation

* Make error handling consistent and add an additional test where a reach can't be found

* Update changelog with issue for unreleased version

* Add 415 status code to API definition

* Few minor cleanup items

* Few minor cleanup items

* Update to [email protected]

* Fix dependencies

* Update required query parameters based on current API functionality

* Enable return of 'compact' GeoJSON response

* Fix linting and add test data

* Update documentation for API accept headers and compact GeoJSON response

* Fix references to incorrect Accept header examples

---------

Co-authored-by: Frank Greguska <[email protected]>

* /version 1.3.0a5

* Feature/issue 183 (#185)

* Provide introduction to timeseries endpoint

* Remove _units in fields list

* Fix typo

* Update examples with Accept headers and compact query parameter

* Add issue to changelog

* Fix typo in timeseries documentation

* Update pymysql

* Update pymysql

* Provide clarity on accept headers and request parameter fields

* /version 1.3.0a6

* Feature/issue 186 Implement API keys (#188)

* API Gateway Lambda authorizer to facilitate API keys and usage plans

* Unit tests to test Lambda authorizer

* Fix terraform file formatting

* API Gateway Lambda Authorizer

- Lambda function
- API Keys and Authorizer definition in OpenAPI spec
- API gateway API keys
- API gateway usage plans
- SSM parameters for API keys

* Fix trailing whitespace

* Set default region environment variable

* Fix SNYK vulnerabilities

* Add issue to changelog

* Implement custom trusted partner header x-hydrocron-key

* Update cryptography for SNYK vulnerability

* Update documentation to include API key usage

* Update quota and throttle settings for API Gateway

* Update API keys documentation to indicate to be implemented

* Move API key lookup to Lambda INIT

* Remove API key authentication and update API key to x-hydrocron-key

* /version 1.3.0a7

* Update changelog for 1.3.0 release

* /version 1.4.0a0

* Feature/issue 198 (#207)

* Update pylint to deal with errors and fix collection reference

* Initial CMR and Hydrocron queries

- Includes placeholders for other operations needed to track granule
ingest.
- GranuleUR query for Hydrocron tables.

* Add and set up vcrpy for testing CMR API query

* Test track ingest operations

- Test CMR and hydrocron queries
- Test granuleUR query
- Update database to include granuleUR GSI

* Update to use track_ingest naming consistently

* Initial Lambda function and IAM role definition

* Replace deprecated path function with as_file

* Add SSM read IAM permissions

* Add DynamoDB read permissions

* Update track ingest lambda memory

* Remove duplicate IAM permissions

* Add in permissions to query index

* Update changelog

* Update changelog description

* Use python_cmr for CMR API queries

* /version 1.4.0a1

* Add doi to documentation pages (#216)

* Update intro.md with DOI

* Update overview.md with DOI

* /version 1.4.0a2

* issue-193: Add Dynamo DB Table for SWOT Prior Lakes (#209)

* add code to handle prior lakes shapefiles, add test prior lake data

* update terraform to add prior lake table

* fix tests, change to smaller test data file, changelog

* linting

* reconfigure main load_data method to make more readable and pass linting

* lint

* lint

* fix string casting to lower storage req & update test responses to handle different rounding pattern in coords

* update load benchmarking function for linting and add unit test

* try parent collection for lakes

* update version parsing for parent collection

* fix case error

* fix lake id reference

* add logging to troubleshoot too large features

* add item size logging and remove error raise for batch write

* clean up logging statements & move numeric_columns assignment

* update batch logging statement

* Rename constant

* Fix temp dir security risk https://rules.sonarsource.com/python/RSPEC-5443/

* Fix temp dir security risk https://rules.sonarsource.com/python/RSPEC-5443/

* fix code coverage calculation

---------

Co-authored-by: Frank Greguska <[email protected]>

* /version 1.4.0a3

* Feature/issue 201 Create a table for tracking granule ingest status (#214)

* Define track ingest database and IAM permissions

* Update changelog with issue

* Modify table structure to support sparse status index

* Updated to only apply PITR in ops

---------

Co-authored-by: Frank Greguska <[email protected]>

* /version 1.4.0a4

* Feature/issue 210 - Load large geometry polygons (#219)

* add functions to handle null geometries and convert polygons to points

* update doi in docs

* fix fill null geometries

* fix tests and update changelog

* /version 1.4.0a5

* Feature/issue 222 - Add granule info to track ingest table on load (#223)

* adjust lambdas to populate track ingest table on granule load

* changelog

* remove test cnm

* lint

* change error caught when handling checksum

* update lambda role permissions to write to track ingest table

* fix typo on lake table terraform

* set default fill values for checksum and rev date in track status

* fix checksum handling in bulk load data

* lint

* add logging to debug

* /version 1.4.0a6

* Add SSM parameter read for last run time

* Feature/issue-225: Create one track ingest table per feature type (#226)

* add track ingest tables for each feature type and adjust load data to populate

* changelog

* /version 1.4.0a7

* Feature/issue 196 Add new feature type to query the API for lake data (#224)

* Initial API queries for lake data

* Unit tests for lake data

* Updates after center point calculations

- Removed temp code to calculate a point in API
- Implemented unit test to test lake data retrieval
- Updated fixtures to load in lake data for testing

* Add read lake table permissions to lambda timeseries and track ingest roles

* Update documenation to include lake data

* Updated documentation to include info on lake centerpoints

---------

Co-authored-by: Frank Greguska <[email protected]>

* /version 1.4.0a8

* Feature/issue 205 - Add Confluence API key (#221)

* Fix possible variable references before value is assigned

* Define Confluence API key and trusted partner plan limits

* Define a list of trusted partner keys and store under single parameter

* Define API keys as encrypted envrionment variables for Lambda authorizer

* Update authorizer and connection class to use KMS to retrieve API keys

* Hack to force lambda deployment when ssm value changes (#218)

* Add replace_triggered_by to hydrocron_lambda_authorizer

* Introduce environment variable that contains random id which will change whenever an API key value changes. This will force lambda to publish new version of the function.

* Remove unnecessary hash function

* Update to SSM parameter API key storage and null_resource enviroment variable

* Update Terraform and AWS provider

* Update API key documentation

* Set source_code_hash to force deployment of new image

* Downgrade AWS provider to 4.0 to remove inline policy errors

* Update docs/timeseries.md

---------

Co-authored-by: Frank Greguska <[email protected]>

* /version 1.4.0a9

* /version 1.4.0a10

* changelog for 1.4.0 release

* /version 1.5.0a0

* Initial track ingest table query

* Fix linting and code style

* Implement feature count operations

* Enable S3 permissions and set environment variable for track lambda

* Fix trailing white spaces and code format

* Update docstrings for class methods

* Implement run time storage in SSM

* Query track table unit tests

* Update CHANGELOG with issue

* Update SSM run time parameter

* Fix trailing whitespace

* Fix reference to IAM policy

* Enable specification of temporal range to search revision date by

* Fix SSM put parameter policy

* Update IAM permissions for reading track ingest

* Enable full temporal search on CMR granules

* Add capability to download shapefile granule to count features

* Update granule UR to include .zip

* Count features via Hydrocron table query

* Remove unnecessary s3 permissions

* Remove whitespace from blank line

* Update cryptography to 43.0.1

* update dependencies

* upgrade geopandas

* update dependencies

---------

Co-authored-by: nikki-t <[email protected]>
Co-authored-by: Frank Greguska <[email protected]>
Co-authored-by: frankinspace <[email protected]>
Co-authored-by: Victoria McDonald <[email protected]>
Co-authored-by: Cassie Nickles <[email protected]>
Co-authored-by: cassienickles <[email protected]>
Co-authored-by: podaac-cicd[bot] <podaac-cicd[bot]@users.noreply.github.com>
Co-authored-by: Victoria McDonald <[email protected]>
Co-authored-by: torimcd <[email protected]>
* /version 1.3.0a0

* Update build.yml

* /version 1.3.0a1

* /version 1.3.0a2

* Feature/issue 175 - Update docs to point to OPS (#176)

* changelog

* update examples, remove load_data readme, info moved to wiki

* Dependency update to fix snyk scan

* issues/101: Support for HTTP Accept header (#172)

* Reorganize timeseries code to  prep for Accept header

* Enable Accept header to return response of specific content-type

* Fix whitespace and string continuation

* Make error handling consistent and add an additional test where a reach can't be found

* Update changelog with issue for unreleased version

* Add 415 status code to API definition

* Few minor cleanup items

* Few minor cleanup items

* Update to [email protected]

* Fix dependencies

---------

Co-authored-by: Frank Greguska <[email protected]>

* /version 1.3.0a3

* issues/102: Support compression of API response (#173)

* Enable payload compression

* Update changelog with issue

---------

Co-authored-by: Frank Greguska <[email protected]>

* /version 1.3.0a4

* Feature/issue 100 Add option to 'compact' GeoJSON result into single feature (#177)

* Reorganize timeseries code to  prep for Accept header

* Enable Accept header to return response of specific content-type

* Fix whitespace and string continuation

* Make error handling consistent and add an additional test where a reach can't be found

* Update changelog with issue for unreleased version

* Add 415 status code to API definition

* Few minor cleanup items

* Few minor cleanup items

* Update to [email protected]

* Fix dependencies

* Update required query parameters based on current API functionality

* Enable return of 'compact' GeoJSON response

* Fix linting and add test data

* Update documentation for API accept headers and compact GeoJSON response

* Fix references to incorrect Accept header examples

---------

Co-authored-by: Frank Greguska <[email protected]>

* /version 1.3.0a5

* Feature/issue 183 (#185)

* Provide introduction to timeseries endpoint

* Remove _units in fields list

* Fix typo

* Update examples with Accept headers and compact query parameter

* Add issue to changelog

* Fix typo in timeseries documentation

* Update pymysql

* Update pymysql

* Provide clarity on accept headers and request parameter fields

* /version 1.3.0a6

* Feature/issue 186 Implement API keys (#188)

* API Gateway Lambda authorizer to facilitate API keys and usage plans

* Unit tests to test Lambda authorizer

* Fix terraform file formatting

* API Gateway Lambda Authorizer

- Lambda function
- API Keys and Authorizer definition in OpenAPI spec
- API gateway API keys
- API gateway usage plans
- SSM parameters for API keys

* Fix trailing whitespace

* Set default region environment variable

* Fix SNYK vulnerabilities

* Add issue to changelog

* Implement custom trusted partner header x-hydrocron-key

* Update cryptography for SNYK vulnerability

* Update documentation to include API key usage

* Update quota and throttle settings for API Gateway

* Update API keys documentation to indicate to be implemented

* Move API key lookup to Lambda INIT

* Remove API key authentication and update API key to x-hydrocron-key

* /version 1.3.0a7

* Update changelog for 1.3.0 release

* /version 1.4.0a0

* Feature/issue 198 (#207)

* Update pylint to deal with errors and fix collection reference

* Initial CMR and Hydrocron queries

- Includes placeholders for other operations needed to track granule
ingest.
- GranuleUR query for Hydrocron tables.

* Add and set up vcrpy for testing CMR API query

* Test track ingest operations

- Test CMR and hydrocron queries
- Test granuleUR query
- Update database to include granuleUR GSI

* Update to use track_ingest naming consistently

* Initial Lambda function and IAM role definition

* Replace deprecated path function with as_file

* Add SSM read IAM permissions

* Add DynamoDB read permissions

* Update track ingest lambda memory

* Remove duplicate IAM permissions

* Add in permissions to query index

* Update changelog

* Update changelog description

* Use python_cmr for CMR API queries

* /version 1.4.0a1

* Add doi to documentation pages (#216)

* Update intro.md with DOI

* Update overview.md with DOI

* /version 1.4.0a2

* issue-193: Add Dynamo DB Table for SWOT Prior Lakes (#209)

* add code to handle prior lakes shapefiles, add test prior lake data

* update terraform to add prior lake table

* fix tests, change to smaller test data file, changelog

* linting

* reconfigure main load_data method to make more readable and pass linting

* lint

* lint

* fix string casting to lower storage req & update test responses to handle different rounding pattern in coords

* update load benchmarking function for linting and add unit test

* try parent collection for lakes

* update version parsing for parent collection

* fix case error

* fix lake id reference

* add logging to troubleshoot too large features

* add item size logging and remove error raise for batch write

* clean up logging statements & move numeric_columns assignment

* update batch logging statement

* Rename constant

* Fix temp dir security risk https://rules.sonarsource.com/python/RSPEC-5443/

* Fix temp dir security risk https://rules.sonarsource.com/python/RSPEC-5443/

* fix code coverage calculation

---------

Co-authored-by: Frank Greguska <[email protected]>

* /version 1.4.0a3

* Feature/issue 201 Create a table for tracking granule ingest status (#214)

* Define track ingest database and IAM permissions

* Update changelog with issue

* Modify table structure to support sparse status index

* Updated to only apply PITR in ops

---------

Co-authored-by: Frank Greguska <[email protected]>

* /version 1.4.0a4

* Feature/issue 210 - Load large geometry polygons (#219)

* add functions to handle null geometries and convert polygons to points

* update doi in docs

* fix fill null geometries

* fix tests and update changelog

* /version 1.4.0a5

* Feature/issue 222 - Add granule info to track ingest table on load (#223)

* adjust lambdas to populate track ingest table on granule load

* changelog

* remove test cnm

* lint

* change error caught when handling checksum

* update lambda role permissions to write to track ingest table

* fix typo on lake table terraform

* set default fill values for checksum and rev date in track status

* fix checksum handling in bulk load data

* lint

* add logging to debug

* /version 1.4.0a6

* Add SSM parameter read for last run time

* Feature/issue-225: Create one track ingest table per feature type (#226)

* add track ingest tables for each feature type and adjust load data to populate

* changelog

* /version 1.4.0a7

* Feature/issue 196 Add new feature type to query the API for lake data (#224)

* Initial API queries for lake data

* Unit tests for lake data

* Updates after center point calculations

- Removed temp code to calculate a point in API
- Implemented unit test to test lake data retrieval
- Updated fixtures to load in lake data for testing

* Add read lake table permissions to lambda timeseries and track ingest roles

* Update documenation to include lake data

* Updated documentation to include info on lake centerpoints

---------

Co-authored-by: Frank Greguska <[email protected]>

* /version 1.4.0a8

* Feature/issue 205 - Add Confluence API key (#221)

* Fix possible variable references before value is assigned

* Define Confluence API key and trusted partner plan limits

* Define a list of trusted partner keys and store under single parameter

* Define API keys as encrypted envrionment variables for Lambda authorizer

* Update authorizer and connection class to use KMS to retrieve API keys

* Hack to force lambda deployment when ssm value changes (#218)

* Add replace_triggered_by to hydrocron_lambda_authorizer

* Introduce environment variable that contains random id which will change whenever an API key value changes. This will force lambda to publish new version of the function.

* Remove unnecessary hash function

* Update to SSM parameter API key storage and null_resource enviroment variable

* Update Terraform and AWS provider

* Update API key documentation

* Set source_code_hash to force deployment of new image

* Downgrade AWS provider to 4.0 to remove inline policy errors

* Update docs/timeseries.md

---------

Co-authored-by: Frank Greguska <[email protected]>

* /version 1.4.0a9

* /version 1.4.0a10

* changelog for 1.4.0 release

* /version 1.5.0a0

* Initial track ingest table query

* Fix linting and code style

* Implement feature count operations

* Enable S3 permissions and set environment variable for track lambda

* Fix trailing white spaces and code format

* Update docstrings for class methods

* Implement run time storage in SSM

* Query track table unit tests

* Update CHANGELOG with issue

* Update SSM run time parameter

* Fix trailing whitespace

* Fix reference to IAM policy

* Enable specification of temporal range to search revision date by

* Fix SSM put parameter policy

* Update IAM permissions for reading track ingest

* Enable full temporal search on CMR granules

* Add capability to download shapefile granule to count features

* Update granule UR to include .zip

* Count features via Hydrocron table query

* Remove unnecessary s3 permissions

* Remove whitespace from blank line

* Update cryptography to 43.0.1

* Update track ingest table operations

* Update changelog with issue

* update dependencies

* upgrade geopandas

* update dependencies

---------

Co-authored-by: nikki-t <[email protected]>
Co-authored-by: Frank Greguska <[email protected]>
Co-authored-by: frankinspace <[email protected]>
Co-authored-by: Victoria McDonald <[email protected]>
Co-authored-by: Cassie Nickles <[email protected]>
Co-authored-by: cassienickles <[email protected]>
Co-authored-by: podaac-cicd[bot] <podaac-cicd[bot]@users.noreply.github.com>
Co-authored-by: Victoria McDonald <[email protected]>
Co-authored-by: torimcd <[email protected]>
* /version 1.3.0a0

* Update build.yml

* /version 1.3.0a1

* /version 1.3.0a2

* Feature/issue 175 - Update docs to point to OPS (#176)

* changelog

* update examples, remove load_data readme, info moved to wiki

* Dependency update to fix snyk scan

* issues/101: Support for HTTP Accept header (#172)

* Reorganize timeseries code to  prep for Accept header

* Enable Accept header to return response of specific content-type

* Fix whitespace and string continuation

* Make error handling consistent and add an additional test where a reach can't be found

* Update changelog with issue for unreleased version

* Add 415 status code to API definition

* Few minor cleanup items

* Few minor cleanup items

* Update to [email protected]

* Fix dependencies

---------

Co-authored-by: Frank Greguska <[email protected]>

* /version 1.3.0a3

* issues/102: Support compression of API response (#173)

* Enable payload compression

* Update changelog with issue

---------

Co-authored-by: Frank Greguska <[email protected]>

* /version 1.3.0a4

* Feature/issue 100 Add option to 'compact' GeoJSON result into single feature (#177)

* Reorganize timeseries code to  prep for Accept header

* Enable Accept header to return response of specific content-type

* Fix whitespace and string continuation

* Make error handling consistent and add an additional test where a reach can't be found

* Update changelog with issue for unreleased version

* Add 415 status code to API definition

* Few minor cleanup items

* Few minor cleanup items

* Update to [email protected]

* Fix dependencies

* Update required query parameters based on current API functionality

* Enable return of 'compact' GeoJSON response

* Fix linting and add test data

* Update documentation for API accept headers and compact GeoJSON response

* Fix references to incorrect Accept header examples

---------

Co-authored-by: Frank Greguska <[email protected]>

* /version 1.3.0a5

* Feature/issue 183 (#185)

* Provide introduction to timeseries endpoint

* Remove _units in fields list

* Fix typo

* Update examples with Accept headers and compact query parameter

* Add issue to changelog

* Fix typo in timeseries documentation

* Update pymysql

* Update pymysql

* Provide clarity on accept headers and request parameter fields

* /version 1.3.0a6

* Feature/issue 186 Implement API keys (#188)

* API Gateway Lambda authorizer to facilitate API keys and usage plans

* Unit tests to test Lambda authorizer

* Fix terraform file formatting

* API Gateway Lambda Authorizer

- Lambda function
- API Keys and Authorizer definition in OpenAPI spec
- API gateway API keys
- API gateway usage plans
- SSM parameters for API keys

* Fix trailing whitespace

* Set default region environment variable

* Fix SNYK vulnerabilities

* Add issue to changelog

* Implement custom trusted partner header x-hydrocron-key

* Update cryptography for SNYK vulnerability

* Update documentation to include API key usage

* Update quota and throttle settings for API Gateway

* Update API keys documentation to indicate to be implemented

* Move API key lookup to Lambda INIT

* Remove API key authentication and update API key to x-hydrocron-key

* /version 1.3.0a7

* Update changelog for 1.3.0 release

* /version 1.4.0a0

* Feature/issue 198 (#207)

* Update pylint to deal with errors and fix collection reference

* Initial CMR and Hydrocron queries

- Includes placeholders for other operations needed to track granule
ingest.
- GranuleUR query for Hydrocron tables.

* Add and set up vcrpy for testing CMR API query

* Test track ingest operations

- Test CMR and hydrocron queries
- Test granuleUR query
- Update database to include granuleUR GSI

* Update to use track_ingest naming consistently

* Initial Lambda function and IAM role definition

* Replace deprecated path function with as_file

* Add SSM read IAM permissions

* Add DynamoDB read permissions

* Update track ingest lambda memory

* Remove duplicate IAM permissions

* Add in permissions to query index

* Update changelog

* Update changelog description

* Use python_cmr for CMR API queries

* /version 1.4.0a1

* Add doi to documentation pages (#216)

* Update intro.md with DOI

* Update overview.md with DOI

* /version 1.4.0a2

* issue-193: Add Dynamo DB Table for SWOT Prior Lakes (#209)

* add code to handle prior lakes shapefiles, add test prior lake data

* update terraform to add prior lake table

* fix tests, change to smaller test data file, changelog

* linting

* reconfigure main load_data method to make more readable and pass linting

* lint

* lint

* fix string casting to lower storage req & update test responses to handle different rounding pattern in coords

* update load benchmarking function for linting and add unit test

* try parent collection for lakes

* update version parsing for parent collection

* fix case error

* fix lake id reference

* add logging to troubleshoot too large features

* add item size logging and remove error raise for batch write

* clean up logging statements & move numeric_columns assignment

* update batch logging statement

* Rename constant

* Fix temp dir security risk https://rules.sonarsource.com/python/RSPEC-5443/

* Fix temp dir security risk https://rules.sonarsource.com/python/RSPEC-5443/

* fix code coverage calculation

---------

Co-authored-by: Frank Greguska <[email protected]>

* /version 1.4.0a3

* Feature/issue 201 Create a table for tracking granule ingest status (#214)

* Define track ingest database and IAM permissions

* Update changelog with issue

* Modify table structure to support sparse status index

* Updated to only apply PITR in ops

---------

Co-authored-by: Frank Greguska <[email protected]>

* /version 1.4.0a4

* Feature/issue 210 - Load large geometry polygons (#219)

* add functions to handle null geometries and convert polygons to points

* update doi in docs

* fix fill null geometries

* fix tests and update changelog

* /version 1.4.0a5

* Feature/issue 222 - Add granule info to track ingest table on load (#223)

* adjust lambdas to populate track ingest table on granule load

* changelog

* remove test cnm

* lint

* change error caught when handling checksum

* update lambda role permissions to write to track ingest table

* fix typo on lake table terraform

* set default fill values for checksum and rev date in track status

* fix checksum handling in bulk load data

* lint

* add logging to debug

* /version 1.4.0a6

* Add SSM parameter read for last run time

* Feature/issue-225: Create one track ingest table per feature type (#226)

* add track ingest tables for each feature type and adjust load data to populate

* changelog

* /version 1.4.0a7

* Feature/issue 196 Add new feature type to query the API for lake data (#224)

* Initial API queries for lake data

* Unit tests for lake data

* Updates after center point calculations

- Removed temp code to calculate a point in API
- Implemented unit test to test lake data retrieval
- Updated fixtures to load in lake data for testing

* Add read lake table permissions to lambda timeseries and track ingest roles

* Update documenation to include lake data

* Updated documentation to include info on lake centerpoints

---------

Co-authored-by: Frank Greguska <[email protected]>

* /version 1.4.0a8

* Feature/issue 205 - Add Confluence API key (#221)

* Fix possible variable references before value is assigned

* Define Confluence API key and trusted partner plan limits

* Define a list of trusted partner keys and store under single parameter

* Define API keys as encrypted envrionment variables for Lambda authorizer

* Update authorizer and connection class to use KMS to retrieve API keys

* Hack to force lambda deployment when ssm value changes (#218)

* Add replace_triggered_by to hydrocron_lambda_authorizer

* Introduce environment variable that contains random id which will change whenever an API key value changes. This will force lambda to publish new version of the function.

* Remove unnecessary hash function

* Update to SSM parameter API key storage and null_resource enviroment variable

* Update Terraform and AWS provider

* Update API key documentation

* Set source_code_hash to force deployment of new image

* Downgrade AWS provider to 4.0 to remove inline policy errors

* Update docs/timeseries.md

---------

Co-authored-by: Frank Greguska <[email protected]>

* /version 1.4.0a9

* /version 1.4.0a10

* changelog for 1.4.0 release

* /version 1.5.0a0

* Initial track ingest table query

* Fix linting and code style

* Implement feature count operations

* Enable S3 permissions and set environment variable for track lambda

* Fix trailing white spaces and code format

* Update docstrings for class methods

* Implement run time storage in SSM

* Query track table unit tests

* Update CHANGELOG with issue

* Update SSM run time parameter

* Fix trailing whitespace

* Fix reference to IAM policy

* Enable specification of temporal range to search revision date by

* Fix SSM put parameter policy

* Update IAM permissions for reading track ingest

* Enable full temporal search on CMR granules

* Add capability to download shapefile granule to count features

* Update granule UR to include .zip

* Count features via Hydrocron table query

* Remove unnecessary s3 permissions

* Remove whitespace from blank line

* Update cryptography to 43.0.1

* Update track ingest table operations

* Update changelog with issue

* update dependencies

* upgrade geopandas

* update dependencies

* Implement operations to publish CNM messages for granules requiring ingest

* Implement unit test of publication operations

* Fix linting

* Add issue to changelog and fix linting

* Add EventBridge schedules with appropriate Lambda permissions

* Set initial schedule expressions and fix assume policy

* Disable eventbridge schedules by default

* Update schedule to run weekly

* Define 1 hour latency to search by revision_date in CMR

---------

Co-authored-by: nikki-t <[email protected]>
Co-authored-by: Frank Greguska <[email protected]>
Co-authored-by: frankinspace <[email protected]>
Co-authored-by: Victoria McDonald <[email protected]>
Co-authored-by: Cassie Nickles <[email protected]>
Co-authored-by: cassienickles <[email protected]>
Co-authored-by: podaac-cicd[bot] <podaac-cicd[bot]@users.noreply.github.com>
Co-authored-by: Victoria McDonald <[email protected]>
Co-authored-by: torimcd <[email protected]>
* /version 1.3.0a0

* Update build.yml

* /version 1.3.0a1

* /version 1.3.0a2

* Feature/issue 175 - Update docs to point to OPS (#176)

* changelog

* update examples, remove load_data readme, info moved to wiki

* Dependency update to fix snyk scan

* issues/101: Support for HTTP Accept header (#172)

* Reorganize timeseries code to  prep for Accept header

* Enable Accept header to return response of specific content-type

* Fix whitespace and string continuation

* Make error handling consistent and add an additional test where a reach can't be found

* Update changelog with issue for unreleased version

* Add 415 status code to API definition

* Few minor cleanup items

* Few minor cleanup items

* Update to [email protected]

* Fix dependencies

---------

Co-authored-by: Frank Greguska <[email protected]>

* /version 1.3.0a3

* issues/102: Support compression of API response (#173)

* Enable payload compression

* Update changelog with issue

---------

Co-authored-by: Frank Greguska <[email protected]>

* /version 1.3.0a4

* Feature/issue 100 Add option to 'compact' GeoJSON result into single feature (#177)

* Reorganize timeseries code to  prep for Accept header

* Enable Accept header to return response of specific content-type

* Fix whitespace and string continuation

* Make error handling consistent and add an additional test where a reach can't be found

* Update changelog with issue for unreleased version

* Add 415 status code to API definition

* Few minor cleanup items

* Few minor cleanup items

* Update to [email protected]

* Fix dependencies

* Update required query parameters based on current API functionality

* Enable return of 'compact' GeoJSON response

* Fix linting and add test data

* Update documentation for API accept headers and compact GeoJSON response

* Fix references to incorrect Accept header examples

---------

Co-authored-by: Frank Greguska <[email protected]>

* /version 1.3.0a5

* Feature/issue 183 (#185)

* Provide introduction to timeseries endpoint

* Remove _units in fields list

* Fix typo

* Update examples with Accept headers and compact query parameter

* Add issue to changelog

* Fix typo in timeseries documentation

* Update pymysql

* Update pymysql

* Provide clarity on accept headers and request parameter fields

* /version 1.3.0a6

* Feature/issue 186 Implement API keys (#188)

* API Gateway Lambda authorizer to facilitate API keys and usage plans

* Unit tests to test Lambda authorizer

* Fix terraform file formatting

* API Gateway Lambda Authorizer

- Lambda function
- API Keys and Authorizer definition in OpenAPI spec
- API gateway API keys
- API gateway usage plans
- SSM parameters for API keys

* Fix trailing whitespace

* Set default region environment variable

* Fix SNYK vulnerabilities

* Add issue to changelog

* Implement custom trusted partner header x-hydrocron-key

* Update cryptography for SNYK vulnerability

* Update documentation to include API key usage

* Update quota and throttle settings for API Gateway

* Update API keys documentation to indicate to be implemented

* Move API key lookup to Lambda INIT

* Remove API key authentication and update API key to x-hydrocron-key

* /version 1.3.0a7

* Update changelog for 1.3.0 release

* /version 1.4.0a0

* Feature/issue 198 (#207)

* Update pylint to deal with errors and fix collection reference

* Initial CMR and Hydrocron queries

- Includes placeholders for other operations needed to track granule
ingest.
- GranuleUR query for Hydrocron tables.

* Add and set up vcrpy for testing CMR API query

* Test track ingest operations

- Test CMR and hydrocron queries
- Test granuleUR query
- Update database to include granuleUR GSI

* Update to use track_ingest naming consistently

* Initial Lambda function and IAM role definition

* Replace deprecated path function with as_file

* Add SSM read IAM permissions

* Add DynamoDB read permissions

* Update track ingest lambda memory

* Remove duplicate IAM permissions

* Add in permissions to query index

* Update changelog

* Update changelog description

* Use python_cmr for CMR API queries

* /version 1.4.0a1

* Add doi to documentation pages (#216)

* Update intro.md with DOI

* Update overview.md with DOI

* /version 1.4.0a2

* issue-193: Add Dynamo DB Table for SWOT Prior Lakes (#209)

* add code to handle prior lakes shapefiles, add test prior lake data

* update terraform to add prior lake table

* fix tests, change to smaller test data file, changelog

* linting

* reconfigure main load_data method to make more readable and pass linting

* lint

* lint

* fix string casting to lower storage req & update test responses to handle different rounding pattern in coords

* update load benchmarking function for linting and add unit test

* try parent collection for lakes

* update version parsing for parent collection

* fix case error

* fix lake id reference

* add logging to troubleshoot too large features

* add item size logging and remove error raise for batch write

* clean up logging statements & move numeric_columns assignment

* update batch logging statement

* Rename constant

* Fix temp dir security risk https://rules.sonarsource.com/python/RSPEC-5443/

* Fix temp dir security risk https://rules.sonarsource.com/python/RSPEC-5443/

* fix code coverage calculation

---------

Co-authored-by: Frank Greguska <[email protected]>

* /version 1.4.0a3

* Feature/issue 201 Create a table for tracking granule ingest status (#214)

* Define track ingest database and IAM permissions

* Update changelog with issue

* Modify table structure to support sparse status index

* Updated to only apply PITR in ops

---------

Co-authored-by: Frank Greguska <[email protected]>

* /version 1.4.0a4

* Feature/issue 210 - Load large geometry polygons (#219)

* add functions to handle null geometries and convert polygons to points

* update doi in docs

* fix fill null geometries

* fix tests and update changelog

* /version 1.4.0a5

* Feature/issue 222 - Add granule info to track ingest table on load (#223)

* adjust lambdas to populate track ingest table on granule load

* changelog

* remove test cnm

* lint

* change error caught when handling checksum

* update lambda role permissions to write to track ingest table

* fix typo on lake table terraform

* set default fill values for checksum and rev date in track status

* fix checksum handling in bulk load data

* lint

* add logging to debug

* /version 1.4.0a6

* Add SSM parameter read for last run time

* Feature/issue-225: Create one track ingest table per feature type (#226)

* add track ingest tables for each feature type and adjust load data to populate

* changelog

* /version 1.4.0a7

* Feature/issue 196 Add new feature type to query the API for lake data (#224)

* Initial API queries for lake data

* Unit tests for lake data

* Updates after center point calculations

- Removed temp code to calculate a point in API
- Implemented unit test to test lake data retrieval
- Updated fixtures to load in lake data for testing

* Add read lake table permissions to lambda timeseries and track ingest roles

* Update documenation to include lake data

* Updated documentation to include info on lake centerpoints

---------

Co-authored-by: Frank Greguska <[email protected]>

* /version 1.4.0a8

* Feature/issue 205 - Add Confluence API key (#221)

* Fix possible variable references before value is assigned

* Define Confluence API key and trusted partner plan limits

* Define a list of trusted partner keys and store under single parameter

* Define API keys as encrypted envrionment variables for Lambda authorizer

* Update authorizer and connection class to use KMS to retrieve API keys

* Hack to force lambda deployment when ssm value changes (#218)

* Add replace_triggered_by to hydrocron_lambda_authorizer

* Introduce environment variable that contains random id which will change whenever an API key value changes. This will force lambda to publish new version of the function.

* Remove unnecessary hash function

* Update to SSM parameter API key storage and null_resource enviroment variable

* Update Terraform and AWS provider

* Update API key documentation

* Set source_code_hash to force deployment of new image

* Downgrade AWS provider to 4.0 to remove inline policy errors

* Update docs/timeseries.md

---------

Co-authored-by: Frank Greguska <[email protected]>

* /version 1.4.0a9

* /version 1.4.0a10

* changelog for 1.4.0 release

* update dependencies for 1.4.0 release

* /version 1.5.0a0

* fix CMR query in UAT

* /version 1.4.0rc1

* fix typo in load_data lambda

* /version 1.4.0rc2

* Initial track ingest table query

* Fix linting and code style

* Implement feature count operations

* Enable S3 permissions and set environment variable for track lambda

* Fix trailing white spaces and code format

* Update docstrings for class methods

* Implement run time storage in SSM

* Query track table unit tests

* Update CHANGELOG with issue

* Update SSM run time parameter

* Fix trailing whitespace

* Fix reference to IAM policy

* Enable specification of temporal range to search revision date by

* Fix SSM put parameter policy

* Update IAM permissions for reading track ingest

* Enable full temporal search on CMR granules

* Add capability to download shapefile granule to count features

* Update granule UR to include .zip

* Count features via Hydrocron table query

* Remove unnecessary s3 permissions

* Remove whitespace from blank line

* Update cryptography to 43.0.1

* Update track ingest table operations

* Update changelog with issue

* update dependencies

* upgrade geopandas

* update dependencies

* fix index on rev date in load data lambda

* update dependencies

* lint readme

* /version 1.4.0rc3

* /version 1.4.0rc4

* Implement operations to publish CNM messages for granules requiring ingest

* Implement unit test of publication operations

* Fix linting

* Add issue to changelog and fix linting

* Add EventBridge schedules with appropriate Lambda permissions

* Set initial schedule expressions and fix assume policy

* fix cmr env search by venue

* /version 1.4.0rc5

* Disable eventbridge schedules by default

* Update schedule to run weekly

* Define 1 hour latency to search by revision_date in CMR

* Allow CMR UAT query based on HYDROCRON_ENV environment variable

* Update unit tests to accomodate UAT CMR query

* Add earthdata login credentials to Lambda

* Add issue to changelog

* Fix linting white space

---------

Co-authored-by: nikki-t <[email protected]>
Co-authored-by: Frank Greguska <[email protected]>
Co-authored-by: frankinspace <[email protected]>
Co-authored-by: Victoria McDonald <[email protected]>
Co-authored-by: Cassie Nickles <[email protected]>
Co-authored-by: cassienickles <[email protected]>
Co-authored-by: podaac-cicd[bot] <podaac-cicd[bot]@users.noreply.github.com>
Co-authored-by: Victoria McDonald <[email protected]>
Co-authored-by: torimcd <[email protected]>
… that aren't loaded into Hydrocron (#245)

* Raise an error if collection shortname does not match Hydrocron table names

* Raise an error unsupported lake data in load granule operations

* Remove trailing whitespace

* Fix code formatting

* Update CHANGELOG with issue

* Feature/issue 248 - Track ingest operations need to query UAT for granule files (#249)

* Query to return granule files should query UAT when running in SIT or UAT environments

* SIT execution should return UAT files for load granule operations

* Set venue environment variable before running test of query_cmr

* Add issue to CHANGELOG
* Handle overlapping times with unique CRIDs

* Define unit test to test when reprocessed granule arrives

* Add issue to CHANGELOG

* Add reprocessed CRID to eventbridge schedule input

* Handle cases with empty reprocessed_crid
… loadedd (#259)

* change assemble attrs function to avoid for loop

* change how attributes are concatenated during shp unpack to avoid slow looping

* remove unused import

* Update API test data with less precise data coordinates

* remove logging every item in batch writer

* lint

---------

Co-authored-by: Nikki <[email protected]>
@nikki-t
Copy link
Collaborator Author

nikki-t commented Nov 11, 2024

Github Issue: #239

Description

Testing in UAT revealed the need for several modifications: #239 (comment)

Overview of work done

  • Increase track ingest lambda timeout to 15 minutes.
  • Increase load granule lambda timeout to 15 minutes and increase memory to 8GB.
  • Provide debug logging for track ingest operations and an environment variable to control debugging logs (DEBUG_LOGS).
  • Modify track ingest operations to query in batches of 500 for granules with a "to_ingest" status to prevent timeouts. Uses BATCH_STATUS environmental variable for batch sizes.
  • Modify track ingest operations to handle the product counter increments.

Overview of verification done

  • Existing functionality preserved.
  • Existing unit tests pass.
  • Added test for incremented product counter.
  • Added test for previous product counter.

Overview of integration done

Executed on CRID cases handled locally on OPS CMR as no overlapping PGC0 and PIC0 granules are loaded into UAT to confirm behavior was preserved. Did not execute publish_cnm.

Executed on Hydrocron SWOT data in services UAT and ran a through times to process granules that require ingestion and work through product increment counters, final logs:

Prior Lakes

2024-11-08T20:40:53.992Z [INFO] 2024-11-08T20:40:53.991Z Collection shortname: SWOT_L2_HR_LakeSP_prior_2.0 
2024-11-08T20:40:53.992Z [INFO] 2024-11-08T20:40:53.992Z Hydrocron table: hydrocron-swot-prior-lake-table 
2024-11-08T20:40:53.992Z [INFO] 2024-11-08T20:40:53.992Z Hydrocron track ingest table: hydrocron-swot-prior-lake-track-ingest-table 
2024-11-08T20:40:53.992Z [INFO] 2024-11-08T20:40:53.992Z Temporal indicator for revision dates: True 
2024-11-08T20:40:53.992Z [INFO] 2024-11-08T20:40:53.992Z Temporal start date: 2024-08-17 00:00:00+00:00 
2024-11-08T20:40:53.992Z [INFO] 2024-11-08T20:40:53.992Z Temporal end date: 2024-08-25 23:59:59+00:00 
2024-11-08T20:40:53.992Z [INFO] 2024-11-08T20:40:53.992Z Reprocessed CRID: PGC0 
2024-11-08T20:40:53.992Z [INFO] 2024-11-08T20:40:53.992Z Environment: UAT 
2024-11-08T20:40:53.992Z [INFO] 2024-11-08T20:40:53.992Z Querying CMR temporal range: 2024-08-17 00:00:00+00:00 to 2024-08-25 23:59:59+00:00. 
2024-11-08T20:41:02.248Z [INFO] 2024-11-08T20:41:02.248Z Located 460 granules in CMR. 
... 
2024-11-08T20:41:24.451Z [INFO] 2024-11-08T20:41:24.451Z Located 22 granules NOT in Hydrocron. 
2024-11-08T20:41:24.451Z [INFO] 2024-11-08T20:41:24.451Z Located 22 unique CRID granules NOT in Hydrocron. 
2024-11-08T20:41:24.471Z [INFO] 2024-11-08T20:41:24.471Z Located 23 granules with 'to_ingest' status. 
2024-11-08T20:41:24.544Z [INFO] 2024-11-08T20:41:24.544Z Located incremented product counter: SWOT_L2_HR_LakeSP_Prior_019_527_EU_20240818T144101_20240818T145310_PIC0_02.zip. 
2024-11-08T20:41:25.506Z [INFO] 2024-11-08T20:41:25.506Z Located 22 granules that require ingestion. 
2024-11-08T20:41:25.506Z [INFO] 2024-11-08T20:41:25.506Z Located 0 granules that are already ingested. 
...

Confirmed that the 22 remaining granules are all granules that have been loaded in SWOT-UAT but our only 27 bytes in size so they are "empty" granules that Hydrocron cannot ingest as they are text files with the following content: HTTP Basic: Access denied.

Reaches

2024-11-11T22:32:30.238Z [INFO] 2024-11-11T22:32:30.237Z Collection shortname: SWOT_L2_HR_RiverSP_reach_2.0
2024-11-11T22:32:30.238Z [INFO] 2024-11-11T22:32:30.238Z Hydrocron table: hydrocron-swot-reach-table
2024-11-11T22:32:30.238Z [INFO] 2024-11-11T22:32:30.238Z Hydrocron track ingest table: hydrocron-swot-reach-track-ingest-table
2024-11-11T22:32:30.238Z [INFO] 2024-11-11T22:32:30.238Z Temporal indicator for revision dates: True
2024-11-11T22:32:30.238Z [INFO] 2024-11-11T22:32:30.238Z Temporal start date: 2023-07-27 00:00:00+00:00
2024-11-11T22:32:30.238Z [INFO] 2024-11-11T22:32:30.238Z Temporal end date: 2024-10-30 23:59:59+00:00
2024-11-11T22:32:30.238Z [INFO] 2024-11-11T22:32:30.238Z Reprocessed CRID: PGC0
2024-11-11T22:32:30.238Z [INFO] 2024-11-11T22:32:30.238Z Environment: UAT
2024-11-11T22:32:30.238Z [INFO] 2024-11-11T22:32:30.238Z Querying CMR temporal range: 2023-07-27 00:00:00+00:00 to 2024-10-30 23:59:59+00:00.
2024-11-11T22:32:34.835Z [INFO] 2024-11-11T22:32:34.835Z Located 490 granules in CMR. 
...
2024-11-11T22:37:42.354Z [INFO] 2024-11-11T22:37:42.354Z Located 256 granules NOT in Hydrocron.
2024-11-11T22:37:42.355Z [INFO] 2024-11-11T22:37:42.354Z Located 104 unique CRID granules NOT in Hydrocron after removing duplicates. 
...
2024-11-11T22:38:00.979Z [INFO] 2024-11-11T22:38:00.978Z Located 72 final granules NOT in Hydrocron after product counter filter.
2024-11-11T22:38:00.999Z [INFO] 2024-11-11T22:38:00.999Z Located 93 granules with 'to_ingest' status.
2024-11-11T22:38:01.000Z [INFO] 2024-11-11T22:38:01.000Z Located 86 unique granules with 'to_ingest' status. 
2024-11-11T22:38:19.743Z [INFO] 2024-11-11T22:38:19.741Z Located 72 granules that require ingestion before de-duplication.
2024-11-11T22:38:19.743Z [INFO] 2024-11-11T22:38:19.743Z Located 72 granules that require ingestion.
2024-11-11T22:38:19.743Z [INFO] 2024-11-11T22:38:19.743Z Located 0 granules that are already ingested. 

The remaining 72 granules include 65 granules that have been loaded into SWOT-UAT but our only 27 bytes in size so they are "empty" granules that Hydrocron cannot ingest. There are 8 granules that have a 0 feature count and will always be picked up by the Track Ingest operations when running on the temporal range that includes them but will not have any impact if ingested as there are 0 features.


def __init__(self, collection_shortname, collection_start_date, hydrocron_table):
FEATURE_ID = {
"SWOT_L2_HR_RiverSP_reach_2.0": "reach_id",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to be careful about putting things in too many different places throughout the code that will need to be updated with new collection names/versions - should this be moved to the constants file? If not let's document that it will need to be updated with the table names etc for each version. Maybe this is ok as-is for now but we roll changing this into the 1.6.0 release that will have the new table/collection dictionary in the constants file. What do you think @nikki-t

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants