Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues/23: Enable Harmony write output to user bucket #46

Merged
merged 16 commits into from
Dec 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .github/workflows/cicd-pipeline.yml
Original file line number Diff line number Diff line change
Expand Up @@ -137,9 +137,12 @@ jobs:
- name: Install conda
uses: conda-incubator/setup-miniconda@v3
with:
channels: conda-forge
activate-environment: bignbit
environment-file: conda-environment.yaml
auto-activate-base: false
conda-remove-defaults: "true"
miniforge-version: latest
- name: Install package
run: poetry install
- name: Lint
Expand All @@ -159,7 +162,7 @@ jobs:
terraform init -backend=false -upgrade
terraform validate -no-color
- name: SonarCloud Scan
uses: sonarsource/sonarcloud-github-action@master
uses: sonarsource/sonarqube-scan-action@v4
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }}
Expand Down
8 changes: 4 additions & 4 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -137,11 +137,11 @@ dmypy.json
# Generated samples
tests/2021*
/terraform/big-lambda.zip
/terraform_deploy/tfplan
/examples/cumulus-tf/tfplan
.terraform.d
.terraform
/.bash_history
/terraform_deploy/cumulus-message-adapter.zip
/terraform_deploy/.terraform.lock.hcl
/examples/cumulus-tf/cumulus-message-adapter.zip
bignbit-lambda.zip
terraform_deploy/bignbit_module/
/examples/cumulus-tf/bignbit_module/
!/examples/cumulus-tf/.terraform.lock.hcl
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,18 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]
### Added
- [issues/9](https://github.com/podaac/bignbit/issues/9): Added some documentation for installing as cumulus module
### Changed
- [issues/15](https://github.com/podaac/bignbit/issues/15): Change 'convertToPNG' choice to a generic send to harmony choice
- [issues/16](https://github.com/podaac/bignbit/issues/16): Change apply opera treatment choice and lambda to be specific to HLS
- [issues/23](https://github.com/podaac/bignbit/issues/23): Harmony requests now include `destinationUrl` parameter to place output
directly in s3 bucket instead of requiring data to be copied.
- [issues/41](https://github.com/podaac/bignbit/issues/41): Module no longer depends on deprecated hashicorp/template provider
- [issues/42](https://github.com/podaac/bignbit/issues/42): Terraform version upgraded to v1.5.3
### Deprecated
### Removed
- [issues/7](https://github.com/podaac/bignbit/issues/15): Remove the wait for GITC response
- [issues/23](https://github.com/podaac/bignbit/issues/23): Removed `lambda_role` module variable. The lambda role is now created as part of the module, `permissions_boundary_arn` is required instead.
### Fixed
- [issues/36](https://github.com/podaac/bignbit/issues/36): Support datetimes without microseconds
### Security
Expand Down
181 changes: 178 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,190 @@
<!-- TOC -->
* [About](#about)
* [Installing this module](#installing-this-module)
* [Configuring a collection](#configuring-a-collection)
* [Harmony requests](#harmony-requests)
* [Module Inputs](#module-inputs)
* [Module Outputs](#module-outputs)
* [Assumptions](#assumptions)
* [Step Function](#step-function)
* [Local Development](#local-development)
* [MacOS](#macos)
<!-- TOC -->

# About

bignbit is a Cumulus module that can be installed as a post-ingest workflow to generate browse imagery via Harmony and then transfer that imagery to GIBS.

See an example of how to use this cumulus module in [browse_image_workflow.tf](/examples/cumulus-tf/browse_image_workflow.tf).
In general, the high level steps are:

_Visual representation of the bignbit step function state machine:_
![image](https://github.com/podaac/bignbit/assets/89428916/4db5ae4b-ca74-4bee-bee7-2d7d1a342bd7)
1. For each configured variable within the granule being processed, generate browse imagery via Harmony and store it in S3.
2. Generate a browse image metadata file for GIBS for each image produced by Harmony.
3. Construct a CNM message for each image that includes the image, metadata, and optional world file
4. Send the CNM messages to GIBS via SQS.
5. Wait for GIBS to process the CNM messages and send a success or failure response back to an SNS topic.
6. Record the result of GIBS processing in S3.

# Installing this module

1. Add a post-ingest workflow to the cumulus ingest workflow. For example:
```
{
"StartAt":"BIGChoice",
"States":{
"BIGChoice":{
"Type":"Choice",
"Choices":[
{
"And":[
{
"Variable":"$.meta.collection.meta.workflowChoice.browseimage",
"IsPresent":true
},
{
"Variable":"$.meta.collection.meta.workflowChoice.browseimage",
"BooleanEquals":true
}
],
"Next":"QueueGranulesToBIG"
}
],
"Default":"BIGSucceed"
},
"QueueGranulesToBIG":{
"Parameters":{
"cma":{
"event.$":"$",
"task_config":{
"provider":"{$.meta.provider}",
"internalBucket":"{$.meta.buckets.internal.name}",
"stackName":"{$.meta.stack}",
"granuleIngestWorkflow":"BrowseImageWorkflow",
"queueUrl": "${aws_sqs_queue.big_background_job_queue.id}"
}
}
},
"Type":"Task",
"Resource":"${module.cumulus.queue_granules_task.task_arn}",
"Retry":[
{
"ErrorEquals":[
"States.ALL"
],
"IntervalSeconds":5,
"MaxAttempts":3
}
],
"Catch":[
{
"ErrorEquals":[
"States.ALL"
],
"ResultPath":"$.exception",
"Next":"BIGFail"
}
],
"Next": "BIGSucceed"
},
"BIGFail":{
"Type":"Fail"
},
"BIGSucceed":{
"Type":"Succeed"
}
}
```
2. Add a new terraform script to the `cumulus-deploy-tf` scripts used to deploy cumulus. This script
should define the bignbit module and the bignbit step function state machine. See an example in [browse_image_workflow.tf](/examples/cumulus-tf/browse_image_workflow.tf).
3. [Configure one or more collections](#configuring-a-collection)


# Configuring a collection

1. Add config file to the `config_bucket`. The file should be named "_collection shortname_.cfg" and the contents should be JSON
2. Associate the UMM-C record to the appropriate Harmony service (HyBIG, net2cog, etc...)

## Harmony requests

> [!IMPORTANT]
> bignbit uses the [user owned bucket](https://harmony.earthdata.nasa.gov/docs#user-owned-buckets-for-harmony-output) parameter
> when making Harmony requests. If an existing bucket is configured for the `bignbit_staging_bucket` parameter, it must
> have a bucket policy that allows Harmony write permission and GIBS read permission. If `bignbit_staging_bucket` is left blank, bignbit will
> create a new S3 bucket (named `svc-${var.app_name}-${var.prefix}-staging`) and apply the correct permissions automatically.
> This bucket will also automatically expire objects older than 30 days.

bignbit uses the harmony-py library to construct the Harmony requests for generating images. Most of the parameters
are extracted from the CMA message as a granule is being processed but the `width` and `height` parameters
can be set via configuration. **Each variable** configured for imaging will result in a unique call to Harmony.

See `bignbit.submit_harmony_job.generate_harmony_request` for details on how the Harmony request is constructed.

# Module Inputs

This module uses the following input variables:

| Name | Type | Description | Default Value |
|----------------------------|--------------|----------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------|
| stage | string | Environment used for resource tagging (dev, int, ops, etc...) | |
| prefix | string | Prefix used for resource naming (project name, env name, etc...) | |
| data_buckets | list(string) | List of buckets where data is stored. Lambdas will be given read/write access to these buckets. | [] |
| config_bucket | string | Bucket where dataset configuration is stored | |
| config_dir | string | Path relative to `config_bucket` where dataset configuration is stored | "datset-config" |
| pobit_audit_bucket | string | S3 bucket where messages exchanged with GITC will be saved. Typically the cumulus internal bucket | |
| pobit_audit_path | string | Path relative to `pobit_audit_bucket` where messages exchanged with GITC will be saved. | "pobit-cma-output" |
| bignbit_staging_bucket | string | S3 bucket where generated images will be saved. Leave blank to use bucket managed by this module. | _create new bucket named svc-${var.app_name}-${var.prefix}-staging_ |
| harmony_staging_path | string | Path relative to `bignbit_staging_bucket` where harmony results will be saved. | "bignbit-harmony-output" |
| gibs_region | string | Region where GIBS resources are deployed | |
| gibs_queue_name | string | Name of the GIBS SQS queue where outgoing CNM messages will be sent | |
| gibs_account_id | string | AWS account ID for GIBS | |
| edl_user_ssm | string | Name of SSM parameter containing EDL username for querying CMR | |
| edl_pass_ssm | string | Name of SSM parameter containing EDL password for querying CMR | |
| permissions_boundary_arn | string | Permissions boundary ARN to apply to the roles created by this module. If not provided, no permissions boundary will be applied. | |
| security_group_ids | list(string) | | |
| subnet_ids | list(string) | | |
| app_name | string | | "bignbit" |
| default_tags | map(string) | | {} |
| lambda_container_image_uri | string | | "" |


# Module Outputs

This module supplies the following outputs:

| Name | Description | Value |
|----------------------------------|-------------------------------------------------------------------|------------------------------------------------------|
| config_bucket_name | Bucket containing dataset configs | var.config_bucket |
| config_path | Path relative to config bucket where configs reside | var.config_dir |
| pobit_handle_gitc_response_arn | ARN of the lambda function | aws_lambda_function.handle_gitc_response.arn |
| pobit_gibs_topic | ARN of SNS topic GIBS replies to | aws_sns_topic.gibs_response_topic.arn |
| pobit_gibs_queue | ARN of SQS queue GIBS replies are published to | aws_sqs_queue.gibs_response_queue.arn |
| pobit_audit_bucket | Name of bucket where messages exchanged with GIBS are stored | var.pobit_audit_bucket |
| pobit_audit_path | Path relative to audit bucket where messages with GIBS are stored | var.pobit_audit_path |
| get_dataset_configuration_arn | ARN of the lambda function | aws_lambda_function.get_dataset_configuration.arn |
| get_granule_umm_json_arn | ARN of the lambda function | aws_lambda_function.get_granule_umm_json.arn |
| get_collection_concept_id_arn | ARN of the lambda function | aws_lambda_function.get_collection_concept_id.arn |
| identify_image_file_arn | ARN of the lambda function | aws_lambda_function.identify_image_file.arn |
| generate_image_metadata_arn | ARN of the lambda function | aws_lambda_function.generate_image_metadata.arn |
| submit_harmony_job_arn | ARN of the lambda function | aws_lambda_function.submit_harmony_job.arn |
| submit_harmony_job_function_name | Name of the lambda function | aws_lambda_function.submit_harmony_job.function_name |
| get_harmony_job_status_arn | ARN of the lambda function | aws_lambda_function.get_harmony_job_status.arn |
| process_harmony_results_arn | ARN of the lambda function | aws_lambda_function.process_harmony_results.arn |
| apply_opera_hls_treatment_arn | ARN of the lambda function | aws_lambda_function.apply_opera_hls_treatment.arn |
| pobit_build_image_sets_arn | ARN of the lambda function | aws_lambda_function.build_image_sets.arn |
| pobit_send_to_gitc_arn | ARN of the lambda function | aws_lambda_function.send_to_gitc.arn |
| pobit_save_cnm_message_arn | ARN of the lambda function | aws_lambda_function.save_cnm_message.arn |
| workflow_definition | Rendered state machine definition | rendered version of state_machine_definition.tpl |
| bignbit_staging_bucket | Name of bignbit staging bucket | var.bignbit_staging_bucket |
| harmony_staging_path | Path to harmony requests relative to harmony staging bucket | var.harmony_staging_path |
| bignbit_lambda_role | Role created by the module applied to lambda functions | aws_iam_role.bignbit_lambda_role |

# Assumptions
- Using `ContentBasedDeduplication` strategy for GITC input queue

# Step Function

_Visual representation of the bignbit step function state machine:_
![image](stepfunctions_graph.png)

# Local Development
## MacOS

Expand Down
Loading
Loading