Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HARMONY-2007: Add is_sequential: true where missing for query-cmr in services.yml and validations for same #698

Merged
merged 4 commits into from
Feb 14, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions config/services.yml
Original file line number Diff line number Diff line change
Expand Up @@ -429,6 +429,7 @@ https://cmr.earthdata.nasa.gov:
reprojection: true
steps:
- image: !Env ${QUERY_CMR_IMAGE}
is_sequential: true
- image: !Env ${HYBIG_IMAGE}

- name: harmony/netcdf-to-zarr
Expand Down Expand Up @@ -576,6 +577,7 @@ https://cmr.earthdata.nasa.gov:
reprojection: true
steps:
- image: !Env ${QUERY_CMR_IMAGE}
is_sequential: true
- image: !Env ${OPERA_RTC_S1_BROWSE_IMAGE}
- image: !Env ${HYBIG_IMAGE}

Expand Down Expand Up @@ -1385,5 +1387,6 @@ https://cmr.uat.earthdata.nasa.gov:
reprojection: true
steps:
- image: !Env ${QUERY_CMR_IMAGE}
is_sequential: true
- image: !Env ${OPERA_RTC_S1_BROWSE_IMAGE}
- image: !Env ${HYBIG_IMAGE}
12 changes: 12 additions & 0 deletions docs/guides/adapting-new-services.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ then invokes a single service image. If setting up a more complex service chain
- [5. Error handling](#5-error-handling)
- [6. Defining environment variables in env-defaults](#6-defining-environment-variables-in-env-defaults)
- [7. Registering services in services.yml](#7-registering-services-in-servicesyml)
- [Sequential Steps](#sequential-steps)
- [Aggregation Steps](#aggregation-steps)
- [8. Docker Container Images](#8-docker-container-images)
- [9. Recommendations for service implementations](#9-recommendations-for-service-implementations)
Expand Down Expand Up @@ -135,6 +136,7 @@ The structure of an entry in the [services.yml](../../config/services.yml) file
validate_variables: true # Whether to validate the requested variables exist in the CMR. Defaults to true.
steps:
- image: !Env ${QUERY_CMR_IMAGE} # The image to use for the first step in the chain
is_sequential: true # Required for query-cmr
- image: !Env ${HARMONY_EXAMPLE_IMAGE} # The image to use for the second step in the chain
```

Expand All @@ -151,6 +153,7 @@ The following `steps` entry is for a chain of services including the PODAAC L2 S
```yaml
steps:
- image: !Env ${QUERY_CMR_IMAGE}
is_sequential: true
- image: !Env ${PODAAC_L2_SUBSETTER_IMAGE}
operations: ['spatialSubset', 'variableSubset']
conditional:
Expand All @@ -168,6 +171,7 @@ There is also a `conditional` option on `umm-c` `native_format` that compares wi
```yaml
steps:
- image: !Env ${QUERY_CMR_IMAGE}
is_sequential: true
- image: !Env ${NET_2_COG_IMAGE}
conditional:
umm_c:
Expand All @@ -177,6 +181,13 @@ steps:

Here we have the query-cmr service (this service is the first in every current workflow). This is followed by the optional NetCDF to COG service, which will only be invoked when the collection's UMM-C native format is one of the values that are defined (case insensitive) in the steps configuration (i.e. `[netcdf-4]`). Finally, we have the HyBIG service that converts the GeoTIFF inputs from the previous step to Global Imagery Browse Services (GIBS) compatible PNG or JPEG outputs. See [10. Service chaining](#10-service-chaining) for more info.

### Sequential Steps
Most steps will produce all of the pieces of work (known as work-items) for a service immediately when the step begins. This allows all of the work-items to be worked in parallel. It is possible, however, for new work-items for the same service to be produced as the step is being worked. In this case, the work-items must be worked sequentially. Steps that must be worked sequentially should include `is_sequential: true` in their definition.

An example of this is the query-cmr service. Each invocation of the query-cmr service can only return up to 2000 granules (due to the CMR page size limit), so, if the job has more granules than that, query-cmr is invoked multiple times. Because the number of granules reported by the CMR may change at any time, we cannot know ahead of time exactly how many invocations we need. So, if the job has more granules than 2000, query-cmr is invoked sequentially until all granules are returned.

For most services `is_sequential: true` is not necessary.

### Aggregation Steps
Services that provide aggregation, e.g., concatenation for CONCISE, require that all inputs are
available when they are run. Harmony infers this from the `operations` field in the associated step.
Expand All @@ -195,6 +206,7 @@ The following `steps` entry is an example one might use for an aggregating servi
```yaml
steps:
- image: !Env ${QUERY_CMR_IMAGE}
is_sequential: true
- image: !Env ${EXAMPLE_AGGREGATING_SERVICE_IMAGE}
is_batched: true
max_batch_inputs: 100
Expand Down
1 change: 0 additions & 1 deletion services/harmony/app/models/job.ts
Original file line number Diff line number Diff line change
Expand Up @@ -194,7 +194,6 @@ const stateMachine = createMachine(
active: true,
},
on: Object.fromEntries([
[JobEvent.COMPLETE, { target: JobStatus.SUCCESSFUL }],
[JobEvent.COMPLETE_WITH_ERRORS, { target: JobStatus.COMPLETE_WITH_ERRORS }],
[JobEvent.CANCEL, { target: JobStatus.CANCELED }],
[JobEvent.FAIL, { target: JobStatus.FAILED }],
Expand Down
25 changes: 15 additions & 10 deletions services/harmony/app/models/services/index.ts
Original file line number Diff line number Diff line change
@@ -1,20 +1,21 @@
import * as fs from 'fs';
import * as path from 'path';
import * as yaml from 'js-yaml';
import _, { get as getIn } from 'lodash';
import * as path from 'path';

import { Conjunction, isInteger, listToText } from '@harmony/util/string';

import logger from '../../util/log';
import { HttpError, NotFoundError, ServerError } from '../../util/errors';
import { isMimeTypeAccepted, allowsAny } from '../../util/content-negotiation';
import { CmrCollection } from '../../util/cmr';
import { addCollectionsToServicesByAssociation } from '../../middleware/service-selection';
import { listToText, Conjunction, isInteger } from '@harmony/util/string';
import TurboService from './turbo-service';
import HttpService from './http-service';
import { CmrCollection } from '../../util/cmr';
import { allowsAny, isMimeTypeAccepted } from '../../util/content-negotiation';
import env from '../../util/env';
import { HttpError, NotFoundError, ServerError } from '../../util/errors';
import logger from '../../util/log';
import DataOperation from '../data-operation';
import BaseService, { ServiceConfig } from './base-service';
import RequestContext from '../request-context';
import env from '../../util/env';
import BaseService, { ServiceConfig } from './base-service';
import HttpService from './http-service';
import TurboService from './turbo-service';

let serviceConfigs: ServiceConfig<unknown>[] = null;

Expand Down Expand Up @@ -109,6 +110,10 @@ function validateServiceConfigSteps(config: ServiceConfig<unknown>): void {
+ `Configured to use ${maxBatchInputs}, but will be limited to ${env.maxGranuleLimit}`);
}
}
if (step.image.match(/harmonyservices\/query\-cmr:.*/) && !step.is_sequential) {

throw new TypeError(`Invalid is_sequential ${step.is_sequential}. query-cmr steps must always have sequential = true.`);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should we remove the is_sequential: true setting from query-cmr step and enforce it in the code? This feels like leaking unnecessary implementation details to the client.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a client error. This is an error that shows up when you start Harmony if it is misconfigured. I think I want to keep it this way instead of adding separate handling for query-cmr.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have said service provider rather than client. The service providers shouldn't have to know the query-cmr step is sequential when they set up the service chain in services.yml. The fact that Harmony always execute query cmr sequentially shouldn't have to be spelled out in the service configuration especially when it is something the service providers doesn't care or know. Like I said that I am ok with having it spelled out because we do have other steps that also use the is_sequential attribute, but I don't think it is necessary.

}
}
}

Expand Down
33 changes: 32 additions & 1 deletion services/harmony/test/models/index.ts
Original file line number Diff line number Diff line change
@@ -1,7 +1,10 @@
import { expect } from 'chai';
import { loadServiceConfigs, loadServiceConfigsFromFile, getServiceConfigs, validateServiceConfig } from '../../app/models/services';
import _ from 'lodash';

import {
getServiceConfigs, loadServiceConfigs, loadServiceConfigsFromFile, validateServiceConfig,
} from '../../app/models/services';

const cmrEndpoints = {
'uat': 'https://cmr.uat.earthdata.nasa.gov',
'prod': 'https://cmr.earthdata.nasa.gov',
Expand Down Expand Up @@ -74,4 +77,32 @@ describe('Services.yml validation', function () {
expect(() => configs.forEach(validateServiceConfig)).to.throw(/Collections cannot be configured for harmony service: with-collections, use umm_s instead./);
});
});

describe('services.yml with unset is_sequential for query-cmr in UAT is invalid', function () {
it('throws an exception', function () {
const configs = loadServiceConfigsFromFile(cmrEndpoints.uat, '../../../test/resources/services_with_unset_is_sequential_query_cmr_uat.yml');
expect(() => configs.forEach(validateServiceConfig)).to.throw(/Invalid is_sequential undefined. query-cmr steps must always have sequential = true./);
});
});

describe('services.yml with unset is_sequential for query-cmr in PROD is invalid', function () {
it('throws an exception', function () {
const configs = loadServiceConfigsFromFile(cmrEndpoints.prod, '../../../test/resources/services_with_unset_is_sequential_query_cmr_prod.yml');
expect(() => configs.forEach(validateServiceConfig)).to.throw(/Invalid is_sequential undefined. query-cmr steps must always have sequential = true./);
});
});

describe('services.yml with false is_sequential for query-cmr in UAT is invalid', function () {
it('throws an exception', function () {
const configs = loadServiceConfigsFromFile(cmrEndpoints.uat, '../../../test/resources/services_with_false_is_sequential_query_cmr_uat.yml');
expect(() => configs.forEach(validateServiceConfig)).to.throw(/Invalid is_sequential false. query-cmr steps must always have sequential = true./);
});
});

describe('services.yml with false is_sequential for query-cmr in PROD is invalid', function () {
it('throws an exception', function () {
const configs = loadServiceConfigsFromFile(cmrEndpoints.prod, '../../../test/resources/services_with_false_is_sequential_query_cmr_prod.yml');
expect(() => configs.forEach(validateServiceConfig)).to.throw(/Invalid is_sequential false. query-cmr steps must always have sequential = true./);
});
});
});
2 changes: 2 additions & 0 deletions services/harmony/test/resources/services_no_umm_s_prod.yml
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ https://cmr.earthdata.nasa.gov:
- application/x-netcdf4
steps:
- image: !Env ${QUERY_CMR_IMAGE}
is_sequential: true
- image: !Env ${PODAAC_L2_SUBSETTER_IMAGE}


Expand All @@ -76,4 +77,5 @@ https://cmr.earthdata.nasa.gov:
- application/x-netcdf4
steps:
- image: !Env ${QUERY_CMR_IMAGE}
is_sequential: true
- image: !Env ${PODAAC_L2_SUBSETTER_IMAGE}
2 changes: 2 additions & 0 deletions services/harmony/test/resources/services_no_umm_s_uat.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ https://cmr.uat.earthdata.nasa.gov:
- application/x-netcdf4
steps:
- image: !Env ${QUERY_CMR_IMAGE}
is_sequential: true
- image: !Env ${PODAAC_L2_SUBSETTER_IMAGE}

- name: podaac/l2-subsetter
Expand Down Expand Up @@ -75,5 +76,6 @@ https://cmr.uat.earthdata.nasa.gov:
- application/x-netcdf4
steps:
- image: !Env ${QUERY_CMR_IMAGE}
is_sequential: true
- image: !Env ${PODAAC_L2_SUBSETTER_IMAGE}

Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ https://cmr.earthdata.nasa.gov:
- application/x-netcdf4
steps:
- image: !Env ${QUERY_CMR_IMAGE}
is_sequential: true
- image: !Env ${PODAAC_L2_SUBSETTER_IMAGE}

- name: umm_s_not_string
Expand All @@ -77,4 +78,5 @@ https://cmr.earthdata.nasa.gov:
- application/x-netcdf4
steps:
- image: !Env ${QUERY_CMR_IMAGE}
is_sequential: true
- image: !Env ${PODAAC_L2_SUBSETTER_IMAGE}
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ https://cmr.uat.earthdata.nasa.gov:
- application/x-netcdf4
steps:
- image: !Env ${QUERY_CMR_IMAGE}
is_sequential: true
- image: !Env ${PODAAC_L2_SUBSETTER_IMAGE}

- name: podaac/l2-subsetter
Expand Down Expand Up @@ -77,5 +78,6 @@ https://cmr.uat.earthdata.nasa.gov:
- application/x-netcdf4
steps:
- image: !Env ${QUERY_CMR_IMAGE}
is_sequential: true
- image: !Env ${PODAAC_L2_SUBSETTER_IMAGE}

2 changes: 2 additions & 0 deletions services/harmony/test/resources/services_with_colls_prod.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ https://cmr.earthdata.nasa.gov:
- application/x-netcdf4
steps:
- image: !Env ${QUERY_CMR_IMAGE}
is_sequential: true
- image: !Env ${PODAAC_L2_SUBSETTER_IMAGE}

- name: podaac/l2-subsetter
Expand Down Expand Up @@ -78,5 +79,6 @@ https://cmr.earthdata.nasa.gov:
- application/x-netcdf4
steps:
- image: !Env ${QUERY_CMR_IMAGE}
is_sequential: true
- image: !Env ${PODAAC_L2_SUBSETTER_IMAGE}

2 changes: 2 additions & 0 deletions services/harmony/test/resources/services_with_colls_uat.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ https://cmr.uat.earthdata.nasa.gov:
- application/x-netcdf4
steps:
- image: !Env ${QUERY_CMR_IMAGE}
is_sequential: true
- image: !Env ${PODAAC_L2_SUBSETTER_IMAGE}

- name: podaac/l2-subsetter
Expand Down Expand Up @@ -78,5 +79,6 @@ https://cmr.uat.earthdata.nasa.gov:
- application/x-netcdf4
steps:
- image: !Env ${QUERY_CMR_IMAGE}
is_sequential: true
- image: !Env ${PODAAC_L2_SUBSETTER_IMAGE}

Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Order for each CMR endpoint in this file will reflect precedence of the service when
# multiple services handle a collection

# Default turbo configuration
x-turbo-config: &default-turbo-config
name: turbo
params: &default-turbo-params
env: &default-turbo-env
USE_LOCALSTACK: !Env ${USE_LOCALSTACK}
LOCALSTACK_HOST: !Env ${BACKEND_HOST}
AWS_DEFAULT_REGION: us-west-2
STAGING_BUCKET: !Env ${STAGING_BUCKET}
TEXT_LOGGER: !Env ${TEXT_LOGGER}
BACKEND_HOST: !Env ${BACKEND_HOST}
EDL_USERNAME: !Env ${EDL_USERNAME}
EDL_PASSWORD: !Env ${EDL_PASSWORD}
OAUTH_UID: !Env ${OAUTH_UID}
OAUTH_PASSWORD: !Env ${OAUTH_PASSWORD}
OAUTH_HOST: !Env ${OAUTH_HOST}
OAUTH_CLIENT_ID: !Env ${OAUTH_CLIENT_ID}
OAUTH_REDIRECT_URI: !Env ${OAUTH_REDIRECT_URI}
FALLBACK_AUTHN_ENABLED: !Env ${FALLBACK_AUTHN_ENABLED}

https://cmr.earthdata.nasa.gov:

- name: non-sequential-query-cmr
description: |
testing service configuration with query-cmr with no is_sequential:
data_operation_version: '0.20.0'
type:
<<: *default-turbo-config
params:
<<: *default-turbo-params
env:
<<: *default-turbo-env
STAGING_PATH: public/asf/opera-rtc-s1-browse
umm_s: S1271728813-ASF
maximum_sync_granules: 0
capabilities:
concatenation: false
subsetting:
bbox: false
variable: false
temporal: false
output_formats:
- image/png
reprojection: true
steps:
- image: !Env ${QUERY_CMR_IMAGE}
is_sequential: false
- image: !Env ${OPERA_RTC_S1_BROWSE_IMAGE}
- image: !Env ${HYBIG_IMAGE}
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Order for each CMR endpoint in this file will reflect precedence of the service when
# multiple services handle a collection

# Default turbo configuration
x-turbo-config: &default-turbo-config
name: turbo
params: &default-turbo-params
env: &default-turbo-env
USE_LOCALSTACK: !Env ${USE_LOCALSTACK}
LOCALSTACK_HOST: !Env ${BACKEND_HOST}
AWS_DEFAULT_REGION: us-west-2
STAGING_BUCKET: !Env ${STAGING_BUCKET}
TEXT_LOGGER: !Env ${TEXT_LOGGER}
BACKEND_HOST: !Env ${BACKEND_HOST}
EDL_USERNAME: !Env ${EDL_USERNAME}
EDL_PASSWORD: !Env ${EDL_PASSWORD}
OAUTH_UID: !Env ${OAUTH_UID}
OAUTH_PASSWORD: !Env ${OAUTH_PASSWORD}
OAUTH_HOST: !Env ${OAUTH_HOST}
OAUTH_CLIENT_ID: !Env ${OAUTH_CLIENT_ID}
OAUTH_REDIRECT_URI: !Env ${OAUTH_REDIRECT_URI}
FALLBACK_AUTHN_ENABLED: !Env ${FALLBACK_AUTHN_ENABLED}

https://cmr.uat.earthdata.nasa.gov:

- name: non-sequential-query-cmr
description: |
testing service configuration with query-cmr with no is_sequential:
data_operation_version: '0.20.0'
type:
<<: *default-turbo-config
params:
<<: *default-turbo-params
env:
<<: *default-turbo-env
STAGING_PATH: public/asf/opera-rtc-s1-browse
umm_s: S1271728813-ASF
maximum_sync_granules: 0
capabilities:
concatenation: false
subsetting:
bbox: false
variable: false
temporal: false
output_formats:
- image/png
reprojection: true
steps:
- image: !Env ${QUERY_CMR_IMAGE}
is_sequential: false
- image: !Env ${OPERA_RTC_S1_BROWSE_IMAGE}
- image: !Env ${HYBIG_IMAGE}
Loading