Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Das 1893 #8

Merged
merged 13 commits into from
Sep 8, 2023
Merged

Das 1893 #8

merged 13 commits into from
Sep 8, 2023

Conversation

eni-awowale
Copy link
Collaborator

@eni-awowale eni-awowale commented Sep 6, 2023

Description

Added functionality to publish a UMM-Var entry to CMR with the requests library

Jira Issue ID

DAS-1893

Local Test Steps

# Import CMR environment
from cmr import CMR_UAT

# Import varinfo cmr_search functions
from varinfo.cmr_search import (get_granules, get_granule_link, 
                                download_granule)

# Import VarInfoFromNetCDF4 class and umm_var functions
# for reading and netCDF4 files and creating UMM-Var entries 
from varinfo import VarInfoFromNetCDF4
from varinfo.umm_var import get_all_umm_var, publish_all_umm_var

# Get granule response and granule download URL with `get_granules` and
# `get_granule_link`. Then download the granule locally with `download_granule`
# I am not sure if that test granule will work for everyone since I have already 
# published variables to it. 
granule_response = get_granules(concept_id='C1256535511-EEDTEST',
                                cmr_env=CMR_UAT,
                                token=token_uat)  # Change to your UAT Token
link = get_granule_link(granule_response)
download_granule(link, token=token_uat)

# Create UMM-Var entries
umm_var_dict = get_all_umm_var(VarInfoFromNetCDF4('MERRA2_400.inst1_2d_asm_Nx.20220531.nc4'))

# List of published variable concept_ids 
# If there was an error with publishing the response will be added to the list
pub_list = publish_all_umm_var('C1256535511-EEDTEST', umm_var_dict, auth_header, CMR_UAT)

PR Acceptance Checklist

  • [ ] Jira ticket acceptance criteria met.
  • [ ] CHANGELOG.md updated to include high level summary of PR changes.
  • VERSION updated if publishing a release.
  • [ ] Tests added/updated and passing.
  • Documentation updated (if needed).

Copy link
Member

@owenlittlejohns owenlittlejohns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. I've skimmed over things and most of my comments are minor. I think the biggest open question is the nature of the token we're using to publish to CMR. I think right now an EDL bearer token will work, but the overall roadmap is shifting to LaunchPad tokens for the future (here's a wiki page that has some of the relevant documentation for LaunchPad tokens).

It might also be worth reviewing William Valencia's GraphQL PR to ensure that the pieces generally match up.

CHANGELOG.md Outdated
## v1.0.2
### Unreleased
#### 2023-09-06
Added functions `publish_umm_var` and `publish_all_umm_var` to `umm_var.py`
## v1.0.1
### Unreleased
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your main branch might be out of date. 1.0.1 was released to unblock William. I think git might take care of it, because your branch is from after all your other ticket work, so the only differences are in this file and the version file, and I don't think they will cause merge conflicts (because of the specific lines that were changed), but just a heads-up.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it was and when I tried to update this in my DAS-1893 branch it caused a conflict. I've been trying to resolve this conflict but it keeps finding a conflict between the new changes. I will leave it like this and then update the main branch with the correct info.


with self.subTest('Check response when the status code is 400'):
self.assertEqual(response.status_code, 400)
self.assertEqual(response.json(), {'concept-id': 'FOO-EEDTEST'})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bit of a nit: Nice check for a failed status, but it's a bit weird seeing the JSON response for a successful request with an unsuccessful status.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note - is it worth an extra assertion to know that requests.put was called with the expected JSON content and headers?

Copy link
Collaborator Author

@eni-awowale eni-awowale Sep 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bit of a nit: Nice check for a failed status, but it's a bit weird seeing the JSON response for a successful request with an unsuccessful status.

Yeah, I agree. I think that was the best way I thought for checking if the response failed but I can try tinker with some other ways.

Side note - is it worth an extra assertion to know that requests.put was called with the expected JSON content and headers?

I think since we already did the mock_requests_put.assert_called_once_with() test we don't need to have it here unless you really thinks it's helpful.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have new commit that addresses this!

tests/unit/test_umm_var.py Show resolved Hide resolved
self.assertEqual(response, 'FOO-EEDTEST')


def test_publish_all_umm_var(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like something should be mocked here - requests.put again, maybe?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I can add that!

self.assertEqual(response, 'FOO-EEDTEST')


def test_publish_all_umm_var(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also - I feel like we need a sub test here for what happens when publishing the first UMM-Var record fails. (I think for now we agreed that we would continue on and try and publish the rest of the records, but I've been out a few days, so might be misremembering)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is what we agreed on. It continues with the ingest process. I will add another test like that. I wrote a comment below for @flamingbear regarding this.

headers_umm_var = {
'Content-type': 'application/vnd.nasa.cmr.umm+json;version='
+ f'{umm_var_dict["MetadataSpecification"]["Version"]}',
'Authorization': f'Bearer {token}',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honest question (maybe for Ryan): For publication, aren't we aiming to use a LaunchPad token, instead of an EDL one? I'm not as familiar with those, but I have a sneaky suspicion they aren't Bearer tokens. I wonder if maybe the best thing to do (here and in the search functionality) is to assume someone has passed us the full Authorization header, and then just pass that through as-is, rather than rebuilding the structure to (or not to) include the Bearer bit?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Do you or @flamingbear have an example of this? I am not too familiar with LaunchPad tokens either.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have an example, but it sounds like @owenlittlejohns is saying, rather than pass token as a param to publish_umm_var, just receive the whole auth header header as input. and add that.

Something like

def publish_umm_var(collection_id: str,
                    umm_var_dict: Dict,
                    auth_header: str,
                    cmr_env: CMR_UAT) -> str:

And then

    headers_umm_var = {
        'Content-type': 'application/vnd.nasa.cmr.umm+json;version='
        + f'{umm_var_dict["MetadataSpecification"]["Version"]}',
        'Authorization': auth_header,
        'Accept': 'application/json'}

But that is just me guessing from what Owen typed and not knowing about the different types of tokens.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! That makes sense!

if response.status_code == 200:
return response.json()['concept-id']
# Return the response if there was an error (response != 200)
return response
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to return the full response object, or maybe the error message? I can't remember the JSON format for a CMR ingest error, but maybe the JSON includes a field for the error message (rather than just using response.text).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to return the full response object. So to get the error it would be something like response.json().['errors']
This is what response.json() looks like if there is an error:

{'errors': ['#: required key [LongName] not found',
  '#: required key [Definition] not found']}

varinfo/umm_var.py Outdated Show resolved Hide resolved
varinfo/umm_var.py Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
varinfo/umm_var.py Outdated Show resolved Hide resolved
varinfo/umm_var.py Outdated Show resolved Hide resolved
'Authorization': f'Bearer {token}',
'Accept': 'application/json'}

url_endpoint = (cmr_env.replace('search', 'ingest') + 'collections/'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would you consider renaming this to expected_url_endpoint and renaming headers_umm_var to expected_headers_umm_var?

That would just help me see that these are going to be tested rather than used as set up input.

you can say no.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So they will be used as the set up input. Unless I am misunderstanding you.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, so the two variables I named, are what you expect the mock to be called with, they aren't input to the function called. And like I said, these are suggestions.

timeout=10)

with self.subTest('Check response when the status code is 400'):
self.assertEqual(response.status_code, 400)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move the setting of the status code from L775 to here like you do in the next subtest for code 200. Just for symmetry (and setting close to the testing location)

varinfo/umm_var.py Outdated Show resolved Hide resolved
varinfo/umm_var.py Outdated Show resolved Hide resolved
umm_var_dict,
'foo',
CMR_UAT)
self.assertEqual(len(variable_ids), 2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how error handling works here.

What happens if one of the variables in the loop fails?

Then the return value would be something like

['FOO-EEDTEST', {'concept-id': 'FOO-EEDTEST'}]?

Who is in charge of checking for errors when you run publish_all_umm_var?

Should you raise exceptions rather than returning the full response on error?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great question and something that @owenlittlejohns and other folks have been talking about. We don't want to raise an error because raising an error would stop the whole ingest process if a single variable in a collection has a hundred variables. We want to add it into the list and maybe have something that checks later for if they were any errors. But it is a good idea to add a test for checking if the list contains errors. I will add that.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree we do not want to fail the "put" (ingest), but there is a good chance this will be updated/extended in future releases.

@flamingbear
Copy link
Member

I didn't put in a comment when I requested changes. I think it looks good except maybe for error handling and some style comments that are really take them or leave them.



@patch('requests.put')
def test_publish_all_umm_var(self, mock_requests_put):
Copy link
Collaborator Author

@eni-awowale eni-awowale Sep 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated this test based @owenlittlejohns suggestions for checking the expected behavior of a successful or failed response with publish_all_umm_var. It's similar to test above



@patch('requests.put')
def test_publish_umm_var_check_response(self, mock_requests_put):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this new test to mock a successful and unsuccessful response

Copy link

@autydp autydp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any real critique here. It all looks very good!

umm_var_dict,
'foo',
CMR_UAT)
self.assertEqual(len(variable_ids), 2)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree we do not want to fail the "put" (ingest), but there is a good chance this will be updated/extended in future releases.

@eni-awowale eni-awowale merged commit 2182ecc into main Sep 8, 2023
@eni-awowale eni-awowale deleted the DAS-1893 branch October 10, 2023 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants