Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update ingestor to add dummy email if dataset has bad email #49

Open
dylanmcreynolds opened this issue Dec 23, 2024 · 2 comments
Open
Assignees

Comments

@dylanmcreynolds
Copy link
Contributor

Not sure if the issue is that we're not getting email or if email is coming with a bad format, but we sometimes see scicat ingestion fail. We should put a check here, since we-formatted email is a requirement. We can add [email protected] if the dataset doesn't have a good email.

@dylanmcreynolds
Copy link
Contributor Author

Ok, this could be an easier fix than I thought. In stead of or "Unknown" we can update to or "[email protected]"

contactEmail=clean_email(scicat_metadata.get("/measurement/sample/experimenter/email"))

@davramov
Copy link
Contributor

I was able to quickly put together a fix for the default email this morning (#50), but I have been running into blockers when trying to test locally.

First, I was having trouble logging in to scicatlive to test the file ingestion. It appears it could stem from this git issue I found: SciCatProject/pyscicat#61

I tried with a few versions of scicatlive (2.8.0, 3.1.1, 3.2.5) in case an older version might work, but I was getting this error:

  File "/Users/david/Documents/code/fork/issue_49/splash_flows_globus/orchestration/flows/scicat/ingest.py", line 57, in ingest_dataset_task
    scicat_client = from_credentials(
                    ^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/pyscicat/client.py", line 784, in from_credentials
    token = get_token(base_url, username, password)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/pyscicat/client.py", line 830, in get_token
    response = _log_in_via_auth_msad(base_url, username, password)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/pyscicat/client.py", line 815, in _log_in_via_auth_msad
    raise ScicatLoginError(response.content)
pyscicat.client.ScicatLoginError: b'{"message":"Cannot POST /auth/msad","error":"Not Found","statusCode":404}'

I was using the following env variables based on the scicat documentation:

SCICAT_API_URL="http://localhost:3000/api/v3"
SCICAT_INGEST_USER="admin"
SCICAT_INGEST_PASSWORD="2jf70TPNZsS"

I decided to work around this by calling the ScicatClient like this in ingestor.py, which worked:

response = [requests.post](http://requests.post/)(
        url="http://localhost:3000/api/v3/auth/login",
        json={"username": SCICAT_INGEST_USER, "password": SCICAT_INGEST_PASSWORD},
        stream=False,
        verify=True,
    )

    scicat_client = from_token("http://localhost:3000/api/v3/auth/login", response.json()["access_token"])

But now I get a different error...

  File "/Users/david/Documents/code/fork/issue_49/splash_flows_globus/orchestration/flows/bl832/ingest_tomo832.py", line 159, in upload_raw_dataset
    dataset_id = scicat_client.upload_new_dataset(dataset)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/pyscicat/client.py", line 147, in datasets_create
    return self._call_endpoint(
           ^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/pyscicat/client.py", line 115, in _call_endpoint
    raise ScicatCommError(f"Error in operation {operation}: {result}")
pyscicat.client.ScicatCommError: Error in operation datasets_create: {'message': 'Cannot POST /api/v3/auth/login/Datasets?access_token=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJfaWQiOiI2NzY5YmY5YWYyZjNjN2ZmOTBjYzQzMmQiLCJ1c2VybmFtZSI6ImFkbWluIiwiZW1haWwiOiJzY2ljYXRhZG1pbkB5b3VyLnNpdGUiLCJhdXRoU3RyYXRlZ3kiOiJsb2NhbCIsIl9fdiI6MCwiaWQiOiI2NzY5YmY5YWYyZjNjN2ZmOTBjYzQzMmQiLCJpYXQiOjE3MzQ5OTA4MTIsImV4cCI6MTczNDk5NDQxMn0.KGLOdeIFJKRF7ucKBnE8GCAypDkwixt1BtDCAbzcfFI', 'error': 'Not Found', 'statusCode': 404}

My patch may work, but it would be nice to verify locally. I will try to get the version of scicat installed that matches what is in production.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants