Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InvenioRDM migration from v11 to v12 fails when registering parent DOIs #1851

Open
danielbreves opened this issue Oct 21, 2024 · 7 comments
Open
Labels
bug Something isn't working

Comments

@danielbreves
Copy link

Package version (if known): 10.8.6

Describe the bug

I got the following exception when running https://github.com/inveniosoftware/invenio-app-rdm/blob/master/invenio_app_rdm/upgrade_scripts/migrate_11_0_to_12_0.py:

Updating record : tcz41-yp821
> Error KeyError('doi')
  File "/usr/local/lib/python3.9/site-packages/invenio_app_rdm/upgrade_scripts/migrate_11_0_to_12_0.py", line 128, in update_record
    update_parent(record)
  File "/usr/local/lib/python3.9/site-packages/invenio_app_rdm/upgrade_scripts/migrate_11_0_to_12_0.py", line 104, in update_parent
    current_rdm_records.records_service.pids.register_or_update(
  File "/usr/local/lib/python3.9/site-packages/invenio_records_resources/services/uow.py", line 376, in inner
    res = f(self, *args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/invenio_rdm_records/services/pids/service.py", line 218, in register_or_update
    pid_manager.register(pid_record, scheme, url=url)
  File "/usr/local/lib/python3.9/site-packages/invenio_rdm_records/services/pids/manager.py", line 237, in register
    provider.register(pid, record=record, url=url)
  File "/usr/local/lib/python3.9/site-packages/invenio_rdm_records/services/pids/providers/datacite.py", line 173, in register
    doc = self.serializer.dump_obj(record)
  File "/usr/local/lib/python3.9/site-packages/flask_resources/serializers/base.py", line 66, in dump_obj
    return self.object_schema.dump(obj)
  File "/usr/local/lib/python3.9/site-packages/marshmallow/schema.py", line 553, in dump
    result = self._serialize(processed_obj, many=many)
  File "/usr/local/lib/python3.9/site-packages/marshmallow/schema.py", line 521, in _serialize
    value = field_obj.serialize(attr_name, obj, accessor=self.get_attribute)
  File "/usr/local/lib/python3.9/site-packages/marshmallow/fields.py", line 341, in serialize
    return self._serialize(value, attr, obj, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/marshmallow/fields.py", line 1985, in _serialize
    return self._serialize_method(obj)
  File "/usr/local/lib/python3.9/site-packages/invenio_rdm_records/resources/serializers/datacite/schema.py", line 421, in get_related_identifiers
    version_doi = version["pids"]["doi"]

This happened because a child record didn't have a DOI. Could that line be version["pids"].get("doi")?

Steps to Reproduce

  1. Setup an instance of Invenio RDM v11
  2. Create and publish a record without a DOI
  3. Create a new version with a DOI
  4. Attempt to migrate to v12
  5. See error

Expected behavior

The migration should succeed since DOIs can be optional

@danielbreves danielbreves added the bug Something isn't working label Oct 21, 2024
@danielbreves danielbreves changed the title InvenioRDM Migration from v11 to v12 fails when registering parent DOIs InvenioRDM migration from v11 to v12 fails when registering parent DOIs Oct 21, 2024
@tmorrell
Copy link
Contributor

Thanks for this report!

There are currently two main DOI configurations in InvenioRDM

DATACITE_ENABLED = True - All records have a DOI
DATACITE_ENABLED = False - No records have a DOI

So then in the migration script https://github.com/inveniosoftware/invenio-app-rdm/blob/810befded43dc5ea49198bf992d6e2393c75ff1e/invenio_app_rdm/upgrade_scripts/migrate_11_0_to_12_0.py#L85 the parent DOI block will skip registering DOIs when DATACITE is not enabled.

In v12 we also added an additional "is_enabled" setting, which provides some ability to determine whether DOIs are minted based on set criteria. This is also checked in the migration script.

It sounds like you have records that don't have DOIs but you have DATACITE enabled. Do you really want parent DOIs minted for these records? That seems like a unusual choice.

I suspect you'll either want to disable parent DOIs with RDM_PARENT_PERSISTENT_IDENTIFIERS={}, or develop a is_enabled setting to exclude those records without DOIs. But please respond back with your use case if neither of those options sound appropriate.

@fenekku
Copy link
Contributor

fenekku commented Oct 21, 2024

I think the underlying issue here is that at time of calling register_or_update (https://github.com/inveniosoftware/invenio-app-rdm/blob/master/invenio_app_rdm/upgrade_scripts/migrate_11_0_to_12_0.py#L104) the records with "pids": {"doi": ...} have not been indexed yet (migrate script only commits them to DB not document engine). The manual call to update the records in the document engine (pipenv run invenio rdm rebuild-all-indices) is only called after the script has been run. But https://github.com/inveniosoftware/invenio-rdm-records/blob/master/invenio_rdm_records/resources/serializers/datacite/schema.py#L415 relies on that indexing to have occurred already to get the versions that are serialized as related identifiers to Datacite.

The migrate script should be split in 2 and interleaved with an enjoinment to the reader to run pipenv run invenio rdm rebuild-all-indices before running the 2nd script that would run the register_or_update code. @danielbreves can use that approach right away too.

@tmorrell
Copy link
Contributor

@fenekku I don't think that's the issue. The child records should already have DOIs before the migration script runs. The migration script is only adding the parent DOI, not the child DOI.

@fenekku
Copy link
Contributor

fenekku commented Oct 21, 2024

Hmm 🤔 ... Here is the situation, I had a similar issue with the migration script and minting the parent and splitting the script in 2 passes is what solved it for me, because I think that despite the "child" records already having "pids: {"doi": ...}, they are not indexed in the new index at this point. They were only in the old one and the new one was consulted (because v12 is running)...

Nevertheless, it may not be quite what the original problem was. @danielbreves How did you create a record with a DOI and a record without a DOI in the same instance? Did you change the datacite settings @tmorrell mentioned above (in-between or otherwise)? Understanding that would help us identify the underlying issue.

@danielbreves
Copy link
Author

@tmorrell @fenekku thanks for the fast reply! Our InvenioRDM instance started with DATACITE_ENABLED = False and had some records created without DOIs, then we enabled DATACITE and created new versions of records with DOIs.

It sounds like you have records that don't have DOIs but you have DATACITE enabled. Do you really want parent DOIs minted for these records? That seems like a unusual choice.

We would like parent DOIs minted, since all future records should have DOIs.

@danielbreves
Copy link
Author

As a side note, I had to add the line below to the migration script to make it print out the stacktrace, since KeyError('doi') wasn't very helpful.

import traceback

traceback_str = ''.join(traceback.format_tb(e.__traceback__))
secho(traceback_str, fg="red")

@fenekku
Copy link
Contributor

fenekku commented Oct 22, 2024

Our InvenioRDM instance started with DATACITE_ENABLED = False and had some records created without DOIs, then we enabled DATACITE and created new versions of records with DOIs.

Ah got it. Then yes, I think it'd be reasonable if you could make a PR with that change from ["doi"] to .get("doi") with a comment explaining how DOI minting may have been enabled after the fact. That way, when no "doi", this will skip serializing that entry which is fine.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants