Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset IPT error, NTNU marine invertebrate collection #196

Open
dagendresen opened this issue Dec 12, 2024 · 4 comments
Open

Dataset IPT error, NTNU marine invertebrate collection #196

dagendresen opened this issue Dec 12, 2024 · 4 comments
Assignees

Comments

@dagendresen
Copy link
Member

Noticed that publication for the dataset NTNU marine invertebrate collection fails data publication on the IPT. Error message on duplicate occurrenceIDs. Was last published on 2024-10-19.

https://ipt.gbif.no/manage/resource.do?r=trh_marine_invertebrate_collection

Publishing version #1.1900 of resource trh_marine_invertebrate_collection failed: Archive generation for resource trh_marine_invertebrate_collection failed: Can't validate DwC-A for resource trh_marine_invertebrate_collection. Each line must have a occurrenceID, and each occurrenceID must be unique (please note comparisons are case insensitive)

@kkongshavn
Copy link
Collaborator

In case this hasn't been solved yet, I've written to Karstein about it.

@dagendresen
Copy link
Member Author

I tested again to publish manually today - with the very same error message

Publishing Status - Monday, 13 January 2025 at 10:02:55 Coordinated Universal Time
Publishing version #1.1900 of resource - trh_marine_invertebrate_collection failed: Archive generation for resource trh_marine_invertebrate_collection failed: Can't validate DwC-A for resource trh_marine_invertebrate_collection. Each line must have a occurrenceID, and each occurrenceID must be unique (please note comparisons are case insensitive)
(...)
10:02:51 Data file written for Darwin Core Occurrence with 115537 records and 50 columns
(...)
10:02:52 ? Validating the core ID filed occurrenceID is always present and unique.
10:02:55 1 line(s) having a duplicate occurrenceID (please note comparisons are case insensitive)
10:02:55 Archive validation failed, because not every line has a unique occurrenceID (please note comparisons are case insensitive)
10:02:55 Restored version #1.1899 of resource trh_marine_invertebrate_collection after publishing failure

@dagendresen
Copy link
Member Author

The occurrenceID for this dataset are all of the urn:uuid:<UUID> format.

I notice an incomplete occurrenceID "urn:uuid:" for catalognumber "WET-71121/2" modified on 2024-10-15. individualCount = 0. Collected "2011-08-28" in "Trondheimsfjorden Tautrasvaet".

https://www.gbif.org/occurrence/4020833758

Any risk that additional occurrence records with NULL as UUID could have entered the MUSIT DB?

@dagendresen
Copy link
Member Author

dagendresen commented Jan 13, 2025

When I download the "MUSIT-dump" (text data export) I find two data records with blank value for the UUID, one added 15th October, and one added 18th October. The latter causing the dataset publication to fail. Both these records need a UUID to be added!

catalogNumber occurrenceID modified
WET-71121/2 2024-10-15
WET-36312/3 2024-10-18

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants