Skip to content
This repository has been archived by the owner on Mar 9, 2023. It is now read-only.

Data publishing

Rukaya edited this page Nov 28, 2022 · 11 revisions

Identifiers

Institutions/organisations

Wherever possible, GBIF Norway aligns scientific Norwegian data publishers with institutes in:

  • ROR - the Research Organization Registry is a community-led project to develop an open, sustainable, usable, and unique identifier for every research organization in the world.
  • GRID - the Global Research Identifier Database is a free and openly available global database of research-related organisations

If a scientific organisation does not have either a ROR or GRID ID, we encourage them to get one. We can determine which department/museum/herbarium is responsible for the data by linking the dataset to a collection in GRSciCol, via collectionID.

When we have datasets which have different institutions, but should be collectively grouped together (e.g. the Nansen Legacy Project datasets), we advise assigning them to a "network", see https://ipt.gbif.org/manual/en/ipt/2.6/manage-resources#networks. We can also assign a dataset to a network directly in the GBIF Registry https://registry.gbif.org/network/search. The ProjectID field is not the best solution, as it doesn't integrate well with the GBIF search and hosted portals.

For example, the Living Norway Ecological Data Network is adding their datasets to the Living Norway "network" https://registry.gbif.org/network/379a0de5-f377-4661-9a30-33dd844e7b9a https://www.gbif.org/network/379a0de5-f377-4661-9a30-33dd844e7b9a

which are displayed in the Living Norway hosted portal based on this criteria (dataset in network) https://data.livingnorway.no/ https://github.com/gbif/hp-living-norway

Personal identifiers for people

We strongly recommend publishing data with populated recordedByID and identifiedByID fields. This will allow us to unambiguously credit and link people to specimens they have collected and identified.

Artskart

Note that in order for a dataset to import into Artskart the project name should be in datasetName. You will also need to manually notify Artsdatabanken that a new dataset has been added which should be included in Artskart, although this is supposed to be automated in the future.

Protocol with absence data

GBIF Norway doesn't publish absence data in sampling datasets. The reasoning: A scientist has gone out and done a systematic survey, and has noted e.g. 5 presences of species x at these points in time/space, and 20 absences at these points in time/space. However:

  1. An absence point is dependent on sampling effort and should really only be considered within the scope of the study and therefore shouldn't be presented in an amalgamated form as GBIF does.

  2. Absence points are displayed in a misleading way on the GBIF map interface as noted in https://github.com/gbif/portal-feedback/issues/1851.

  3. Absences can be derived anyway if you have a list of species that the scientist was looking for in the study included in the metadata.

Therefore we think it's not really helpful to publish the absence data explicitly.

endpoint.py

While most data is published through various [IPT] instances, we also use the GBIF API to publish DwC-A files created using other tools.

endpoint.py is a simple tool for interacting with the GBIF registry API through the command line:

$ endpoint.py b124e1e0-4755-430f-9eab-894f25a9b59c # Shows all endpoints

$ endpoint.py -w b124e1e0-4755-430f-9eab-894f25a9b59c # Wipes the dataset's endpoints

$ endpoint.py -d "An endpoint" -e https://data.gbif.no/dwca.zip b124e1e0-4755-430f-9eab-894f25a9b59c # Adds a new endpoint to a dataset

$ endpoint.py -c b124e1e0-4755-430f-9eab-894f25a9b59c # Request a crawl

The username and password can be set using the command line switches -u and -p or using the environment variables GBIF_USERNAME and GBIF_PASSWORD.

Note: endpoint.py does not seem to be present in github or on any of the GBIF.no servers (Aug 2019)