Usage of harvested_portal? #27

dev-rke · 2024-06-14T21:36:00Z

Hi there,

what is the configuration option {"harvested_portal": "abcde"} good for?
Is there any benefit to have this?
Why is it required? Unfortunately it is not documented and the source code didn't gave me deeper insights.

Is it possible to make it optional?
Or define it within an own backend field, instead of the configuration field?
One has to define it manually, which increases harvester definition complexity, especially when creating harvesters via API or CLI.

The text was updated successfully, but these errors were encountered:

seitenbau-govdata · 2024-06-18T13:47:20Z

Hi @dev-rke,

this is missing in the docs and should be added.

Currently the harvester adds a field metadata_harvested_portal to all harvested datasets containing the value of harvested_portal. This is used to identify datasets that are no longer provided to the harvester and should be deleted after a new harvesting run. This could be implemented differently to make the config optional, but it would require a few changes in the code.

However currently it needs an unique string value, and usually the name of harvested portal it used here.

dev-rke · 2024-06-18T22:39:06Z

Hi @seitenbau-govdata

Thanks for the explanation.

When reading settings from configuration - why not using the harvester uuid as relation identifier by default?
Instead i propose to define metadata_harvested_portal only then, when you need to actually identify the records in special cases, e.g. when collecting data of a single source via multiple endpoints (e.g. using multiple harvesters in parallel).

This will keep the same behaviour, but without the need to specify this setting manually in "normal" cases.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Usage of harvested_portal? #27

Usage of harvested_portal? #27

dev-rke commented Jun 14, 2024

seitenbau-govdata commented Jun 18, 2024

dev-rke commented Jun 18, 2024

Usage of harvested_portal? #27

Usage of harvested_portal? #27

Comments

dev-rke commented Jun 14, 2024

seitenbau-govdata commented Jun 18, 2024

dev-rke commented Jun 18, 2024