Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Usage of harvested_portal? #27

Open
dev-rke opened this issue Jun 14, 2024 · 2 comments
Open

Usage of harvested_portal? #27

dev-rke opened this issue Jun 14, 2024 · 2 comments

Comments

@dev-rke
Copy link

dev-rke commented Jun 14, 2024

Hi there,

what is the configuration option {"harvested_portal": "abcde"} good for?
Is there any benefit to have this?
Why is it required? Unfortunately it is not documented and the source code didn't gave me deeper insights.

Is it possible to make it optional?
Or define it within an own backend field, instead of the configuration field?
One has to define it manually, which increases harvester definition complexity, especially when creating harvesters via API or CLI.

@seitenbau-govdata
Copy link
Member

Hi @dev-rke,

this is missing in the docs and should be added.

Currently the harvester adds a field metadata_harvested_portal to all harvested datasets containing the value of harvested_portal. This is used to identify datasets that are no longer provided to the harvester and should be deleted after a new harvesting run. This could be implemented differently to make the config optional, but it would require a few changes in the code.

However currently it needs an unique string value, and usually the name of harvested portal it used here.

@dev-rke
Copy link
Author

dev-rke commented Jun 18, 2024

Hi @seitenbau-govdata

Thanks for the explanation.

When reading settings from configuration - why not using the harvester uuid as relation identifier by default?
Instead i propose to define metadata_harvested_portal only then, when you need to actually identify the records in special cases, e.g. when collecting data of a single source via multiple endpoints (e.g. using multiple harvesters in parallel).

This will keep the same behaviour, but without the need to specify this setting manually in "normal" cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants