Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot harvest ISO 19115 metadata hosted in pycsw #219

Open
gpcimino opened this issue Apr 10, 2019 · 2 comments
Open

Cannot harvest ISO 19115 metadata hosted in pycsw #219

gpcimino opened this issue Apr 10, 2019 · 2 comments

Comments

@gpcimino
Copy link

Hi all,

just install CKAN 2.8.2 and the last version from master branch of ckanext-harvest and ckanext-spatial.

My goal is to have CKAN harvest XML metadata ISO 19115 hosted by pycsw.

I used the command harvester run_test to test the first harvest (see the output below).
The harvester was not able to get any metadata file. As matter of fact it generates a lot of "Empty record for GUID xxx" message.
My guess is that the metadata exposed vi pycsw start with

<gmi:MI_Metadata>

while looks like the CKAN CSW harvester looks for

<gmd:MD_Metadata>

Is that correct?
Any suggestions?

Thanks

(default) [root@myserver ckan]# /usr/lib/ckan/default/bin/paster --plugin=ckanext-harvest harvester run_test   -c /etc/ckan/default/development.ini 731de3d5-a98b-411e-875d-6408af1ff422
2019-04-10 17:09:11,895 DEBUG [ckanext.harvest.model] Harvest tables defined in memory
2019-04-10 17:09:11,905 DEBUG [ckanext.harvest.model] Harvest tables already exist
2019-04-10 17:09:11,937 DEBUG [ckanext.spatial.plugin] Setting up the spatial model
2019-04-10 17:09:11,946 DEBUG [ckanext.spatial.model.package_extent] Spatial tables defined in memory
2019-04-10 17:09:11,953 DEBUG [ckanext.spatial.model.package_extent] Spatial tables already exist
2019-04-10 17:09:12,252 DEBUG [ckanext.harvest.model] Harvest tables already exist
2019-04-10 17:09:12,284 DEBUG [ckanext.spatial.plugin] Setting up the spatial model
2019-04-10 17:09:12,287 DEBUG [ckanext.spatial.model.package_extent] Spatial tables already exist

/usr/lib/ckan/default/lib/python2.7/site-packages/sqlalchemy/sql/compiler.py:624: SAWarning: Can't resolve label reference 'error_count desc'; converting to text() (this warning may be suppressed after 10 occurrences)
  util.ellipses_string(element.element))
2019-04-10 17:09:12,558 INFO  [ckanext.harvest.logic.action.create] Harvest job create: {'source_id': u'731de3d5-a98b-411e-875d-6408af1ff422'}
2019-04-10 17:09:12,573 INFO  [ckanext.harvest.logic.action.create] Harvest job saved 327f4595-9c54-4797-832a-fc18eb55f43c
2019-04-10 17:09:12,579 INFO  [ckanext.harvest.logic.action.update] Send job to gather queue: {'id': u'327f4595-9c54-4797-832a-fc18eb55f43c'}
2019-04-10 17:09:12,623 INFO  [ckanext.harvest.logic.action.update] Sent job 327f4595-9c54-4797-832a-fc18eb55f43c to the gather queue
2019-04-10 17:09:12,641 DEBUG [ckanext.spatial.harvesters.csw.CSW.gather] CswHarvester gather_stage for job: <HarvestJob id=327f4595-9c54-4797-832a-fc18eb55f43c created=2019-04-10 15:09:12.571549 gather_started=2019-04-10 15:09:12.641448 gather_finished=None finished=None source_id=731de3d5-a98b-411e-875d-6408af1ff422 status=Running>
2019-04-10 17:09:12,680 DEBUG [ckanext.spatial.harvesters.csw.CSW.gather] Starting gathering for http://myserver:8000/pycsw
2019-04-10 17:09:12,680 INFO  [ckanext.spatial.lib.csw_client] Making CSW request: getrecords2 {'typenames': 'csw:Record', 'maxrecords': 10, 'sortby': <owslib.fes.SortBy object at 0x7fd7cf763250>, 'outputschema': 'http://www.isotc211.org/2005/gmd', 'cql': None, 'startposition': 0, 'esn': 'brief', 'constraints': []}
2019-04-10 17:09:12,749 INFO  [ckanext.spatial.harvesters.csw.CSW.gather] Got identifier etopo180 from the CSW
2019-04-10 17:09:12,750 INFO  [ckanext.spatial.harvesters.csw.CSW.gather] Got identifier etopo360 from the CSW

/usr/lib/ckan/default/lib/python2.7/site-packages/sqlalchemy/orm/session.py:2181: SAWarning: Usage of the 'related attribute set' operation is not currently supported within the execution stage of the flush process. Results may not be consistent.  Consider using alternative event listeners or connection-level operations instead.
  % method)
/usr/lib/ckan/default/lib/python2.7/site-packages/sqlalchemy/orm/session.py:2181: SAWarning: Usage of the 'collection append' operation is not currently supported within the execution stage of the flush process. Results may not be consistent.  Consider using alternative event listeners or connection-level operations instead.
  % method)
/usr/lib/ckan/default/lib/python2.7/site-packages/sqlalchemy/orm/session.py:2276: SAWarning: Attribute history events accumulated on 1 previously clean instances within inner-flush event handlers have been reset, and will not result in database updates. Consider using set_committed_value() within inner-flush event handlers to avoid this warning.
  % len_)

  2019-04-10 17:09:12,795 DEBUG [ckanext.spatial.harvesters.csw.CSW.fetch] CswHarvester fetch_stage for object: 6e837ddb-edae-42d9-801a-9bb2aa0595ff
2019-04-10 17:09:12,838 INFO  [ckanext.spatial.lib.csw_client] Making CSW request: getrecordbyid [u'etopo180'] {'esn': 'full', 'outputschema': 'http://www.isotc211.org/2005/gmd'}
2019-04-10 17:09:12,875 DEBUG [ckanext.harvest.model] Empty record for GUID etopo180

2019-04-10 17:09:13,423 DEBUG [ckanext.spatial.harvesters.csw.CSW.fetch] CswHarvester fetch_stage for object: 93fed1f4-039b-46a9-8bf7-bbaf804f103c
2019-04-10 17:09:13,488 INFO  [ckanext.spatial.lib.csw_client] Making CSW request: getrecordbyid [u'etopo360'] {'esn': 'full', 'outputschema': 'http://www.isotc211.org/2005/gmd'}
2019-04-10 17:09:13,524 DEBUG [ckanext.harvest.model] Empty record for GUID etopo360
20
2019-04-10 17:09:13,626 INFO  [ckanext.harvest.logic.action.update] Harvest job run: {}
2019-04-10 17:09:13,643 INFO  [ckanext.harvest.logic.action.update] Marking job as finished http://myserver:8000/pycsw 327f4595-9c54-4797-832a-fc18eb55f43c
2019-04-10 17:09:13,669 DEBUG [ckanext.harvest.logic.action.update] Updating search index for harvest source: myorg-csw
@benjwadams
Copy link

benjwadams commented Apr 11, 2019

Related: #210 , #209
In short MI_Metadata isn't handled properly in the current master as of this writing. I'll get back around to seeing if I can track down why the automated tests are failing.

@bonnland
Copy link
Contributor

There are many versions of ISO 19115; CKAN does not support all of them. Here is the version that I believe the harvester is written for:

https://service.ncddc.noaa.gov/rdn/www/metadata-standards/documents/MD-Metadata.pdf

ccancellieri added a commit to ccancellieri/ckanext-spatial that referenced this issue Oct 22, 2021
…mespace and tag name so we can now harvest any kind of metadata (if validation is provided or ignored) ckan#209 ckan#210 ckan#219 ckan#258
ccancellieri added a commit to ccancellieri/ckanext-spatial that referenced this issue Oct 22, 2021
…mespace and tag name so we can now harvest any kind of metadata (if validation is provided or ignored) ckan#209 ckan#210 ckan#219 ckan#258
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants