Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mdr extractor #65

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Mdr extractor #65

wants to merge 5 commits into from

Conversation

tpeng
Copy link
Contributor

@tpeng tpeng commented Aug 27, 2014

add MdrExtractor to parse the listing data. the output will be a separated field with the name as the group name set in the annotation (using listingDataGroupName) and the value is a list of dict extracted from each matched record.

tpeng added 4 commits August 20, 2014 15:32
MDR extractor is base on https://pypi.python.org/pypi/mdr/ which can
detect the listing data automatically and extract listing data with
scrapely annnotation supervision.
since sometimes the extract data is empty, this will make the validated
false. but we still want to add to extracted listing data to indicate
there are some data missing on the page.

also fix a problem when the annotation was added to other records
rather than seed record. fix it by propogating the annotations to
aligned elements.
also fixed a typo for the group name saved in annotation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant