Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to crowdsource meaning of arbitrary strings on publishers' websites #120

Open
emanuil-tolev opened this issue Jan 13, 2015 · 0 comments

Comments

@emanuil-tolev
Copy link
Contributor

For example on EPMC

<img usemap="#logo-imagemap" alt="" src="/corehtml/pmc/pmcgifs/logo-nihpa.gif"></img>

means it's an author manuscript.

This issue is an idea to allow crowdsourcing the fact that that HTML string means something is an author manuscript. UI would be something like

| input textarea for html string | where in the OAG data model the new field will appear | what value it should take|

e.g.

| <img usemap="#logo-imagemap" alt="" src="/corehtml/pmc/pmcgifs/logo-nihpa.gif"></img> | licence.provenance.accepted_author_manuscript | True |

So basically allow the scraper to be configured to look for anything at all and record the fact it was there.

Note this EPMC example will be implemented in simple custom code in the epmc_license plugin so initial use case will either be something else, or that custom code should be taken out of epmc_license as this is going in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant