Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AIP - Copyright statement only in PDF #85

Open
cameronneylon opened this issue Apr 16, 2014 · 3 comments
Open

AIP - Copyright statement only in PDF #85

cameronneylon opened this issue Apr 16, 2014 · 3 comments

Comments

@cameronneylon
Copy link
Contributor

Noting for the future. There is an OA tag on the landing page but nothing that gives license information until you hit the PDF itself.

Something for further down the track.

An example: http://scitation.aip.org/content/aip/journal/jap/114/5/10.1063/1.4817422

@emanuil-tolev
Copy link
Contributor

Another note for the future:
URL to pdf: http://scitation.aip.org/deliver/fulltext/aip/journal/jap/114/5/1.4817422.pdf?itemId=/content/aip/journal/jap/114/5/10.1063/1.4817422&mimeType=pdf&containerItemId=content/aip/journal/jap

Nothing out of the ordinary, can be generated by a plugin specific to AIP.


Downloading the whole PDF could be problematic - we do have a lot of bandwidth now, but memory consumption could also be a problem. Still, it should supposedly work. if the license string is present in there, but it will be very brittle. For the size, we could chunk up incoming files (regardless of whether they're PDF-s or not) and run all the needed comparisons on the chunks (e.g. of 1 MB). Then if nothing found, next chunk, and so on.

@cameronneylon
Copy link
Contributor Author

MDPI is another publisher that does this: http://www.mdpi.com/2071-1050/5/7/3095 and the relevant pdf is: http://www.mdpi.com/2071-1050/5/7/3095/pdf

@emanuil-tolev
Copy link
Contributor

Note for future readers here: we don't download PDF-s anymore, so in order to eventually support statements in PDFs this would have to change. We use the robus python-magic library to check the file header, so it's pretty unlikely a PDF will slip by.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants