Driver script #3

Klortho · 2012-05-20T23:14:05Z

Right now this is fetch-samples, but it needs to morph into a real driver script with these features:

Maintain a lightweight sqlite database that describes:
- which articles have been converted and uploaded, and the status of each
- when the last batch (from oa-service) was retrieved
Takes an argument articles, which specifies the list of articles to process.
Either an explicit list, or a reference to an XML file that contains
a list, or (default) all the articles that have been updated since last time,
according to the oa-service.
Takes an argument steps, that specifies which step in the pipeline to execute (default is all):
- Download from PMC
- Unzip
- Reorganize directory
- Convert XML
- Import into Mediawiki
- Upload media files into Mediawiki

The text was updated successfully, but these errors were encountered:

Daniel-Mietchen · 2012-05-20T23:34:14Z

Sounds very similar to the oa-get routine in
https://github.com/erlehmann/open-access-media-importer .

Klortho · 2012-05-20T23:43:40Z

Yes, I saw that. But I didn't think oa-get is ready for prime time, and I just want something simple. But you're right that these should be tied together at some point. I might have to learn Python ...

Daniel-Mietchen · 2012-05-20T23:51:57Z

The OA Media Importer as a whole is not ready yet, but the crawling part mostly is, and using it does not require coding anything in python.

One use case is the "Wikipedia" circle in http://malaria.bibsoup.net/ .

konrad · 2012-05-21T05:07:21Z

I removed download_examples.sh now as fetch-samples.sh does the job.

open-access-media-importer has some dependencies which might be a hurdle for some people. I think for our purpose it is fine to fetch the selected examples with wget. But we should refer to oa-get as tool for downloading other articles than used for our testing.

Klortho mentioned this issue May 10, 2013

Script for automating conversion and import #10

Closed

Daniel-Mietchen mentioned this issue Feb 18, 2014

Enable programmatic full-text import from PMC into Wikisource wpoa/OA-signalling#7

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Driver script #3

Driver script #3

Klortho commented May 20, 2012

Daniel-Mietchen commented May 20, 2012

Klortho commented May 20, 2012

Daniel-Mietchen commented May 20, 2012

konrad commented May 21, 2012

Driver script #3

Driver script #3

Comments

Klortho commented May 20, 2012

Daniel-Mietchen commented May 20, 2012

Klortho commented May 20, 2012

Daniel-Mietchen commented May 20, 2012

konrad commented May 21, 2012