Make a scraper API #109

ArtskydJ · 2019-06-12T14:43:40Z

It would be nice to have a documented API for others to write scrapers.

See #86 (comment)
The current status is not nice.

I might need to make 2 interfaces.

Sites like Gocomics, Arcamax, Comics kingdom, etc. that hold multiple strips series.
Sites like Dilbert, XKCD, Freefall, etc. that just have one strip series.

Goals:

Eliminate some of the gocomics-specific weirdness. (titleAuthorDate, are you serious?)
Look into generators. I don't want to buffer everything into memory. I used to parse the whole gocomics site in memory, and then had memory issues. I need to be writing to disk as I go (for the multiple strip-series sites.) But I'd also like to abstract away the writing to disk.
1. I could pass in a function for them to write to disk.
2. I could make a generator function.
3. I could expose writing to the disk as a library
4. other things that I'm not thinking about yet
5. do I write a bunch of stuff to disk when I am in the process? i forget if this is even relevant/necessary...

ArtskydJ · 2019-06-24T13:01:26Z

I made a scraper API. It has 1 interface. It does not eliminate the gocomics-specific weirdness.

I saw that the current gocomics scraper actually was keeping the whole thing in memory before finally writing to disk, so I kept the same behavior.

After I implement a another multi-comics site scraper, I might have a better idea of how to make an API more specific to multi-comics sites.

ArtskydJ added the high-priority label Jun 12, 2019

ArtskydJ closed this as completed Jun 24, 2019

Provide feedback