You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sites like Gocomics, Arcamax, Comics kingdom, etc. that hold multiple strips series.
Sites like Dilbert, XKCD, Freefall, etc. that just have one strip series.
Goals:
Eliminate some of the gocomics-specific weirdness. (titleAuthorDate, are you serious?)
Look into generators. I don't want to buffer everything into memory. I used to parse the whole gocomics site in memory, and then had memory issues. I need to be writing to disk as I go (for the multiple strip-series sites.) But I'd also like to abstract away the writing to disk.
I could pass in a function for them to write to disk.
I could make a generator function.
I could expose writing to the disk as a library
other things that I'm not thinking about yet
do I write a bunch of stuff to disk when I am in the process? i forget if this is even relevant/necessary...
The text was updated successfully, but these errors were encountered:
It would be nice to have a documented API for others to write scrapers.
See #86 (comment)
The current status is not nice.
I might need to make 2 interfaces.
Goals:
The text was updated successfully, but these errors were encountered: