Support multiple html pages #6

sathish316 · 2012-06-04T20:08:19Z

Support multiple htmls if content is spread across pages.

Most lists like Kindle Top 100 books or Time Top 100 books are spread across 5 pages with 20 books/page.

http://www.amazon.com/Best-Sellers-Kindle-Store/zgbs/digital-text

It would be really easy if html supports an array of pages and crawls all 5 pages

class KindleTop100
  include Scrapify::Base
  html "http://amazon.com/kindle/1-25", "http://amazon.com/kindle/26-50", "http://amazon.com/kindle/51-75", "http://amazon.com/kindle/76-100"
end

The text was updated successfully, but these errors were encountered:

kalarani · 2012-07-18T12:24:36Z

What do you expect KindleTop100.url to return in this case? I'm thinking of renaming it to urls and hold all the values. Let me know if you think otherwise.

kalarani · 2012-07-18T13:03:28Z

Duplicate of #29

sathish316 · 2012-07-18T14:28:15Z

This is still in an experimental branch (nextpage). The implementation is not optimal because each page is fetched N times if there are N attributes without any caching.

#url or #urls won't matter bcos it's not supposed to be a public method like find and all.

https://github.com/sathish316/scrapify/blob/nextpage/spec/models/magazine.rb

kalarani closed this as completed Jul 18, 2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support multiple html pages #6

Support multiple html pages #6

sathish316 commented Jun 4, 2012

kalarani commented Jul 18, 2012

kalarani commented Jul 18, 2012

sathish316 commented Jul 18, 2012

Support multiple html pages #6

Support multiple html pages #6

Comments

sathish316 commented Jun 4, 2012

kalarani commented Jul 18, 2012

kalarani commented Jul 18, 2012

sathish316 commented Jul 18, 2012