Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multiple html pages #6

Closed
sathish316 opened this issue Jun 4, 2012 · 3 comments
Closed

Support multiple html pages #6

sathish316 opened this issue Jun 4, 2012 · 3 comments

Comments

@sathish316
Copy link
Owner

Support multiple htmls if content is spread across pages.

Most lists like Kindle Top 100 books or Time Top 100 books are spread across 5 pages with 20 books/page.

http://www.amazon.com/Best-Sellers-Kindle-Store/zgbs/digital-text

It would be really easy if html supports an array of pages and crawls all 5 pages

class KindleTop100
  include Scrapify::Base
  html "http://amazon.com/kindle/1-25", "http://amazon.com/kindle/26-50", "http://amazon.com/kindle/51-75", "http://amazon.com/kindle/76-100"
end
@kalarani
Copy link
Collaborator

What do you expect KindleTop100.url to return in this case? I'm thinking of renaming it to urls and hold all the values. Let me know if you think otherwise.

@kalarani
Copy link
Collaborator

Duplicate of #29

@sathish316
Copy link
Owner Author

This is still in an experimental branch (nextpage). The implementation is not optimal because each page is fetched N times if there are N attributes without any caching.

#url or #urls won't matter bcos it's not supposed to be a public method like find and all.

https://github.com/sathish316/scrapify/blob/nextpage/spec/models/magazine.rb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants