Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adaptive crawl rate #102

Open
5 tasks
brendanheywood opened this issue Dec 17, 2019 · 0 comments
Open
5 tasks

Adaptive crawl rate #102

brendanheywood opened this issue Dec 17, 2019 · 0 comments

Comments

@brendanheywood
Copy link
Contributor

brendanheywood commented Dec 17, 2019

ie if a url works, and has worked consistently, then slowly ramp down how often it is rescraped. The key factors we'd want to consider using are:

  • when it was last crawled
  • its history of good crawls
  • how often it if viewed (uselogs)
  • how often it is edited
  • if the links going out (and maybe in) have changed

I don't want to use a factor like 'is the course inactive'. We can already do this for free, the robot user should only have access to content it should crawl so this can be managed by role assignments to course categories etc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant