-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how does worker pick a site after crash? #231
Comments
If you wait an hour, it should start crawling again. See https://github.com/internetarchive/brozzler/blob/e23fa68d6/brozzler/frontier.py#L117. If you can't wait, you could set |
Thanks @nlevitt for your quick reply. The problem is deciding when to make claimed=false. Is there any specific reason to choose 60 minutes or is just random? |
- Configurable claimed limit as it was hard coded to 60. The nodes in case of crash can come back in fairly quick time.
@mishranitin2003 It's not random. It has to be high enough that you will never have one worker claim a site when another is legitimately working on it. The value should not be configurable. |
Scenario: I have warcprox and brozzler worker running on my local machine. While in the middle of archiving a website, if brozzler worker process is killed such as either using 'kill -9 <process_id>' or closing the console session.
After both warcprox and brozzler worker instances are restarted (on same ports as before), the site will not be picked for crawling. This is due to reason that db('Brozzler').table('sites').claimed property = true.
Query:
The text was updated successfully, but these errors were encountered: