You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello,
I've been utilizing brozzler-easy for testing and brozzler looks to be working wonderfully. I have a very large website I am trying to archive and unsure of a few things that I can't figure out through the job-conf.rst.
I'm running a local version of the website on my local machine. So that site is not running from it's public domain. Is there way to get brozzler to replace my local host domain with the actual public domain?
Another question I have, is there any way to boost the performance? Possibly configure it to use more threads? Currently when I setup a brozzler job and monitor it in Brozzler Dashboard, it shows two sites being actively crawled. Is that an example of Brozzler running two threads to crawl the site?
Maybe there's a writeup somewhere explaining optimal ways to use brozzler on a local machine?
greatly appreciate any insights. Sorry to post this here, not sure how else to get in touch with people on this project.
Thank you.
The text was updated successfully, but these errors were encountered:
I'm running a local version of the website on my local machine. So that site is not running from it's public domain. Is there way to get brozzler to replace my local host domain with the actual public domain?
Neither brozzler nor warcprox have that functionality built in. But it sounds doable with /etc/hosts.
Another question I have, is there any way to boost the performance? Possibly configure it to use more threads?
You can configure the number of browsers running simultaneously with the -n,--max-browsers option. But only one browser at a time will work on a single site. You might need to reorganize your crawl if you want more parallelization (depending on what you're doing).
Maybe there's a writeup somewhere explaining optimal ways to use brozzler on a local machine?
Hello,
I've been utilizing brozzler-easy for testing and brozzler looks to be working wonderfully. I have a very large website I am trying to archive and unsure of a few things that I can't figure out through the job-conf.rst.
I'm running a local version of the website on my local machine. So that site is not running from it's public domain. Is there way to get brozzler to replace my local host domain with the actual public domain?
Another question I have, is there any way to boost the performance? Possibly configure it to use more threads? Currently when I setup a brozzler job and monitor it in Brozzler Dashboard, it shows two sites being actively crawled. Is that an example of Brozzler running two threads to crawl the site?
Maybe there's a writeup somewhere explaining optimal ways to use brozzler on a local machine?
greatly appreciate any insights. Sorry to post this here, not sure how else to get in touch with people on this project.
Thank you.
The text was updated successfully, but these errors were encountered: