Added datCrawlerWorker
class:
- This now does the download and crawling "phases"
- new:
datCrawl.worker(url)
anddatCrawl.match(url)
, both returns adatCrawlerWorker
Modified crawler behaviour
- Now every regular expression of a certain URL needs a group called
url
. That group will be the URL sent to the associatedDownloader
. - The core send the match object (
re.compile.match()
) as akwarg
to theCrawler
called:matches
, so you can play around with URL values too accessing it viakwargs.get('matches')
. - Added an Exception and test cases for this behaviour: Checking for the
url
group on a pattern and checking for the kwargs being sent correctly.
Added options to Downloaders
- Passed via kwarg
options
- This will improve the reusability of the Downloader, so you don't have separated classes for proxies, user agents, etc.
- Fixing pypi package
- Initial release