Trawler

A job scheduler and analysis tool for webscraping (and other) tasks.

Datasources

Curently the following datasources are implemented:

tiktok get video metadata per hashtag, download them and analyse the text using easyOCR
gab (nazi-twitter) crawl posts for user
onionlist download tor-catalogue from onionlist.org
google dorking fint interesting files and download them
facebook posts and reactions scrape facebook posts, comments and reactions (like, heart, etc)

Can be distributed (workers and c&c on different locations/servers)
Jobs are managed through json files (and can be distrubuted with an adapter like pouchDB)
Multithreaded

Install using docker-compose by running:

docker-compose up