Skip to content

Open Crawler || Open Source and Contributing to an open database of crawled data

License

Notifications You must be signed in to change notification settings

darkmash-org/OpenCrawler

 
 

Repository files navigation


Logo

Open Crawler

Open Source Website Crawler

Explore the docs »

Report Bug . Request Feature

Contributors Issues License

Table Of Contents

About The Project

Screen Shot


An Open Source Crawler/Spider

Can be used by anyone... And can be ran on any win / linux computers It ain't any crawler for industrial use as written in a slow programming language and may have its own issues..

The project can be easily used with mongoDB.

The project can also be used for pentesting.

Features

  • Cross Platform
  • Installer for linux
  • Related-CLI Tools (includes ,CLI access to tool, not that good search-tool xD, etc)
  • Memory efficient [ig]
  • Pool Crawling - Use multiple crawlers at same time
  • Supports Robot.txt
  • MongoDB [DB]
  • Language Detection
  • 18 + Checks / Offensive Content Check
  • Proxies
  • Multi Threading
  • Url Scanning
  • Keyword, Desc And recurring words Logging

Getting Started

The first thing is install the project... The installer provided is only for Linux..

In windows the application wont be added to path or requirements be installed soo check out the installation procedure for Windows.

Installation

Linux
git clone https://github.com/merwin-asm/OpenCrawler.git
cd OpenCrawler
chmod +x install.sh && ./install.sh
Windows

You need git, python3 and pip installed

git clone https://github.com/merwin-asm/OpenCrawler.git
cd OpenCrawler
pip install -r requirements.txt

Usage

The project can be used for :

  • Making a (not that good) search engine
  • For Osint
  • For Pentesting
Linux

To see available commands

opencrawler help

or

man opencrawler
Windows

To see available commands

python opencrawler help

Contributing

Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.

  • If you have suggestions for adding or removing projects, feel free to open an issue to discuss it, or directly create a pull request after you edit the README.md file with necessary changes.
  • Please make sure you check your spelling and grammar.
  • Create individual PR for each suggestion.

License

Distributed under the MIT License. See LICENSE for more information.

Authors

  • Merwin A J - CS Student - Merwin A J - Build OpenCrawler

Uses Materials From :

About

Open Crawler || Open Source and Contributing to an open database of crawled data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 97.5%
  • Roff 2.3%
  • Shell 0.2%