GitHub - darkmash-org/OpenCrawler: Open Crawler || Open Source and Contributing to an open database of crawled data

Open Crawler

Open Source Website Crawler

Explore the docs »

Report Bug . Request Feature

About The Project

An Open Source Crawler/Spider

Can be used by anyone... And can be ran on any win / linux computers It ain't any crawler for industrial use as written in a slow programming language and may have its own issues..

The project can be easily used with mongoDB.

The project can also be used for pentesting.

Features

Cross Platform
Installer for linux
Related-CLI Tools (includes ,CLI access to tool, not that good search-tool xD, etc)
Memory efficient [ig]
Pool Crawling - Use multiple crawlers at same time
Supports Robot.txt
MongoDB [DB]
Language Detection
18 + Checks / Offensive Content Check
Proxies
Multi Threading
Url Scanning
Keyword, Desc And recurring words Logging

Getting Started

The first thing is install the project... The installer provided is only for Linux..

In windows the application wont be added to path or requirements be installed soo check out the installation procedure for Windows.

Installation

Linux

git clone https://github.com/merwin-asm/OpenCrawler.git

cd OpenCrawler

chmod +x install.sh && ./install.sh

Windows

You need git, python3 and pip installed

git clone https://github.com/merwin-asm/OpenCrawler.git

cd OpenCrawler

pip install -r requirements.txt

Usage

The project can be used for :

Making a (not that good) search engine
For Osint
For Pentesting

Linux

To see available commands

opencrawler help

or

man opencrawler

Windows

To see available commands

python opencrawler help

Contributing

Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have suggestions for adding or removing projects, feel free to open an issue to discuss it, or directly create a pull request after you edit the README.md file with necessary changes.
Please make sure you check your spelling and grammar.
Create individual PR for each suggestion.

License

Distributed under the MIT License. See LICENSE for more information.

Authors

Merwin A J - CS Student - Merwin A J - Build OpenCrawler

Uses Materials From :

https://github.com/coffee-and-fun/google-profanity-words

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
.github		.github
LICENSE		LICENSE
README.md		README.md
bad_words.txt		bad_words.txt
config.py		config.py
connection_tree.py		connection_tree.py
crawler.py		crawler.py
current.ver		current.ver
docs.md		docs.md
fix_db.py		fix_db.py
install.sh		install.sh
installer.py		installer.py
mongo_db.py		mongo_db.py
opencrawler		opencrawler
opencrawler.1		opencrawler.1
requirements.txt		requirements.txt
robots_txt.py		robots_txt.py
search.py		search.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Open Crawler

Table Of Contents

About The Project

Features

Getting Started

Installation

Linux

Windows

Usage

Linux

Windows

Contributing

License

Authors

Uses Materials From :

About

Releases

Packages

Languages

License

darkmash-org/OpenCrawler

Folders and files

Latest commit

History

Repository files navigation

Open Crawler

Table Of Contents

About The Project

Features

Getting Started

Installation

Linux

Windows

Usage

Linux

Windows

Contributing

License

Authors

Uses Materials From :

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages