GitHub - bhanu13/CrawlnPeek: WebCrawler | Network Visualizer

CrawlnPeek | A Micro WebCrawler and Network Visualizer

The program crawls the given website URL by following anchor tags based on Breadth First Search and indexes the website. Then it saves the relevant crawled data in JSON and visualizes the domain's connectivity.

Usage:

python main.py http://www.example.com

Features:

- Creates a list of all pages indexed on a website - Creates a list of indexed pages with their relative depths and respective predecessors - Creates an image of the website network - Saves the indexed URLs to a JSON file - Added robustness to handle complex data parsing and broken hyperlinks - Added limits for maxdepth and maxpages indexed - Added support for relative links i.e. (href = "/source")

Requirements:

- Requests or Requests[Security] to allow true SSL connections - JSON library - Matplotlib to allow plotting the graph - Networkx to create the Graph using the list data

Examples:

An image of codeacademy.com Network:

An image of google.com Network @ 100 pages:

NOTE: Crawling can sometimes a really long time depending on the maxpages specified. It has a default value of 100 pages.

Future:

- Set up a web app to perform crawling on a given user query and then present an interactive visualization. - Use d3.js to visualize the website tree.

Author - bagarwa2

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
data		data
.DS_Store		.DS_Store
.gitignore		.gitignore
crawler.py		crawler.py
main.py		main.py
parser.py		parser.py
readme.md		readme.md
timeout.py		timeout.py
visualize.py		visualize.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CrawlnPeek | A Micro WebCrawler and Network Visualizer

Usage:

Features:

Requirements:

Examples:

Future:

About

Releases

Packages

Languages

bhanu13/CrawlnPeek

Folders and files

Latest commit

History

Repository files navigation

CrawlnPeek | A Micro WebCrawler and Network Visualizer

Usage:

Features:

Requirements:

Examples:

Future:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages