WebCrawler

A web crawler which bots the wikipedia website recursively. Go to a Wikipedia page you find interesting, or just a random one and click the first link. Then on that page click the first link in the main body of the article text and just keep going.

Automating the Wikipedia Crawl

Whilst it's interesting to click through Wikipedia, it takes a lot of time to click through and read all those articles. We're going to work on automating this process, ending up with a program that will go through Wikipedia for us, keeping track of the first links on each page and seeing where they lead. In order to do this, we'll need to find out about how web pages work and get to know some of the Python tools we can use to interact with the web and web content.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.Rhistory		.Rhistory
README.md		README.md
Sample_Output.png		Sample_Output.png
WebCrawler.py		WebCrawler.py
__init__.py		__init__.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WebCrawler

Automating the Wikipedia Crawl

About

Releases

Packages

Languages

tejasvin/WebCrawler

Folders and files

Latest commit

History

Repository files navigation

WebCrawler

Automating the Wikipedia Crawl

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages