Skip to content

A web crawler which bots wikipedia recursively to attain some "Philosophy" :)

Notifications You must be signed in to change notification settings

tejasvin/WebCrawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WebCrawler

A web crawler which bots the wikipedia website recursively. Go to a Wikipedia page you find interesting, or just a random one and click the first link. Then on that page click the first link in the main body of the article text and just keep going.

Automating the Wikipedia Crawl

Whilst it's interesting to click through Wikipedia, it takes a lot of time to click through and read all those articles. We're going to work on automating this process, ending up with a program that will go through Wikipedia for us, keeping track of the first links on each page and seeing where they lead. In order to do this, we'll need to find out about how web pages work and get to know some of the Python tools we can use to interact with the web and web content.

About

A web crawler which bots wikipedia recursively to attain some "Philosophy" :)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages