ScaleOfWebsites

crawl the scale of websites

This is a python crawler to get the number of pages of one website. I want to study crawler which implements on distributed system such as hadoop. So, it is the beginning.

I got this idea from a interview for a internship of Google. Although I failed, the interviewer taught me how to write a crowler to get the number of pages of a website in an hour.

I am so sad about the loss, which motivates me a lot to write this crawler.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
crawl.py		crawl.py
crawl2.py~		crawl2.py~
crawlBfs.py		crawlBfs.py
crawlDfs.py		crawlDfs.py
logs		logs
logs~		logs~
res_bfs		res_bfs
res~		res~

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ScaleOfWebsites

About

Releases

Packages

Languages

wubdut/ScaleOfWebsites

Folders and files

Latest commit

History

Repository files navigation

ScaleOfWebsites

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages