crawl the scale of websites
This is a python crawler to get the number of pages of one website. I want to study crawler which implements on distributed system such as hadoop. So, it is the beginning.
I got this idea from a interview for a internship of Google. Although I failed, the interviewer taught me how to write a crowler to get the number of pages of a website in an hour.
I am so sad about the loss, which motivates me a lot to write this crawler.