GitHub - ayals99/Multithreaded_WebScraper_Wikimapia: Personal Project: A multithreaded Web scraper I wrote in Python, using the BeautifulSoup and requests libraries.

Personal Project: A multithreaded Web scraper I wrote in Python, using the BeautifulSoup and requests libraries.

Result: For each Geographical site in a Country on Wikimapia.org, this program gets the site's coordinates in decimal format. The final result is a JSON file, organized by:

{'Country Name' : {'District' : {'City' : { "Geographical Site" : {'Latitude', 'Longitude'} } } } }

HTTP Connection: To avoid getting banned from the site, the program rotates between different User Agents for each request and waits a random amount of time after a request.

Threads: The program splits the workload into five threads where each thread gets a country to scrape since the work is I/O intensive.

Scraping: The Wikimapia website is organized by Country, District, and then City. Inside a City's webpage, there can be many subpages, each one holding up to 50 links to Geographical sites. The program can scroll through all those subpages, enter every link they old, and extract data about the Geographical Site (in this case Latitude and Longitude).

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.gitattributes		.gitattributes
README.md		README.md
Wikimapia_webscraper.py		Wikimapia_webscraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

ayals99/Multithreaded_WebScraper_Wikimapia

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages