vgchartz-full-crawler.py is a python@3 crawler script based on BeautifulSoup. It creates a csv dataset with data from more than 57,000 games. based on data from VGChartz Site.
The dataset is saved in the file specified at cfg/resources.json, by default "dataset/vgsales.csv".
You will need to have some depencies compiled at requirements.txt.
It can be installed by pip.
# Install dependencies
$> pip install -r requirements.txt
# Run
$> python vgchartz-full-crawler.py
The dataset it's composed by this fields, and the data is collected with this methodology.
Field | Description |
---|---|
Rank | Ranking of overall sales |
Name | The games name |
Genre | Genre of the game |
Platform | Platform of the games release (i.e. PC,PS4, etc.) |
Developer | Developer of the game |
Publisher | Publisher of the game |
Vgchartz_Score | Score at VGcharz site |
Critic_Score | Score at Critic |
User_Score | Score by VGcharts users' site |
Total_Shipped | Total worldwide shipments (in millions) |
Total_Sales | Total worldwide sales (in millions) |
NA_Sales | Sales in North America (in millions) |
EU_Sales | Sales in Europe (in millions) |
JP_Sales | Sales in Japan (in millions) |
Other_Sales | Sales in the rest of the world (in millions) |
Release_Date | Year of the game's release |
Last_Update | Last update of this register |
- Remap the columns according the selected values at resources.json
- Add some unit testing
- Dockerize (w/ alpine-python) to ease use and avoid intallations
- Publish at Docker hub
Thanks to Chris Albon