Analysis of Spotify music statistics and characteristics for the most popular tracks, artists, and albums from 2017-2021, as determined by Billboard year-end Top Album charts.
For completion of the Final Project requirement of the Entity Academy / Woz-U Data Science curriculum.
A collaborative project of Bianca Serrano and Katie Ravenwood.
The dataset was created via Python using Spotify's public API and playlists created based on the Billboard Top 200 Albums charts for 2017-2021.
Billboard 200 Top Albums 2017
Billboard 200 Top Albums 2018
Billboard 200 Top Albums 2019
Billboard 200 Top Albums 2020
Billboard 200 Top Albums 2021
- Billboard Album Chart Name & Year
- Album ID, Name, Release Year
- Album artists' names, IDs, popularity, and associated genres
- Track artists' names, IDs, popularity, and associated genres
- Explicit designation
- Audio features
Master Chart Table Creation
All Album Track Table Creation
Data was cleaned and recoded for analysis and machine learning predictions.
Wrangling Cleaning and Recoding
Exploratory analyses included visualization and standardization of audio feature variables, as well as correlation analysis and plotting.
Exploratory Analysis Notebook
Exploratory Analysis RScript
Linear regression and dependent t-tests were used to analyze the correlation between several audio features and McNemar Chi square was used to determine significant changes in genre presence over the dataset time frame. Tracks were grouped using the K Means method. Classification of track genres was tested via K Nearest Neighbors and Random Forest algorithms.
Data Analysis and Machine Learning Notebook
Data Analysis RScript
Visualizations were created in Python and R for analyses and models.
Project was presented via Zoom to Woz U / Entity Academy faculty and students for internal review on 2 February 2022.
View the project presentation video on VIMEO:
Good Vibrations? Spotify x Billboard Top 200 Albums Five-Year Analysis Project