TThis script processes and analyzes your Spotify listening history data stored in JSON files. The analysis includes:
- First song ever played in each year.
- Total music listened to over the years.
- Most and least skipped songs.
- Most and least played songs.
- Saving the analysis results to an Excel file.
- Python 3.x
- Pandas library
- openpyxl library
- JSON files from your Spotify listening history
You can install the required libraries using pip:
pip install pandas openpyxl
The script expects the Spotify data in JSON files with the following structure:
- Each JSON file contains an array of listening events.
- Each event contains details such as
DateTime
,Track Name
,Artist
,Time Played(ms)
, and whether the track was skipped.
[
{
"DateTime": "2024-12-21T12:00:00Z",
"Platform": "Spotify",
"Time Played(ms)": 200000,
"Country": "IN",
"IP_Addr": "192.168.1.1",
"Track Name": "Track A",
"Artist": "Artist A",
"Album": "Album A",
"URL": "https://spotify.com/track/abc123",
"episode_name": null,
"show_name": null,
"episode_uri": null,
"reason_start": "user",
"reason_end": "user",
"shuffle": false,
"skipped": false,
"offline": false,
"Offline_timestamp": null,
"Incognito Mode": false
},
...
]
- Reads all JSON files containing Spotify listening data.
- Aggregates the data across all files and processes it using pandas.
- For each year, it performs the following:
- Extracts the first song played.
- Sums up the total time played.
- Analyzes the most and least skipped songs.
- Analyzes the most and least played songs.
- Saves the results to an Excel file.
get_years(df)
: Extracts unique years from the data.first_song(i, df)
: Prints the first song listened to for the year (or overall).sum_heard_music(i, df)
: Prints the total time spent listening to music in different units (milliseconds, seconds, minutes, hours, days).filtering(df)
: Filters the data and computes the total time played, skip count, and play count for each song.most_skipped_least_skipped(df)
: Finds the most and least skipped songs.most_played_least_played(df)
: Finds the most and least played songs.save_to_csv(most_skipped, least_skipped, most_played, least_played, year, run_loop, all_lines_df=None)
: Saves the analysis results to an Excel file
To run the script, simply execute the Python file
python main.py
- The script expects the JSON files to follow the naming convention:
Streaming_History_Audio_<Year>_<Unique Digit>.json
- Example:
Streaming_History_Audio_2022-2023_1.json
,Streaming_History_Audio_2024_2.json
, etc. - Ensure that your JSON files are placed in a folder named:
my_spotify_data\Spotify Extended Streaming History
. If your data is stored in a different folder, update the path_to_json variable in the script accordingly.
The script generates the following output:
- Excel files for each year analyzed, saved as .xlsx. These include:
- Most skipped tracks.
- Least skipped tracks.
- Most played tracks.
- Least played tracks.
- A comprehensive file named
Entire History.xlsx
, containing the combined analysis of all data. - A raw data CSV file named
Entire raw Audio Data from start to end.csv
, containing the entire dataset used for analysis.
- The script converts timestamps to Indian Standard Time (IST).
- Make sure your Spotify JSON files contain the necessary fields for accurate analysis.
- If no suitable JSON files are found, the script will display an error message in the console.
- A GUI!
- Select multiple timezone for output
- Output to CSV should be optional
- Year Filtering according to user's wishes
- Given that
Skipped
/Skipped_Data
variable was brought to the extended data in late 2022, a workaround to find skipped tracks before 2023 should be found. - Integrate with the liked songs playlist and then find out what songs you've played the least, most, etc.
- Simplify the initial data frame input workflow
This script is provided "as is" without any guarantees or warranties. The author is not liable for any errors, data loss, or other issues that may arise from its use. Use it at your own risk.