Skip to content

This script processes and analyzes Spotify's extended streaming history data. It calculates metrics such as total playback time, most skipped songs, least skipped songs, and most played songs

Notifications You must be signed in to change notification settings

mohdaadilf/Spotify-Stat-Wrapper

Repository files navigation

Spotify Stat Wrapper

TThis script processes and analyzes your Spotify listening history data stored in JSON files. The analysis includes:

  • First song ever played in each year.
  • Total music listened to over the years.
  • Most and least skipped songs.
  • Most and least played songs.
  • Saving the analysis results to an Excel file.

Requirements

  • Python 3.x
  • Pandas library
  • openpyxl library
  • JSON files from your Spotify listening history

You can install the required libraries using pip:

pip install pandas openpyxl

Structure of the Data

The script expects the Spotify data in JSON files with the following structure:

  • Each JSON file contains an array of listening events.
  • Each event contains details such as DateTime, Track Name, Artist, Time Played(ms), and whether the track was skipped.

Example JSON structure

[
    {
        "DateTime": "2024-12-21T12:00:00Z",
        "Platform": "Spotify",
        "Time Played(ms)": 200000,
        "Country": "IN",
        "IP_Addr": "192.168.1.1",
        "Track Name": "Track A",
        "Artist": "Artist A",
        "Album": "Album A",
        "URL": "https://spotify.com/track/abc123",
        "episode_name": null,
        "show_name": null,
        "episode_uri": null,
        "reason_start": "user",
        "reason_end": "user",
        "shuffle": false,
        "skipped": false,
        "offline": false,
        "Offline_timestamp": null,
        "Incognito Mode": false
    },
    ...
]

Script Overview

main() function:

  • Reads all JSON files containing Spotify listening data.
  • Aggregates the data across all files and processes it using pandas.
  • For each year, it performs the following:
    • Extracts the first song played.
    • Sums up the total time played.
    • Analyzes the most and least skipped songs.
    • Analyzes the most and least played songs.
    • Saves the results to an Excel file.

Analysis Functions:

  • get_years(df): Extracts unique years from the data.
  • first_song(i, df): Prints the first song listened to for the year (or overall).
  • sum_heard_music(i, df): Prints the total time spent listening to music in different units (milliseconds, seconds, minutes, hours, days).
  • filtering(df): Filters the data and computes the total time played, skip count, and play count for each song.
  • most_skipped_least_skipped(df): Finds the most and least skipped songs.
  • most_played_least_played(df): Finds the most and least played songs.
  • save_to_csv(most_skipped, least_skipped, most_played, least_played, year, run_loop, all_lines_df=None): Saves the analysis results to an Excel file

Running the Script

To run the script, simply execute the Python file

python main.py

Naming Convention and location of JSON files

  • The script expects the JSON files to follow the naming convention: Streaming_History_Audio_<Year>_<Unique Digit>.json
  • Example: Streaming_History_Audio_2022-2023_1.json, Streaming_History_Audio_2024_2.json, etc.
  • Ensure that your JSON files are placed in a folder named: my_spotify_data\Spotify Extended Streaming History. If your data is stored in a different folder, update the path_to_json variable in the script accordingly.

Output

The script generates the following output:

  • Excel files for each year analyzed, saved as .xlsx. These include:
    • Most skipped tracks.
    • Least skipped tracks.
    • Most played tracks.
    • Least played tracks.
    • A comprehensive file named Entire History.xlsx, containing the combined analysis of all data.
    • A raw data CSV file named Entire raw Audio Data from start to end.csv, containing the entire dataset used for analysis.

Notes

  • The script converts timestamps to Indian Standard Time (IST).
  • Make sure your Spotify JSON files contain the necessary fields for accurate analysis.
  • If no suitable JSON files are found, the script will display an error message in the console.

TODO:

  • A GUI!
    • Select multiple timezone for output
    • Output to CSV should be optional
    • Year Filtering according to user's wishes
  • Given that Skipped/Skipped_Data variable was brought to the extended data in late 2022, a workaround to find skipped tracks before 2023 should be found.
  • Integrate with the liked songs playlist and then find out what songs you've played the least, most, etc.
  • Simplify the initial data frame input workflow

Disclaimer

This script is provided "as is" without any guarantees or warranties. The author is not liable for any errors, data loss, or other issues that may arise from its use. Use it at your own risk.

About

This script processes and analyzes Spotify's extended streaming history data. It calculates metrics such as total playback time, most skipped songs, least skipped songs, and most played songs

Topics

Resources

Stars

Watchers

Forks

Languages