I was given a simulated dataset that mimics customer behavior on the Starbucks rewards mobile app. I performed some data engineering, analysis and visualizations to get more insights to our data. I also made a model which predicts how many of the given offers will be completed by each user.
- Installation
- Project Motivation
- Libraries
- File Descriptions
- Results
- Licensing, Authors, and Acknowledgements
The code should run with no issues using Python versions 3.*.
I used the simulated data set to:
- Inspect and process data to transform to more interpretable formats
- Create an offer-user data for accessing to offer responder statistics.
- Visualize and determine which demographic group responds best to which offer type.
- Preprocess and merge all the data we have for each user and prepare a complete and clean user-offer dataset for the model training
- Apply a machine learning classifier method and build a model to predict the number of responses by each user to the given offers.
python libraries used in this project:
- pandas
- numpy
- math
- statistics
- datetime
- json
- matplotlib.pyplot
- seaborn as sns
- sklearn.model_selection
- sklearn.ensemble
- sklearn.metrics
- pickle
Data folder:
- portfolio.json — containing offer ids and metadata about each offer (duration, type, etc.)
- profile.json — demographic data for each customer
- transcript.json — records for transactions, offers received, offers viewed, and offers complete
image folder: Contains all the visualization plots used in the process
Starbucks_Capstone_notebook.ipynb: A Jupyter notebook containing all the coded used for this analysis
Starbucks_model.sav: Our deployed model for using on new data.
The main findings of the code can be found at the post available here.
Must give credit to Starbucks and Udacity for providing this data. Otherwise, feel free to use the code here as you would like!