Skip to content

SooyeonWon/sales_forecasting_for_new_stores

Repository files navigation

Product Sales Forecasting in Retail Industry

Project: Predictive Analytics Capstone

by Sooyeon Won

Keywords

  • Unsupervised Learning
  • Data Visualisations
  • K-Centroids Clustering/ Segmentation
  • Classification
  • Time Series Forecasting

Summary of Findings

In this analysis, I investigate the sales forecasting of newly opening stores of the company in the retail business. To do so, the following datasets are required.

  • Product category sales records for existing stores
  • Demographic data around existing and new stores
  • Each store information data

In the first part of the analysis, I explored the following current circumstances of the company with visualisations.

  • Traffic Trend
  • Overall Product Sales Trend
  • Demographic Characteristics Comparisons between Existing and New Stores

In the second part of analysis, I mainly conducted the Cluster Analysis as follows.

  • K-Centroids Diagnostics based on various Methods
  • K-Centroids Clustering Analysis - Clusters Comparisons
  • Classification Models (Decision Tree, Random Forest, Boosted Model) Comparisons
  • Predict the appropriate Clusters for new Stores

In the last part of analysis, I forecast product sales of the existing and new stores of the company.

  • Time Series Forecasting Models Comparisons
  • Sales Forecasting of the current and new stores for the next 12 months

This project was the Capstone project of Udacity Nanodegree Program. Therefore, all predictive analytics skills are combined in this project. As earlier mentioned, the outputs are not exactly the same, but highly close to the results from Alteryx, which is the software I used. What is meaningful to me is not just the outcome of the analysis, but the complete understanding of the predictive analytics techniques, which are hidden behind Alteryx tools, and of course, practical usages of python coding.

References

K-Centroids Diagnostics Tool - Alteryx
K-Means Clustering in Python: A Practical Guide
Kmeans - Python sklearn
ETS models - Statsmodel
SARIMAX - statsmodel
Time Series Forecasting Performance Measures With Python
Seasonal ARIMA with Python
K-Centroids Diagnostics - Alteryx Community
K-Centroids Cluster Analysis Tool - Alteryx
ARIMA Tool - Alteryx