Skip to content

This is a reviewed notebook to the DSN AI+ OAU July 2020 challenge where I took the first position. The project demonstrate proper Exploratory Data Analysis, categorical data handling, Data visualization, feature selection and engineering, Model selection and hyperparameter tuning of various regression algorithms.

Notifications You must be signed in to change notification settings

DeleLinus/Supermarket-Sales-Prediction

Repository files navigation

Python Pandas scikit-learn SciPy keras mySQL Jupyter

Product Supermarket Sales Prediction

Abstract

A stage of a business life cycle that requires strategic and careful measures to be put in place is the growth stage. Growth stage comes after the Business has been launched, and part of a business growth can be expansion into more locations.

It however requires proper and critical analysis to know what location and product type is best for a specific location. For example, a tin of milk which sells for N100 in one supermarket branch may also be sold at N110 at another supermarket within the same chain of supermarkets. Hence, there's a need to understand what type of product, market clusters and supermarket type (location, age, size) will give more margin as business is expanded to more locations.

In this analysis, a predictive model is developed using machine learning algorithms to improve and accurately forecasts product sales. The proposed model is especially targeted to identify key characteristics of products and supermarkets driving sales so as to be better informed on an optimal template for expansion of Chuwkwudi Supermarket to other states in Nigeria. The model is not intended to change current subjective forecasting methods. A model based on real supermarket store's data is developed in order to validate the use of the various machine learning algorithms

Introduction

This is an in house kaggle competition organized by AI+ OAU where the task is to predict product supermarket sales to help identify key characteristics of products and supermarkets driving sales so as to be better informed on an optimal template for expansion of Chukwudi Supermarket to other states in Nigeria.

For this particular problem, I have analyzed the data as a supervised learning problem. In order to forecasts the sales, I have compared different regression models like Linear Regression, Decision Tree, ExtraTreeRegressor, Gradient Boosting, Random Forest, XgBoost and Neural Network.

The data comes in the shape of multiple files BUT to demonstrate my SQL proficiency had to load the data into a MySQL database. First, the train table essentially contains the sales by supermarket, product and so on. The test table contains the same features without the product supermarket sales information, which I am tasked to predict.

Data Sources

The data has can be seen here comes in the shape of multiple files BUT to demonstrate my SQL proficiency had to load the data into a MySQL database. The file containing the mysql scripts that created the database and the data is saved as chukwudi_supermarket.sql in the db and data scripts folder. The database contain the following tables:

  • train contains the sales by supermarket, product and so on.
  • test contains the same features as train but without the product supermarket sales information
  • sample_submission contains Supermarket id and dummy product supermarket sales values. This serve as a submission template

Installation

Install all requirements by running the following command

pip install requirements.txt

Issues

Incase you have any difficulties or issues while trying to run the app you can raise it on the issues section.

Pull Requests

If you have something to add or new idea to implement, you are welcome to create a pull requests.

Give it a Star

If you find this repo useful , give it a star so as many people can get to know it.

Credits

About

This is a reviewed notebook to the DSN AI+ OAU July 2020 challenge where I took the first position. The project demonstrate proper Exploratory Data Analysis, categorical data handling, Data visualization, feature selection and engineering, Model selection and hyperparameter tuning of various regression algorithms.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published