Time series data, collected at regular time intervals, is essential for businesses to understand how past events influence the future. Forecasting is the process of estimating future observations based on historical data. Time series forecasting is a statistical method used to analyze time-based patterns in data, helping organizations model and predict future behavior. It serves as a bridge connecting the past, present, and future.
Forecasting is vital in various domains, including supply chain management, stock prediction, weather forecasting, and biomedical monitoring. In this project, we aim to predict store sales using Greykite, a Python library developed by LinkedIn, and the Neural Prophet model developed by Facebook.
The objective is to predict future sales/demand using historical data and other relevant features using Greykite and Neural Prophet.
We use Walmart store sales data, which includes historical sales data for 45 Walmart stores located in different regions. Each store contains multiple departments. The dataset comprises four main files:
- Stores.csv: Information about the 45 stores, including their type and size.
- Train.csv: Historical training data covering the period from 2010-02-05 to 2012-11-01.
- Test.csv: Identical to train.csv, except it lacks the weekly sales that need to be predicted.
- Features.csv: Additional data related to stores, departments, and regional activities for specific dates.
The key features in the dataset include:
- Store number
- Date (week)
- Department number
- Average temperature in the region
- Fuel price
- MarkDown1-5 (anonymized data related to promotional markdowns)
- Consumer price index (CPI)
- Unemployment rate
- Special holiday weeks (IsHoliday)
- Weekly sales for a given department in a store
- Language:
Python
- Libraries:
Greykite
,Neural Prophet
,Sci-kit Learn
,Pandas
,Pandas Profiling
,Matplotlib
,Datetime
,Plotly
,Seaborn
,Numpy
-
Exploratory Data Analysis (EDA):
- Feature analysis
- Data visualization using Pandas Profiling
-
Data Cleaning:
- Handling missing values
- Detecting and handling outliers
-
Feature Engineering:
- Extracting day, month, and year from the date
- Mapping and encoding
-
Time Series Component Analysis:
- Analyzing trends and seasonality
-
Model Building:
- Greykite
- Neural Prophet
-
Model Evaluation:
- Mean Absolute Percent Error
- RMSE
-
Forecasting Using Trained Models
-
Input: Contains the data files used for analysis (features.csv, stores.csv, test.csv, and train.csv).
-
Src: The core of the project, containing modularized code for various steps:
- ML_pipeline
- engine.py
- server.py
-
Output: Contains trained models for future use.
-
Lib: Reference materials, including the original IPython notebook
-
requirements.txt: Lists all required libraries and their versions. Install these libraries using
pip install -r requirements.txt
.
Note: For installing the Neural Prophet and Greykite libraries, refer to the document "Steps to Install Neural Prophet and Greykite Libraries."