This project aims to develop a sales forecasting model using data science techniques. It leverages historical sales data to predict future sales trends, enabling better inventory management and sales strategies.
The dataset contains 8,524 rows of sales data with the following key features:
- Item_Identifier: Unique identifier for each product.
- Item_Weight: Weight of the item (in kg).
- Item_Fat_Content: Categorization of item fat content (e.g., Low Fat, Regular).
- Item_Visibility: Visibility score of the item.
- Item_Type: Type of item (e.g., Dairy, Snacks).
- Item_MRP: Maximum retail price (in INR).
- Outlet_Identifier: Unique identifier for each outlet.
- Outlet_Establishment_Year: Year of outlet establishment.
- Outlet_Size: Size of the outlet (e.g., Small, Medium).
- Outlet_Location_Type: Location type of the outlet (e.g., Tier 1, Tier 2).
- Outlet_Type: Type of outlet (e.g., Supermarket).
- Item_Outlet_Sales: Sales of the item at the outlet (in INR).
- Total Items: 3,300 unique items across different categories. π
- Top Selling Category: Dairy products account for 35% of total sales. π₯
- Average Item MRP: βΉ 150.00
- Highest Sales Record: The highest recorded sales for a single item reached βΉ 4,022.76. π°
- Data Cleaning: Addressed missing values and outliers.
- Exploratory Data Analysis (EDA): Analyzed trends and patterns in sales data.
- Model Development: Implemented machine learning algorithms to forecast sales.
- Performance Evaluation: Evaluated the model using metrics like RMSE and MAE.
- Python
- Pandas
- Scikit-Learn
- Matplotlib
- Seaborn
To run this project locally, follow these steps:
-
Clone this repository.
-
Install required packages:
- Pandas: For data manipulation and analysis
- NumPy: For numerical computations
- Matplotlib: For data visualization
- Seaborn: For statistical data visualization
- Scikit-Learn: For machine learning algorithms and data preprocessing
- StatsModels: For statistical modeling
For any queries, feel free to reach out: