Walmart Product ETL Project

Welcome to the Walmart Product ETL Project repository! This project demonstrates the process of extracting, transforming, and loading (ETL) product data from Walmart's website, using Python for web scraping, data manipulation, and visualisation.

Project Overview

This project was developed as part of the TTTC3213 Data Engineering course at Universiti Kebangsaan Malaysia (UKM). It focuses on extracting product data from Walmart's website, transforming it into a structured format, and analysing it to gain insights such as pricing trends, brand distribution, and customer reviews.

Key Objectives:

Scrape product data (e.g., names, prices, ratings) from Walmart's search results.
Clean and transform the scraped data into a structured format.
Visualise the data to uncover trends and insights.

You can read the detailed explanation of this project in our Medium article:
Mastering Walmart Product ETL: A Data Engineering Journey from Web Scraping to Insights.

Features

Web Scraping: Extract product details (name, price, rating, reviews) from Walmart using Python's requests and BeautifulSoup.
Data Transformation: Clean and structure the data using pandas, including handling missing values and categorising prices.
Data Visualisation: Generate insightful visualisations such as pie charts, scatter plots, and heatmaps using matplotlib and seaborn.
Data Export: Save the cleaned dataset as a CSV file for further analysis.

Technologies Used

The following Python libraries were used in this project:

Web Scraping: requests, BeautifulSoup
Data Manipulation: pandas, numpy
Visualisation: matplotlib, seaborn
Others: json, re, time

Installation

Clone this repository:

git clone https://github.com/jonathanfernandi/Walmart-Product-ETL.git
cd Walmart-Product-ETL

Create a virtual environment (optional but recommended):

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install the required dependencies:
```
pip install -r requirements.txt
```

Usage

Open the Jupyter Notebook file (Walmart_Product_ETL.ipynb) in your preferred IDE or Jupyter environment.
Update the search term in the code to scrape specific product categories:
```
search_term = "laptop"
```
Run all cells to execute the ETL pipeline:
- Extract product data from Walmart.
- Transform and clean the data.
- Visualize insights and save the cleaned dataset.
The final cleaned dataset will be saved as a CSV file:
```
walmart_products.csv
```

Project Structure

Walmart-Product-ETL/
│
├── Walmart_Product_ETL.ipynb    # Main Jupyter Notebook for the ETL process
├── requirements.txt             # List of required Python libraries
├── walmart_products.csv         # Cleaned dataset (generated after running the notebook)
└── README.md                    # Project documentation

Results and Visualisations

1. Pie Chart: Brand Distribution

Displays the proportion of popular laptop brands in the dataset.

2. Heat Map: Feature Correlations

Shows correlations between numerical features like price, rating, reviews, and RAM size.

3. Scatter Plots

RAM vs Price: Highlights how RAM size influences laptop prices.
Rating vs Price: Explores relationships between customer ratings and price categories.

4. Count Plots

Visualises distributions of RAM sizes and ratings across products.

5. KDE Plot

Shows density distributions of ratings across different price categories.

For detailed visualisations, refer to our Medium article linked above.

Contributors

This project was developed by Group INT-2 for the Data Engineering course:

Jonathan Alvindo Fernandi (A207961)
Kevin Maverick (A208051)
Lai Junlin (A197837)

Thank you for exploring our project! If you have any questions or feedback, feel free to reach out or open an issue in this repository!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Walmart Product ETL Project

Project Overview

Features

Technologies Used

Installation

Usage

Project Structure

Results and Visualisations

1. Pie Chart: Brand Distribution

2. Heat Map: Feature Correlations

3. Scatter Plots

4. Count Plots

5. KDE Plot

Contributors

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
Walmart_Product_ETL.ipynb		Walmart_Product_ETL.ipynb
walmart_products.csv		walmart_products.csv

jonathanfernandi/Walmart-Product-ETL

Folders and files

Latest commit

History

Repository files navigation

Walmart Product ETL Project

Project Overview

Features

Technologies Used

Installation

Usage

Project Structure

Results and Visualisations

1. Pie Chart: Brand Distribution

2. Heat Map: Feature Correlations

3. Scatter Plots

4. Count Plots

5. KDE Plot

Contributors

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages