This capstone project is an image classification model that identifies six categories:
[ Buildings, Forest, Glacier, Mountain, Sea, and Street ].
This project aims to classify images into one of six scene categories using a Deep Learning model. The goal is to:
- Train a robust image classifier using transfer learning.
- Deploy the model as a web application using Streamlit.
- Package the deployment using Docker for easy reproducibility.
The dataset used for this project is the Intel Image Classification Dataset, which consists of labeled images for six different scene categories. The dataset is preprocessed and split into training, validation, and test sets.
Kaggel Link: Intel Image Classification on Kaggle
To better understand the dataset, we performed an extensive EDA, which includes:
- Data Structure: Verified the organization of training, validation, and test sets.
- Image/Label Shape: Checked image consistency and label correctness.
- Class Distribution: Ensured balanced distribution across categories.
- Visualize Pictures: Displayed sample images to confirm dataset integrity.
For dataset analysis, refer to the Jupyter Notebook in the repository.
- Architecture: Pretrained Xception model with fine-tuning.
- Loss Function: Categorical Cross-Entropy.
- Optimizer: Adam Optimizer.
- Evaluation Metrics: Accuracy, Validation Loss.
- Data Augmentation: Applied techniques like flipping, rotation, and brightness adjustment.
- ModelCheckpoint: Saves the best-performing model.
- EarlyStopping: Stops training if no improvement is observed.
- ReduceLROnPlateau: Reduces learning rate on validation loss plateau.
This model is deployed using Streamlit and Docker.
- Upload an image (JPG, PNG, JPEG).
- Select image processing method:
- Resize with padding (Maintain Aspect Ratio).
- Central crop (Remove edges).
- Prediction: The model classifies the image and outputs:
- Predicted Class.
- Confidence Score.
- Streamlit Cloud for web deployment. [Live Demo: Image Classifier]
- Google Drive (gdown) for downloading model weights.
- Docker for containerized application deployment.
git clone https://github.com/Pei-Tong/ml-zoomcamp-capstone.git
cd ml-zoomcamp-capstone
pip install -r requirements.txt
python train.py
streamlit run app.py
docker build -t image-classifier .
docker run -p 5000:5000 image-classifier
Live Demo: Image Classifier
File | Description |
---|---|
app.py |
Streamlit UI for the image classification app. |
train.py |
Model training script using TensorFlow. |
predict.py |
Model inference script. |
service.py |
BentoML service file for model serving. |
Dockerfile |
Docker setup for deployment. |
requirements.txt |
Required dependencies for the project. |
The final model achieves high accuracy on the test dataset. The evaluation metrics are as follows:
Metric | Score |
---|---|
Training Accuracy | 88.05% |
Validation Accuracy | 89.68% |
Test Accuracy | 89.27% |
The following is a screenshot of the Customer Purchase Prediction app interface:
Criteria | Status |
---|---|
Problem Description | ✅ Described in README |
EDA | ✅ Included dataset analysis in Jupyter Notebook |
Model Training | ✅ Used transfer learning, tuning, and augmentation |
Exporting Notebook to Script | ✅ train.py provided for training |
Reproducibility | ✅ Dataset link provided, setup instructions included |
Model Deployment | ✅ Streamlit & Docker used for deployment |
Dependency Management | ✅ requirements.txt included |
Containerization | ✅ Dockerfile provided |
Cloud Deployment | ✅ Hosted on Streamlit Cloud |
- Fine-tuning the model with larger datasets.
- Implementing real-time image classification.
- Deploying on cloud platforms like AWS/GCP.