Skip to content

gittysam1/Email-spam-classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Email Spam Classification Project

Overview

This project aims to develop a machine learning model for classifying emails as spam or not spam (ham) based on their content. The model is trained using a dataset containing labeled examples of spam and ham emails. Various preprocessing techniques and classification algorithms are employed to build an efficient email spam classifier.

Features

  • Dataset preprocessing: The dataset is cleaned, missing values are handled, and duplicate entries are removed. Text preprocessing techniques such as tokenization, removing stopwords, punctuation, and stemming are applied to clean the text data.
  • Model Training: Several classification algorithms including Naive Bayes, Support Vector Machines, Decision Trees, Logistic Regression, Random Forest, AdaBoost, Bagging, Extra Trees, Gradient Boosting, and XGBoost are trained and evaluated for accuracy and precision.
  • Model Evaluation: Trained models are evaluated using accuracy and precision metrics. The best-performing models are identified based on precision scores.
  • Deployment: A user-friendly web application is developed using Streamlit framework to deploy the best-performing model. Users can input a message, and the application predicts whether the message is spam or not spam.

Use Cases

  • Email Filtering: The model can be integrated into email servers to automatically filter out spam emails, thereby reducing clutter in users' inboxes and improving productivity.
  • Fraud Detection: Similar techniques can be applied to classify fraudulent messages in banking or online platforms, enhancing security and trust.
  • Content Moderation: Social media platforms can utilize this model to detect and filter out spammy or abusive content posted by users, ensuring a safer and more enjoyable user experience.

Requirements

  • Python 3.x
  • Libraries: numpy, pandas, scikit-learn, nltk, matplotlib, seaborn, wordcloud, streamlit

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published