Skip to content

Commit

Permalink
upload to github
Browse files Browse the repository at this point in the history
  • Loading branch information
narittt committed Jun 21, 2023
1 parent a8c3d48 commit 71c58f4
Show file tree
Hide file tree
Showing 19 changed files with 14,744 additions and 0 deletions.
Binary file added .DS_Store
Binary file not shown.
20 changes: 20 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Email Spam Filter

This project creates an email spam filter based on supervised learning that classifies emails as either spam (unwanted) or ham (legitimate) for my data analysis and vsiualization class.

I used two supervised learning algorithms, K Nearest Neighbors (KNN) and Naive Bayes, and compared their performances. To train and evaluate these classifiers, I used the Enron spam email dataset, which consists of approximately 34,000 emails. Once the classifiers were trained, I ran them in a Jupyter Notebook to predict whether new emails are spam or ham.

## Goals

- Explore and implement the KNN and Naive Bayes algorithms.
- Gain hands-on experience in preprocessing text data, specifically converting emails into numeric features suitable for model processing.
- Set up a supervised learning problem and analyze the results.
- Understand and follow a typical end-to-end supervised machine learning workflow.
- Work with a large, real text dataset.

## Dataset

I used the Enron spam email dataset for this project. You can download the dataset using the following links:

- [Enron Emails Dataset - Dev Set](https://cs.colby.edu/courses/S23/cs251/projects/p6spam/data/enron_dev.zip)
- [Enron Emails Dataset](https://cs.colby.edu/courses/S23/cs251/projects/p6spam/data/enron.zip)
Binary file added data/.DS_Store
Binary file not shown.
Binary file added data/email_test_inds.npy
Binary file not shown.
Binary file added data/email_test_x.npy
Binary file not shown.
Binary file added data/email_test_y.npy
Binary file not shown.
Binary file added data/email_train_inds.npy
Binary file not shown.
Binary file added data/email_train_x.npy
Binary file not shown.
Binary file added data/email_train_y.npy
Binary file not shown.
Loading

0 comments on commit 71c58f4

Please sign in to comment.