-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
19 changed files
with
14,744 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
# Email Spam Filter | ||
|
||
This project creates an email spam filter based on supervised learning that classifies emails as either spam (unwanted) or ham (legitimate) for my data analysis and vsiualization class. | ||
|
||
I used two supervised learning algorithms, K Nearest Neighbors (KNN) and Naive Bayes, and compared their performances. To train and evaluate these classifiers, I used the Enron spam email dataset, which consists of approximately 34,000 emails. Once the classifiers were trained, I ran them in a Jupyter Notebook to predict whether new emails are spam or ham. | ||
|
||
## Goals | ||
|
||
- Explore and implement the KNN and Naive Bayes algorithms. | ||
- Gain hands-on experience in preprocessing text data, specifically converting emails into numeric features suitable for model processing. | ||
- Set up a supervised learning problem and analyze the results. | ||
- Understand and follow a typical end-to-end supervised machine learning workflow. | ||
- Work with a large, real text dataset. | ||
|
||
## Dataset | ||
|
||
I used the Enron spam email dataset for this project. You can download the dataset using the following links: | ||
|
||
- [Enron Emails Dataset - Dev Set](https://cs.colby.edu/courses/S23/cs251/projects/p6spam/data/enron_dev.zip) | ||
- [Enron Emails Dataset](https://cs.colby.edu/courses/S23/cs251/projects/p6spam/data/enron.zip) |
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Oops, something went wrong.