Email Spam Filter

This project creates an email spam filter based on supervised learning that classifies emails as either spam (unwanted) or ham (legitimate) for my data analysis and vsiualization class.

I used two supervised learning algorithms, K Nearest Neighbors (KNN) and Naive Bayes, and compared their performances. To train and evaluate these classifiers, I used the Enron spam email dataset, which consists of approximately 34,000 emails. Once the classifiers were trained, I ran them in a Jupyter Notebook to predict whether new emails are spam or ham.

Goals

Explore and implement the KNN and Naive Bayes algorithms.
Gain hands-on experience in preprocessing text data, specifically converting emails into numeric features suitable for model processing.
Set up a supervised learning problem and analyze the results.
Understand and follow a typical end-to-end supervised machine learning workflow.
Work with a large, real text dataset.

Dataset

I used the Enron spam email dataset for this project. You can download the dataset using the following links:

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
email_preprocessor.py		email_preprocessor.py
knn.ipynb		knn.ipynb
knn.py		knn.py
naive_bayes.ipynb		naive_bayes.ipynb
naive_bayes.py		naive_bayes.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Email Spam Filter

Goals

Dataset

About

Releases

Packages

Languages

License

narittt/spam-classifier

Folders and files

Latest commit

History

Repository files navigation

Email Spam Filter

Goals

Dataset

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages