Skip to content

Email spam classifier based on supervised learning algorithms. Compares the performance of KNN vs. a Naive Bayes approach on the Enron emails dataset.

License

Notifications You must be signed in to change notification settings

narittt/spam-classifier

Repository files navigation

Email Spam Filter

This project creates an email spam filter based on supervised learning that classifies emails as either spam (unwanted) or ham (legitimate) for my data analysis and vsiualization class.

I used two supervised learning algorithms, K Nearest Neighbors (KNN) and Naive Bayes, and compared their performances. To train and evaluate these classifiers, I used the Enron spam email dataset, which consists of approximately 34,000 emails. Once the classifiers were trained, I ran them in a Jupyter Notebook to predict whether new emails are spam or ham.

Goals

  • Explore and implement the KNN and Naive Bayes algorithms.
  • Gain hands-on experience in preprocessing text data, specifically converting emails into numeric features suitable for model processing.
  • Set up a supervised learning problem and analyze the results.
  • Understand and follow a typical end-to-end supervised machine learning workflow.
  • Work with a large, real text dataset.

Dataset

I used the Enron spam email dataset for this project. You can download the dataset using the following links:

About

Email spam classifier based on supervised learning algorithms. Compares the performance of KNN vs. a Naive Bayes approach on the Enron emails dataset.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published