Skip to content

Email Spam Classification with Spark streaming and Predictive Data Modelling

Notifications You must be signed in to change notification settings

Abhishek4848/Email_Spam_Classifier

Repository files navigation

Real world spark streaming and predictive data modelling.

Email spam Classification

  1. Each record consists of 3 features - the subject, the email content and the label

  2. Each email is one of 2 classes, spam or ham

  3. 30k examples in train and 3k in test

Dataset Link: Email spam

How to run

run the python file which will send the data over tcp connection

python3 stream.py -f <dataset name> -b <batch size>

execute the spark fetch with the help of spark submit

$SPARK_HOME/bin/spark-submit spark_fetch.py 2>log.txt

Demo to run

need to experiment with the batch size ( >1000).

running the stream.py file image

running the spark_fetch file image