Skip to content

kanagalingamsm/Email-Spam-Detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EMAIL SPAM DETECTION

20945480

One of the primary methods for spam mail detection is email filtering. It involves categorize incoming emails into spam and non-spam. Machine learning algorithms can be trained to filter out spam mails based on their content and metadata.

DESCRIPTION

• The project code completely done using Python

• Dataset taken from kaggle, link: https://www.kaggle.com/datasets/uciml/sms-spam-collection-dataset/code

• Required packages installed, that are pandas, re, nltk, sklearn, seaborn, matplotlib, tqdm, time

• Data Preprocessing, NLP, Classification and Classification report these are the operations performed

• Logistic Regression used as classification model for this project to get high accuracy for the text data perfomed from NLP operations.

• Visualising confusion matrix by heatmap to get clear performance of the classification model

• Finally, Classification report has been executed.

Other Key steps to Spam Mail Detection:

• Email Filtering: One of the primary methods for spam mail detection is email filtering. It involves categorize incoming emails into spam and non-spam. Machine learning algorithms can be trained to filter out spam mails based on their content and metadata.

• Natural Language Processing: Natural Language Processing (NLP) is a technique that enables machines to understand and process human language. It plays a crucial role in spam detection, as it helps in extracting meaningful features from emails such as subject, body, and attachments.

• Text Classification: Text classification is a supervised learning technique used for spam detection. It involves labelling emails as spam or non-spam based on their features, such as the presence of certain keywords, tone, or grammar.

• Feature Engineering: Feature engineering is the process of selecting relevant features from the email to classify it as spam or non-spam. It involves extracting features such as the sender's email address, the presence of certain words or phrases, and the length of the email.

• Supervised Learning: Supervised learning is a technique that involves training the model on labelled data to predict the labels of new, unlabeled data. It is widely used in spam detection for text classification tasks.

• Unsupervised Learning: Unsupervised learning is a technique used to find hidden patterns in the data without the need for labelled data. It can be used for anomaly detection, clustering, and association rule mining.

• Deep Learning: Deep learning is a subfield of machine learning that involves training deep neural networks with multiple hidden layers to learn complex features from the data. It has shown great promise in spam detection tasks.

• Neural Networks: Neural networks are a type of deep learning model inspired by the human brain. They can be trained to extract meaningful features from emails and classify them as spam or non-spam.