Skip to content

The following python code used OCR, pytesseract libraries to extract words from images, and then uses the spam/ham dataset and classifies the image as spam/ham

License

Notifications You must be signed in to change notification settings

Isvarya12/spam-classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Image Email Spam/Ham Classifier using Pytesseract and OCR

Overview

This project is a simple Image Email Spam/Ham Classifier that utilizes Optical Character Recognition (OCR) through Pytesseract. The aim is to classify emails as spam or ham based on the textual content extracted from images using OCR.

Dataset

The dataset used for training and testing consists of images containing email content labeled as spam or ham. Ensure that you have the dataset available and properly organized.

Requirements

  • Python 3.x
  • Install the required libraries using pip install -r requirements.txt

python predict.py path/to/new/image.jpg
```

Model Evaluation

Evaluate the model's performance by analyzing metrics such as accuracy, precision, recall, and F1 score. The results will be displayed after training and testing.

Additional Notes

  • Make sure to adjust the paths and configurations in the scripts according to your dataset structure.
  • Fine-tune hyperparameters and consider using additional preprocessing techniques for better performance.

Feel free to modify and enhance the code according to your specific requirements. Happy coding!

About

The following python code used OCR, pytesseract libraries to extract words from images, and then uses the spam/ham dataset and classifies the image as spam/ham

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages