This project is a simple Image Email Spam/Ham Classifier that utilizes Optical Character Recognition (OCR) through Pytesseract. The aim is to classify emails as spam or ham based on the textual content extracted from images using OCR.
The dataset used for training and testing consists of images containing email content labeled as spam or ham. Ensure that you have the dataset available and properly organized.
- Python 3.x
- Install the required libraries using
pip install -r requirements.txt
python predict.py path/to/new/image.jpg
```
Evaluate the model's performance by analyzing metrics such as accuracy, precision, recall, and F1 score. The results will be displayed after training and testing.
- Make sure to adjust the paths and configurations in the scripts according to your dataset structure.
- Fine-tune hyperparameters and consider using additional preprocessing techniques for better performance.
Feel free to modify and enhance the code according to your specific requirements. Happy coding!