Skip to content

Spam SMS Detection Project implemented using NLP & Transformers. DistilBERT - a hugging face Transformer model for text classification is used to fine-tune to best suit data to achieve the best results. Multinomial Naive Bayes achieved an F1 score of 0.94, the model was deployed on the Flask server. Application deployed in Google Cloud Platform

License

Notifications You must be signed in to change notification settings

Tejas-TA/Transformer-BERT-SMS-Spam-Detection

Repository files navigation

Transformer BERT SMS Spam Detection

Google Cloud Platform - https://nlp-sms-spam-detection.wm.r.appspot.com/
Heroku - https://spam-sms-detect-nlp.herokuapp.com/


Dataset

https://www.kaggle.com/uciml/sms-spam-collection-dataset

Libraries Used

1. Flask
2. gunicorn
3. itsdangerous
4. Jinja2
5. MarkupSafe
6. Werkzeug
7. Pillow
8. Pickle
9. NLTK
10. Numpy
11. Scikit-learn
12. Pandas
13. Seaborn
14. Joblib
15. Matplotlib
16. HTML
17. CSS
18. Bootstrap
19. JavaScript

Project Walkthrough

1. Exploratory Data Analysis(EDA)
2. Data Cleaning
3. Data Manipulation
4. Feature Engineering
5. Applied Stemming and Lemmatization techniques (Snowball Stemmer, Porter Stemmer, and Wordnet Lemmatizer)
6. Implemented Bag of Words model on the dataset
7. Implemented TF | IDF
8. Model Building - Used Multinomial Naive Bayes and Light GBM Classifier. Achieved 94% F1 Score after hyperparameter tuning with Multinomial Naive Bayes
9. Exported Multinomial Naive Bayes Classifier model using Joblib library
10. Implemented DistilBERT - a hugging face transformer model
11. Fine Tuned DistilBERT
12. Developed Front End
13. Created a flask server and deployed the code
14. Web-app working successfully in Google Cloud Platform and Heroku

Email - tejasta@gmail.com
LinkedIn - https://www.linkedin.com/in/tejas-ta/
Blogs - https://tejasta.medium.com/