The main goal of the project is to use NLP techniques to predict what is the name of an author from his quotes.
Link : https://www.kaggle.com/datasets/rafsunahmad/popular-quotes-author-classifier
This project consists of code that uses NLP techniques for author name prediction from the specific author's quotes. The dataset consists of two columns, one for Quotes and the other for the name of the author.
- Extract the data from the csv file using pandas
- Apply the preprocessing steps that include missing values removal, duplicate removal, data augmentation and a few other minor steps
- Use the DL Models to make the predictions
- BERT
- Word2Vec + RNN(LSTM)
- TF-IDF + Softmax Regression
- tensorflow
- pandas
- nlpaug
- numpy
- transformers
- scikit-learn
- gensim
- matplotlib
- nltk
- BERT - 91.32%
- Word2Vec + RNN(LSTM) - 50.00%
- TF-IDF + Softmax Regression - 86.57%
From the above accuracies, we conclude that BERT is the model with the best performance.
Hi, I am Iman Kalyan Chakraborty, a passionate ML and DL developer from Kolkata. Here are my socials :
Twitter / X - https://twitter.com/ikc1975
LinkedIn - https://www.linkedin.com/in/imankalyanchakraborty/