Training classifiers on offline dataset from Stanford 'Sentiment140' for classifying tweets into classes of emotions.
Aim is to test it online on real-time tweets.
Motivation: This project is applicable to be used for market research, product review summaries, campaign analysis to help make better business decisions.
Python & pySpark
TechStack:
-
Python libraries
- NLTK
- BeautifulSoup
- sklearn
- pyspark
- tweepy
- textblob
- matplotlib
- pandas
- numpy
Phases completed:
- Data cleaning, tokenizing
- Word Vectorizing
- Performing NLP
- Feature extraction
- N-gram testing using Logistic Regression
- Training and evaluting using Multinomial Naive Bayes, Bernoulli Naive Bayes, Ridge Classifier and AdaBoost Classifier
- Ongoing project *