Check out my Medium post "Sentiment Classification with Logistic Regression — Analyzing Yelp Reviews" here.
Check out my Kaggle kernel here.
I built a sentiment classification model using logistic regression and tried out different strategies to improve upon the simple model. Among those ideas, including bigrams as features has the most improvement in F1 score. For both the simple model and the improved model, I also analyzed its most important textual features.
Sentiment analysis is a highly effective tool for a business to not only take a look at the overall brand perception, but also evaluate customer attitudes and emotions towards a specific product line or service. This data-driven approach can help the business better understand the customers and detect subtle shifts in their opinions in order to meet changing demand.
- Peek at the Review Data
- Convert Stars into Categories
- Decide on Evaluation Metric
- Text Processing & Vectorization
- Model Development and Evaluation
- Visualize Feature Importance
- Analyze Improvement Strategies
I did my analysis through Kaggle kernel and I recommended you to do so as well, mostly based on two reasons:
- The size of Yelp dataset is quite large but it is pre-loaded through Kaggle kernel so you don't need to download it locally.
- Most libraries are already available in this environment so no need to install more libraries locally.