This project automates the scraping, preprocessing, and analysis of user reviews from the Google Play Store. By leveraging unsupervised machine learning techniques, it provides insights into user feedback through topic modeling and sentiment analysis, offering a clear understanding of overall user sentiment.
- Scrape reviews from Google Play Store apps using the
google_play_scraper
library. - Save the scraped reviews in a structured CSV file for easy processing.
- Tokenize and clean text data.
- Remove stopwords, apply lemmatization, and vectorize text using TF-IDF.
- Apply Latent Dirichlet Allocation (LDA) to extract hidden topics from reviews.
- Visualize topics with key terms and explore their distributions interactively.
- Classify the sentiment of each review as positive, negative, or neutral.
- Generate metrics such as overall sentiment scores for apps and sentiment trends.
- Export cleaned data, sentiment scores, and topics to CSV.
- Use the exported data in dashboards or reports for further analysis.
Interactive plots displaying key terms for each topic.
Line graphs showing changes in sentiment over time.
Latent Dirichlet Allocation (LDA) is a probabilistic generative model that helps discover hidden topics in a collection of text documents. It is widely used in natural language processing (NLP) and text mining for tasks such as topic modeling, document classification, and more.
The LDA model assumes the following:
- Each document is a mixture of topics.
- Each topic is a distribution of words.
- Programming Language: Python
- Libraries:
google_play_scraper
pandas
nltk
scikit-learn
Gensim
matplotlib
andseaborn
for visualizations
- Machine Learning Techniques:
- Topic Modeling with LDA
- Sentiment Analysis using NLP models
- Clone the repository:
git clone https://github.com/yourusername/google-play-reviews-analysis.git cd google-play-reviews-analysis