Skip to content

Latest commit

 

History

History
80 lines (58 loc) · 2.91 KB

readme.md

File metadata and controls

80 lines (58 loc) · 2.91 KB

Google Play Reviews Analysis: Topic Modeling and Sentiment Analysis

📄 Overview

This project automates the scraping, preprocessing, and analysis of user reviews from the Google Play Store. By leveraging unsupervised machine learning techniques, it provides insights into user feedback through topic modeling and sentiment analysis, offering a clear understanding of overall user sentiment.


🚀 Features

1️⃣ Automated Review Scraping

  • Scrape reviews from Google Play Store apps using the google_play_scraper library.
  • Save the scraped reviews in a structured CSV file for easy processing.

2️⃣ Text Preprocessing

  • Tokenize and clean text data.
  • Remove stopwords, apply lemmatization, and vectorize text using TF-IDF.

3️⃣ Topic Modeling

  • Apply Latent Dirichlet Allocation (LDA) to extract hidden topics from reviews.
  • Visualize topics with key terms and explore their distributions interactively.

4️⃣ Sentiment Analysis

  • Classify the sentiment of each review as positive, negative, or neutral.
  • Generate metrics such as overall sentiment scores for apps and sentiment trends.

5️⃣ Export and Integration

  • Export cleaned data, sentiment scores, and topics to CSV.
  • Use the exported data in dashboards or reports for further analysis.

📈 Visualization Examples

1️⃣ Sentiment Trends

image

Interactive plots displaying key terms for each topic.

2️⃣ Topic Modeling

image

Line graphs showing changes in sentiment over time.


Latent Dirichlet Allocation (LDA) Model

Latent Dirichlet Allocation (LDA) is a probabilistic generative model that helps discover hidden topics in a collection of text documents. It is widely used in natural language processing (NLP) and text mining for tasks such as topic modeling, document classification, and more.

How LDA Works

The LDA model assumes the following:

  • Each document is a mixture of topics.
  • Each topic is a distribution of words.

LDA Workflow

image


🛠️ Tech Stack

  • Programming Language: Python
  • Libraries:
    • google_play_scraper
    • pandas
    • nltk
    • scikit-learn
    • Gensim
    • matplotlib and seaborn for visualizations
  • Machine Learning Techniques:
    • Topic Modeling with LDA
    • Sentiment Analysis using NLP models

🛠️ Installation and Setup

  1. Clone the repository:
    git clone https://github.com/yourusername/google-play-reviews-analysis.git
    cd google-play-reviews-analysis