YouTube Video Sentiment Analysis and Regression

Overview

This project involves sentiment analysis of YouTube video comments by extracting data from the most liked, disliked, and random videos. Using machine learning models such as Random Forest, Ridge, and Gradient Boosting, it aims to predict video performance metrics and views based on sentiment analysis scores. The project also includes hyperparameter tuning and model stacking for improved predictive accuracy.

Data Collection

The dataset is collected from YouTube using the Google API. Comments are extracted for three categories:

Most Liked Videos: Top 8,000 videos with the highest number of likes.
Most Disliked Videos: Top 8,000 videos with the highest number of dislikes.
Random Videos: 8,000 randomly selected videos from the dataset.

Sentiment Analysis

We use the VADER (Valence Aware Dictionary and sEntiment Reasoner) sentiment analysis tool to analyze the extracted comments. The sentiment scores are then used to help predict video performance in the machine learning phase.

Machine Learning Models

Several regression models are applied to predict video metrics (likes/dislikes):

Random Forest Regression
Ridge Regression
K-Nearest Neighbors Regression
Gradient Boosting Regression

These models are trained and tested on the dataset, with metrics such as R² scores used to evaluate their performance.

Model Tuning and Stacking

To improve model performance, hyperparameter tuning is done for both Random Forest and Gradient Boosting models. Stacking is also implemented to combine the strengths of multiple models for better results.

Conclusion and Future Work

Stacking, an advanced ensemble technique, excels in generalization as demonstrated by its robust performance on the testing set. By leveraging the complementary strengths of multiple base regression models, stacking aggregates their predictions to achieve superior accuracy compared to any single model. This not only mitigates the limitations of individual models but also optimizes predictive performance, making it highly suitable for applications prioritizing precision. In future, there are promising avenues for further improvement and exploration. One key focus could be on reducing the Mean Squared Error (MSE) by refining feature selection techniques or introducing novel features that capture additional relevant information. Furthermore, enhancing the interpretability of the stacked model would provide valuable insights into how different base models contribute to predictions, thereby fostering trust and facilitating informed decision-making in real-world scenarios. Overall, continuous refinement and innovation in stacking methodology hold significant potential to deliver more precise and actionable predictions in diverse applications.

Usage

To run this notebook, you will need the following libraries installed:

pandas
vaderSentiment
matplotlib
google-api-python-client
scikit-learn

You can install them using pip:

pip install pandas vaderSentiment matplotlib google-api-python-client scikit-learn

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Final.ipynb		Final.ipynb
PROJECT REPORT.pdf		PROJECT REPORT.pdf
README.md		README.md
YouTube_category_id.json		YouTube_category_id.json
random_middle_8000_videos.csv		random_middle_8000_videos.csv
top_8000_disliked_videos.csv		top_8000_disliked_videos.csv
top_8000_liked_videos.csv		top_8000_liked_videos.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

YouTube Video Sentiment Analysis and Regression

Overview

Table of Contents

Data Collection

Sentiment Analysis

Machine Learning Models

Model Tuning and Stacking

Conclusion and Future Work

Usage

About

Releases

Packages

Languages

vishnu-vamshii/YouTube-Comments-Sentiment-Analysis-for-Views-Prediction

Folders and files

Latest commit

History

Repository files navigation

YouTube Video Sentiment Analysis and Regression

Overview

Table of Contents

Data Collection

Sentiment Analysis

Machine Learning Models

Model Tuning and Stacking

Conclusion and Future Work

Usage

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages