Python application to perform to Sentiment Analysis on YouTube Comments Dataset

Overall prerequisites

Python packages

python-dotenv
clean-text
langdetect
textblob
google-api-python-client

Overall code organization/ directory structure

getVideoIds.py : python script to fetch ids of youtube videos
videoIds : directory containing .txt files which store the fetched vidoe ids asa result of executing getVideoIds.py
getComments.py : python script which reads video ids from the videoIds directory and fetches comments for each video
cleanText.py : python script which takes the raw comments and performs data cleaning
generateDataset.py : pyhton script which takes the clean comments file and produces a dataset ready for Sentiment Analysis tasks
commentsDatasetLarge.csv : A csv file, which is the result of the output of the execution of generateDataset.py
gatherData.sh : Shell script that automates the process of collecting data, cleaning the data and creating a dataset out of it
processed_dataset.pkl : Cleaned and preprocessed dataset
best_models : directory containing saved LSTM and CNN models
Lstm.ipynb : Fully executed notebook for LSTM models
CNN.ipynb : Fully executed notebook for CNN models

Flow of operations

The below diagram aims to explain the sequence of operations that take place when this application is run to perform Sentiment Analysis

Running the application

Steps to run this application

Make sure you have the required python packages mentioned in above sections
Set up envornment varaibles file .env in the root (current) directory

API_KEY="YOUR_API_KEY_CREATED_FROM_GOOGLE_CONSOLE"

Phase 1: Execute the following command to get yotube comments, preprocess it and generate a dataset out of it:

gatherData.sh

Phase 2: Execute the Lstm.ipynb notebook
Phase 3: Execute the CNN.ipynb notebook

Environment specifications

Phase 1

Following are the specifications of the environment on which this part of application was executed/tested:

MacBook Air M1
OS: Montery
Memory: 16 GB
Python version: 3.9.13

Phase 2

Following are the specifications of the environment on which this part of application was executed/tested:

Google Colab Notebook
Connected to a custom GCP VM:
- System RAM: 102.2 GB
- Disk: 186.0 GB

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Python application to perform to Sentiment Analysis on YouTube Comments Dataset

Overall prerequisites

Python packages

Overall code organization/ directory structure

Flow of operations

Running the application

Environment specifications

Phase 1

Phase 2

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
best_models		best_models
videoIds		videoIds
.gitignore		.gitignore
CNN.ipynb		CNN.ipynb
Lstm.ipynb		Lstm.ipynb
README.md		README.md
cleanText.py		cleanText.py
commentsDatasetLarge.csv		commentsDatasetLarge.csv
gatherData.sh		gatherData.sh
generateDataset.py		generateDataset.py
getComments.py		getComments.py
getVideoIds.py		getVideoIds.py
processed_dataset.pkl		processed_dataset.pkl

mitrjain/Youtube_Comments_Sentiment_Analysis

Folders and files

Latest commit

History

Repository files navigation

Python application to perform to Sentiment Analysis on YouTube Comments Dataset

Overall prerequisites

Python packages

Overall code organization/ directory structure

Flow of operations

Running the application

Environment specifications

Phase 1

Phase 2

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages