- python-dotenv
- clean-text
- langdetect
- textblob
- google-api-python-client
getVideoIds.py
: python script to fetch ids of youtube videosvideoIds
: directory containing .txt files which store the fetched vidoe ids asa result of executinggetVideoIds.py
getComments.py
: python script which reads video ids from thevideoIds
directory and fetches comments for each videocleanText.py
: python script which takes the raw comments and performs data cleaninggenerateDataset.py
: pyhton script which takes the clean comments file and produces a dataset ready for Sentiment Analysis taskscommentsDatasetLarge.csv
: A csv file, which is the result of the output of the execution ofgenerateDataset.py
gatherData.sh
: Shell script that automates the process of collecting data, cleaning the data and creating a dataset out of itprocessed_dataset.pkl
: Cleaned and preprocessed datasetbest_models
: directory containing saved LSTM and CNN modelsLstm.ipynb
: Fully executed notebook for LSTM modelsCNN.ipynb
: Fully executed notebook for CNN models
The below diagram aims to explain the sequence of operations that take place when this application is run to perform Sentiment Analysis
Steps to run this application
- Make sure you have the required python packages mentioned in above sections
- Set up envornment varaibles file
.env
in the root (current) directory
API_KEY="YOUR_API_KEY_CREATED_FROM_GOOGLE_CONSOLE"
- Phase 1: Execute the following command to get yotube comments, preprocess it and generate a dataset out of it:
gatherData.sh
- Phase 2: Execute the Lstm.ipynb notebook
- Phase 3: Execute the CNN.ipynb notebook
Following are the specifications of the environment on which this part of application was executed/tested:
- MacBook Air M1
- OS: Montery
- Memory: 16 GB
- Python version: 3.9.13
Following are the specifications of the environment on which this part of application was executed/tested:
- Google Colab Notebook
- Connected to a custom GCP VM:
-
- System RAM: 102.2 GB
-
- Disk: 186.0 GB