Sentiment Analysis on Reddit Data (Hurricane Helene)

This project implements a sentiment analysis model for Reddit data, specifically related to Hurricane Helene. It utilizes various machine learning techniques, including deep learning and traditional models like logistic regression, random forests, and gradient boosting. The project also uses the Hugging Face distilbert-base-uncased-emotion model for emotion classification.

Data Collection

The datasets used in this project were first collected through a web scraper, which can be found at: Web-Scraper. The web scraper pulls relevant Reddit posts related to Hurricane Helene, which are then processed and analyzed for sentiment in this repository.

Overview

The goal of this project is to analyze and predict sentiments and emotional responses to Reddit posts related to Hurricane Helene. This involves:

Cleaning and preprocessing data.
Implementing sentiment analysis using machine learning models.
Fine-tuning a transformer model.
Evaluating and visualizing results.

The project includes both traditional machine learning models and a deep learning-based transformer model for emotion classification.

Files and Their Descriptions

Data

data/: This folder contains raw datasets (Reddit posts, comments, and other text data) used for training, testing, and evaluation.

Scripts

combine.py: Script for combining multiple datasets and cleaning the data (e.g., removing duplicates and irrelevant text).
analysistransformers.py: Script for applying sentiment analysis and emotion classification using transformer models (e.g., fine-tuned distilbert).
deep_learning.py: Script for applying deep learning-based models to sentiment analysis (including training and evaluation).
gradient_boosting.py: Script that applies gradient boosting techniques (e.g., XGBoost) to predict sentiment.
logistic_regression.py: Script for sentiment analysis using logistic regression.
naive_bayes.py: Script that applies the Naive Bayes classifier for sentiment analysis.
random_forest.py: Script for training a Random Forest classifier for sentiment analysis.
clean_manually_labeled_further.py: Script to further clean and preprocess manually labeled data for sentiment analysis.

Results and Visualizations

combined.csv: A CSV file containing the combined dataset after merging multiple raw datasets.
combined_clean.csv: A cleaned version of the combined.csv file after removing irrelevant or noisy data.
Figure_1.png, Figure_2.png, Figure_3.png: Images containing visualizations and results from the analysis (e.g., bar plots of emotion scores).

Model

tokenizer_fine_tuned_distilbert/: Contains the tokenizer for the fine-tuned distilbert model that processes text before feeding it to the model.
model_fine_tuned_distilbert/: Directory containing the fine-tuned distilbert-base-uncased-emotion model, specifically trained for emotion classification.

Other

requirements.txt: A text file listing all the required Python dependencies for the project.

Process

Data Preprocessing:
- Combine multiple raw datasets (combine.py).
- Clean and preprocess text data (clean_manually_labeled_further.py).
- Tokenize the text data (analysistransformers.py).
Modeling:
- Traditional Machine Learning Models: Implement models like Logistic Regression, Naive Bayes, Random Forest, and Gradient Boosting (logistic_regression.py, naive_bayes.py, random_forest.py, gradient_boosting.py).
- Deep Learning Models: Implement and train deep learning-based models (deep_learning.py and deep_learning_balanced.py).
- Emotion Classification with Transformer Model: Fine-tune the pre-trained distilbert-base-uncased-emotion model for emotion classification and apply it to the data (analysistransformers.py).
Evaluation and Results:
- Evaluate model performance and visualize emotion scores (analysistransformers.py).
- Save and analyze results in the results/ folder.

Model Citation

The analysistransformers.py file used emotion classification by utilizing the fine-tuned version of the distilbert-base-uncased-emotion model. This model was developed by Bhadresh Savani and can be accessed at:

@article{savani2020distilbert,
  title={distilbert-base-uncased-emotion},
  author={Bhadresh Savani},
  journal={Hugging Face},
  year={2020},
  url={https://huggingface.co/bhadresh-savani/distilbert-base-uncased-emotion}
}

Requirements

Python 3.6 or higher
Install the dependencies from requirements.txt:

pip install -r requirements.txt

Ensure that you have all the required libraries, such as pandas, sklearn, transformers, torch, and seaborn.

By following the instructions above, you can replicate this analysis and sentiment classification on any text data. The process is flexible and can be adjusted for different types of datasets or analysis needs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentiment Analysis on Reddit Data (Hurricane Helene)

Data Collection

Table of Contents

Overview

Files and Their Descriptions

Data

Scripts

Results and Visualizations

Model

Other

Process

Model Citation

Requirements

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.idea		.idea
.venv		.venv
data		data
.DS_Store		.DS_Store
Figure_1.png		Figure_1.png
Figure_2.png		Figure_2.png
Figure_3.png		Figure_3.png
README.md		README.md
clean_manually_labeled_further.py		clean_manually_labeled_further.py
combine.py		combine.py
deep_learning.py		deep_learning.py
deep_learning_balanced.py		deep_learning_balanced.py
gradient_boosting.py		gradient_boosting.py
naive_bayes.py		naive_bayes.py

sjanefullerton/Sentiment-Analysis

Folders and files

Latest commit

History

Repository files navigation

Sentiment Analysis on Reddit Data (Hurricane Helene)

Data Collection

Table of Contents

Overview

Files and Their Descriptions

Data

Scripts

Results and Visualizations

Model

Other

Process

Model Citation

Requirements

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages