Sentiment Analysis on Twitter Dataset Using Machine Learning

Project Overview

The goal of this project is to perform sentiment analysis on a large dataset of tweets to classify them as positive or negative. Sentiment analysis helps in understanding the sentiment of users towards specific topics, brands, or events, enabling better decision-making and strategy formulation.

Dataset

The dataset used in this project is the Sentiment140 dataset, which contains 1,600,000 tweets. Each tweet is labeled as either negative, neutral, or positive.

Dataset Files

training.1600000.processed.noemoticon.csv: The main dataset containing tweets with sentiment labels.

Dataset Fields

sentiment: The polarity of the tweet (0 = negative, 2 = neutral, 4 = positive)
ids: The id of the tweet
date: The date of the tweet
flag: The query (if any)
user: The user who tweeted
text: The text of the tweet

Project Structure

Data Preprocessing and Exploratory Data Analysis (EDA)

Data Loading: Load the dataset.
Data Cleaning: Remove unnecessary columns and replace sentiment values for better understanding (4 to 1 for positive).
EDA: Plot the distribution of sentiments and create word clouds for negative and positive tweets.

Feature Engineering

Text Preprocessing:

Convert text to lowercase.
Replace URLs, emojis, and usernames with placeholders.
Remove non-alphanumeric characters and stopwords.
Lemmatize the words.

TF-IDF Vectorization: Convert text data into numerical features using TF-IDF.

Model Training and Evaluation

Train/Test Split: Split the data into training and test sets.
Model Training: Train three models - Bernoulli Naive Bayes, LinearSVC, and Logistic Regression.
Model Evaluation: Evaluate models using precision, recall, f1-score, and confusion matrix.

Results

The models were evaluated based on precision, recall, f1-score, and confusion matrix:

Bernoulli Naive Bayes:

Precision: 0.81 (Negative), 0.80 (Positive)
Recall: 0.79 (Negative), 0.81 (Positive)
F1-Score: 0.80 (Negative), 0.80 (Positive)
Accuracy: 0.80

LinearSVC:

Precision: 0.82 (Negative), 0.81 (Positive)
Recall: 0.81 (Negative), 0.83 (Positive)
F1-Score: 0.82 (Negative), 0.82 (Positive)
Accuracy: 0.82

Logistic Regression:

Precision: 0.83 (Negative), 0.82 (Positive)
Recall: 0.82 (Negative), 0.84 (Positive)
F1-Score: 0.83 (Negative), 0.83 (Positive)
Accuracy: 0.83

Visualization

Word Cloud for Negative Tweets: Visual representation of the most frequent words in negative tweets.
Word Cloud for Positive Tweets: Visual representation of the most frequent words in positive tweets.
Confusion Matrix: Heatmap of the confusion matrix showing the performance of the models in terms of true positives, false positives, false negatives, and true negatives.

Saving and Loading Models

Model Saving: Save the trained models and vectorizer using pickle for future use.
Model Loading: Load the saved models and vectorizer to make predictions on new data.

Conclusion

This project demonstrates a comprehensive approach to performing sentiment analysis on Twitter data using machine learning. The models trained provide good accuracy and can be used for real-time sentiment analysis applications.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
BernouliNB Confusion matrix.png		BernouliNB Confusion matrix.png
Distribution of data.png		Distribution of data.png
LinearSVC confusion matrix.png		LinearSVC confusion matrix.png
LogisticReg confusion matrix.png		LogisticReg confusion matrix.png
README.md		README.md
Sentiment Analysis.ipynb		Sentiment Analysis.ipynb
Sentiment Analysis.py		Sentiment Analysis.py
WordCloud Negative.png		WordCloud Negative.png
WordCloud Positive.png		WordCloud Positive.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentiment Analysis on Twitter Dataset Using Machine Learning

Project Overview

Dataset

Dataset Files

Dataset Fields

Project Structure

Data Preprocessing and Exploratory Data Analysis (EDA)

Feature Engineering

Model Training and Evaluation

Results

Visualization

Saving and Loading Models

Conclusion

About

Releases

Packages

Languages

VaishDeshpande234/Sentiment-Analysis

Folders and files

Latest commit

History

Repository files navigation

Sentiment Analysis on Twitter Dataset Using Machine Learning

Project Overview

Dataset

Dataset Files

Dataset Fields

Project Structure

Data Preprocessing and Exploratory Data Analysis (EDA)

Feature Engineering

Model Training and Evaluation

Results

Visualization

Saving and Loading Models

Conclusion

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages