Analyzing Cyberbullying Tweets with LSTM Networks

This project aims to develop a tool for identifying cyberbullying tweets and classifying them based on various categories such as gender, religion, age, ethnicity, and other types of cyberbullying. The primary objectives include:

Utilizing the Cyberbullying Classification Dataset sourced from Kaggle.
Conducting data cleaning procedures to enhance data quality.
Applying data preprocessing techniques to prepare the cleaned data for analysis.
Constructing a Recurrent Neural Network (RNN) model using Long Short-Term Memory (LSTM) layers and evaluating its performance on a separate test dataset.
Implementing a client-facing API using Flask for seamless integration and usability.

Technologies and Resources

Python Version: 3.10
Libraries: numpy, pandas, matplotlib, seaborn, nltk, tensorflow, scikit-learn, flask, json
Flask API Setup:
- pip install -r requirements.txt
- conda env create -n <ENVNAME> -f environment.yaml (Anaconda environment)
Dataset: https://www.kaggle.com/datasets/andrewmvd/cyberbullying-classification

Data Acquisition

The project relies on the Cyberbullying Classification Dataset obtained from Kaggle. This dataset comprises over 47,000 labeled tweets categorized into distinct classes of cyberbullying.

Not Cyberbullying
Gender
Religion
Other types of cyberbullying
Age
Ethnicity

Data Cleaning

A custom Python script is developed to perform rigorous data cleaning processes. These processes involve:

Removal of punctuation marks
Elimination of numerical characters
Conversion of text to lowercase
Elimination of stop words
Lemmatization/Stemming of words
Removal of URLs

Data Preprocessing

To prepare the cleaned tweets for analysis, the TextVectorization layer from Keras is applied. This layer facilitates one-hot encoding of text, resulting in a list of encoded integers representing individual words (or tokens) in the input string. Additionally, sequences are padded to ensure uniform length.

Model Building

Train-Test Split: Data is divided into 80% training and 20% testing sets.
Bidirectional LSTM Model: Build an RNN architecture utilizing Bidirectional LSTM layers.
Evaluation: Employ "categorical_crossentropy" for loss measurement and "RMSprop" for optimization.

Model Visualization:

Model Performance:

Productionization

A Flask-based user interface (UI) allows users to submit tweets and receive cyberbullying type predictions in real-time.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
data		data
models		models
notebooks		notebooks
reports/figures		reports/figures
static/css		static/css
templates		templates
.gitignore		.gitignore
README.md		README.md
app.py		app.py
environment.yaml		environment.yaml
request.py		request.py
requirements.txt		requirements.txt
text_cleaning.py		text_cleaning.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Analyzing Cyberbullying Tweets with LSTM Networks

Technologies and Resources

Data Acquisition

Data Cleaning

Data Preprocessing

Model Building

Productionization

About

Releases

Packages

Languages

polaternez/cyberbullying-tweet-detection-rnn

Folders and files

Latest commit

History

Repository files navigation

Analyzing Cyberbullying Tweets with LSTM Networks

Technologies and Resources

Data Acquisition

Data Cleaning

Data Preprocessing

Model Building

Productionization

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages