Skip to content

Utilizing Recurrent Neural Networks (RNNs) to predict the types of cyberbullying in tweets

Notifications You must be signed in to change notification settings

polaternez/cyberbullying-tweet-detection-rnn

Repository files navigation

Analyzing Cyberbullying Tweets with LSTM Networks

This project aims to develop a tool for identifying cyberbullying tweets and classifying them based on various categories such as gender, religion, age, ethnicity, and other types of cyberbullying. The primary objectives include:

  • Utilizing the Cyberbullying Classification Dataset sourced from Kaggle.
  • Conducting data cleaning procedures to enhance data quality.
  • Applying data preprocessing techniques to prepare the cleaned data for analysis.
  • Constructing a Recurrent Neural Network (RNN) model using Long Short-Term Memory (LSTM) layers and evaluating its performance on a separate test dataset.
  • Implementing a client-facing API using Flask for seamless integration and usability.

Technologies and Resources

Data Acquisition

The project relies on the Cyberbullying Classification Dataset obtained from Kaggle. This dataset comprises over 47,000 labeled tweets categorized into distinct classes of cyberbullying.

  • Not Cyberbullying
  • Gender
  • Religion
  • Other types of cyberbullying
  • Age
  • Ethnicity

alt text

Data Cleaning

A custom Python script is developed to perform rigorous data cleaning processes. These processes involve:

  • Removal of punctuation marks
  • Elimination of numerical characters
  • Conversion of text to lowercase
  • Elimination of stop words
  • Lemmatization/Stemming of words
  • Removal of URLs

Data Preprocessing

To prepare the cleaned tweets for analysis, the TextVectorization layer from Keras is applied. This layer facilitates one-hot encoding of text, resulting in a list of encoded integers representing individual words (or tokens) in the input string. Additionally, sequences are padded to ensure uniform length.

Model Building

  1. Train-Test Split: Data is divided into 80% training and 20% testing sets.
  2. Bidirectional LSTM Model: Build an RNN architecture utilizing Bidirectional LSTM layers.
  3. Evaluation: Employ "categorical_crossentropy" for loss measurement and "RMSprop" for optimization.

Model Visualization:

alt text

Model Performance:

alt text

Productionization

A Flask-based user interface (UI) allows users to submit tweets and receive cyberbullying type predictions in real-time.

alt text

About

Utilizing Recurrent Neural Networks (RNNs) to predict the types of cyberbullying in tweets

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published