Vietnamese Hate Speech Detection on Youtube using PhoBERT-CNN; CNN-BiLSTM; BiLSTM and Logistic Regression
THis repository aim to create a website that can dectect and let users have chances to deal with different type of comments on youtube:
- Clean
- Offensive
- Hate
This is a repository re-implementing the code of the paper Vietnamese-Hate-and-Offensive-Detection-using-PhoBERT-CNN-and-Social-Media-Streaming-Data
for Data Mining final project
Since the file is over 500mb so i can not put in gihub, you need to go to this google colab to download:
in folder final/HateSpeechDetectionApp/Models/result
- CNN_BiLSTM_model_reddit.h5 and tokenizer_reddit.pickle for CNN-BiLSTM
- BiLSTM.h5 and tokenizer_1.pickle for BiLSTM
: Pre-trained language models for Vietnamese -
: BiLSTM for text classification - - BiLSTM
: Research on Text Classification Based on CNN and LSTM - Neural Networks
for Sentence Classification - spark
: a unified engine for big data processing - kafka
: a distributed event-store and streaming platform: -
- Install necessary packages from requirements.txt file
pip install -r HateSpeechDetectionApp/requirements.txt
Set up kafka cluster locally: please refer to this document
Create topic for data storage (run this code in the kafka path you have just set up):
bin/ --create --topic rawData --bootstrap-server localhost:9092
bin/ --create --topic anomalies --bootstrap-server localhost:9092
1.1. Install Visual Studio Code:
- Visit the official Visual Studio Code page:
- Download and install Visual Studio Code with your computer's compatible operating system.
1.2. Install Node.js and npm:
- Visit the official Node.js page:
- Download and install Node.js and npm.
- Check if Node.js is installed with the command
node –v
1.3. Install Kafka:
- Visit the Apache Kafka homepage ( to download the version of Kafka you want to install. Choose the version suitable for your operating system.
1.4. Install MongoDB:
- Download MongoDB: Visit the MongoDB homepage: Choose the version suitable for your operating system and download it.
- Install MongoDB: Run the installation file (.msi) you just downloaded. Select "Complete" when asked about the installation type. Select "Run service as Network Service user" to run MongoDB as a Windows service.
- Environment setup: Create folder C:\data\db to store MongoDB data.
2.1. Create a website using npm
- Go to the folder you want to put the website in
- Open terminal and type:
npx create-react-app ${Name of the website you want}
2.2. Unzip the source code files
- Move the 2 files src and server file into the web folder you just created (overwrite the original file)
- Move the final file to a separate folder (It's best to keep it in drive D so you don't need to change the path inside the file)
2.3. Install libraries for ReactJS:
- Go to the server directory and open terminal and type:
pip install Flask pymongo requests google-api-python-client beautifulsoup4
npm install express cors
- Go to the main directory of the website, open terminal and type:
npm install react recharts axios react-icons qrcode.react react-spinners
2.4. Set up database and collection for data storage:
- Change the names of databases and collections contained in the server, visomences and src files to match your created and set up database and collection names.
- Go to Google Cloud Console, go to APIs and services and create a new project
- Go to the OAuth consent screen and create a new External server
- Go to the Identifiants section in Create IDs to create an API key and an OAuth client ID
- Go to Library to find Youtube Data API v3 and start it
- Finally, go into the file (in the server) and the file (in final folder) change the value of Developer_Key with the new API key and change the path of Client_secret_life (in with the path of the file OAuth client ID
4.1. Start the Zookeeper server
.\bin\windows\zookeeper-server-start.bat .\config\
.\bin\windows\kafka-server-start.bat .\config\
kafka-topics.bat --create --bootstrap-server localhost:9092 --replication-factor 1--partition 1 --topic “topic_name”
4.2. Start Consumer
Run file
4.3. Launch Web
• Open a terminal in the server folder and type: node server.js • Open a terminal in the server folder and type: python • Open a terminal in the main web folder and type:
npm start