Satire detector is my capstone project which identifies satirical news headlines from legitimate news headlines
The requirements for each virtual environment used is included in the requirement
folder.
Please read the notebooks in this order:
1_WEBSCRAPE
2_CLEANING
3_EARLY_MODELS
4_ADJUST_DATA
5_RERUN_MODELS
6A_BERT_BINARY_SPLIT
6B_BERT
7A_DATA_MULTICLASS
7B_BERT_MULTICLASS
BERT_reload_final_binary
BERT_reload_final_multi
This project used BERT_base_uncased
as the pre-trained model. Files could be downloaded from:
https://storage.googleapis.com/bert_models/2018_10_18/uncased_L-12_H-768_A-12.zip
Google's sample code for BERT model was modified to fit this particular problem. Original code could be found at:
Saved models were too big to fit into the submission portal. I have zipped them and uploaded them onto Google Drive. Please find the link below.
https://drive.google.com/file/d/1LbynyDRns_doutM5OhMIUC2olUXskgoj/view?usp=sharing
Codes and files used for the StreamLit app is also hosted on Google Drive. Please find the link below.
https://drive.google.com/file/d/11p-AOcp2Agoc3Mg5Gh3rT3pTvuIZJ2s0/view?usp=sharing
Since all the data used in this project is scraped from the web, I have uploaded a zip file containing all of the scraped data on Google Drive. Please find the link below.
https://drive.google.com/file/d/1y8FN8qpKrTKaa_M7AOZSPDC_gVs_7SKA/view?usp=sharing