Skip to content

gabrielaeaton/capstone

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Satire Detector

Satire detector is my capstone project which identifies satirical news headlines from legitimate news headlines

Required packages

The requirements for each virtual environment used is included in the requirement folder.

Notebooks

Please read the notebooks in this order:

1_WEBSCRAPE
2_CLEANING
3_EARLY_MODELS
4_ADJUST_DATA
5_RERUN_MODELS
6A_BERT_BINARY_SPLIT
6B_BERT 7A_DATA_MULTICLASS
7B_BERT_MULTICLASS
BERT_reload_final_binary
BERT_reload_final_multi

BERT transfer learning

This project used BERT_base_uncased as the pre-trained model. Files could be downloaded from:

https://storage.googleapis.com/bert_models/2018_10_18/uncased_L-12_H-768_A-12.zip

Google's sample code for BERT model was modified to fit this particular problem. Original code could be found at:

https://colab.research.google.com/github/bentoml/gallery/blob/master/tensorflow/bert/bert_movie_reviews.ipynb#scrollTo=dCpvgG0vwXAZ

Trained model

Saved models were too big to fit into the submission portal. I have zipped them and uploaded them onto Google Drive. Please find the link below.

https://drive.google.com/file/d/1LbynyDRns_doutM5OhMIUC2olUXskgoj/view?usp=sharing

StreamLit App

Codes and files used for the StreamLit app is also hosted on Google Drive. Please find the link below.

https://drive.google.com/file/d/11p-AOcp2Agoc3Mg5Gh3rT3pTvuIZJ2s0/view?usp=sharing

Data

Since all the data used in this project is scraped from the web, I have uploaded a zip file containing all of the scraped data on Google Drive. Please find the link below.

https://drive.google.com/file/d/1y8FN8qpKrTKaa_M7AOZSPDC_gVs_7SKA/view?usp=sharing

About

Satire Detector Capstone

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published