Fake-and-Real-News

The main objective was to predict whether a news article is Real or Fake, given their title and/or body of text.

The dataset was acquired from kaggle, and contained news predominantly from the years 2016 and 2017.
After cleaning and wrangling, four features were carried forward to be used in modeling: title, text, title-length, and text-length. The latter was eventually dropped after statistical analysis (non-distinguishing feature among real and fake data).

EDA revealed certain words (e.g. Trump, President) to be found in both real and fake news, while others were more prevalent in real (e.g reuters, Washington, state) or fake (e.g. hillary, donald, obama) news. Additionally, fake news titles were longer with higher word counts.
It is interesting to note that in 2016 (an election year), there were double the number of fake news articles than real news. While 2017 had double the amount of overall news, it had 2.5 times less fake news than real.

Among the three models compared (Naive Bayes, Logistic Regression and PassiveAgressive) Naive Bayes was the fastest performing (3 to 4 times faster than the others). Even though it had a low accuracy in the 'test' data, it had superior prediction when using a 'blind' test set.
A final big leap in model improvement was the result of dropping all non-alphabetic characters from the text of the article, and modeling with Naive Bayes. The final blind test accuracy was 70%.

This repository contains reports, code and slides pertaining to this project.

Milestone Report is a jupyter notebook with code, and documentation on data wrangling and EDA (exploratory data analysis).
Statistical Analysis is a summary of simple statistics on the original and engineered data.
Machine Learning and Analysis is the bulk of the work with various ML hyper-parameter and model tests. Improvements were a learning process which culminated in a 10% jump in overall f1 increase, and a marked improvement (40% f1 increase) in the fake news prediction.
Slides is a summary presentation of the project.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
Figures		Figures
Capstone 2_Milestone Report.ipynb		Capstone 2_Milestone Report.ipynb
Machine Learning and Analysis.ipynb		Machine Learning and Analysis.ipynb
README.md		README.md
Slides - Real and Fake News.pdf		Slides - Real and Fake News.pdf
Statistical Analysis.ipynb		Statistical Analysis.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fake-and-Real-News

About

Releases

Packages

Languages

ssatti0/Fake-and-Real-News

Folders and files

Latest commit

History

Repository files navigation

Fake-and-Real-News

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages