Fake News Detector README

This project was created to fulfill the final project requirement for ECE 143 WI'21 at UC San Diego. This project uses the 'Fake News' Kaggle dataset (https://www.kaggle.com/c/fake-news/data) to visualize features of the data including most commonly used words, names, and country names in fake and real articles.

Authors: Akshay Gopalkrishnan, Bolun Liu, Pu Cheng, and Madison Wilson

Requirements

Python 3.7+

Usage

Instructions

To recreate the environment that contains all the modules necessary to run the code, run 'conda env create -f environment.yml' in the terminal.

Git clone 'ECE-143-Final-Project' repository
Run 'conda env create -f environment.yml' in the terminal
Run final_project.ipynb

Project Figure Generation

All figures used in our final project presentation can be generated by running the final_project.ipynb file in the 'Code-and-Notebooks' folder. Supporting methods used within this file are described below.

Supporting Methods

Several custom methods were written to process, plot, and predict the fake news data. These files can be found in the 'Code-and-Notebooks' folder. Descriptions of each method are as follows:

geo.py: This file downloads and modifies a GeoPandas world file to count the number of mentions of each country in fake and real news articles. The returned value is a DataFrame containing country counts and geographic dimensions that can be used to plot a heat map of the world.

ml.py: This file contains all the methods necessary for the machine learning pipeline, including preprocessing text, creating and training the model, and graphing the training and validation accuracy. Also includes an interactive feature where the user can enter an article name to see whether it's real or fake.

most_common_names.py: This method extracts the name drops from the articles and builds the bar charts to visualize the number of mentions and names.

wordle.py: This file creates wordle (word could) based on the article text from the real/fake news file.

Data

All data files used for this project can be found in the 'Datasets' folder. test.csv, test_data_labels.csv, and train.csv were downloaded from Kaggle https://www.kaggle.com/c/fake-news/data. The rest of the datasets were created by our group for this project. Descriptions of each dataset are as follows:

fake_name.csv: Stores all the name drops from fake articles.

new0.csv: Contains all the real news articles and their labels.

new1.csv: Contains all the fake news articles and their labels.

real_name.csv: Stores all the name drops from real articles.

test.csv: Contains all the real and fake news articles used for testing the machine learning model.

test_data_labels.csv: Contains the true value labels for the test data describing whether each article is real or fake.

train.csv: Contains all the real and fake news articles used for training the machine learning model.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.idea		.idea
Code-and-Notebooks		Code-and-Notebooks
Datasets		Datasets
Fake News - Final Presentation.pdf		Fake News - Final Presentation.pdf
Group12_Assignment4.ipynb		Group12_Assignment4.ipynb
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fake News Detector README

Requirements

Usage

Instructions

Project Figure Generation

Supporting Methods

Data

About

Releases

Packages

Contributors 2

Languages

akshaygopalkr/ECE-143-Final-Project

Folders and files

Latest commit

History

Repository files navigation

Fake News Detector README

Requirements

Usage

Instructions

Project Figure Generation

Supporting Methods

Data

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages