RedBlue

A political language classifier for news articles

Here is a presentation that provides a high-level overview of the project.

Here is a report that goes over the entire project in detail.

Quick Start

This quick start is intended to help you replicate our process.

Clone the repository:

$ git clone https://github.com/samgoodgame/RedBlue.git
$ cd redblue

Create a virtualenv and install the dependencies:

$ virtualenv venv
$ source venv/bin/activate
$ pip install -r requirements.txt

Normally, you'd need to run the dem_parse.py and rep_parse.py scripts to pull the training data from the internet and parse it into useable form. Since this repository includes the training data in the /data/debate_data/ directory, you don't need to run these scripts.
Build the models by running the classification script. Make sure that you modify the script to pickle the models into the right directory (modify the paths in lines 68, 357, 365, and 371).

$ cd scripts
$ python classify_svm.py

You'll receive a number of different results as your output. The most important number is the last one, which is the accuracy of the SVM model.

Classify the RSS data. You'll need to go into predict.py and adjust the path to the dataset (news source) that you wish to analyze, and you'll also need to make sure the script is pulling the pickled models from the right directory (modify the paths in lines 51 and 81).

$ python predict.py

Your results will appear in your CLI. To see results for each news source, simply redirect the classify_svm.py script to run in each news source's directory, under /data/sources/text/.

About

RedBlue is a political language classifier for news articles. It trains a Support Vector Machine (SVM) algorithm using training data from the 2016 Democratic and Republican presidential primary debates. It then uses Baleen to ingest RSS feeds into MongoDB, parse the feeds, remove stop words, and vectorize the data.

Once the RSS data is in the proper format (a sparse matrix with words as features and documents as instances), we pass it to our fitted model, which predicts if articles are "red" (Republican) or "blue" (Democratic).

Attribution

We generated our word cloud from an open-source Python word cloud package. The words are from Democratic and Republican presidential primary debates.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

RedBlue

Quick Start

About

Attribution

Files

README.md

Latest commit

History

README.md

File metadata and controls

RedBlue

Quick Start

About

Attribution