GitHub - adunmore/Recidivism_Risk_Final_Project: Final project for 95791 Data Mining at CMU Heinz College

Introduction

About This Project

We process a large (100k records) data set of arrest records, and train models predicting risk of two-year recidivism and two-year violent recidivism at time of arrest. We then evaluate our models, comparing their performance to that of a professional model, and assessing whether they perform differently among different demographic categories.

Attributions

I share co-authorship with Minseon Lee. She has hosted this project on her github.

The data for this project was published by ProPublica, in their analysis of the COMPAS recidivism risk assessment instrument.

This project was completed for 95791 Data Mining at CMU Heinz College, taught by Prof. Alexandra Chouldechova

Using the project

Prerequisites

Requires sqlite3 command line tools, or some other way to execute Data_Processing/feature_engineering.sql against compas.db

final_report.Rmd and data_export.R each require several R packages. These must be installed manually.

data_orocessing.sh

This repository contains the source code and data required to build the project from scratch.

Before executing Notebooks/final_report.Rmd, simply run data_processing.sh. This shell script:

creates temp folders Data/modified/ and Cache/
creates a copy of the source data (Data/compas.db) in Data/modified/compas.db
in the copy of the source data, executes feature_engineering.sql, which creates a number of additional variables, and makes several changes optimizing performance (adding indexes, denormalizing tables).
executes data_export.R, generating several temporary .csv files in Data/modified, which are used by the modeling code.

final_report.Rmd

This notebook is the main product of this project. It contains all of the code for our exploratory analysis, model training and selection, and model evaluation.This document renders to final_report.md.

Cache/

We cache models that are performance-intensive to train in Cache/. If you would like to train all models as new, be sure to delete the contents of Cache/.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
Data		Data
Data_Processing		Data_Processing
Notebooks		Notebooks
.gitignore		.gitignore
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

About This Project

Attributions

Using the project

Prerequisites

data_orocessing.sh

final_report.Rmd

Cache/

About

Releases

Packages

Languages

adunmore/Recidivism_Risk_Final_Project

Folders and files

Latest commit

History

Repository files navigation

Introduction

About This Project

Attributions

Using the project

Prerequisites

data_orocessing.sh

final_report.Rmd

Cache/

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages