Predicting U.S. Crime Rates

Authors	LinkedIn	GitHub
Anand Ramakrishnan
Nick Van Bergen	planetvb	nvbergen
Steven Tran	steven-tran	mr-steven-tran

Notebook	Description
target_data_retrieval	Retrieve crime data from the United States Federal Bureau of Investigation Crime Data API. An API key is required before the script can be run again.
cpi_retrieval	Retrieve Consumer Price Index data from the United States Bureau of Labor Statistics Public Data API. An API key before the script can be run again.
unemployment_rates_retrieval	Retrieve state unemployment rate data from the United States Bureau of Labor Statistics Public Data API. An API key before the script can be run again.
AG_scrape	Retrieve attorney general political affiliation from their pages maintained on Wikipedia.

EDA and Modeling

Once the data are collected, progress through the following notebooks in the order presented to replicate our EDA and modeling results:

Step	Notebook	Description
0	join_all	Joins each of the predictor data sources with the target data sources into one dataframe for EDA and modeling.
1	EDA	We conduct exploratory data analysis to identify states where crime rates are exceptional. Further, test the time series crime data for stationarity to help guide what direction to take for modeling.
2	ARIMAX	Based most states' crime rates over time exhibiting stationarity at the second-order differencing level, we fit the ARIMAX model and discuss predictive performance metrics.
3	RNN_LSTM_Model	NOTE: This notebook may need to be run in Google Colab. Because not every state's time series exhibited stationarity, we construct a Recurrent Neural Network with Long Short-Term Memory to attempt to predict crime rates in the first out-of-sample year (2021 in our case).
4	Discussion_LSTM	We discuss the modeling results of the RNN LSTM model, including limitations and potential next steps.

Lastly, we provide presentation slides summarizing our project.

Problem Statement

We are of the opinion that current research demonstrates that person-based predictive policing is in inherently racially biased and unfairly categorizes large groups of people. As such, our goal is attempt an alternative solution that achieves:

A prediction.
Has applications in law enforcement.
Escapes internal biases.

Read our Executive Summary here

Data

Source	Purpose	Credentials Needed
United States Federal Bureau of Investigation Crime Data API	Target Data: Crime counts by state by year. We selected only the 50 states and DC.	API Key Required
United States Bureau of Labor Statistics Public Data API	Monthly Unemployment by state (we annualized). National CPI per month per year.	API Key Required
Wikipedia (python library)	Political affiliation of state attorneys general over time.	None

Software Requirements

If you want to set up an environment identical to the one we used, please install the following libraries and version types:

pandas, v 1.20.1
NumPy, v 1.2.4
matplotlib, v 3.3.4
seaborn, v 0.11.1
scikit-learn v 0.24.1

NOTE: The code notebook RNN_LSTM_Model.ipynb must be run in Google Colab.

Tensorflow (with Keras) v 2.7.0

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
code		code
data		data
slides		slides
LICENSE		LICENSE
README.md		README.md
executive_summary.md		executive_summary.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predicting U.S. Crime Rates

Table of Contents

Data Collection

EDA and Modeling

Problem Statement

Data

Software Requirements

About

Releases

Packages

Languages

License

mr-steven-tran/predicting_US_crime

Folders and files

Latest commit

History

Repository files navigation

Predicting U.S. Crime Rates

Table of Contents

Data Collection

EDA and Modeling

Problem Statement

Data

Software Requirements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages