Authors | GitHub | |
---|---|---|
Anand Ramakrishnan | ||
Nick Van Bergen | planetvb | nvbergen |
Steven Tran | steven-tran | mr-steven-tran |
To get started replicating our work, start with these Data Collection notebooks. These notebooks can be run in any order.
Notebook | Description |
---|---|
target_data_retrieval | Retrieve crime data from the United States Federal Bureau of Investigation Crime Data API. An API key is required before the script can be run again. |
cpi_retrieval | Retrieve Consumer Price Index data from the United States Bureau of Labor Statistics Public Data API. An API key before the script can be run again. |
unemployment_rates_retrieval | Retrieve state unemployment rate data from the United States Bureau of Labor Statistics Public Data API. An API key before the script can be run again. |
AG_scrape | Retrieve attorney general political affiliation from their pages maintained on Wikipedia. |
Once the data are collected, progress through the following notebooks in the order presented to replicate our EDA and modeling results:
Step | Notebook | Description |
---|---|---|
0 | join_all | Joins each of the predictor data sources with the target data sources into one dataframe for EDA and modeling. |
1 | EDA | We conduct exploratory data analysis to identify states where crime rates are exceptional. Further, test the time series crime data for stationarity to help guide what direction to take for modeling. |
2 | ARIMAX | Based most states' crime rates over time exhibiting stationarity at the second-order differencing level, we fit the ARIMAX model and discuss predictive performance metrics. |
3 | RNN_LSTM_Model | NOTE: This notebook may need to be run in Google Colab. Because not every state's time series exhibited stationarity, we construct a Recurrent Neural Network with Long Short-Term Memory to attempt to predict crime rates in the first out-of-sample year (2021 in our case). |
4 | Discussion_LSTM | We discuss the modeling results of the RNN LSTM model, including limitations and potential next steps. |
Lastly, we provide presentation slides summarizing our project.
We are of the opinion that current research demonstrates that person-based predictive policing is in inherently racially biased and unfairly categorizes large groups of people. As such, our goal is attempt an alternative solution that achieves:
- A prediction.
- Has applications in law enforcement.
- Escapes internal biases.
Read our Executive Summary here
Source | Purpose | Credentials Needed |
---|---|---|
United States Federal Bureau of Investigation Crime Data API | Target Data: Crime counts by state by year. We selected only the 50 states and DC. | API Key Required |
United States Bureau of Labor Statistics Public Data API | Monthly Unemployment by state (we annualized). National CPI per month per year. | API Key Required |
Wikipedia (python library) | Political affiliation of state attorneys general over time. | None |
If you want to set up an environment identical to the one we used, please install the following libraries and version types:
- pandas, v 1.20.1
- NumPy, v 1.2.4
- matplotlib, v 3.3.4
- seaborn, v 0.11.1
- scikit-learn v 0.24.1
NOTE: The code notebook RNN_LSTM_Model.ipynb must be run in Google Colab.
- Tensorflow (with Keras) v 2.7.0