Explore public Chicago crimes data set using the latest Python open source libs for big data visualizations: Dask, HoloViews, Datashader, GeoViews, Bokeh.
- Download and install Anaconda: https://www.continuum.io/downloads
conda install -c conda-forge fastparquet snappy python-snappy pyspark
bokeh dask distributed numba scikit-learn pyarrow matplotlib palettable
seaborn bottleneck pymc3 brewer2mpl holoviews datashader
The following command will create a spec file with a list of packages installed:
conda list --explicit > spec-file.txt
See conda.io for more info on managing environments: https://conda.io/docs/using/envs.html
Click on Download > CSV menu in the top right corner to download all crimes data since 2001:
Convert raw CSV data to dask DataFrame and save it in Parquet file format with snappy compression for some preliminary data scrubbing and analytics with Jupyter notebooks:
>cd scripts
/scripts>python convert_crime_csv_to_parquet.py
/notebooks>jupyter notebook
- See CSV data preview notebook for 2017 Chicago crimes sample data preview (included in this repo's /data folder):
- Crime plots notebook has some crime type stats and matplotlib charts for reported 2017 Chicago crimes:
- All Chicago crime charts notebook requires Crimes-2001-to-present csv data download from Chicago data portal and running conversion script to snappy parquet. You can still preview select matplotlib charts output in this notebook without going through that hurdle:
TODO: covert matplotlib charts to interactive Chicago crimes charts and maps with HoloViews, Bokeh, Datashader and GeoViews.
