TRG Evaluation

Challenge 1

I wanted to make something easily deployable so after some research I cloned this repo and made the appropriate changes. So a big thanks to omahoco

This project contains the following containers:

A Spark cluster with two worker nodes.
A Jupiter notebook server for exploration.
A Shared filesystem for blob storage.

Prerequisites

This runs only on unix systems. Please do not run on Windows. You just need to have docker installed.

Building the environment

# Clone the repo
git clone https://github.com/b1ackout/trg_challenge

# Create the environment
cd trg_challenge
make all

# Unpack the data:
make unpack_data

To submit the spark job that will generate the parquet files just run:

make spark-submit

To submit the spark job that will generate the json kpis run:

make spark-kpis

Checking the data via Jupyter and steps followed

Navigate to http://localhost:9999. Jupyter will ask you for a token for authentication.

To get the authentication token from the Jupiter server.

make jupyter_token

In Jupyter, I did some data exploration.

I concluded that the left join of the street with the outcome datasets is the way to go, because the crimes of the outcome that don't join must be from older time period than that of our scope.

Also a window function was used to ensure that the outcome is actually the latest.

I also wanted to deploy an elastic and a kibana node but was not able to make elastic ingest the parquet data, I was always running to errors. I used this for reference.

So my other thought was to use databricks for visualization. Since databricks uses its own datasources, lets assume that the shared spark containers storage (/data directory) is a blob storage like S3 and databricks can access it.

An html export file of a databricks notebook is included, containing the kpi visualizations.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
files		files
pyspark		pyspark
.gitattributes		.gitattributes
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TRG Evaluation

Challenge 1

Prerequisites

Building the environment

Checking the data via Jupyter and steps followed

About

Releases

Packages

Languages

b1ackout/trg_challenge

Folders and files

Latest commit

History

Repository files navigation

TRG Evaluation

Challenge 1

Prerequisites

Building the environment

Checking the data via Jupyter and steps followed

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages